Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: starting up hive within my spring data app

  1. #1

    Default starting up hive within my spring data app

    I have a script that is running fine when i connect remotely to my hive server but fails with the very generic "Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:1"
    when i run the hive server as part of my spring application. I assume that i need to have it start up with same parameters or the like. Has anyone run into this? I have the hive-site.xml on my classpath.

    Thanks,
    David

  2. #2
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    hive-site.xml is needed on the server side not the client as most of the configuration in this case is done through Spring. Hive and Hadoop in general are fairly cryptic and there's not much SHDP can do about this.
    Make sure to look into the Hive server logs as well to see whether something goes wrong on the server - not having the derby instance running or a missing library tend to be common errors.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  3. #3

    Default

    Quote Originally Posted by Costin Leau View Post
    hive-site.xml is needed on the server side not the client as most of the configuration in this case is done through Spring. Hive and Hadoop in general are fairly cryptic and there's not much SHDP can do about this.
    Make sure to look into the Hive server logs as well to see whether something goes wrong on the server - not having the derby instance running or a missing library tend to be common errors.
    Just to clarify, my spring data app is trying to boostrap hive. When i run the code such that the client connects to an already running hiveserver, it works fine.When i run the script in the hive CLI it works, but when i run it in the instance created by the xml:
    <hdp:hive-server port="${hive.port}" auto-startup="true"
    properties-location="hive-server.properties"/>

    That is when i get the cryptic errors. So, I am assuming that somehow the defaults that the hive server starts with differ from how it runs when i bring it up at the command line. So, i am wondering how i can have the spring hive instance start exactly as it runs from command line.
    My preference is to bring it up in process so we dont have multiple processes accidentally using the same hive server due to warnings about thread safety.

    Thanks,
    David

  4. #4
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Understood. It depends on how you start your hive server by hand - do you rely on any services or configuration? Make sure this are passed properly to hive-server.
    Note that by definition, hive-server starts a Thrift server for use with hive-client (a thrift client).
    Also, make sure that the hive-conf.xml properties are properly passed to the hive-server - it's best to specify them through properties-location then have them as a file since the classpath can differ. Note that also all the hive-related libraries and dependencies need to be available in the classpath (as opposed to just hive).
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  5. #5

    Default

    so i can point to an xml file for the properties instead of a simple properties?
    As for libraries, the hive server starts and i include in my classpath the libs of hive as installed on the machine.

  6. #6
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    no - the properties attributes points to just that, a properties file.
    You can however create a dedicated hdp:configuration attribute and pass the XML to that (and potentially set any other properties that you want in a nested fashion):
    http://static.springsource.org/sprin...#hadoop:config
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  7. #7

    Default

    I should be passing the hadoop and hive xmls to it? I tried this (cloudera install of hadoop) and still same error.

    <hdp:configuration resources="classpath:/core-site.xml, classpath:/hdfs-site.xml"/>

    <hdp:configuration id="hive" configuration-ref="hadoopConfiguration" resources="classpath:/hive-site.xml"/>

  8. #8
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    I've just double check one of the hive server tests and there's nothing special about my setup. Note that I'm talking the hard road - using Hive client on Win machine (my dev one) accessing Hive server on a Win machine (the same) talking to a Hadoop cluster on a remote/VM machine (*nix).

    First make sure the Hive libraries are in place - you typically get CNFE if you don't. In my case this meant hive-builtins/hive-metastore added to the classpath. Also make sure you're using a proper version of antlr (antlr-runtime 3.0.x) - this can be an issue if you have pig in the classpath which will pull in a more recent version of antlr which whom Hive is not compatible (and you'll know get a cryptic NoSuchField error).

    Below are my config file and the artifact dependencies from gradle:

    Code:
        <hdp:hive-server properties-ref="props" properties-location="cfg-1.properties, cfg-2.properties" port="${hive.port}" configuration-ref="hadoopConfiguration">
            star=chasing
            return=captain eo
            train=last
        </hdp:hive-server>
    
        <hdp:hive-client-factory host="${hive.host}" port="${hive.port}"/>
    gradle dependencies (note this is based on the SHDP trunk so you'll get some extra dependencies in there like pig):
    https://gist.github.com/costin/7459c90dc8d589247a5e
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  9. #9
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    by the way, the properties passed to hive-server are just for testing - you can safely ignore them. It's only hadoopConfiguration which is relevant (and that is passed by default anyway). Note that I don't have any hive-site.xml in the classpath.
    In fact you can see the test (both the server and the client that runs against it) in the project test suite:
    https://github.com/SpringSource/spri...hive/basic.xml
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  10. #10

    Default

    Thanks. my environment is running on a configured cloudera hadoop cluster (centos). Hive does come up and the job ultimately fails, even though this same job that would succeed when running hive out of process. there is a CNFE that seems to be a red herring since the class is there. I built the classpath of my app by including /usr/lib/hadoop /usr/lib/hadoop/lib and /usr/lib/hive/lib. And i see it comes up so the jars are there including the CNF that i see. So in the stdout i see:
    Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.ExecDriver

    But in my logs i see hive running and eventually:

    2013-02-13 09:41:05,776 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1) Execution failed with exit status: 1
    2013-02-13 09:41:05,776 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1) Obtaining error information
    2013-02-13 09:41:05,776 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1)
    Task failed!
    Task ID:
    Stage-1

    Logs:

    and a few lines later:

    2013-02-13 09:41:05,777 ERROR [org.apache.hadoop.hive.ql.exec.MapRedTask] (pool-1-thread-1) Execution failed with exit status: 1
    2013-02-13 09:41:05,781 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive. ql.exec.MapRedTask
    2013-02-13 09:41:05,781 INFO [org.apache.hadoop.hive.ql.log.PerfLogger] (pool-1-thread-1) </PERFLOG method=Driver.execute start=1360748464589 end=1360748465781 duration=1192 >
    2013-02-13 09:41:05,781 INFO [org.apache.hadoop.hive.ql.log.PerfLogger] (pool-1-thread-1) <PERFLOG method=releaseLocks>
    2013-02-13 09:41:05,781 INFO [org.apache.hadoop.hive.ql.log.PerfLogger] (pool-1-thread-1) </PERFLOG method=releaseLocks start=1360748465781 end=1360748465781 duration=0>
    2013-02-13 09:41:05,787 DEBUG [org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker] (pool-1-thread-1) 13: Call -> null@isk-vsrv802.qualcomm.com/10.231.148.183:8020: delete {src: "/var/lib/hadoop-hdfs/tmp/hive_2013-02-13_09-41-02_866_2200460261977945997" recursive: true}

    So, at this point my solution is just to run hive as a standalone server...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •