Results 1 to 7 of 7

Thread: How to fire a MapReduce Job that performs operations on hbase?

  1. #1
    Join Date
    Feb 2013
    Posts
    10

    Default HBase: Properties of a hdp:hbase-configurationn bean are not passed to a hdp:job?

    Hi, we had the problem, that the Job instance instantiated by Spring haven't had the hbase.zookeeper.quorum property set we specified in the applicationContext.xml. We solved it by putting the hbase config into the hdp:configuration element.

    Our initial applicationContext was like this:

    Code:
    ...
    	<hdp:configuration id="hadoopConfiguration">
    		fs.defaultFS=hdfs://namenode.example.com:8020
    	</hdp:configuration>
    
    	<hdp:hbase-configuration id="hbaseConfiguration" configuration-ref="hadoopConfiguration">
    		hbase.zookeeper.quorum=zookeeper1.example.com
    	</hdp:hbase-configuration>
    
    	<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
    		<property name="configuration" ref="hbaseConfiguration" />
    	</bean>
    
    	<hdp:job id="exampleJob"
    		input-path="hdfs://namenode.example.com:/examples/data/big.data.txt"
    		output-path=""
    		mapper="com.example.ExampleMapper"
    		reducer="com.example.ExampleReducer"/>
    ...
    Then we tried to execute a job like this:

    Code:
    	public static void main(final String[] args) throws Exception
    	{
    		final ApplicationContext ctx = new ClassPathXmlApplicationContext("/application-context.xml");
    
    		final Configuration conf = (Configuration) ctx.getBean("hbaseConfiguration");
    
    		final Job job = (Job) ctx.getBean("exampleJob");
    
    		job.setInputFormatClass(TextInputFormat.class);
    
    		job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    		job.setMapOutputValueClass(Put.class);
    
    		TableMapReduceUtil.initTableReducerJob("exampleTable", ExampleReducer.class, job);
    
    		job.waitForCompletion(true);
    	}

    This leads into the following exception.

    Code:
    13/02/18 18:37:05 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
    java.net.ConnectException: Connection refused
    	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
    	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
    	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047)
    13/02/18 18:37:05 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    We figured out it is because our Configuration instance retrieved with Configuration conf = (Configuration) ctx.getBean("hbaseConfiguration"); doesn't had the hbase.zookeeper.quorum=zookeeper1.example.com property set. That's the reason why /hbase/master occurs in the stacktrace.

    Workaround
    To tackle this, we just specified the property in the hdp:configuration element like this:

    Code:
    ...
    	<hdp:configuration id="hadoopConfiguration">
    		fs.defaultFS=hdfs://namenode.example.com:8020
    		hbase.zookeeper.quorum=zookeeper1.example.com
    	</hdp:configuration>
    
    	<hdp:hbase-configuration id="hbaseConfiguration" configuration-ref="hadoopConfiguration">
    	</hdp:hbase-configuration>
    
    	<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
    		<property name="configuration" ref="hbaseConfiguration" />
    	</bean>
    
    	<hdp:job id="exampleJob"
    		input-path="hdfs://namenode.example.com:/examples/data/big.data.txt"
    		output-path=""
    		mapper="com.example.ExampleMapper"
    		reducer="com.example.ExampleReducer"/>
    ...
    And all our problems are solved .

    Maybe someone can clarify why it is like this and whether we did it correct.

    Best Regards,
    Christian.
    Last edited by d0x; Feb 18th, 2013 at 03:07 PM.

  2. #2

    Default

    I am looking for suggestions on a similar problem. I want to migrate an existing Map-Reduce code where the reducer writes to HBase & uses TableMapReduceUtil. I am looking for how this job can be modeled using Spring Hadoop.

  3. #3
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Sounds like a bug - the locally defined nested properties, should take precedence over the other properties used. We're using the HBase API so it might be a side-effect of that.
    Btw, since RC2 you have a dedicated attribute on the hbase configuration to specify the ZooKeeper quorum and port - see [1].

    Can you confirm whether this properly works for you and report back?

    Thanks,

    [1] http://static.springsource.org/sprin...tml/hbase.html
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  4. #4

    Default

    This works for me.

    However, distributed cache is not working for me with HBase. I am trying to set-up the Job configuration for HBase (TableMapReduceUtil.initTableReducerJob) using the following bean.
    <bean id="setupConf4HBase" class="org.springframework.beans.factory.config.Me thodInvokingFactoryBean">
    <property name="targetClass"><value>dimension.setup.Initiali zeMRJob</value></property>
    <property name="targetMethod"><value>initReducerJob</value></property>
    <property name="arguments">
    <list>
    <value>${tbl}</value>
    <value>${dimension.calculator.reducerclass}</value>
    <ref local="dimension.calculator"/>
    </list>
    </property>
    </bean>

    When the reducer job runs, it fails since it could not locate any files in the distributed cache. I checked the job.xml, no files are set in the cache. When running any other non-HBase-job which does not require this HBase-setup, the distributed cache works properly.

  5. #5
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Have you double checked the "path.separator" again - if that's set properly at the System level and DC doesn't work, check the logs and see whether your jars are actually deployed and then retrieved by your reducer.

    P.S. See the util namespace to simplify your bean declaration (instead of using MethodInvokingFactoryBean directly)
    P.P.S. It probably makes sense to start a separate thread on the HBase/DC issue.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  6. #6

    Default

    Costin, Thanks for the suggestion to use util namespace; I have accommodated that now. I will start a separate thread for the DC issue.

  7. #7
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    I've checked our test suite and added more tests and everything looks okay - namely the hadoopConfiguration settings are read and are overridden by the local defined one.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •