Results 1 to 7 of 7

Thread: Chaining Hadoop Jobs in Spring Batch

  1. #1
    Join Date
    Apr 2012
    Posts
    5

    Default Chaining Hadoop Jobs in Spring Batch

    Hi,

    This is a question about Hadoop job-chaining within Spring Batch.

    I have successfully kicked off a Hadoop Job in Spring Batch but I cannot successfully kick off two hadoop tasklets as part of my SpringBatch job.

    I would like to chain my two hadoop tasklets (i.e. map-reduce jobs) so that the output of the first tasklet is the input to the second tasklet.

    When I launch the SpringBatch job, I get an error stating that "my/tempoutput" does not exist.

    But, if I remove any references to the second tasklet, the first tasklet completes successfully and outputs my results to "my/tempoutput".

    Am I missing something? Is there another way to chain Hadoop Jobs using SpringBatch?
    Thanks for any help you can offer,

    Rob.

    Code:
    /* */
    <hdp:job	id="myMRJob1"
    		input-path="my/input/"
    		output-path="my/tempoutput/"
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
    		mapper="foo.MyMapper1"
    		reducer="foo.MyReducer1"/>
    /* */
    <hdp:job	id="myMRJob2"
    		input-path="my/tempoutput/"
    		output-path="my/output/"
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
    		mapper="foo.MyMapper2"
    		reducer="foo.MyReducer2"/>
    
    /* */
    <hdp:tasklet id="myTasklet1" job-ref="myMRJob1" wait-for-job="true" />
    <hdp:tasklet id="myTasklet2" job-ref="myMRJob2" wait-for-job="true" />
    
    /* */
    <batch:job id="myBatchJob" job-repository="jobRepository">
    	<batch:step id="myStep1" next="myStep2" >
    		<batch:tasklet ref="myTasklet1"/>
    	</batch:step>
    
    	<batch:step id="myStep2" >
    		<batch:tasklet ref="myTasklet2"/>
    	</batch:step>
    </batch:job>

  2. #2
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Set validate-paths attribute (default is true) for the second job to false. Basically each jobs verifies by default whether the input folder exists when it starts up - in this case folder is not present at startup so the test fails.
    We might actually change the default to false to prevent this issue.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  3. #3
    Join Date
    Apr 2012
    Posts
    5

    Default

    Thanks Costin. I'll give that a try this morning.

    Rob.

  4. #4
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Let us know how it's working.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  5. #5
    Join Date
    Apr 2012
    Posts
    5

    Default

    Yes, that worked a treat. Thanks again, Costin!

    Rob.

  6. #6
    Join Date
    Nov 2012
    Posts
    1

    Default

    Hi,

    I try to use this solution but when I use validate-paths="false", I have this error at launching:

    Code:
    ERROR [org.springframework.batch.core.launch.support.CommandLineJobRunner] - <Job Terminated in error: Line 72 in XML document from class path resource [META-INF/application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'validate-paths' is not allowed to appear in element 'hdp:job'.>
    org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line 72 in XML document from class path resource [META-INF/application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'validate-paths' is not allowed to appear in element 'hdp:job'.
    	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.doLoadBeanDefinitions(XmlBeanDefinitionReader.java:396)
    	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.loadBeanDefinitions(XmlBeanDefinitionReader.java:334)
    	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.loadBeanDefinitions(XmlBeanDefinitionReader.java:302)
    	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:143)
    	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:178)
    	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:149)
    	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:212)
    	at org.springframework.context.support.AbstractXmlApplicationContext.loadBeanDefinitions(AbstractXmlApplicationContext.java:126)
    	at org.springframework.context.support.AbstractXmlApplicationContext.loadBeanDefinitions(AbstractXmlApplicationContext.java:92)
    	at org.springframework.context.support.AbstractRefreshableApplicationContext.refreshBeanFactory(AbstractRefreshableApplicationContext.java:130)
    	at org.springframework.context.support.AbstractApplicationContext.obtainFreshBeanFactory(AbstractApplicationContext.java:467)
    	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:397)
    	at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:139)
    	at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:83)
    	at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:282)
    	at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:574)
    	at com.sadiel.e3mel.Inicio.main(Inicio.java:32)
    Caused by: org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'validate-paths' is not allowed to appear in element 'hdp:job'.
    	at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    	at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source)
    	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    	at org.apache.xerces.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(Unknown Source)
    	at org.apache.xerces.impl.xs.XMLSchemaValidator.reportSchemaError(Unknown Source)
    	at org.apache.xerces.impl.xs.XMLSchemaValidator.processAttributes(Unknown Source)
    	at org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement(Unknown Source)
    	at org.apache.xerces.impl.xs.XMLSchemaValidator.emptyElement(Unknown Source)
    	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    	at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    	at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    	at org.springframework.beans.factory.xml.DefaultDocumentLoader.loadDocument(DefaultDocumentLoader.java:75)
    	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.doLoadBeanDefinitions(XmlBeanDefinitionReader.java:388)
    	... 16 more
    The attribute 'validate-paths' is autocompleted and mi namespace is like this:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
    	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:batch="http://www.springframework.org/schema/batch"
    	xmlns:context="http://www.springframework.org/schema/context"
    	xmlns:jdbc="http://www.springframework.org/schema/jdbc" xmlns:hdp="http://www.springframework.org/schema/hadoop"
    	xmlns:task="http://www.springframework.org/schema/task" xmlns:p="http://www.springframework.org/schema/p"
    	xsi:schemaLocation="
    		http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
    		http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-3.0.xsd
    		http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
    		http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
    		http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
    		http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd"
    	default-lazy-init="false">
    Thanks
    Ramon

  7. #7
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Thanks because we have removed the validate-paths option all together in the latest release. Try using the latest version of the Hadoop namespace.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •