Hi,
This is a question about Hadoop job-chaining within Spring Batch.
I have successfully kicked off a Hadoop Job in Spring Batch but I cannot successfully kick off two hadoop tasklets as part of my SpringBatch job.
I would like to chain my two hadoop tasklets (i.e. map-reduce jobs) so that the output of the first tasklet is the input to the second tasklet.
When I launch the SpringBatch job, I get an error stating that "my/tempoutput" does not exist.
But, if I remove any references to the second tasklet, the first tasklet completes successfully and outputs my results to "my/tempoutput".
Am I missing something? Is there another way to chain Hadoop Jobs using SpringBatch?
Thanks for any help you can offer,
Rob.
Code:/* */ <hdp:job id="myMRJob1" input-path="my/input/" output-path="my/tempoutput/" input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat" output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat" mapper="foo.MyMapper1" reducer="foo.MyReducer1"/> /* */ <hdp:job id="myMRJob2" input-path="my/tempoutput/" output-path="my/output/" input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat" output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat" mapper="foo.MyMapper2" reducer="foo.MyReducer2"/> /* */ <hdp:tasklet id="myTasklet1" job-ref="myMRJob1" wait-for-job="true" /> <hdp:tasklet id="myTasklet2" job-ref="myMRJob2" wait-for-job="true" /> /* */ <batch:job id="myBatchJob" job-repository="jobRepository"> <batch:step id="myStep1" next="myStep2" > <batch:tasklet ref="myTasklet1"/> </batch:step> <batch:step id="myStep2" > <batch:tasklet ref="myTasklet2"/> </batch:step> </batch:job>


Reply With Quote
