Chaining Hadoop Jobs in Spring Batch
Hi,
This is a question about Hadoop job-chaining within Spring Batch.
I have successfully kicked off a Hadoop Job in Spring Batch but I cannot successfully kick off two hadoop tasklets as part of my SpringBatch job.
I would like to chain my two hadoop tasklets (i.e. map-reduce jobs) so that the output of the first tasklet is the input to the second tasklet.
When I launch the SpringBatch job, I get an error stating that "my/tempoutput" does not exist.
But, if I remove any references to the second tasklet, the first tasklet completes successfully and outputs my results to "my/tempoutput".
Am I missing something? Is there another way to chain Hadoop Jobs using SpringBatch?
Thanks for any help you can offer,
Rob.
Code:
/* */
<hdp:job id="myMRJob1"
input-path="my/input/"
output-path="my/tempoutput/"
input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
mapper="foo.MyMapper1"
reducer="foo.MyReducer1"/>
/* */
<hdp:job id="myMRJob2"
input-path="my/tempoutput/"
output-path="my/output/"
input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
mapper="foo.MyMapper2"
reducer="foo.MyReducer2"/>
/* */
<hdp:tasklet id="myTasklet1" job-ref="myMRJob1" wait-for-job="true" />
<hdp:tasklet id="myTasklet2" job-ref="myMRJob2" wait-for-job="true" />
/* */
<batch:job id="myBatchJob" job-repository="jobRepository">
<batch:step id="myStep1" next="myStep2" >
<batch:tasklet ref="myTasklet1"/>
</batch:step>
<batch:step id="myStep2" >
<batch:tasklet ref="myTasklet2"/>
</batch:step>
</batch:job>