PDA

View Full Version : Iterate/Loop over job?



anthonymak
Jul 31st, 2012, 02:11 AM
hi all,

I have a graph algorithm that I need to run a mapreduce job iteratively. Each time, setting the output path to the input path of the next iteration. Currently, I can run iteratively using org.apache.hadoop.mapreduce.Job, and org.apache.hadoop.util.ToolRunner and running from commandline.

How can I do this with Spring Hadoop?
I found I can chain different mapreduce jobs using Spring Batch. But I don't know how to iterate through the same mapreduce job using Spring Hadoop/Batch? (each time setting the output path to the input path of the next iteration)

Any suggestions and references will be greatly appreciated.

Kind Regards,
Anthony Mak

Costin Leau
Aug 2nd, 2012, 04:16 AM
Loops are supported by Spring Batch - see the user guide for more info [1].
As for the dynamic nature, you can use SpEL (Spring Expression Language) or a dedicated FactoryBean for programmatic generation - see the late-binding section.
In short, rather then passing a 'hard-coded' value to each step you can either pass a SpEL expression that gets evaluated each time (the easiest route) or a reference to a factory that returns the proper value.
I recommended the SpEL route, since it provides first class support for invoking method on beans or arbitrary classes without forcing any interface to be implemented or class inheritance.

[1] http://static.springsource.org/spring-batch/reference/html/configureStep.html