Feb 19th, 2009, 10:48 AM
I have a question about achieving parallel processing: What is the recommended methodology to do so?
I see there is the repeat template but based on the documentation, this appear to be only nestable within a step. However, another post suggested it might be possible to use it to launch multiple jobs.
I would like to process a large amount of data (flat files) and my idea was that I could break the data up into several input files but I would like them all to be executed under the same job. I have a wrapper service that needs to take cumulative statistics based on the data, regardless of which input file it comes from.
Steps require input writers and readers but I don't see an option to execute steps in parallel. Can I wrap a single step in a repeat template and change the file name with each iteration - obviously limiting the iterations to a specific number?
Feb 19th, 2009, 11:53 AM
Feb 19th, 2009, 02:17 PM
The user guide section on partitioning still isn't done, but the APIs are there (PartitionHandler and StepExecutionSplitter), including one implementation of each. The javadocs might get you started, until we can get the user guide done (RC1 will probably be out tomorrow, but I'm assuming for now that the partitioning chapter wil be missing or incomplete.)