Nov 20th, 2009, 10:07 AM
selection dependent multi file reader
I'm new to Spring Batch and haven't been able to figure out how to handle this situation.
A database SELECT returns a list of those of our merchants that have updated their catalog files.
For each merchant id there is a similarly named directory that contains one or more flat CSV files that need to be processed.
At first sight this seems to require to have one job that reads the selected merchant ids then for each one of them spawn a new job to handle the files using something like a MultiResourceItemReader. Is this a valid way to handle the problem?
I haven't yet figured out how to correctly spawn a secondary job but maybe there is a simpler way to do this?
Any advice is welcomed
Nov 23rd, 2009, 11:31 AM
You could put your merchant SELECT in a StepExecutionSplitter. There's a sample that is close (partitionFileJob).
Nov 26th, 2009, 03:32 AM
Thanks for your help.
From the documentation and sample code I understood that StepExecutionSplitter is used to partition a data set into chunks to execute them remotely or in parallel mainly for optimization purposes.
In my case I have two distinct data sets, one with the list of merchants, and a second one, derived from it, a list of catalog file names to process for each merchant.
Unless I'm missing something obvious I don't see how I could use the splitter logic.
Nov 26th, 2009, 06:19 AM
Seems like your merchant id would be an input to the step that processed the related catalogs. Did I miss something?
Nov 27th, 2009, 03:28 AM
That's exactly it.
First I determine which merchant catalogs need to be processed, then each selected merchant id should be repeatedly passed to the second step to process the catalog files themselves. For the second step I can use a MultiResourceItemReader [derived tasklet] using a wildcard file specification based on the merchant's id.
My problem is how to trigger the repeated executions of the second step from the first one for each merchant id.
I'm looking into two alternatives:
The first one is a repeat policy derived from a CompletionPolicy as described in this article http://angelborroy.wordpress.com/200...eat-a-tasklet/. In my case the RepeatPolicy object will iterate on the selected merchants returning the merchant ids as the current resource until all selected merchants have been processed.
The second one is developing a tasklet similar to the MultiResourceItemReader but working on the merchants selection to generate the catalog files list(resources) on the fly and process them. Less "elegant" because the two operations are joined in a single step.
Are this valid ways to do this or am I still missing something?
Thanks in advance for your help
Nov 27th, 2009, 09:27 AM
I finally found an [hopefully] elegant and modular way to do it by using a SystemCommandTasklet to launch a separate job process on each selected merchant.
Thanks again for your help
Nov 27th, 2009, 10:02 AM
There is JobStep in version 2.1.0-M3 (not released yet).
Nov 28th, 2009, 05:12 AM
Thanks a lot, that's very good news indeed as it exactly addresses my issue.
I'll stick to a simple job spawn system but waiting eagerly for the 2.1.0 version to be released.
Jan 13th, 2010, 02:11 PM
Similar Problem has to be solved using Spring batch.
Database Select (retuns list of ids)
For list of ids in step1, step 2 and step3 has to be executed.
Set of database selects
Id value from the step1 has to be passed to select the statement.
rows from the step2 db select has to written to the file.
file should be created with id value and current timestamp.
Any suggestions to resolve this issue.