Jun 14th, 2010, 11:00 AM
Input file recursion within a job instance
Hi Folks, I am new to Spring Batch. I have a requirement to map and load files to a DB. My only question is that the driver will be a directory that can contain 0 to about 25 files per day. The files are large so I would like to process them FIFO and use chunking to maintain a small (smaller) memory footprint (if I understand chunking properly). I could use a script to execute Spring Batch for each file on a given day but stopping starting the JVM up to 25 times makes me want to look at options. Is Spring Batch designed to support processing recursion? Could I create a Tasklet to return the next file and then do some processing and then at the end have another Tasklet check if another file is present and loop back? If so, would this adversely affect a proper rollback/restart?
Jun 16th, 2010, 06:03 AM
A neat way to do that is to partition the directory and process each file as a separate step in the same PartitionStep. The MultiResourcePartitioner does this and there is a sample to follow in spring-batch-samples.
Jun 21st, 2010, 09:54 AM
Thanks for the reply Dave. Due to the business rules, on any given run we wouldn't know what to expect... but in reality, we could narrow the input down to the Year/Quarter but that still would involve 420 possible dataset names. Too many for each to have its own step but I guess we could rename the first 10...
In the end, we will probably just have a Perl script stage a new file for processing and handle the renaming, etc. Then we will just have the script executed via cron or AutoSys a number of times per night. We will have the script only launch Spring Batch if there is a file to process so the JVM won't be started if it isn't needed.
The "recursion" I was speaking of seems outside of the design philosophy of Spring Batch. The Spring Batch answer to my question seems to be "Jobs" as the solution to recursion within a "Job".
Jun 21st, 2010, 11:45 AM
420 steps is not such a bad thing, but if you don't like it all you have to do is re-write the partitioner to generate a list of resources per step, instead of a single one.
I'm not sure what you mean by "recursion" (what you need is a simple iterator?), so you'll have to explain. Then I can comment on the design principles.
Jul 1st, 2010, 10:46 AM
Dave, I am looking at the partitioner but I am not sure if it will meet all of our requirements. Here is what are requirements are at a high level:
1) 0 to 420 files may exist in a directory upon start of the process. They are SFTP'd from a legacy system and we have no control over when and how often files arrive.
2) There are no headers or trailers in the files and we have to rely upon the filename in order to perform proper mapping and loading to the correct table.
3) There are essentially 12 file formats.
4) We need to run the "process" every night one time to process any and all files that are present at a cutoff time (say 9pm).
5) On an annual average, there will be many more runs with no files than with files.
I am leaning towards a script staging the files and passing them one at a time into the Spring Batch process. I see an advantage in that seperate jobs would be created for each file... nice audit trail. I don't like the idea of starting and stopping the JVM more than once but it may not be a big deal since we won't start the JVM if there are no files; and if we relied on Spring Batch to do this, the JVM would have to start/stop every time it is scheduled to run -- a tradeoff?