Jan 11th, 2008, 06:11 PM
Spring Batch - database input, database output
I need to build a batch process to read records( in pages) from the database, feed each record to a record processing handler who will process this input and store some results to the database. I would like to use a multithreaded pool where each thread will process one record.
My questions are:
1) Can Spring batch deal with such situations?
2) Is there any Spring thread pool available?
3) Assuming Spring batch can provide this functionality are there any limitations I have to keep in mind?
Many thanks for helping with this,
Jan 13th, 2008, 06:29 AM
You do have to be careful with restartability and synchronization of the input source. We recommend a "process indicator" pattern in the input data (or staging table as in the sample) - this is described in the reference guide (http://static.springframework.org/sp....html#d0e5 73). and also in Wayne Lun'ds talk at TSE, which should be available on the website See the parallelJob sample for an example. N.B. The best idiom for this kind of thing will change with the m4 release, when we start providing chunk-oriented processing.
Also note that the thread pool model is not Spring Batch - we just use the TaskExecutor strategy from Spring Core (which see).
Jan 14th, 2008, 11:29 AM
does anybody know where to find a copy of Wayne's talk....I can't seem to locate it....thanks
Jan 14th, 2008, 02:01 PM
I believe you need to have attended TSE to get the recorded presentations.
All the 'process indicator approach' entails is creating a flag in the data that marks, definitively whether or not the record has been processed. It requires an extra column in your input, but doesn't require any extra information to be persisted about what has been processed, and is easy to restart by adding a simple where clause to your SQL statement (WHERE process_indicator != y)
Jan 14th, 2008, 02:13 PM
Yes....I am aware of that technique...used it many times in the past.....thanks for the response...when I looked up the description of the talk (by Wayne Lund) it looked like it had a lot of good information in general.
Jan 14th, 2008, 03:49 PM
Send me a PM with your email address and I'll send you a copy of the presentation.
Jan 19th, 2008, 11:39 AM
Originally Posted by Dave Syer
Thanks for your invaluable input. Indeed, our envisioned architecture will dump raw data files in corresponding staging tables as the prerequisite for the spring batch processing. Our tables will have a column called ProcessedFlag (just a suggestion) which will be set accordingly based on the outcome of processing individual table rows. Now, from reading the parallel spring batch processing pattern my understanding is that I have to run multiple single threaded java processes where each process deals with pre-defined data range.
This is not what I have in mind. My solution proposal is to use single multithreaded java process (can scale to many multithreaded processes) to process data from the stanging area, validate, transform, amalgamate it and finally store it in the system's application database.
We will use the multithread pool provided by the Spring core which I assume it can be seamlessly integrated with the Spring batch.
My best regards,
Jan 20th, 2008, 03:01 AM
That sounds like the parallelJob sample from Spring Batch. Did you look at that? I think we might provide more than just a sample at some point, but for now you can adapt the sample to your needs quite easily, by the sounds of it.
Jan 20th, 2008, 09:43 AM
I cannot find this example in the "spring-batch-1.0.0.m3-with-dependencies.zip" which I downloaded.
Originally Posted by Dave Syer
Should I look under http://springframework.svn.sourcefor.../spring-batch/ ?
!!!! Also I am planning to use Spring Core V2.0.6 to integrate with Spring batch. Do you see any problem with this? This is very important to me (to use Spring core v2.0.6 ) as this is our supported enterprise Spring version.
Thanks a lot for your help,
Last edited by phanae; Jan 20th, 2008 at 10:08 AM.
Jan 21st, 2008, 06:47 AM
Sorry, I forgot, the parallelJob was added just after m3. You can get it from SVN or from the snapshot builds (backporting to m3 should be trivial up to this point in time).
As far as 2.0.x goes, we haven't started testing yet, but we will, and I know there are projects using 2.0.x. With x=6 I think you should be OK, but we are only going to test against the latest release (currently x=8). If you need help just ask on the forum.