Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Spring Batch - database input, database output

  1. #1
    Join Date
    Jan 2008
    Posts
    12

    Default Spring Batch - database input, database output

    Hi,


    I need to build a batch process to read records( in pages) from the database, feed each record to a record processing handler who will process this input and store some results to the database. I would like to use a multithreaded pool where each thread will process one record.
    My questions are:
    1) Can Spring batch deal with such situations?
    2) Is there any Spring thread pool available?
    3) Assuming Spring batch can provide this functionality are there any limitations I have to keep in mind?

    Many thanks for helping with this,

    Stefan

  2. #2
    Join Date
    Jun 2005
    Posts
    4,230

    Default

    You do have to be careful with restartability and synchronization of the input source. We recommend a "process indicator" pattern in the input data (or staging table as in the sample) - this is described in the reference guide (http://static.springframework.org/sp....html#d0e5 73). and also in Wayne Lun'ds talk at TSE, which should be available on the website See the parallelJob sample for an example. N.B. The best idiom for this kind of thing will change with the m4 release, when we start providing chunk-oriented processing.

    Also note that the thread pool model is not Spring Batch - we just use the TaskExecutor strategy from Spring Core (which see).

  3. #3
    Join Date
    Aug 2007
    Location
    Toronto
    Posts
    66

    Default

    does anybody know where to find a copy of Wayne's talk....I can't seem to locate it....thanks

  4. #4
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    I believe you need to have attended TSE to get the recorded presentations.

    All the 'process indicator approach' entails is creating a flag in the data that marks, definitively whether or not the record has been processed. It requires an extra column in your input, but doesn't require any extra information to be persisted about what has been processed, and is easy to restart by adding a simple where clause to your SQL statement (WHERE process_indicator != y)

  5. #5
    Join Date
    Aug 2007
    Location
    Toronto
    Posts
    66

    Default

    Yes....I am aware of that technique...used it many times in the past.....thanks for the response...when I looked up the description of the talk (by Wayne Lund) it looked like it had a lot of good information in general.

  6. #6
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    Send me a PM with your email address and I'll send you a copy of the presentation.

  7. #7
    Join Date
    Jan 2008
    Posts
    12

    Default

    Quote Originally Posted by Dave Syer View Post
    You do have to be careful with restartability and synchronization of the input source. We recommend a "process indicator" pattern in the input data (or staging table as in the sample) - this is described in the reference guide (http://static.springframework.org/sp....html#d0e5 73). and also in Wayne Lun'ds talk at TSE, which should be available on the website See the parallelJob sample for an example. N.B. The best idiom for this kind of thing will change with the m4 release, when we start providing chunk-oriented processing.

    Also note that the thread pool model is not Spring Batch - we just use the TaskExecutor strategy from Spring Core (which see).
    Hi Dave,

    Thanks for your invaluable input. Indeed, our envisioned architecture will dump raw data files in corresponding staging tables as the prerequisite for the spring batch processing. Our tables will have a column called ProcessedFlag (just a suggestion) which will be set accordingly based on the outcome of processing individual table rows. Now, from reading the parallel spring batch processing pattern my understanding is that I have to run multiple single threaded java processes where each process deals with pre-defined data range.
    This is not what I have in mind. My solution proposal is to use single multithreaded java process (can scale to many multithreaded processes) to process data from the stanging area, validate, transform, amalgamate it and finally store it in the system's application database.
    We will use the multithread pool provided by the Spring core which I assume it can be seamlessly integrated with the Spring batch.

    My best regards,

    Stefan

  8. #8
    Join Date
    Jun 2005
    Posts
    4,230

    Default

    That sounds like the parallelJob sample from Spring Batch. Did you look at that? I think we might provide more than just a sample at some point, but for now you can adapt the sample to your needs quite easily, by the sounds of it.

  9. #9
    Join Date
    Jan 2008
    Posts
    12

    Default

    Quote Originally Posted by Dave Syer View Post
    That sounds like the parallelJob sample from Spring Batch. Did you look at that? I think we might provide more than just a sample at some point, but for now you can adapt the sample to your needs quite easily, by the sounds of it.
    I cannot find this example in the "spring-batch-1.0.0.m3-with-dependencies.zip" which I downloaded.
    Should I look under http://springframework.svn.sourcefor.../spring-batch/ ?

    !!!! Also I am planning to use Spring Core V2.0.6 to integrate with Spring batch. Do you see any problem with this? This is very important to me (to use Spring core v2.0.6 ) as this is our supported enterprise Spring version.

    Thanks a lot for your help,

    Regards,

    Stefan
    Last edited by phanae; Jan 20th, 2008 at 10:08 AM.

  10. #10
    Join Date
    Jun 2005
    Posts
    4,230

    Default

    Sorry, I forgot, the parallelJob was added just after m3. You can get it from SVN or from the snapshot builds (backporting to m3 should be trivial up to this point in time).

    As far as 2.0.x goes, we haven't started testing yet, but we will, and I know there are projects using 2.0.x. With x=6 I think you should be OK, but we are only going to test against the latest release (currently x=8). If you need help just ask on the forum.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •