Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Thread Safety While Running Multiple Instances Of The Same Job

  1. #1
    Join Date
    Aug 2008
    Posts
    19

    Lightbulb Thread Safety While Running Multiple Instances Of The Same Job

    I have a simple job which reads from one database and write in another.
    I have reader as JDBCCurserItemReader which reads only records which are not processed i.e. processed = 0 and Writer has Insert statement.
    I have chunk size of 100. I mark the record in source table as "processed=1"
    in writer's beforeWrite() method. This processed flag updation in during writing and within transaction so if anything fails it will be rolled back.

    I tested this job by running only one instance. It works fine without any problem.

    Now my question is I want to run multiple instances of the same job by changing version parameter at the same time. So if I start say 3 instances of the same job then they will be working on same set of data.
    Is there a possibility that different jobs will pickup same chunk? If 2 jobs pickup same record they will add it to the destination table duplicate records as there are no constraint on destination table.

    Correct me if I am thinking on wrong path.
    Let's say first job pickup first 100 records and processing them. It processed and marked 80 records. However commit interval is 100 so it is not yet commited in database. If second instance of the same job kicks in will it pickup the first 100 records as a chunk?

    Please help me because we are planning to run multiple instances of the same job in production for faster and better performance. Is it advisible to run multiple instances of the same job for faster performance? Or I should do multithreading within the job itself.

  2. #2
    Join Date
    Jul 2008
    Posts
    27

    Default

    That's a very interesting question, and I am in a similar situation and would like to add the following specific question:

    Actually I even want to use the StaxEventItemReader in several Jobs that have to run at the same time, and according to the Javadoc it's not thread-safe. What exactly could happen in such a scenario if I registered all jobs using something like a TaskExecutorLauncher (from the adhoc job sample)?

  3. #3
    Join Date
    Jun 2005
    Posts
    4,230

    Default

    Kmisaal: you are doing the right things with your process indicator flag. But it really doesn't make sense to run more than one job concurrently using the same indicator. I guess you need a separate indicator column for each job?

    dmarzi: your question is only partly related. You just need to prevent two jobs from using the same *instance* of the writer (as well as making sure that the instances are writing to different files of course). This is easy to do - just create two bean instances. Or put them in different application contexts to be on the safe side (that's how all the samples work).

  4. #4
    Join Date
    Jul 2008
    Posts
    27

    Default

    That's simple indeed. Thanks.

  5. #5
    Join Date
    Jul 2008
    Posts
    27

    Default

    Oh, and it seems like the original post was expanded significantly after I made mine. Or I wasn't very concentrated when I read it.

  6. #6
    Join Date
    Aug 2008
    Posts
    19

    Post

    Hi Dave.
    Creating and maintaining a processed indicator per job instance seems to be little overhead and may kill the performance.

    Can you suggest some configuration best practices or multi-threading best practices within the job? Is there any thumb rule while deciding the chunk size for faster performance? Do we have any control over JDBC Batch Update?

    Will it improve performance If I merge the functionality of reader and writer into a single tasklet step by merging the select and insert queries (i.e. something like insert into XYZ values select * from PQR). This will eliminate the row mapper.

    Please put your view on how to use spring batch framework efficiently.
    Thanks for your prompt reply.

  7. #7
    Join Date
    Jun 2005
    Posts
    4,230

    Default

    Can you suggest some configuration best practices or multi-threading best practices within the job? Is there any thumb rule while deciding the chunk size for faster performance? Do we have any control over JDBC Batch Update?
    The optimal chunk size for throughput varies according to the data and the database configuration. Usually it will be in the region of 100 or so. This, however, has nothing to do with threading. Batch updates can be done using the BatchSqlUpdateItemWriter.

    Will it improve performance If I merge the functionality of reader and writer into a single tasklet step by merging the select and insert queries (i.e. something like insert into XYZ values select * from PQR). This will eliminate the row mapper.
    Probably. If you can do the whole step in a single SQL statement that is likely to be much more efficient.

  8. #8
    Join Date
    Aug 2008
    Posts
    19

    Default

    Thanks Dave.

    I can combine everything in one single sql and execute the sql in tasklet.
    My question is if I execute the whole query in tasklet will I get the liberty of chuck processing. I think it will process everything in one go and we will lose the advantage of chunk?

    Secondly I can execute the combined query in ItemWriter. However ItemWriter requires ItemReader. In this case I can have blank reader which do nothing.

    Please suggest which is the best way and how we can get advantage of chunk processing if we merge the query.

  9. #9
    Join Date
    Jun 2005
    Posts
    4,230

    Default

    I'm not sure there is any advantage in chunk processing if you can do your update in a single query. A blank ItemReader would behave identically to a Tasklet, so please use the latter.

  10. #10
    Join Date
    Aug 2008
    Posts
    19

    Post

    Hey Dave

    I tried merging everything in one query and executed it in writer.
    It works however there is no chunk processing. There is one huge query getting executed for 2 hrs.

    In this case if say something goes wrong in processing last record then it will rollback all the successfully processed record. This is a big loss if we don't do chunk processing.

    Secondly I tried analyzing the bad performance area in the code and identified that updating processed flag for every record in afterWrite() method is killing us. Is there a way I can issue only one update statement for all the records in the chunk once the chunk is processed successfully. i.e If chunk size is 100. I want to issue only one update after the chunk which will mark all 100 records as processed rather than 100 update statements.

    Please suggest.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •