Jan 8th, 2010, 12:39 PM
Reading from database at the start of a job
I have a Spring batch job that has two steps. Let's call them step A and step B. Step A reads records from the database that have a status of 'READY_FOR_PROCESSING'. After step A is done, it updates the status of the records to 'PRE_PROCESSED'. Step B starts and picks up the records that have a status of 'PRE_PROCESSED'. Both step A and step B work with the same records, I just need to update the status after each step.
READY_FOR_PROCESSING -> (Step A) -> PRE_PROCESSED -> (Step B) -> PROCESSED
When I execute the batch job, it executes step A but when step B starts, there are no records marked 'PRE_PROCESSED' in the database for it to pick up while it is running. The batch job finishes. All the values in the database are marked as PRE_PROCESSED. If I run the job again, it picks up the records marked PRE_PROCESSED and step B executes without a problem.
I have both steps defined in my job's XML configuration file.
When the job is initialized, how are the records for the steps read?
1) Read records from database for step A and perform step A
2) Read records from database for step B and perform step B
1) Read records from database for both steps A and B
2) Perform step A
3) Perform step B
If it works the second way, it makes sense that step B does not have any records to work with because when the job starts, no records are marked as PRE_PROCESSED yet.
Any ideas? It doesn't seem to be a commit problem. Maybe I am missing something in my job config XML file. Or maybe this job has to be broken up into two jobs because step B is dependent on step A updating the database first.
Jan 9th, 2010, 02:39 AM
All items from Step A should be committed before Step B is started. I think we need to know more about how and where you are updating the status of those records. There is a similar use case in the parallelJob in Spring Batch Samples, so you could look there to see something that works.
Jan 12th, 2010, 10:34 AM
Dave, thanks for the reply. I did some more testing and it appears that after Step A is finished, the status is successfully updated in the database (before Step B executes). It is not a commit problem. I think it has to do when the records are read for each step. Step B doesn't execute because whenever the records are read for this step, there are no PRE_PROCESSED records. It seems to me that this job is behaving the following way:
1) Read records from database for both steps A and B (done at startup when there are no PRE_PROCESSED records)
2) Perform step A (read READY_FOR_PROCESSING records, set to PRE_PROCESSED after step is finished)
3) Perform step B (read records that are marked PRE_PROCESSED)
If I change the SQL for step B to look for records marked READY_FOR_PROCESSING instead of PRE_PROCESSED, the batch job runs as desired with no issues. This problem must be caused by when the records are being read.
I will take a look at the parallelJob.
Last edited by nine-three; Jan 12th, 2010 at 01:55 PM.
Jan 13th, 2010, 05:33 AM
I have a similar requirement and it works
I have a similar requirement and I have it working just like you outlined your requirement - what I would suggest is for you to look into is the commit-interval (like Dave suggested) and also make sure pre-processing step's next attribute is set to processing step.
Hope this helps.