May 15th, 2008, 04:21 AM
Inconsistency in persisted Spring Batch metadata?
I've come across something that might be an issue just by chance.
I have an AOP logger that prints out a log message every time you call saveOrUpdateExecutionContext on JobRepository. That way I can show that the batch job is actually moving forward an processing elements (i.e.: I show every commit that's supposed to happen).
Today we have been having some network problems, and suddenly our batch processes were deciding to stall during their execution. We discovered that this long waits were due to DB access: our jobs were waiting to complete commits. We have Spring Batch metadata being kept in one DBMS while the actual work is done against another DBMS. That's why commits were working fine on the metadata side and having trouble on the other. Since I had my logger wired into the jobs, I had a shiny message stating that 50 elements had been already processed. And, of course, Spring Batch metadata in the DB was displaying the same amount of processed items, which had supposedly been done on one single commit (commitCount was 1).
That's when I jumped into the ItemOrientedStep code and discovered that it first calls the saveOrUpdateExecutionContext() method and afterwards tries to commit the current transaction. So here's my question: should this order be the other way around?
Thanks for reading this long post and please forgive my stealing your time if this is irrelevant.
May 15th, 2008, 06:07 AM
ItemOrientedStep simply puts item processing and metadata persistance in the same transaction, fundamentally saying "do both or none", which makes perfect sense I suppose. My point is the issue you described is a matter of tx setup, not ItemOrientedStep's execution logic.
When working with multiple databases you need to use tx manager that supports two-phase commits (JTA) - maybe that's the catch?
May 15th, 2008, 08:42 AM
I agree with Robert. Spring Batch meta-data is intended to be in the same transaction as the 'business transaction', which makes sense because things like the ExecutionContext are tied to whether or not the business transaction was successfull. For example, if the FlatFileItemReader is storing the fact that it's on line 100 in the ExecutionContext, you would only want that persisted if the output from those lines were also persisted. As Robert said, if you have a real need to use two databases, you will need two-phase commit support.
May 16th, 2008, 02:06 AM
That makes sense. We're actually not using 2 DBMS but 3, although we only write on one of them on each step (plus the Spring Batch metadata). We'll have to look for a JTA transaction manager, then.
May 19th, 2008, 02:26 AM
We've been talking a bit here, and we think that having a unique transaction manager for both metadata and actual business data may not be a good solution: If you only have one transaction, what happens upon rolling it back? We think that the batch meta-data won't end up written precisely because you have rolled the transaction back.
May 19th, 2008, 03:16 AM
Having both the business and batch data updates in the same transaction means they should be in synch and tx manager is supposed to handle it. When transaction is rolled back neither business nor batch data will be written - I don't see anything wrong with that.