Jul 27th, 2009, 01:18 PM
Chunks with AsyncProcessing
Our system uses async processing because we want to run 3-4 threads a time when running spring batch. (heck even with 3-4 threads it can take a week to run) However i want to use the skip-step items because i need to calculate how many steps that fail and this seems like the "best" and least "hacky" way of doing it.
So we are using the FaultTolerantStepFactory bean for our step and a SImpleAsyncStepExecutor (although i am thinking of switching to the pooled one)
HOWEVER ... we are getting an error from ChunkMonitor saying
"ItemStream was opened in a different thread. Restart data could be compromised."
I notice the javadoc says its not thread safe. Now we are only using 1 item at a time in our lists passed to the writers. And all the data seems to be getitng loaded correctly.
Do you think we will be ok? I noticed 2.0.3 does not have the chunkmonitor in it? is there a reason?
Jul 29th, 2009, 02:15 AM
If you are happy to forgo restartability it is trivial to make an ItemReader thread safe - just wrap the call to read() in a synchronized block. Then if your wrapper doesn't implement ItemStream you won't get that warning from the chunk monitor. Whether it helps in any given scenario is highly dependent on the details, and don't forget you lose restartability (but then you already did by using a multi-threaded step).
Best practice for restartable multi-threaded jobs involving file input is to load the file as quickly as possible into a relational database, and take it from there in parallel (as per parallel sample). If you job takes a week to finish this might be a classic example where the pre-load step is worth the extra work.
Chunk monitor is still there. It is really an internal implementation detail though, so don't rely on it always being there.
Jul 29th, 2009, 08:49 AM
Jul 30th, 2009, 03:22 AM
If all you did was synchronize the read() in an existing reader and plug in a task executor into a Step I doubt if you are really restartable (unless the reader was already thread safe). Most existing readers implement ItemStream by counting the number of items (or similar) so if you are reading in multiple threads then there it is ulikely that a single counter will work because a failed chunk coould come before a successful one in the input.