Occasional deadlocks on MapJobRegistry.getJob(...)
I am using Spring Batch 2.1.0 for a large number of data feeds that run as often as every 15 seconds. Things will run fine for a few weeks or so, but then a deadlock issue occurs (randomly of course) that tends to wreck my day. The deadlock occurs within a synchronized block in MapJobRegistry.getJob() line 78.
I have a bounded thread pool (from Quartz) and eventually all the threads lock up on this method, and so the system grinds to a halt and has to be restarted. I confirmed this by dumping all the thread stacktraces using VisualVM. I can also see the blocked threads in the VisualVM GUI.
It looks like there is no easy way to make this call unsynchronized. In my case, I'd much rather deal with the consequences of a race condition than have the whole system lock up on the getJob() call.
A sample stacktrace from one of the threads is below. There are many such threads and all are BLOCKED on this single method at line 78 in MapJobRegistry.
"job_fetch_data" prio=6 tid=0x00a08000 nid=0x13a4 waiting for monitor entry [0x2947f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.batch.core.configuration.suppo rt.MapJobRegistry.getJob(MapJobRegistry.java:78)
- waiting to lock <0x0ed65f78> (a java.util.HashMap)
Perhaps it is not a deadlock, but some thread is not releasing the lock for another reason. The end result is my entire system is hosed because of a single ill-behaved thread. I am still investigating and will report my findings.
-Trey