Oct 28th, 2010, 03:10 AM
Job hangs occassionally - need help to pinpoint error
We are experiencing this problem ( Stalled at a step indefinitely) very rarely in production and also not able to reproduce this in DEV/QA environments. When resubmitted, it would run fine. The Batch Job is configured to use Oracle. My explanation (guess) was that it is stalling because of resource (connections) issue.
It stops at various steps of the batch job. This morning, it stopped at a Tasklet where a maximum of 3 transactions is used by the framework - The 'Version' column of STEP_EXECUTION is at 1 of 3 with read and write counts at 0.
I am attaching the screenshot of the table.
Could anyone advice for which package/class should I turn on the log so that we could identify the problem? (Note: I do not want to turn on logs for all classes in the PROD)
Oct 30th, 2010, 03:30 AM
I would monitor the oracle instance with the Enterprise Manager. Check also that the Oracle version matches (and the jdbc driver as well) between your prod and QA)
Dec 4th, 2010, 08:20 AM
Thanks. Somehow I did not get a notification when the response was posted. This morning is yet another time, the job did not respond.
I am listing the changes I made since the initial post:
I updated the Apache DBCP to the current version. Set the max idle connections equal to the initial size 30. Evictable time was also set.
Since it is happening still, Seems to be not a problem with pooling as well.
Based on the response, I would monitor the Oracle Instance (Needs to be real time?)