Sep 8th, 2010, 07:01 AM
We are creating an application which reads data from a set of DB2 tables, formats and validates the data and then writes it to a CSV file. We used springbatch to achieve this as this seemed to be a viable option. Below are the application specific details and requirements –
1. The job has around 50 steps and it uses an HSQL CACHE database as the job repository.
2. We tend to fire off one job per message as the various parameters passed to each step in the job are account specific and we want to create one output csv file per message.
3. Hence multiple job instances are created and started in parallel. We currently have the number of threads limited to 50.
The current max heap size specified is 1G. Upon executing around 500 jobs, the process goes out of memory. I have tried reducing the job commit interval to 1 and it doesn’t have any impact on the process.
A memory histogram on the process shows that a large number of ConcurrentHashMap objects are being created.
num #instances #bytes class name
1: 2143688 68598016 java.util.concurrent.ConcurrentHashMap$Segment
2: 2143948 51454752 java.util.concurrent.locks.ReentrantLock$NonfairSy nc
3: 409778 36625864 [C
4: 2143808 34635960 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
5: 54225 20745928 [I
6: 258069 20658560 [Ljava.util.HashMap$Entry;
7: 636912 15285888 java.util.HashMap$Entry
8: 201778 10957048 [Ljava.lang.Object;
9: 133984 10718496 [Ljava.util.concurrent.ConcurrentHashMap$Segment;
10: 413230 9917520 java.lang.String
Is there any obvious reason that might lead to the out of memory error in springbatch? If so, is there any mechanism to prevent this from happening. Does this mean that springbatch is not suitable for submitting multiple jobs in parallel?
Sep 9th, 2010, 09:40 AM
I tried implementing a custom MapJobRepository with scope=prototype. So, the repository should not be held in memory after the job finishes. I also tried creating a job with dummy step which reads nothing and writes nothing to the file and still get the same memory issue upon running these multiple threads. Is there any way to resolve this?
Sep 10th, 2010, 04:51 AM
I thought you said you were using HSQL for the JobRepository?
Anyway, I'm not aware of any leak issues in Spring Batch in "normal" usage. We can probably only make progress here if you get some more information about those map instances - you need to find out at least who holds the references (not just how many there are). A decent profiler can do that for you.
Dec 6th, 2010, 04:49 PM
I ran into a similar issue when we started using StepScope since the 2.0.4 version of PlaceHolderTargetSource.getTarget() was creating a new DefaultListableBean factory and then cloning over the bean factory's configuration.
DefaultListableBeanFactory beanFactory = new DefaultListableBeanFactory(listableBeanFactory);
My heap dump indicates that the creation of new bean factories while cloning over the same beanExpressionResolver instance in the top level DefaultListableBeanFactory (the one held by the application context) across all of them allows the StandardBeanExpressionResolver's evaluationCache to grow endlessly with each step as the evaluationCache will keep adding new entries for each new BeanFactory.
I worked around this by just resetting the beanExpressionResolver on the top level DefaultListableBeanFactory with a new instance when starting a new job so the stored evaluationCaches can get GC'd which took care of the memory issue and hasn't so far caused problems for our functionality. I would like to look into whether there might be a better way to solve this though.
Tags for this Thread