We've set up a Spring Batch batch email-sender job that does the following:
- Read incoming email details from a JMS Queue.
- Don't do any processing, just pass them through...
- Turn the details into emails and "write" them to SMTP
- Repeat ad infinitum
This generally works fine, but we've discovered a massive memory leak. After 9 hours or so, can see a single FlowJob, but 748 JobExecution (all 'STARTED'), each holding 778 (!) StepExecution instances. All in all, 900 MB of stuff.
Here's the Spring config (Spring 3.1, Spring Batch 2.1.9):
Our InfiniteThrottledExecutionDecider basically returns new FlowExecutionStatus("CONTINUE") every time, to ensure that the fetchEmailsStep gets executed at the end of the flow, and that the Job never completes - at least not until we're ready to stop it ourselves.Code:<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" /> <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> <property name="taskExecutor"> <bean class="org.springframework.core.task.SimpleAsyncTaskExecutor"/> </property> </bean> <bean id="jmsEmailFetcher" class="org.springframework.batch.item.jms.JmsItemReader"> <property name="jmsTemplate" ref="batchEmailJmsTemplate" /> </bean> <bean id="passthroughProcessor" class="org.springframework.batch.item.support.PassThroughItemProcessor" /> <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" /> <!-- The Spring Batch *Limiter/Decider* --> <bean id="ourLimitDecider" class="our.special.InfiniteThrottledExecutionDecider" /> <!-- The Spring Batch *Job* --> <batch:job id="fetchEmailsJobBatch" restartable="true"> <batch:step id="fetchEmailsStep" next="limitDecision"> <batch:tasklet throttle-limit="10"> <batch:chunk reader="jmsEmailFetcher" processor="passthroughProcessor" writer="batchEmailService.javaMailItemWriter" commit-interval="100" skip-limit="999999999"> <!-- Might gets *lots* of MQ-not-up-yet fails --> <batch:skippable-exception-classes> <batch:include class="org.springframework.jms.JmsException" /> <!-- Generally MQ-not-up-yet ConnectionException, or Session-closed (in tests) --> <batch:include class="java.lang.IllegalStateException" /> <!-- Yuk, usually/presumably a test SMTP server isn't yet connected --> <batch:include class="org.springframework.mail.MailSendException" /> <!-- SMTP down... --> </batch:skippable-exception-classes> </batch:chunk> </batch:tasklet> </batch:step> <batch:decision id="limitDecision" decider="ourLimitDecider"> <batch:next on="CONTINUE" to="fetchEmailsStep" /> <batch:end on="COMPLETED" /> </batch:decision> </batch:job>
We're not using a database, so we expect some stuff to be held in memory, but not a complete record of everything that's ever been run...
Is there something wrong with our configuration? Or our approach?
* * * * *
Here's a bit from our log file for what we're told is Step #778 and what should be the one and only Job instance.
Problem is, the heap dump shows 748 JobExecutions, and 581,911 StepExecutions. Where are they all coming from?Code:23:58:18,782 - INFO (org.springframework.batch.core.job.SimpleStepHandler) - Duplicate step [fetchEmailsStep] detected in execution of job=[fetchEmailsJobBatch]. If either step fails, both will be executed again on restart. 23:59:52,257 - INFO (org.springframework.batch.core.job.SimpleStepHandler) - Executing step: [fetchEmailsStep] 23:59:52,257 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Executing: id=778 23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Starting repeat context. 23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat operation about to start at count=1 23:59:52,259 - DEBUG (org.springframework.batch.core.scope.context.StepContextRepeatCallback) - Preparing chunk execution for StepContext: org.springframework.batch.core.scope.context.StepContext@1be1ee 23:59:52,259 - DEBUG (org.springframework.batch.core.scope.context.StepContextRepeatCallback) - Chunk execution starting: queue size=0 23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Starting repeat context. 23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat operation about to start at count=1 ... 5 second JMS timeout ... 23:59:57,716 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat is complete according to policy and result value. 23:59:57,716 - DEBUG (org.springframework.batch.core.step.item.ChunkOrientedTasklet) - Inputs not busy, ended: true 23:59:57,716 - DEBUG (org.springframework.batch.core.step.tasklet.TaskletStep) - Applying contribution: [StepContribution: read=0, written=0, filtered=0, readSkips=0, writeSkips=0, processSkips=0, exitStatus=EXECUTING] 23:59:57,719 - DEBUG (org.springframework.batch.core.step.tasklet.TaskletStep) - Saving step execution before commit: StepExecution: id=778, version=1, name=fetchEmailsStep, status=STARTED, exitStatus=EXECUTING, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0, exitDescription= 23:59:57,721 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat is complete according to policy and result value. 23:59:57,721 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Step execution success: id=778 23:59:57,722 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Step execution complete: StepExecution: id=778, version=3, name=fetchEmailsStep, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0 23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Completed state=fetchEmailsJobBatch.fetchEmailsStep with status=COMPLETED 23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Handling state=fetchEmailsJobBatch.limitDecision100 23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Completed state=fetchEmailsJobBatch.limitDecision100 with status=CONTINUE 23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Handling state=fetchEmailsJobBatch.fetchEmailsStep
* * * * *
We've tried upping the JMS timeout to a much higher number (30 minutes), and obviously that massively reduces churn in periods of inactivity, like the above scenario. But as activity increases, we're back where we started.


Reply With Quote
