Results 1 to 3 of 3

Thread: Spring Batch Meta Data Table Insert- Slowing down the Batch Job

  1. #1
    Join Date
    Nov 2012
    Posts
    1

    Default Spring Batch Meta Data Table Insert- Slowing down the Batch Job

    We have jobs that might process up to 20,000 files. We are using a MultiResourcePartitioner to set things up. The job does run, but we have noticed a bottleneck.

    SpringBatch is creating entries in the BATCH_STEP_EXECUTION table for each file found, and will not process any files until it has created a table entry for every file. The loading of this table seems to take a very long time.

    In local testing, trying to process just 1,000 files, it is taking 38-40 minutes to add the rows to 'BATCH_STEP_EXECUTION'. Once the table is loaded, the files are processed quite rapidly (usually under 1 minute).

    It seems like Bottleneck is loading steps into BATCH_STEP_EXECUTION table, where after every insert it commits to that table , so for 20k files - it's slowing down the whole job due to this.
    TransactionManager with Isolation level - Default or "ISOLATION_READ_COMMITTED" DOES NOT make any difference ..

    How to change commit interval of Spring Metadata tables or how to improve performance in this case? Please suggest

    Here is how the database is set up (we really subclass the 'OracleDataSource' (we are using 'ojdbc6.jar' file to get to the class) and the db_file is a properties file to get to the url, password, etc.):
    <bean id="dataSource" class="oracle.jdbc.pool.OracleDataSource" destroy-method="close">
    <constructor-arg value="db_file" />
    <property name="connectionCachingEnabled" value="true" />
    <property name="connectionCacheProperties">
    <props merge="default">
    <prop key="InitialLimit">10</prop>
    <prop key="MinLimit">25</prop>
    <prop key="MaxLimit">50</prop>
    <prop key="InactivityTimeout">1800</prop>
    <prop key="AbandonedConnectionTimeout">900</prop>
    <prop key="MaxStatementsLimit">20</prop>
    <prop key="PropertyCheckInterval">20</prop>
    </props>
    </property>
    </bean>

    Here is the rest of the JobRepository definition:
    <bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSou rceTransactionManager">
    <property name="dataSource" ref="dataSource" />
    </bean>


    <bean id="jobRepository" class="org.springframework.batch.core.repository.s upport.JobRepositoryFactoryBean" >
    <property name="databaseType" value="oracle" />
    <property name="dataSource" ref="dataSource" />
    <property name="transactionManager" ref="transactionManager" />
    <property name="isolationLevelForCreate" value="ISOLATION_DEFAULT"/>
    </bean>

    <bean id="jobExplorer" class="org.springframework.batch.core.explore.supp ort.JobExplorerFactoryBean">
    <property name="dataSource" ref="dataSource" />
    </bean>

    <bean id="jobLauncher" class="org.springframework.batch.core.launch.suppo rt.SimpleJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
    </bean>

    <bean id="jobParametersIncrementer" class="org.springframework.batch.core.launch.suppo rt.RunIdIncrementer" />

    Anyone have any ideas?

    spring-batch
    Last edited by vrundab; Nov 19th, 2012 at 08:21 AM.

  2. #2
    Join Date
    Sep 2008
    Location
    Chicagoland, IL
    Posts
    366

    Default

    You are right that the creation of the StepExecutionContexts for each partition. It looks like we could use some batching around the storage of StepExecutionContexts when partitioning. I've created a JIRA issue to track this: https://jira.springsource.org/browse/BATCH-1908.
    Michael Minella
    Spring Batch Lead
    Author - Pro Spring Batch
    http://www.michaelminella.com
    Twitter: @MichaelMinella

  3. #3
    Join Date
    Jan 2013
    Posts
    1

    Default Any work-around?

    Quote Originally Posted by mminella View Post
    You are right that the creation of the StepExecutionContexts for each partition. It looks like we could use some batching around the storage of StepExecutionContexts when partitioning. I've created a JIRA issue to track this: https://jira.springsource.org/browse/BATCH-1908.
    Hello, I'm so glad I found the root cause here. Are there any work-around to this issue? May I know the acceptable #no of partitions that Spring batch can help without performance issue? Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •