Oct 29th, 2009, 12:14 PM
Framework performance after using it quite a while
Haven't found something similar so new post is here.
We have a problem when running Spring Batch jobs for a while. With initial state of database - all tables are empty - we take stop watch metrics and receive that Spring Batch does it's job in 7 seconds while whole jobs runs for about 2 minutes.
After that we launch ~30000 jobs and try to re-take the metrics from above. And now Spring Batch works for about 190 seconds and whole job is about 5 minutes.
Number of jobs is growing in about 2000 per week so performance downgrade become visible in 2 months of usage and in 5 months Spring Batch takes more time to perform own code than the job is run.
Any ideas what's the deal?
Our current plan is to override StepExecutionDao and ExecutionContextDao so they won't bother database with unnecessary saves while we don't use restartable jobs. How does this sound like?
Oct 29th, 2009, 01:21 PM
Wat makes you think it has anything to do with the database? Did you see query plans? Can you provide details? There is not point guessing with optimisation: you have to measure something before it's worth making any changes.
Originally Posted by Eduard Dudar
Oct 29th, 2009, 01:42 PM
Let's state that general job contains of 2 parts:
- the framework processing
- application code
We measure both of them on empty database and on loaded with 30000 finished jobs one. The results as it already mention in original post are the following:
Empty BATCH_ tables:
- batch: ~7 s
- application code: ~110 s
With 30000-jobs-filled BATCH_ tables:
- batch: ~190 s
- application code: ~110s
Application code is the same, job and parameters are the same and incoming data is the same in all cases.
Furthermore there is an observation most time consuming query is
UPDATE BATCH_EXECUTION_CONTEXT set TYPE_CD = 'LONG', STRING_VAL = null, DOUBLE_VAL = 0.0, LONG_VAL = 1, OBJECT_VAL = null where EXECUTION_ID = 28749 and KEY_NAME = '...'
Oct 29th, 2009, 04:36 PM
It might also be worth looking at your commit intervals. I had one job that originally had a commit-interval of 2 and it was taking about 75 minutes to run. As it turns out, restartability wasn't really a concern, so I upped the commit-interval to 50 and it now takes about 3 minutes. (So, the other 72 minutes were taken up by updates to Spring Batch tables - mostly updating the step context).
Oct 29th, 2009, 04:59 PM
Thanks for advise but this also means for you that your job commits transaction results every 50 processed items. That is not acceptable in our case because of strict item-by-item processing so when last item fails we won't rollback all 49 items prior to failure.
Oct 29th, 2009, 06:27 PM
I just want to make sure that your requirements really do require a commit-interval=1. While you are right that if your commit interval is 50 and record #50 fails, it will roll back the whole chunk, but on restart it will start again at record #1, so it's not like the data will be inconsistent. Moreover, if record #50 skips, then Spring Batch will roll back the chunk and then automatically reprocess records #1-49 before committing and moving on to record #51. So, unless theres some further reason why you would need a commit-interval=1, you should probably increase the interval to get better performance.
Oct 30th, 2009, 10:01 AM
Which DB platform are you using? Based on the information in this thread, it appears that your DB platform is not scaling very well when the record count in your BATCH_EXECUTION_CONTEXT table starts to reach ~30,000.
On DB2 platform, this table will be created without any indexes (because it is not part of a foreign key constraint defined on other tables nor does it have a unique constraint or PK defined on it). Thus, the performance for the mentioned "update" statement will be great when the table contains very few records but it will progressively deteriorate as the table size grows. This is exactly what you are experiencing.
In strict database terms, this problem can be resolved in at least the following 2 ways:
1. Add an index on the table to support the most frequent and expensive SQL that you will be running. You could start out with an index on (EXECUTION_ID, KEY_NAME) as that is the key needed for the "update" statement mentioned.
2. Periodically, purge the data from your Spring Batch tables to keep them to a reasonable size.
I believe it would be a good idea to do both of the suggestions above.
It would be nice if the Spring Batch Example DDL scripts to create the Spring Batch tables included at least 1 index for each table.
Nov 2nd, 2009, 03:40 PM
Well, as far as there is no listed native solution we have couple of things to workaround the problem. We are going to play with commit-interval and will clean up tables regularly. Also there is possibility that indices on heavy-used columns may help but I'm quite skeptic here because index update may kill down performance improvement gained by fast select (search).
Will keep you posted when we will have results to share so they may help the community.
Sep 6th, 2010, 03:58 AM
Purge BATCH_* tables data
Does the Spring Batch framework itself provide any solution towards purging data from those Spring Batch tables?
It would be nice to have an out-of-box puring mechanism (thru jobs-context XML settings?).
Thanks & Regards,
Clelio de Souza
Sep 6th, 2010, 11:26 AM
There is a test utility (JobRepositoryTestUtils, with some bugs just fixed) that you can use to delete data from the relational tables, but no plans currently to make it a core feature.
Tags for this Thread