Apr 14th, 2008, 09:47 AM
BatchUpdate in MySql - Performance
I am reading from a file and writing to a database. As I am in the process of creating a prototype to go with Spring-Batch, I am using tradeDAO sample that came with the samples of Spring-Batch.
I am using MySQL and have about 50,000 records in my file. Just read all data in the file it takes me about 10-12 minutes, which I believe is long.
Then to read and insert them into MySQL database it takes me 40 minutes.
This is just a sample data and eventually we will get millions of data.
My question is this the normal performance I can get using Spring-Batch, is there any fine-tuning mechanism available?? If yes, how much would be the performance improvement??
Currently the whole batch process in mainframe gets completed in 1hr 30 minutes (including file read, and other business logic processing for millions of records). I am little concerned about the performance here.
Any help and input to improve performance would be GREAT.
Apr 14th, 2008, 10:16 AM
Make sure you configure the commit interval property on the step to some reasonable value (it is 1 by default, which means metadata is saved after processing each item).
Next you can consider writing to database using batch updates (see batchUpdateJob in samples).
Apr 14th, 2008, 11:36 AM
Thanks for the input!! I set the commitInterval to 100 and it improved the performance dramatically! From 40 minutes it came down somewhere between 5-6 minutes. (This is without BatchUpdate though)
Apr 14th, 2008, 02:02 PM
some performance tests I recorded with spring batch 1.0
Transfer data from a table (with 23 columns) from one database to another database.No buisiness logic done.
Repository database : apache derby
Number of rows processed : 110662
Sybase bcp tool was used to do the same test case from sybase database to another sybase database. 110662 rows were transferred in 15.68 sec
Obviously bcp was very fast in transferring data from one sybase database to another sybase database. What spring batch provides is object interaction and it is not a bulk copy tool .Business process can be easily integrated with spring batch with simple java objects.
Apr 14th, 2008, 08:04 PM
I would say that's a pretty fair assessment. Spring Batch isn't trying to replace a bulk data-load tool. As you said, it allows for applying business logic with Java, rather than some scripting language with etl or not at all without it. One caveat I would add to that is that the status tables provide a nice advantage if you have a lot of batch processes running. It provides you with a consistent place to look (one table instead of many tables and/or many log files) for the status of a process. Even if using a simple data load tool such as bcp or SQLOADER, you might see an advantage from calling such a tool from a Tasklet so that you could still see what time it was kicked off, whether it completed successfully etc, in a consistent way across multiple processes.
Apr 14th, 2008, 10:21 PM
I will emphasize on these points too in my presentation that I will be doing in next few days.
Everyone in spring batch team,
Thanks for the great documentation.