Feb 26th, 2013, 12:55 AM
Batch Performance using Spring Batch
I've trying to search online on any performance issue using Spring Batch, but can't really find something substantial.
What angle should we be looking at in terms of performance?
- Spring Batch Framework?
- File loading/reading mechanism?
- Network? Connection to database or server?
- Hardware? Server Capacity?
In terms of performance tuning, which are the common areas that require tuning?
Feb 26th, 2013, 01:55 AM
Could you b e a bit more specific? Your question is quite generic and as such hard to answer. Do you have a problem if so describe it.
Could be your database (indexes should be switched off before batch processing enabled later). Chunks to small generating to much transactions, chunks to large and using hibernate can cause a lot of excessive dirty checknig. If you are reading with FTp could be the network (latency is in general the problem). Memory issues to little memory leading to excessive garbage collection...
So there can be a whole lot of issues.
Feb 26th, 2013, 03:52 AM
Thanks Marten for replying. Agreed that the question is generic, but the audiences are not very technical or have little knowledge on Spring Batch.
Trying to ascertain when user raised a concern with regards to performance in executing a series of batch programs that process millions of records, where is the possible area that will cause the drop in performance? Will Spring Batch Framework or other areas - network, server, database, etc be the cause in the lapse?
Trying to explain to management that with the use of SBF, it will not affect performance and is usually caused by external factors such network, server, database or the business logic in the batch programs and not the SBF.
Feb 28th, 2013, 05:00 PM
Other than the database and external resources, some of the performance bottlenecks could be reading from sources like a flat file or from an unindexed datasource and writing to a file. But then these are problems that could plague any batch processor and is not specific to spring batch. Adding to that Spring batch does provide for ways mitigating some them like the partitioning the reading by splitting the file or doing a partitioned read on a database. It also provides for remote chunking as a strategy where you could split the work load among multiple remote slaves when the sink (or the writer) is slow like maybe webservice calls etc. In short, like you mentioned, most of the performance issues are dependent on external sources and we have some options in spring batch.