Nov 25th, 2012, 02:02 PM
I am using spring batch and am facing problems with respect to the times taken for writing records from database to file.
Please find below the timings and the configurations.
To read from Database and write 14,00,000 records it took 17 hours and it took 48 hours to write 74,00,000 records.
1) Trying to write 2 files in parallel (in 2 steps). Later would keep 14 steps in parallel each step writing huge chunks of data around the same as above.
2) Commit interval of 10,000.
3) Throttle limit of 2
4) FlatFileItemWriter to write the files.
5) The Data base, app tier and file store are at different locations.
6) Have only 2 CPU machine (64 bit) for now but would be running on a 64 CPU machine.
7) Kept the heap size to 3GB.
Please can you'll help in suggesting any performance change to reduce the timings for the 2 file that i am writing.
Thanks a lot in advance.
Nov 26th, 2012, 01:04 PM
what is your disk infrastructure?
Nov 26th, 2012, 07:46 PM
I guess that your process has three part, ItemReader, ItemProcessor, ItemWriter.
How long time does each part take?
And do you use LineAggregator in FlatFileItemWriter?
Could these informations open the way?
Because select SQL with paging takes long time, and complex LineAggregator also takes long time.
And I have one more question.
Select SQL with paging is correct?
If paging SQL is not correct, duplicate records are selected any number of times.
I'm sorry if I misunderstood your question.
Nov 30th, 2012, 06:26 AM
Thanks a lot for the responses. I looked at the box that we were using and as mentioned before mine was a 2 CPU machine and was writing 2 files with 2 parallel steps. I shifted to a better box i.e. went on a 64 CPU box and could write the same volumes of data as mentioned in my earlier post in 5 hours and 45 minutes. I think its got to do with the way thread are processed by the CPU and the way it is given priority. By having multiple CPU's the threads created by other processes do not block the extract creation threads.
I have one more question and that is with respect to fetch size. I had changed the fetch size to be same as my commit interval i.e. 10000 but that does seem to give any better times. The default fetch size that I read is 10. So even after changing the fetch size the times that I am getting are the same. Does playing with fetch size really help in large volumes of data?
Thanks once again.
Nov 30th, 2012, 10:30 AM
It can make a difference on large volumes of data. Which ItemReader you're using to access the database will impact how much of a difference that makes. For example, for every page fetched using the JdbcPaginingItemReader a new SQL query is executed. So if you have a larger page, you run less queries. Obviously this operation doesn't occur in a vacuum though and all aspects of your job need to be looked at to be sure the true performance impact.