I have a scenario and am confused type of job to use.
Below is scenrio.
I have around 10(can be more) input files, all the files need to be processed and written to one output file. I am using MapJobRepositoryBean as my repository, so no database used.
I am planning to do the processing parallely to save time. I don't know which of the below performs best.
1) MultiresourceItemReader(Will synchronize its methods) and AsyncTaskExecutor.
3) Partition Job
Can someone suggest me the best among these as per performance is concerned. Or let me know if any other approach is suggestable.
Thanks in advance.
Are all 10 files going into the same logic? Or different logic per file.
Multiple readers won't help, as you have 1 hard drive. Reading the files 1 by 1 should be the best read performance.
Also, writing a file with more than 1 writer won't help. Your hard drive can write to 1 place at a time.
IF your job is very CPU light, you will be fine with 1 thread all the way through. If you are doing lots of heavy CPU IO, I would suggest:
1 File reader -> Lots of worker threads -> 1 File Writer
I would use something like a LinkedBlockingQueue to hand off rows of files to and from the worker threads in blocks of 1000 lines per block.
How can I implement lots of worker threads in Spring batch job?
Does they go in Processor?
Your processer still does 1 line at a time. Design it stupid so it just does 1 line, and stores no local state.
You would just make a step:
Read data from queue -> process data -> write to output queue
And multi-thread this step.
Look at the example "parallelJob.xml". You wrap your chunk like this
And if that taskExecutoris a multi threaded taskExecutor.. it will multi-thread your step.