Can scale batch job by using distrubuted steps ?
My use case is that a batch jobs consists of 3 steps. The first step will need to read input item and for each item it saves the downloaded result to files.
The step 2 reads items from files and processes, writes to db.
Now I want to be able to scale it. However, the scaling solution, remote chunking, only allows the delegate of chunk processing of one step to remote batch slaves. So it will be a problem to my case. If for the first step, the chunk processing is done at each slave, the downloaded files are also saved at different slaves. And if I use remote chunking pattern for the step 2, it won't be able to correctly pick up these files as the processing might be delegated to different slaves.
Is there any way I can remote chunking for not only a chunk processor but also a series of steps ? For example, a slave after processing its chunk of items, can start the next steps of the jobs as well ? Theoritically I can programmatically trigger the job with these steps at the slave end but then it will create a separate job entry.
Now, the only solution I can think of is to write the data to a message queue instead of local files. But I prefer the solution that don't have to change the program much.
Thank you for your help.