multi-threading and long running transactions
I am working on improving the performance of an existing spring batch implementation. The current batch works on a largish data set (100k) and basically creates pdf documents for each row.
My environment. The batch runs in a clustered Websphere setup in the context of a WorkManager thread. It calls EJBs to do the retrieving of data and writing of data.
It currently works although slow as it creates one document after the other. The production setup has three separately deployed pdf creating services so multithreading will definitely speed up processing. It also only executes on one of the cluster members which means from a batch perspective the cluster only provides fail over and no load balancing.
My question is which approach to take. I have looked at partitioning but that does not seem to be right approach as I cannot break up the work in batches small enough and still keep the number of threads under control. In other words. 100k in 100 threads still means 1000 pdfs in one step and that is not really feasible as I have to call an EJB service with a transaction timeout of 2 minutes. I suppose I could call the EJB 1000 times from the writer or processor but that just does not feel right.
The perfect solution in my mind would take the 100k rows process them 10 at a time in parallel and as soon as 1 finishes another starts so that there are always 10 executing.