Sep 8th, 2011, 02:31 AM
Parallel processing of XML files
I am new to spring framework and spring batch framework as well. I am working on project where I need to develop design using spring batch framework which can do the following.
- Collection of XML file from one server to another server
- Parse XML file and load it in data into database.
we have multiple XML files and above steps needs to be done for each XML file in parallel.
Can some one help me with directing to sample program which demonstrate similar sort of functionality? Mainly i am looking for sample code which demonstrate running multiple task in parallel threads.
Thanks for you help in advance.
Sep 8th, 2011, 10:34 AM
Sorry for original post – I've intended to reply to another thread. Look here for example of running chunk processing in parallel. It does not fix exactly your needs, as you want to process not records, but files in parallel. But do you really need this? Imagine that one file is much bigger than another, so the thread that process it may inefficiently continue running on one CPU, while in case of parallel chunks all CPUs will be involved (if you set number of threads equal to number of CPUs).
Last edited by dma_k; Sep 8th, 2011 at 10:43 AM.
Sep 8th, 2011, 04:52 PM
Originally Posted by dma_k
Thanks for your reply. Yes I really needs to process multiple files in multiple processes since system is going to get multiple files from upstream application. I believe there is no use of processing it sequentially since system is going to get multiple files from upstream application. Do you believe spring can provide that functionality or better to develop customized batch framework using Thread pool and JMS?
Also, I am not planning to process all files together. If system is going to get ~50 files then I am planning to have ~10 thread running and as soon as any one thread is free will start processing another thread. The another reason to have such flexibility is because system will not receive all the files together from upstream application and it will be in random order.
Sep 9th, 2011, 06:24 AM
In this case I would define parallel steps, each step will refer the same XMLFileProvider class that on request will acquire a new XML file. Each step should acquire / open / read / close XML file (this breaks the reader API, but is doable) and exit, when XMLFileProvider signals that there are no more files. Perhaps somebody may advise a better solution.
Sep 9th, 2011, 08:47 AM
You can also look at partitioning. In case you need to process 10 files you could create 10 partitions.