Dec 1st, 2008, 09:59 AM
should I consider spring batch?
I am completely new to Spring Batch but am considering using it along with/quartz to oversee batch processing of XML files.
XML files need to be converted from XML to domain objects and then persisted using Hibernate.
The basic process I have in place is
- Gets a List of XML files to process from a directory
- For XML file, convert it (using jdom) to a domain object (which requires calls a DAO to set certain properties)
- Persist the domain object to the database via DAO
- Write a flat text file which is referenced in the db record
- Move file to sucess location on server
This process needs is to occur every minute. My program is pretty messy at the moment and the process above is taking around 7 seconds PER xml file, which I feel is way too long.
If anyone has done anything similar, I'd love to hear.
Dec 1st, 2008, 10:29 AM
How large are the xml files (how many records/domain objects)?
Dec 1st, 2008, 10:34 AM
Currently the xml files are under 10k each and per XML file.
I populate a number of domain objects. The 1st uses most of the elements from the XML and the others are returned from Hibernate based some other values in the XML.
Dec 1st, 2008, 10:42 AM
So, you are doing about 10k db inserts as well 10k+ queries?
Originally Posted by Sprang
Dec 1st, 2008, 10:47 AM
sorry, 10k in file size .
Initially I migh be processing 60xml files per hour, but longterm there is no real way of knowing just how many files I will be be dealing with. As our editorial system becomes more widely used within the business, the more XML files will need processing.
Dec 1st, 2008, 10:48 AM
Edit: oops, 7 SECONDS, not 7 minutes....
Originally Posted by chudak
If I were you, I'd run my program through a profiler. Otherwise, you are just guessing at what the bottleneck is.
Is your app running in process or are you launching it as a separate vm everytime? What is the network latency? Is your db correctly indexed? Is it a local file write/move or remote (e.g. NFS, SMB)?
Last edited by chudak; Dec 1st, 2008 at 10:55 AM.
Dec 1st, 2008, 10:59 AM
Thanks for replying - I should point out the following though!
- I am not using Spring Batch at the moment - it is just a service method that will be triggered via a Quartz cron expression (every min).
- The filesize of each XML file is below 10k.
So, if there are 100 XML files which need to be converted and loaded into the DB then it is.
100files x 7s / 60s = nearly 12 minutes.
For each conversion the following occurs
- XML file is loaded by jdom
- 1 main domain object is created with from elements within XML
- 5 basic SQL selects are used to set other properties of main domain object
- 1 flat file is written
- 3 SQL inserts are fired
- XML file is moved to either success or fail directory
Currently this is just a unit test running locally which processes files from a samba share. This could be a bottleneck right here. I will use locally stored files for testing.
But my main question was really wether using Spring Batch could offer me any benefits (I know it supports error recovery etc) but at the same time I don't want to overcomplicate the project.
Last edited by Sprang; Dec 1st, 2008 at 11:02 AM.
Reason: various updates
Dec 1st, 2008, 12:05 PM
I don't see why you couldn't use spring batch. It may be overkill for you, however. Since each file contains a dom that represents basically one domain object, you can just process files asynchronously.
Originally Posted by Sprang
Have your cron job scan the directory and for every file, serialize the file into a jms message and send to a jms queue for processing. The JMS queue can scale the listeners horizontally. This whole thing can scale horizontally if the files you need to process and the directory where you write the output files are mounted on each application server that is running your app. That's how we are doing something similar.
Dec 2nd, 2008, 04:33 AM
me too me too
this is kind of creepy. i started looking at spring batch yesterday for the same setup. i need to process what ultimately becomes an XML file and populate database tables based on that. these files are in the 10s of MB in size with typically < 50K entries.
i was looking at an online tutorial and liked the idea of using an item reader (e.g. the stax one) but from what i can tell the flow if i go that route is to convert my to a FieldSet. but that is a flat data structure, and in my case the XML can convert to many rows in many parent and child tables. i can see where i can just toss the whole processing into a tasklet, but that doesn't seem to be taking advantage of alot of the infrastructure spring batch can provide.
so... is there a methodology, a pattern, that i should be looking at? one last thing (of course, always one last thing). i need to process both asynchronously and synchronously. i assume spring batch is happy with either, yes?