Jul 29th, 2008, 09:31 AM
I'm building an application that will read files, process them and store a modified version of each file. Preferably, I would like this application to run until someone stops/kills it. If there are no files to read it should wait until there are new files available.
Do you have any suggestions on a suitable approach? I was thinking having a main thread which checks for files and kicks off a job when there are new files available. However, this would put some "batch job" logic outside the actual job and I would prefer to have my runtime environment being able to start different types of jobs with other input sources. Would it be a good idea to implement a custom reader which will have a blocking read until it finds a new file to read from?
Jul 29th, 2008, 10:13 AM
If I were you, I would use Quartz. It has a built in 'listener' that polls a file directory and can be used to kick of a Spring Batch job when a file has been dropped.
Jul 29th, 2008, 10:42 AM
That sounds pretty good actually. I guess that listener could send that file as a parameter to the job so the job knows which file it is and can take a backup of it before processing?
Jul 29th, 2008, 08:13 PM
That's what I would do, it would also allow Spring Batch to identify the JobInstances via the file name (since they would be JobParameters)
Jul 30th, 2008, 05:48 AM
It seems to be a bit harder to get this working than I thought. Compared to the sample application where they use the predefined spring quartz implementations. I can easily exchange the QuartzJobBean for the FileScanJob but the FileScanListener seem a bit tricky since it only has a string as argument. Would it be best to let the quatz job instance and the listener be the same?
Also, we would at a later stage move to a queue based solution. The queue would hold a file location and a job name. In this solution would I register some kind of queue listener and drop the quartz implementation? The queue listener would then launch the job and when the job is finished remove the message from the queue.
Jul 30th, 2008, 10:59 AM
That's more of a quartz question, but *think* you can combine the listener and job interfaces.
Kicking off a job via a queue is perfectly acceptable as well.
Aug 11th, 2008, 08:13 AM
Thanks for all help so far.
I've started adapting my solution to use amazon simple queue service. I'm currently using quartz to check if there are any messages on the queue. If there are messages on the queue the job will be launched with the message content as a parameter. I've now come across a problem that I haven't figured out how to solve yet. The job needs to retrieve a great amount of data from a database used for doing some analysis. This means that it takes some time for the job to actually start the processing. I would like to be able to keep this data in cache/memory between jobs so that a job could start processing much faster. It seems that each job run within its own context and I cannot figure out how to share beans between jobs?
Aug 11th, 2008, 11:53 AM
Not really a batch-specific question - you are only limited by what Spring can do really. But note that the Quartz sample we publish uses a parent application context, which is shared between job instances. Any bean in that context would be shareable. But be careful with restartability - if you rely on that shared data to be there, one day it will not be because the lights went out and you had to restart the process.