Jan 30th, 2008, 11:03 AM
Pre-step and post-step method
we need to do some initialization at the beginning of a step
(e.g. mark all records in the target the database table with some flag).
In the same sense, we would need to execute some cleanup at the end of the task (when all lines of the input file have been processed).
The "pre-step" and "post-step" method should belong to the step's transaction.
Thank you in advance
Jan 31st, 2008, 02:07 AM
What do you mean by 'the step's transaction?' Normally there are many transactions per step, unless you write your own Tasklet that returns ExitStatus.FINISHED, or have an extremely large commit interval. If you are writing your own Tasklet then the answer would be obvious, so I must be missing something?
Jan 31st, 2008, 03:10 AM
You can implement the initialization and cleanup as separate steps. There is no single 'step transaction' so you should get the same result you would get with proposed "pre-step" and "post-step" hooks.
Jan 31st, 2008, 04:03 AM
Thank you for your replies.
I shall try to explain, what I'd like to achieve.
(1) start tx
(2) reset a status flag for all records in a DB table (pre-step)
(3) read a CSV file and insert/update data in the DB table (the flag gets modified for some of the records).
(4) store log info (post-step)
(5) commit tx
The whole file shoul be read in a single (potentially long) transaction.
What is the "correct" way to realize this scenario?
Thank you in advance
Jan 31st, 2008, 04:25 AM
As Dave indicated you can implement a tasklet with execute body doing (2);(3);(4); return ExitStatus.FINISHED;
However in case of a single transaction you get almost no benefit from using the batch framework. All I can think of is automatic deletion of created files in case of rollback (4) (assuming you actually want to store the logs as files and use FlatFileItemWriter).
The "correct" way to realize your scenario may very well be "just write a Ruby/Groovy/... script", but that depends mostly on the context of what you are doing.
Last edited by robert.kasanicky; Jan 31st, 2008 at 04:33 AM.
Feb 1st, 2008, 04:38 AM
I would say that you might want to use a multi-stage operation for this -- e.g.
Reset status flag for all records in a DB table (1 transaction OR M / commitFrequency transactions were M is the number of records in the database if you do updates one record at a time -- the second strategy might be better if your table has a large amount of data or if the DBMS in your environment is prone to blocking)
For each line in CSV file, insert / update data in the DB in a holding table (N / commitFrequency transactions where N is the number of lines in file)
Migrate data from holding table to production table (1 transaction)
In my experience this avoids a lot of problems, since you are not updating production data at the same time as you are interacting with the error-prone part of your processing (i.e. reading from the file).
This can easily be represented in Spring Batch if you do it this way.
a) the transaction strategy used in the recommended "simple" configuration would encapsulate these needs seamlessly
b) STEP2 can be almost fully created by configuring pre-packaged classes
Originally Posted by mpalicka
Feb 1st, 2008, 04:43 AM
Just some further context - the operations you're talking about aren't really clean-up operations - they have real meaning to you, even it only means "these records haven't been updated yet today." There are mechanisms in place for setup and clean-up (e.g. opening and closing files, connections, etc.) but they wouldn't serve your purposes because they occur outside of the transactions created by Spring Batch. That's why I recommended that instead of thinking of these tasks as "maintenance" tasks, that you consider them to be atomic steps that must occur in a certain order.
Doing all your processing in a holding table first and then moving the data over in a single step lets you perform one logical transaction over several physical transactions by "committing" everything (i.e. updating production) at the end.