
Originally Posted by
Dave Syer
Simple: it prevents the same data from being processed twice.
Aren't there a whole class of problems where you don't necessarily know what data is going to be processed (or when), and therefore you need to delegate the responsibility of ensuring data doesn't get processed twice? In our case, it is up to the SQL query passed to our JdbcCursorItemReader to ensure we don't process duplicated data. So forcing developers to parametrize *and* ensure identity just leads people to working around this constraint. I do a sync similar to what is described above, and I will have to pass in a timestamp just to guarantee identity but for no other purpose.
That really isn't a big deal (unless two people somehow kick off a job at the same time in an async task environment). And I see the benefit of uniqueness if, say, you needed to ensure that a set of XXXX-[date].xml files came in and you wanted to ensure only today's files got processed once. But in the real world weird things happen, and there is a remote chance they *might* need to be processed again, and not just via the restart functionality.
What if it was optional to ensure parameter identity?
Code:
<job unique="false" ... > ... </job>
P.S.
Other than this concern I am very impressed with Spring Batch. Keep up the good work.