I'm processing a series of files, which are named with datestamps. I need the filename because I've got to do some record keeping, so I followed the partitioning approach found in this thread.
The issue is that I need the files processed in order, but this isn't happening. Although the files are partitioned in order (that is, the partitioned steps are named in the correct order), they're processed out of order (that is, the step IDs are not in the correct order).
I believe that this is because after the files are partitioned and assigned to a step execution, the step executions are inserted into a hashmap. Of course, we aren't guaranteed any retrieval order out of the hashmap, and that's how the steps are processed in arbitrary order.
The fix I employed was to create a version of MultiResourcePartitioner that uses LinkedHashMap instead of HashMap in partition(), and a version of SimpleStepExecutionSplitter that uses LinkedHashSet instead of HashSet in split(). This maintains the order.
Does this sound reasonable?


Reply With Quote
