I'm looking for an advice to understand better if spring batch is good for me. Here is a basic requirement/flow I have to implement:
Input data: A collection that contains "file paths" to download from local share
The strategy interfaces are PartitionHandler and StepExecutionSplitter (MultiResourcePartitioner), so each step takes one File/Item to download and then proceed it (let's say in parallel I will download 10-15 files)
1. Download file
- custom ItemReader will read file
- save it to local file system
- pass it to ItemProcessor as a reference/location link
2. Split/break it to parts (with nio FileChannel)
- ItemProcess will read file from filesystem based on reference/location link we get from ItemReader
- break it into chunks (fixed size, like 64kb)
- create MD5 sum from those parts
- delete original file from file system
- pass those parts to ItemWriter - here I see a problem since it's possible to get Out of Memory if input files are large - how to solve this if I don't want to save file parts during ItemProcess and actually it's an ItemWriter job to do this?
3. Store parts in local cache - memory/disk depends on memory availability in that moment.
- Update/Add parts location to a Collection/Storage object that can be used outside of Job execution - is it possible?
Then I need to have a reference to that storage/collection and use it later on in my flow - read parts and do some other transformations.
Few other questions:
1. How to launch a job so it can partition an input data from the queue/collection (wait if there is no data available in that moment) as an infinite loop?
as a prototype I did it this way (but it's not dynamic and stick to a folder), so how to read and partition data in a dynamic way? Read from some external Object storage/Thread-Safe Collection that is updating on the fly by a different process populating it with the list of files to download by step #1?
I will appreciate your input. Don't hesitate to ask questions if something is unclear.
<beans:bean name="step1:master" class="org.springframework.batch.core.partition.support.PartitionStep">
<beans:property name="jobRepository" ref="jobRepository" />
<beans:constructor-arg ref="jobRepository" />
<beans:constructor-arg ref="step1" />
<beans:property name="resources" value="file:D:/Music/*.mp3" />
<!-- The TaskExecutorPartitionHandler is quite useful for IO intensive Steps,
like copying large numbers of files or replicating filesystems into content
management systems. -->
<beans:property name="taskExecutor" ref="asyncTaskExecutor" />
<beans:property name="step" ref="step1" />
<!-- <beans:property name="gridSize" value="3" /> -->