Spring Batch: Design question
Hello there! We are starting a new project, and I'm considering spring-batch to be the kernel solution of our architecture.
I've just read the docs, and now have a basic understanding of it. There are some design questions that I would like to discuss here, and get some feedback from the community.
Our main bottleneck is network latency. We need to crawl data from several thousands sites. And we have a time window (10 min) to do that. Today, doing this in parallel in several machines would not be a problem, we could spam 50-100 threads for each machine and get the results in a couple of minutes.
The job afterwards is what drove us toward spring-batch. We need consistency, to make sure that each job (a crawl task) have been executed, if it has fail, it must be retried a few times before we decided to log it on a audition place to be looked at. All this seems to be provided by spring-batch.
Well, given this scenario, my question rely lies on the better job flow for this. Should I write an ItemReader that spans the many threads, or create parallel jobs (one for each site). Partitioning would also be a must, since our time window is short. But I'll deal with that later, just start with the basics right now.
Any suggestions for this?