-
Feb 26th, 2008, 04:07 AM
#1
Scalable processing in cluster-safe fashion
Hi,
I think this is basic newbie question, but I haven't found anything on it thus far.
We have several use-cases, which share a common problem:
- At a certain time, bill all customers in a particular state
- At a certain time, send an e-mail to all customers in a particular state
- At a certain time, send an SMS to all customers in a particular state.
- etc...
The application is a webapp using Spring and Hibernate; it will run on a variety of servers and databases. At the scale we are talking about, typically there will be a clustered database which appears as a single logical database to the application, and multiple JEE servers in a cluster running the webapp.
My questions are about having the webapp running on a cluster.
- We don't want to have node-specific builds if at all avoidable.
- How best to allow each node to process say 100 customers at a time during each task, while ensuring that we don't charge each customer on each node / send duplicate messages, etc.
In the past, I've done this using raw SQL to do SELECT FOR UPDATE style semantics to ensure each node has an exclusive lock on the customers that it's processing for a given task. Is this the recommended way to do it, or am I missing something?
Cheers,
James
-
Feb 29th, 2008, 10:55 AM
#2
It doesn't seem like you are using Spring Batch framework to implement your solution, so you probably posted to a wrong forum?
-
Mar 3rd, 2008, 08:17 AM
#3
No, I did mean to post here. I'm trying to evaluate what support Spring Batch provides for cluster-safe running, and I haven't found much mention of it in the documentation.
I tried to frame the question as being how I might approach it just using Spring TimerTask support to do it. My apologies if that was made clear.
-
May 24th, 2009, 11:19 AM
#4
Scalable processing in cluster-safe fashion
The application is a webapp using Spring and Hibernate; it will run on a variety of servers and databases. At the scale we are talking about, typically there will be a clustered database which appears as a single logical database to the application, and multiple JEE servers in a cluster running the webapp.
My questions are about having the webapp running on a cluster.
We don't want to have node-specific builds if at all avoidable.
How best to allow each node to process say 100 customers at a time during each task, while ensuring that we don't charge each customer on each node / send duplicate messages, etc.
In the past, I've done this using raw SQL to do SELECT FOR UPDATE style semantics to ensure each node has an exclusive lock on the customers that it's processing for a given task. Is this the recommended way to do it, or am I missing something?
Cheers,
Lingerie Alley
-
May 25th, 2009, 02:41 AM
#5
The PartitionStep is designed to handle this scenario. You need to provide a PartitionHandler that knows about your cluster and probably a Partitioner for your data set. As long as the database is shared you will have no problems with duplicate processing.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules