I agree with this definition. I don't agree that clustering is only for sharing state across nodes. To me, if I need high availability of servers, that means clustering. Now, everyone does clustering a little differently, so you have to evaluate what the product you are considering actually offers. Does it provide the most efficient, and hopefully least invasive solution? I think Spring makes it easy to plugin such a solution without changing your POJOs. http://www.javaworld.com/javaworld/j...31-spring.html demonstrates this for a stateless service. You are not going to tell me that having multiple servers, providing the same stateless service, is bad, are you?
Greg L. Turnquist (@gregturn), SpringSource/VMware
Project Lead: Spring Python and author of Spring Python 1.1 and Python Testing Cookbook.
Listen to Pond Jumpers, the international podcast for open source developers.
These comments are my own personal opinions, and do not reflect those of my company.
Yes, you are of course correct. I wasn't being clear.
My point was that the load balancing can happen outside of the application, i.e. with Apache or a hardware load balancer. Each "node" can work independently from the others as there is no state being replicated, hence no server affinity.
If the web tier state is replicated, then making the middle tier cluster aware has little or no advantage.
From what you say, I see no problem deploying N instances (with the web tier replicating state) and then having Apache distributing the calls (round robin etc.)
The question that I was asking, is exactly what benefit do you get about introducing clustering into the middle tier itself?
You can of course introduce a Proxy instead of the POJO which is cluster aware (using RMI maybe) but I see no benefit, only complexity....
- Yagiz -
http://blog.decaresystems.ie (Shameless "company blog link" )
To bring several posts together...
For HA, use a load balancer in front of your incoming requests (be it web services or browsers). Co-locate your spring services with your web server (tomcat, resin, whatever), keep them stateless, and cluster the pair. That way you can easily survive the loss of a machine.
When you get serious about HA (5 9's), dual-path everything. Your load balancer gets a hot spare. Your internet connection becomes 2+ through different companies with as different traceroutes as possible. You dual nic everything through different switches. Your database gets clustered as well (Oracle RAC or DB2 EEE). Leave nothing to be a single point of failure.
Having your Spring services be stateless is absolutely the way to go. There are always other issues that crop up with clustering. Such as, if I'm using Hibernate I want to take advantage of the 2nd level cache as much as possible to avoid pounding the db server with the same request over and over again. But if each machine has it's own cache, how do I handle updates to objects?
One solution might be to have a entries in the cache be invalidated after so much time so they will be refetched the next time they're used. If you don't need to see real time updates as they occur this could work. Or you could use JMS so communicate among the nodes and publish events. When an event is caught saying an object was updated you could invalidate its entry in the local cache. You could use Tangosol or Terracotta as they provide distributed caches.
Another problem is if you're using Quartz for jobs. If you've got a job you want to run every night at 12pm, how do you make sure that only one of the nodes does the job? In the same node, you might also want to have scheduled jobs that run on all the nodes. How do you accomplish that? I think Quartz can use a database for storing job information, and I know I've seen references to using Tangosol or Terracotta for accomplishing this sort of thing.
One last problem that I can think of is if you're using Lucene to provide search capabilities in your application. You could use a database backed Directory, but this has performance problems. This is another area where you could have indexes on each machine and use some communication method (like JMS, Tangosol, or Terracotta) to tell the others when an object is added/updated and needs to be (re)indexed.
These are all problems and some ideas I've seen floated around before. I've never actually heard any "we wanted to implement a cluster and here's how we did it" case studies. I'd be very interested to see someone talk about how they actually solved these problems and any additional hurdles they had to overcome.
PS: When I say cluster, I just mean a group of servers all running the same application with minimal communication between them. Ideally they wouldn't have to have any communication, but because of the issues I've highlighted above that doesn't seem to always be possible. I guess it would be better to call it a "farm" rather than a cluster.
Yes, you are correct - the problem with scheduled tasks really occurs if there are several servers in cluster (for example - email sending, scheduled generation of some documents, file system monitoring etc).
We solved that problem is quite a straighforward way - all servers (in the system where cluster was used) were communicated with shared database. And to eliminate tasks duplications, each task before execution locked appropriate record (by writing ID of server and time of locking) - so other tasks were able to check that lock and execute it only if no lock exists.
Caching - yes, another issue. We've used some caching on servers, but synchronized them via distributed cache.
Actually, I've developed Cluster4Spring project because at that moment there were no ready to use clustering solution for Spring and it was required to cluster existing system. Yes, in most cases we've used stateless services and therefore clustering of system we've developed was performed mostly by correcting appropriate XML mapping.
Of course, not very system could be easily clustered since there specific requirements that should be supported by system architecture (like stateless services). However, if the system is designed from scratch, it's possible to satisfy them.
As for figures - configuration of real life cluster we've developed included servers of several types (based on server' purpose) - web servers, application servers, image processor, images generatin servers, pdf generation servers, uploads processors (not counting database servers).
On production stage, we have
8 web servers
5 upload processing servers
8 image processing
3 pdf generation server
7 images generation servers
At the moment of writing, uptime of the system is more than 3 months.
I think i'm in the situation right now!
We have 1 webserver and one db server, but our client want to scale up the site, to give more performance for software. The proposed solution is to set up a new webapp server and improve db server.
Could you pls suggest the best way of implementing this?
Of course i think about keeping applications in sync, cause data is stored in local cache. Just a few words about app: webwork - spring - hibernate.
A prediction game with a lot of rankings and ways of grouping players.
As far as i've understood - either use distibuted cache or send messages as a signal to clear the cache.
correct? what would you suggest?
My first question would be:
- Currently, is there a performance problem?
I guess the answer to this question is "yes".
Then, the following question is "do you know where the bottleneck is?"