
Originally Posted by
gcorro
ok, the actual problem is:
We have to design an application that has to make thousands of complex statistical algorithms. In order to achieve the best performance we need parallel computing.
Each one of those algorithms is composed of ten sub-algorithms, first, we start four sub-algorithms in parallel, based on their output we may run the next four sub-algorithms in parallel and once again based on their output we may execute the last couple in parallel.
So, Which would be the best way to go here? Terracotta? Hadoop? GridGain? My own threads? Somebody even suggested Quartz clusters :-|
Thanks!
For the intra virtual machine parallelization I would look at the java.util.concurrent library.
You could do a
Code:
Future future1 = executor.submit(task1);
Future future2 = executor.submit(task1);
Future future3 = executor.submit(task1);
if(!someCondition(future1.get(),future2.get(),future3.get()))
return "Oh dear... no reason to try further";
//the first 3 tasks indicated we should try the rest.
Future4 = executor.submit(task4);
Future5 = executor.submit(task5);
Future6 = executor.submit(task5);
... and now check again
This can all be done with the functionality the java.util.concurrent library provides. The missing part is the distribution functionality. A proof of concept version would be easy to set up with Terracotta.
A few of the other grid solutions (like Javaspaces) also make it easy to distribute the code over the nodes. I don't know if Terracotta provides a similar solution. There is other functionality missing from the example: persisting tasks, failover over tasks (resubmitting them when something fails) etc. You can create it all by hand, but this usually is the stuff grid solutions take care of (partially).
Java 7 is receiving some new functionality in the java.util.concurrent library:
http://www.ibm.com/developerworks/ja...-jtp11137.html