Results 1 to 8 of 8

Thread: heavy multithreading

  1. #1

    Default heavy multithreading

    Hi,


    I'm starting the analysis of a new project requirement which involves the execution of thousands of heavy-load algorithms' calculations. Obviously I need to make these bunch of CPU and memory demanding calculations in a multi-threaded fashion in order to get the maximum performance.

    The question is if Spring counts with a component that could help to program advanced concurrency applications.

    Thanks a lot.

  2. #2
    Join Date
    Nov 2004
    Location
    Hilversum - The Netherlands
    Posts
    1,054

    Default

    Are you planning to distribute the load over a grid or just run on a single vm?

    Spring has some support for multithreaded code, but if you need control I would go for a more specific solution. Maybe use the java.util.concurrent library but there are also other alternatives. It all depends on the amount of control and the type of solution you need.
    Last edited by Alarmnummer; Nov 19th, 2007 at 11:34 AM.

  3. #3

    Default

    Hi,

    We are not sure yet, but most probably it will be distributed over 2-3servers/JVMs

    Thanks

  4. #4
    Join Date
    Nov 2004
    Location
    Hilversum - The Netherlands
    Posts
    1,054

    Default

    In that case the type of solution is going to be different. There are open source alternatives like Terracotta, Blitz Spaces (A javaspaces implementation), but there are also commercial implementation like Gigaspaces (also has a Javaspaces implementation) that can be used to create a computational grid.

    The first step I would take is to figure out how your tasks can be parallelized and see if there are any dependencies between tasks.

    And Spring probably can be used to wire up most/all of these implementations (that is how I like my Spring).
    Last edited by Alarmnummer; Nov 20th, 2007 at 02:56 AM.

  5. #5
    Join Date
    Jul 2006
    Location
    Kolkata, India
    Posts
    217

    Default

    Quote Originally Posted by Alarmnummer View Post
    The first step I would take is to figure out how your tasks can be parallelized and see if there are any dependencies between tasks.
    Sure .. parallelizing effectively also allows applications of techniques like map/reduce or Hadoop. NY Times recently used this quite effectively ..

    Cheers.
    - Debasish

  6. #6

    Default

    ok, the actual problem is:

    We have to design an application that has to make thousands of complex statistical algorithms. In order to achieve the best performance we need parallel computing.

    Each one of those algorithms is composed of ten sub-algorithms, first, we start four sub-algorithms in parallel, based on their output we may run the next four sub-algorithms in parallel and once again based on their output we may execute the last couple in parallel.

    So, Which would be the best way to go here? Terracotta? Hadoop? GridGain? My own threads? Somebody even suggested Quartz clusters :-|

    Thanks!

  7. #7
    Join Date
    Nov 2004
    Location
    Hilversum - The Netherlands
    Posts
    1,054

    Default

    Quote Originally Posted by gcorro View Post
    ok, the actual problem is:

    We have to design an application that has to make thousands of complex statistical algorithms. In order to achieve the best performance we need parallel computing.

    Each one of those algorithms is composed of ten sub-algorithms, first, we start four sub-algorithms in parallel, based on their output we may run the next four sub-algorithms in parallel and once again based on their output we may execute the last couple in parallel.

    So, Which would be the best way to go here? Terracotta? Hadoop? GridGain? My own threads? Somebody even suggested Quartz clusters :-|

    Thanks!
    For the intra virtual machine parallelization I would look at the java.util.concurrent library.

    You could do a

    Code:
    Future future1 = executor.submit(task1);
    Future future2 = executor.submit(task1);
    Future future3 = executor.submit(task1);
    
    if(!someCondition(future1.get(),future2.get(),future3.get()))
       return "Oh dear... no reason to try further";
    
    //the first 3 tasks indicated we should try the rest.
    Future4 = executor.submit(task4);
    Future5 = executor.submit(task5);
    Future6 = executor.submit(task5);
    
    ... and now check again
    This can all be done with the functionality the java.util.concurrent library provides. The missing part is the distribution functionality. A proof of concept version would be easy to set up with Terracotta.

    A few of the other grid solutions (like Javaspaces) also make it easy to distribute the code over the nodes. I don't know if Terracotta provides a similar solution. There is other functionality missing from the example: persisting tasks, failover over tasks (resubmitting them when something fails) etc. You can create it all by hand, but this usually is the stuff grid solutions take care of (partially).

    Java 7 is receiving some new functionality in the java.util.concurrent library:
    http://www.ibm.com/developerworks/ja...-jtp11137.html
    Last edited by Alarmnummer; Nov 21st, 2007 at 06:54 AM.

  8. #8
    Join Date
    May 2007
    Posts
    15

    Default Terracotta clusters util.concurrent

    Alarmnummer is correct wrt Terracotta. Start with util.concurrent and then cluster / distribute it with our stuff. That is, if you choose Terracotta in the first place. I leave it to you, somewhat obviously, to decide which clustering approach you take.

    The main reason I am posting here is that if you do choose Terracotta and util.Concurrent (or our own MasterWorker framework) you need to consider performance in your particular use case. I assert this merely because I assume you are going parallel for throughput and performance of otherwise CPU-intensive computations.

    Here's my thinking: queue striping and associated lock striping. Using MasterWorker, have a queue per Master and 4 worker threads. Then have a map of Masters to which you can assign a 4-way parallel task. Pick a Master-worker tuple by random or by other means from the map and then send the master the work via a simple Java interface. He then enqueues the work for his workers and gathers their responses back. This way each Master/Worker tuple is completely partitioned in terms of workload and locks and concurrency from other tuples. And this will lead to linear scale. As for how you partition the Masters and Workers across JVMs, that's up to you.

    Hope this helps...

    --Ari

    This
    Last edited by ikarzali; Nov 21st, 2007 at 06:15 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •