Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 32

Thread: In memory database or Caching solution

  1. #11
    Join Date
    Apr 2005
    Posts
    15

    Default

    Quote Originally Posted by Paul Newport View Post
    There is a big section in hibernate in action all about caching and performance tuning.

    I don't think you'll gain anything from using an in memory database to be honest.

    On a final note, don't forget, "Tin is cheap". Sometimes it's just easier to be pragmatic and throw some hardware at a problem. A monster server is after all about the same price as a contractors wages for a week.
    In our performance testing environment in the current project, we will probably use hp rx4640s, which are priced somewhere in the area of 60,000 GBP... I would LOVE to earn that much in a week... :-)

    BR,

    Robert

  2. #12
    Join Date
    Sep 2006
    Location
    UK
    Posts
    8,424

    Default

    Quote Originally Posted by dinsush View Post
    The reason I am hesitant in using in - memory database is mainly because the amount of data I would like to cache is huge and with large chunk of data HSQLDB, that we evaluated seems to become slow or marginally faster than what we have been doing so far.
    If you are dealing with huge amounts of data I'm not sure HSQLDB will play very nicely. I've had lots of problems in the past as the data set gets larger. H2 database was much better, but if you are dealing with lots of data I'm not sure how this would scale.

  3. #13
    Join Date
    Aug 2006
    Location
    Now Germany, previously Ukraine
    Posts
    1,546

    Default

    I have no clue not only about details but also about general characteristics of your project, so can only wild guess...

    But in 95% of cases bad performance is due to the bugs in the DB design and/or badly written SQL. So first make sure that in this area all is Ok. Only then search for another solutions.

    BTW, sometimes relatevely small changes in the DB design give tremendous performance gain.

    If you will provide more info, then some more targeted advices may be given.

    BTW, overhead of Spring JDBCTemplate is rather minor, so changing to plain JDBC will save you (in the best case) only few percents, more likely fraction of percent.

    Regards,
    Oleksandr Alesinskyy

    Quote Originally Posted by dinsush View Post
    Hello we have an application running, using Spring and database is SQLServer.
    The application is giving us problems wrt to performance.
    We are thinking of using some caching mechanism in our application.

    Any thoughts which is better In memory databases like HSQLDB or H2 or
    java caching solutions like ehcache or JCS.

    We want to cache lots of data in memory/disk that is primarily queried again and again from database and run our application validations against that data. Basically mostly reads very little writes.

    Any suggestions, tips will be very helpful

    Thanks
    Diana

  4. #14
    Join Date
    Sep 2006
    Location
    UK
    Posts
    8,424

    Default

    I guess efficient SQL is always a good starting point. As you say the improvements can be massive. It was mentioned that this is mostly reads. If that's the case, making sure the queries are efficient, and in-memory database and possibily a cache should solve most problems. I think the previous posts have summed up all the options in detail.

  5. #15
    Join Date
    Nov 2006
    Location
    Boston, MA
    Posts
    303

    Default

    We are facing a similar issue: we have an application that must perform traversals of very large graphs. There are use cases when we need to traverse up and down a graph structure that contains, potentially, 200,000-300,000 nodes. Today, the app makes a trip to the DB for each node element and retrieves the IDs of the element's children and parents. The application then goes on to retrieve each child or ancestor element from the database. Needless to say, it is extremely inefficient. It works ok so far only because the data volume has not yet reached its projected size. We have been considering caching the data graph (elements and their associations) using in-memory and disk cache. Would EhCache or JCS be suitable for this? Can any of you guys suggest a good approach?

    Also, are there any good examples of using spring-modules-cache - complete and working. SM 0.8 distribution does not seem to contain any...

    Thanks for any advice!

  6. #16
    Join Date
    Sep 2006
    Location
    UK
    Posts
    8,424

    Default

    I'm sure caching will help your situation. It does sound like your current approach is very inefficient however. It would be good to improve that as well as caching. Your cache is limited by memory so unless you have bucket loads of the stuff you are still going to be hitting the database sometimes.

  7. #17
    Join Date
    Nov 2006
    Location
    Boston, MA
    Posts
    303

    Default

    Thanks for the reply. The current approach comes with the legacy implementation, and we are now working on a completely new architecture. We are thinking of, possibly, preloading the complete graph into cache (it's ok if a portion of it spills on the disk.) There are some specific requirements that dictate the necessity of that kind of traversals, though. Each element may have to be analyzed before you make a decision which way to go further, etc. Some elements may contain thousands of children and the data from all of them is necessary for the analysis to complete. If these elements are stored in cache, it should be fairly fast.

    Do you know of any good examples of using declarative caching with Spring Modules? I have found some stuff on the net but it's all not in a working condition, obsoilete, perhaps. The latest SM 0.8 distribution does not have examples for caching, for some reason.
    Thanks.

  8. #18
    Join Date
    Aug 2006
    Location
    Now Germany, previously Ukraine
    Posts
    1,546

    Default

    Hello,

    do you know all elements that belong to the given graph in advance? If so, your best bet is to read the whole graph into the memory by single DB query and only then operate on it. 300,000 nodes are not so many. Even if size of data assciated with each node are in 1K-2K range your graph would fit in the memory comfortably on each decent machine.

    If you can discover elements only in process of traversing then cache may help as soon as you go over the same element many times. But, in this very case it may (or may not) be better to go with DIY ("do it yourself") rudimentary implementation of the cache - just
    Code:
    HashMap<NodeID,Node> myRudimentaryCache;
    Node getNode(NodeID id) {
        Node result= myRudimentaryCache.get(id);
        if (result==null) result= getNodeFromDB(id);
        return result;
    }
    BTW, how many graphs have traverse simultaneously?

    Regards,
    Oleksandr

    Quote Originally Posted by constv View Post
    We are facing a similar issue: we have an application that must perform traversals of very large graphs. There are use cases when we need to traverse up and down a graph structure that contains, potentially, 200,000-300,000 nodes. Today, the app makes a trip to the DB for each node element and retrieves the IDs of the element's children and parents. The application then goes on to retrieve each child or ancestor element from the database. Needless to say, it is extremely inefficient. It works ok so far only because the data volume has not yet reached its projected size. We have been considering caching the data graph (elements and their associations) using in-memory and disk cache. Would EhCache or JCS be suitable for this? Can any of you guys suggest a good approach?

    Also, are there any good examples of using spring-modules-cache - complete and working. SM 0.8 distribution does not seem to contain any...

    Thanks for any advice!
    Regards,
    Oleksandr

  9. #19
    Join Date
    Nov 2006
    Location
    Boston, MA
    Posts
    303

    Default

    Thanks Oleksandr. I am indeed planning to read all the nodes with a single query. My original thought was to read everything into a concurrent hashmap, and I have actually implemented that solution. The concern is that the potential number of objects - and their sizes - could create problems. So, we thought, a safer solution would be to use some existing caching engine that would allow to spill the data onto the disk. We could, of course, implement our own, but what's theh point of reinventing the wheel?

    The service that traverses the graph will be accessed by multiple clients, however, there will be only one instance of the graph itself. The service needs to be able to traverse the graph and return subsets of the grapgh (e.g by skipping the nodes that don't satisfy the client's criteria.) All that can done efficiently by storing the elements and their associations in maps, and I already have a prototype working with smaller volumes of data. I was just really tempted to try declarative caching. That would make things much cleaner.

  10. #20
    Join Date
    Aug 2006
    Location
    Now Germany, previously Ukraine
    Posts
    1,546

    Default

    Hallo,

    disk cache would be not so much more effective then DB access (you have to remember then production-class databases has their own caches for "hot" database pages/blocks). Memory cache is for sure quite effective, but it would compete with your application for memory. My assumtion is that memory overhead of advanced caches is higher then simple DIY solutions and in your case, with single graph instance, advantages of advanced cache over simpliest DIY solution are not so obvious.

    But - how big are data for single node? Are your concern about memory really well-grounded? As now 4Gb memory is not so big deal, then you my easily spend more then 2Gb just for your graph and this mean 1,000,000 of 2Kb nodes. Seems enough, to satisfy your needs. If it is really so, forget about any kind of caching - DIY or 3rd-party. Do not forget KISS principle

    BTW, I have made similar (to some extent) task almost 20 years ago - on 4Mb 386/486 computers with MS-DOS and C. Nodes data were relatevely small, but number of nodes has reached ~100,000.

    Regards,
    Oleksandr

    Quote Originally Posted by constv View Post
    Thanks Oleksandr. I am indeed planning to read all the nodes with a single query. My original thought was to read everything into a concurrent hashmap, and I have actually implemented that solution. The concern is that the potential number of objects - and their sizes - could create problems. So, we thought, a safer solution would be to use some existing caching engine that would allow to spill the data onto the disk. We could, of course, implement our own, but what's theh point of reinventing the wheel?

    The service that traverses the graph will be accessed by multiple clients, however, there will be only one instance of the graph itself. The service needs to be able to traverse the graph and return subsets of the grapgh (e.g by skipping the nodes that don't satisfy the client's criteria.) All that can done efficiently by storing the elements and their associations in maps, and I already have a prototype working with smaller volumes of data. I was just really tempted to try declarative caching. That would make things much cleaner.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •