I have no clue not only about details but also about general characteristics of your project, so can only wild guess...
But in 95% of cases bad performance is due to the bugs in the DB design and/or badly written SQL. So first make sure that in this area all is Ok. Only then search for another solutions.
BTW, sometimes relatevely small changes in the DB design give tremendous performance gain.
If you will provide more info, then some more targeted advices may be given.
BTW, overhead of Spring JDBCTemplate is rather minor, so changing to plain JDBC will save you (in the best case) only few percents, more likely fraction of percent.
Regards,
Oleksandr Alesinskyy
I guess efficient SQL is always a good starting point. As you say the improvements can be massive. It was mentioned that this is mostly reads. If that's the case, making sure the queries are efficient, and in-memory database and possibily a cache should solve most problems. I think the previous posts have summed up all the options in detail.
We are facing a similar issue: we have an application that must perform traversals of very large graphs. There are use cases when we need to traverse up and down a graph structure that contains, potentially, 200,000-300,000 nodes. Today, the app makes a trip to the DB for each node element and retrieves the IDs of the element's children and parents. The application then goes on to retrieve each child or ancestor element from the database. Needless to say, it is extremely inefficient. It works ok so far only because the data volume has not yet reached its projected size. We have been considering caching the data graph (elements and their associations) using in-memory and disk cache. Would EhCache or JCS be suitable for this? Can any of you guys suggest a good approach?
Also, are there any good examples of using spring-modules-cache - complete and working. SM 0.8 distribution does not seem to contain any...
Thanks for any advice!
I'm sure caching will help your situation. It does sound like your current approach is very inefficient however. It would be good to improve that as well as caching. Your cache is limited by memory so unless you have bucket loads of the stuff you are still going to be hitting the database sometimes.
Thanks for the reply. The current approach comes with the legacy implementation, and we are now working on a completely new architecture. We are thinking of, possibly, preloading the complete graph into cache (it's ok if a portion of it spills on the disk.) There are some specific requirements that dictate the necessity of that kind of traversals, though. Each element may have to be analyzed before you make a decision which way to go further, etc. Some elements may contain thousands of children and the data from all of them is necessary for the analysis to complete. If these elements are stored in cache, it should be fairly fast.
Do you know of any good examples of using declarative caching with Spring Modules? I have found some stuff on the net but it's all not in a working condition, obsoilete, perhaps. The latest SM 0.8 distribution does not have examples for caching, for some reason.
Thanks.
Hello,
do you know all elements that belong to the given graph in advance? If so, your best bet is to read the whole graph into the memory by single DB query and only then operate on it. 300,000 nodes are not so many. Even if size of data assciated with each node are in 1K-2K range your graph would fit in the memory comfortably on each decent machine.
If you can discover elements only in process of traversing then cache may help as soon as you go over the same element many times. But, in this very case it may (or may not) be better to go with DIY ("do it yourself") rudimentary implementation of the cache - just
BTW, how many graphs have traverse simultaneously?Code:HashMap<NodeID,Node> myRudimentaryCache; Node getNode(NodeID id) { Node result= myRudimentaryCache.get(id); if (result==null) result= getNodeFromDB(id); return result; }
Regards,
Oleksandr
Regards,
Oleksandr
Thanks Oleksandr. I am indeed planning to read all the nodes with a single query. My original thought was to read everything into a concurrent hashmap, and I have actually implemented that solution. The concern is that the potential number of objects - and their sizes - could create problems. So, we thought, a safer solution would be to use some existing caching engine that would allow to spill the data onto the disk. We could, of course, implement our own, but what's theh point of reinventing the wheel?
The service that traverses the graph will be accessed by multiple clients, however, there will be only one instance of the graph itself. The service needs to be able to traverse the graph and return subsets of the grapgh (e.g by skipping the nodes that don't satisfy the client's criteria.) All that can done efficiently by storing the elements and their associations in maps, and I already have a prototype working with smaller volumes of data. I was just really tempted to try declarative caching. That would make things much cleaner.
Hallo,
disk cache would be not so much more effective then DB access (you have to remember then production-class databases has their own caches for "hot" database pages/blocks). Memory cache is for sure quite effective, but it would compete with your application for memory. My assumtion is that memory overhead of advanced caches is higher then simple DIY solutions and in your case, with single graph instance, advantages of advanced cache over simpliest DIY solution are not so obvious.
But - how big are data for single node? Are your concern about memory really well-grounded? As now 4Gb memory is not so big deal, then you my easily spend more then 2Gb just for your graph and this mean 1,000,000 of 2Kb nodes. Seems enough, to satisfy your needs. If it is really so, forget about any kind of caching - DIY or 3rd-party. Do not forget KISS principle
BTW, I have made similar (to some extent) task almost 20 years ago - on 4Mb 386/486 computers with MS-DOS and C. Nodes data were relatevely small, but number of nodes has reached ~100,000.
Regards,
Oleksandr