Jun 8th, 2011, 08:21 AM
No support for Neo4J's BatchInserter?
Using Spring Data to import a large data set into Neo4J takes forever. It takes less than a minute using Neo4J's Batch Inserter API to import 1000 records, but using Spring Data it didn't finish importing those 1000 records even after running over night.
Spring Data is not very useful to us without a proper bulk data load facility.
Jun 9th, 2011, 02:13 AM
Could you please post your code? What does one "record" mean in your case? Are you batching the insert? (I.e. transactional batches?)
I had Spring Data Graph inserting hundreds of thousands of records in mere seconds.
Jun 9th, 2011, 08:57 AM
I cannot post the code, but it doesn't do anything complex. I just cloned Cineasts project, added couple of additional domain objects to it. Created one type of relationship between these domain objects using @RelatedVia. The relationship itself has a few properties in it. Each of the two domain objects and the relationship contains very small amount of data, less than 100 bytes (Few ints and a couple of strings).
I wrote something very similar to MovieDbImportService in Cineasts project to import from our relational database. Our database itself returns data very fast (I tested by commenting out the lines of code related to loading the data using Spring Data). I removed the @Transactional annotation for loading an individual object that importMove() had, that sped it up slightly, but not by a long shot. Just like Cineasts import, we do "movieRepository.findByPropertyValue("id", id)" to ensure each record does not already exist.
Hopefully this information is useful to you, but you don't need to look into this on my account. We have given up on Spring Data and all other Spring projects that act as interfaces on third party products. We still like your actual products like Spring-IoC or Spring Integration or Spring-Batch or Spring WebMVC. But we've found that your projects that act as facades on third party products (JMS, AMQP, Data, ORM, etc) only hide the true power of the underlying product with the marginal benefit of easier and faster development.
Jun 9th, 2011, 06:25 PM
I think your problems are mainly due to two reasons - first too small transaction size (you should insert about 10k element in one tx). Secondly using the relationship-collections to add lots of new relationships (this has quadratic complexity, using entity.relateTo(target, type) is much faster. You should have gotten back to us (or onto the neo4j mailing list much sooner, then we could have helped you w/o you needing to abandon you invested time and work).
Did you profile the importer at some point? How large was your dataset?
Do you continue to use the underlying technologies though? i.e. jms, amqp, neo4j ?