Criteria Query returning Duplicate Records / Objects when using Eager Fetching
I have an application that generates and stores directed graph data, then needs to reconstruct graphs from persisted data and perform analysis on them.
When I try to read in and graph data (nodes and links) using a Criteria Query containing nested joins I'm getting duplicate records/objects returned from the database.
My code, and console output of results are included below.
Why is GORM returning duplicate Node objects? I thought that including the outLinks collection in my criteria enabled eager fetching, not a SQL JOIN type of result set.
Why are additional (N+1) queries being executed when I access a Link in the outLinks collection (and also if I access the link.targetNode.id that an outLinks Link refers to)?
I had this working better using Node.findAllByDateCreatedBetween(startDate, endDate, [ Criteria Map ]) and using other LAZYish methods (e.g. requiring one additional query per Node to get its outLinks collection), but I couldn't get it to do eager fetching of the outLinks collection and their link.targetNode objects, which is what I'm trying to accomplish.
I really want to optimize this process of graph reconstruction because I'm dealing with very large graphs, on the order of 10^8 nodes (and an order of magnitude or two more links).
Code:
***** Grails Domain Classes ********************
package graphanalysis
class Node {
NodeCreationTask creationTask
String name
Date dateCreated
static belongsTo = NodeCreationTask
static mapping = {
table: 'nodes'
id generator: 'sequence'
version false
creationTask index: 'CreationTask_Name_Idx,CreationTask_Idx'
name index: 'creationTask_Name_Idx,Name_Idx'
dateCreated index: 'DateCreated_Idx'
}
static hasMany = [outLinks: Link,
inLinks: Link,
comunities: NodeCommunityMap]
static mappedBy = [outLinks: 'sourceNode',
inLinks: 'targetNode',
comunities: 'node']
static constraints = {
name maxSize: 1024, unique: ['creationTask']
dateCreated maxSize: 13
}
}
class Link {
Node sourceNode
Node targetNode
static belongsTo = Node
static mapping = {
table: 'links'
id generator: 'sequence'
version false
sourceNode index: 'Source_Idx'
targetNode index: 'Target_Idx'
}
static constraints = {
targetNode unique: ['sourceNode']
}
}
class NodeCreationTask {
...
}
class NodeCommunityMap {
...
}
***** Grails Service ********************
class CreateGraphService {
def sessionFactory
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
def run(Date startDate, Date endDate) {
def PAGE_SIZE = 10
def curOffset = 0
def List<Node> nodes
def totNodes = 0
// create criteria instance
def criteria = Node.createCriteria()
// get result set
nodes = criteria.list {
between('dateCreated', startDate, endDate)
maxResults PAGE_SIZE
firstResult curOffset
join outLinks {
join targetNode {}
}
}
// -- do pagination here when ready --
// count the results
println( ">>>>> Criteria Query has " + webNodes.size() + " results <<<<<" )
// iterate through results
for (node in nodes) {
def size = node.outLinks.size()
println node
println " has " + size + " outlinks to:"
if (size > 0) {
node.outLinks.each { link ->
println " --> [ " + link.targetNode + " ]"
}
}
++totNodes
}
// -- loop pagination here when ready --
log.info("Loaded a total of " + totNodes + " nodes for the provided date range")
}
}
***** Console output ********************
Hibernate: select this_.id as id11_2_, this_.date_created as date2_11_2_, this_.name as name11_2_, this_.node_creation_task_id as nodecreation4_11_2_, outlinks_a1_.id as id0_0_, outlinks_a1_.source_node_id as source2_0_0_, outlinks_a1_.target_node_id as target3_0_0_, targetnode2_.id as id11_1_, targetnode2_.date_created as date2_11_1_, targetnode2_.name as name11_1_, targetnode2_.node_creation_task_id as web4_11_1_ from node this_ inner join link outlinks_a1_ on this_.id=outlinks_a1_.source_node_id inner join node targetnode2_ on outlinks_a1_.target_node_id=targetnode2_.id where this_.date_created between ? and ? limit ?
>>>>> Criteria Query has 10 results <<<<<
Hibernate: select outlinks0_.source_node_id as source2_11_1_, outlinks0_.id as id1_, outlinks0_.id as id0_0_, outlinks0_.source_node_id as source2_0_0_, outlinks0_.target_node_id as target3_0_0_ from link outlinks0_ where outlinks0_.source_node_id=?
graphanalysis.Node : 6495462
has 5 outlinks to:
Hibernate: select node0_.id as id11_0_, node0_.date_created as date2_11_0_, node0_.name as name11_0_, node0_.node_creation_task_id as web4_11_0_ from node node0_ where node0_.id=?
--> [ graphanalysis.Node : 6495568 ]
Hibernate: select node0_.id as id11_0_, node0_.date_created as date2_11_0_, node0_.name as name11_0_, node0_.node_creation_task_id as web4_11_0_ from node node0_ where node0_.id=?
--> [ graphanalysis.Node : 6495527 ]
Hibernate: select node0_.id as id11_0_, node0_.date_created as date2_11_0_, node0_.name as name11_0_, node0_.node_creation_task_id as web4_11_0_ from node node0_ where node0_.id=?
--> [ graphanalysis.Node : 6495617 ]
--> [ graphanalysis.Node : 6495502 ]
Hibernate: select node0_.id as id11_0_, node0_.date_created as date2_11_0_, node0_.name as name11_0_, node0_.node_creation_task_id as web4_11_0_ from node node0_ where node0_.id=?
--> [ graphanalysis.Node : 6495618 ]
graphanalysis.Node : 6495462
has 5 outlinks to:
--> [ graphanalysis.Node : 6495568 ]
--> [ graphanalysis.Node : 6495527 ]
--> [ graphanalysis.Node : 6495617 ]
--> [ graphanalysis.Node : 6495502 ]
--> [ graphanalysis.Node : 6495618 ]
graphanalysis.Node : 6495462
has 5 outlinks to:
--> [ graphanalysis.Node : 6495568 ]
--> [ graphanalysis.Node : 6495527 ]
--> [ graphanalysis.Node : 6495617 ]
--> [ graphanalysis.Node : 6495502 ]
--> [ graphanalysis.Node : 6495618 ]
graphanalysis.Node : 6495462
has 5 outlinks to:
--> [ graphanalysis.Node : 6495568 ]
--> [ graphanalysis.Node : 6495527 ]
--> [ graphanalysis.Node : 6495617 ]
--> [ graphanalysis.Node : 6495502 ]
--> [ graphanalysis.Node : 6495618 ]
...
graphanalysis.Node : 6495462
has 5 outlinks to:
--> [ graphanalysis.Node : 6495568 ]
--> [ graphanalysis.Node : 6495527 ]
--> [ graphanalysis.Node : 6495617 ]
--> [ graphanalysis.Node : 6495502 ]
--> [ graphanalysis.Node : 6495618 ]
29 Jun 2012 13:37:20,419 [INFO ] grails.app.services.graphanalysis.CreateGraphService- Loaded a total of 50 nodes for the provided date range