ok well i ran some experiments here. unless i am doing something wrong (very possible), it looks like the answer is to 1) don't mess with the batch size at all, 2) insert objects one by one, 3) don't flush & clear
i'm quite surprised, but the results speak for themselves: (3700 rows with about 80 columns being read from a csv file. if anyone needs more details on the experiment i'm happy to provide, but i'm 90% sure these are good results
Code:
17:05:12,968 INFO Experiment:25 - create individually no batch size - 17.848 seconds.
17:05:31,273 INFO Experiment:32 -batch create with 8 51.575 seconds.
17:06:23,084 INFO Experiment:32 - batch create with 16 64.398 seconds.
17:07:27,688 INFO Experiment:32 - batch create with 32 70.037 seconds.
17:08:37,930 INFO Experiment:32 - batch create with 64 146.881 seconds.
17:11:05,063 INFO Experiment:32 - batch create with 128 107.341 seconds.
17:12:52,577 INFO Experiment:32 - batch create with 256 181.478 seconds.
anyone else find that odd that it is actually slower to do what hibernate recommends as in my first post? maybe i'm doing something wrong, maybe spring does a great job by itself? i'm pretty sure the batch size is getting set correctly. maybe because i'm iterating through the list twice to do the batch create (read from csv, create list)? that is bad but still wouldn't explain the fact that setting the batch size to 8 or 32 is that much worse, it should be at most 2x the individual create .. any ideas?
here's the relevant portions of the code,
Code:
public int createIndividually() {
while (csv.readRecord()) {
MyObject myObject = readRowFromCsvFile(csv);
if (myObject != null) {
getDAO().create(myObject);
++newRows;
}
}
return newRows;
}
Code:
public int batchCreate() {
List<MyObject> myObjectList = new ArrayList<MyObject>();
while (csv.readRecord()) {
MyObject myObject = readRowFromCsvFile(csv);
if (myObject != null) {
myObjectList.add(myObject);
++newRows;
}
}
return getDAO().batchCreate(myObjectList);
}
Code:
public int batchCreate(final List<Entity> entityList) { // in the DAO
Long insertedCount = 0L;
for (int i = 0; i < entityList.size(); ++i) {
create(entityList.get(i));
if (++insertedCount % batchSize == 0) {
flushAndClear();
}
}
flushAndClear();
return insertedCount;
}
protected void flushAndClear() {
if (getSession().isDirty()) {
getSession().flush();
getSession().clear();
}
}