Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: fastest way to batch insert w/ hibernate

  1. #1

    Default fastest way to batch insert w/ hibernate

    Hi

    I have a situation where i need to create a List of say 10000 objects. What's the best way to do this? does spring have something?

    13.1. Batch inserts

    When making new objects persistent, you must flush() and then clear() the session regularly, to control the size of the first-level cache.

    Session session = sessionFactory.openSession();
    Transaction tx = session.beginTransaction();

    for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
    //flush a batch of inserts and release memory:
    session.flush();
    session.clear();
    }
    }

    tx.commit();
    session.close();
    now i'm wondering if spring does the flushing & clearing for me? notice the tx.commit(), i surely don't need that ... just wondering if i'm wasting time saying flush & clear and what spring can help me with, if anything, for batch processing

    thx

  2. #2
    Join Date
    Sep 2006
    Location
    UK
    Posts
    8,424

    Default

    I think the best way of answering this is simply to measure the performance. Flushing too often can cause just as many problems as not doing it often enough.

    HibernateTemplate.saveOrUpdateAll(..) lets you insert lots of times at once. This doesn't perform any flushing though. If you want to do this I would recommend HibernateTemplate.execute(HibernateCallback). You can then perform the flush and clear as you want to. As for the tx.commit, if you are using Spring TransactionTemplate of declarative transactions, you don't need to worry about this.
    http://www.springframework.org/docs/....C ollection)
    http://www.springframework.org/docs/...rnateCallback)

  3. #3

    Default

    yeah i prob should have mentioned that i am using OSIV and all the spring transaction management etc etc. this is very nice stuff! it would be really cool if i could figure out how to not use OSIV and hibernate callback but nothing worked. maybe i'll try that again next week, if i can avoid OSIV it will make my testing context a little simpler.

    just wondering if my explicit call to

    Code:
    private void flushAndClear()  {
        if (getSession().isDirty())  {
             getSession().flush();
             getSession().clear();
        }
    }
    is actually donig anything, i would think that because springs hibernate infrastructure is so good, that i would just bother it by doing lower level stuff like that myself.

    i guess i'll try a few experiements with a large dataset, we'll see what happens and i'll post back.

  4. #4
    Join Date
    Sep 2006
    Location
    UK
    Posts
    8,424

    Default

    It's very easy not to use OSIV, I'm not using in my current project. I you post a thread for your OSIV issue, I'm sure someone might help. Your call to flush will be doing something, and personally I can't remember the last time I had to call flush. The Spring transaction will do this for you on commit. Large datasets however might be one example of where you might want to use it though.

  5. #5

    Default

    ok well i ran some experiments here. unless i am doing something wrong (very possible), it looks like the answer is to 1) don't mess with the batch size at all, 2) insert objects one by one, 3) don't flush & clear

    i'm quite surprised, but the results speak for themselves: (3700 rows with about 80 columns being read from a csv file. if anyone needs more details on the experiment i'm happy to provide, but i'm 90% sure these are good results

    Code:
    17:05:12,968  INFO Experiment:25 - create individually no batch size - 17.848 seconds.
    
    17:05:31,273  INFO Experiment:32 -batch create with 8 51.575 seconds.
    17:06:23,084  INFO Experiment:32 - batch create with 16  64.398 seconds.
    17:07:27,688  INFO Experiment:32 - batch create with 32 70.037 seconds.
    17:08:37,930  INFO Experiment:32 - batch create with 64  146.881 seconds.
    17:11:05,063  INFO Experiment:32 - batch create with 128 107.341 seconds.
    17:12:52,577  INFO Experiment:32 - batch create with 256 181.478 seconds.
    anyone else find that odd that it is actually slower to do what hibernate recommends as in my first post? maybe i'm doing something wrong, maybe spring does a great job by itself? i'm pretty sure the batch size is getting set correctly. maybe because i'm iterating through the list twice to do the batch create (read from csv, create list)? that is bad but still wouldn't explain the fact that setting the batch size to 8 or 32 is that much worse, it should be at most 2x the individual create .. any ideas?

    here's the relevant portions of the code,

    Code:
    public int createIndividually()  {
        while (csv.readRecord())  {
            MyObject myObject = readRowFromCsvFile(csv);
            if (myObject != null)  {
                getDAO().create(myObject);
                ++newRows;
    	}
         }
         return newRows;
    }


    Code:
    public int batchCreate()  {
        List<MyObject> myObjectList = new ArrayList<MyObject>();
        while (csv.readRecord())  {
            MyObject myObject = readRowFromCsvFile(csv);
            if (myObject != null)  {
                myObjectList.add(myObject);
                ++newRows;
    	}
         }
         return getDAO().batchCreate(myObjectList);
    }

    Code:
    public int batchCreate(final List<Entity> entityList)  { // in the DAO
        Long insertedCount = 0L;
        for (int i = 0; i < entityList.size(); ++i) {
            create(entityList.get(i));
            if (++insertedCount % batchSize == 0) {
                flushAndClear();
    	}
        }
        flushAndClear();
        return insertedCount;
    }
    
    
    protected void flushAndClear()  {
        if (getSession().isDirty()) {
            getSession().flush();
            getSession().clear();
        }
    }

  6. #6
    Join Date
    Sep 2006
    Location
    UK
    Posts
    8,424

    Default

    I'm not surprised that the performance wasn't as clear cut as assumed, these things tend to work like that. It would be go to build the list first and then run the tests. There's not much point unless it's fair. Another issue is where the transaction boundary is here? I've found flushing the Session to be something I tend to stay awat from and just let Hibernate manage it.

  7. #7
    Join Date
    Jan 2007
    Posts
    139

    Default

    Quote Originally Posted by lloyd.mcclendon View Post
    ok well i ran some experiments here. unless i am doing something wrong (very possible), it looks like the answer is to 1) don't mess with the batch size at all, 2) insert objects one by one, 3) don't flush & clear

    i'm quite surprised, but the results speak for themselves: (3700 rows with about 80 columns being read from a csv file. if anyone needs more details on the experiment i'm happy to provide, but i'm 90% sure these are good results

    Code:
    17:05:12,968  INFO Experiment:25 - create individually no batch size - 17.848 seconds.
    
    17:05:31,273  INFO Experiment:32 -batch create with 8 51.575 seconds.
    17:06:23,084  INFO Experiment:32 - batch create with 16  64.398 seconds.
    17:07:27,688  INFO Experiment:32 - batch create with 32 70.037 seconds.
    17:08:37,930  INFO Experiment:32 - batch create with 64  146.881 seconds.
    17:11:05,063  INFO Experiment:32 - batch create with 128 107.341 seconds.
    17:12:52,577  INFO Experiment:32 - batch create with 256 181.478 seconds.
    anyone else find that odd that it is actually slower to do what hibernate recommends as in my first post? maybe i'm doing something wrong, maybe spring does a great job by itself? i'm pretty sure the batch size is getting set correctly. maybe because i'm iterating through the list twice to do the batch create (read from csv, create list)? that is bad but still wouldn't explain the fact that setting the batch size to 8 or 32 is that much worse, it should be at most 2x the individual create .. any ideas?

    here's the relevant portions of the code,

    Code:
    public int createIndividually()  {
        while (csv.readRecord())  {
            MyObject myObject = readRowFromCsvFile(csv);
            if (myObject != null)  {
                getDAO().create(myObject);
                ++newRows;
    	}
         }
         return newRows;
    }


    Code:
    public int batchCreate()  {
        List<MyObject> myObjectList = new ArrayList<MyObject>();
        while (csv.readRecord())  {
            MyObject myObject = readRowFromCsvFile(csv);
            if (myObject != null)  {
                myObjectList.add(myObject);
                ++newRows;
    	}
         }
         return getDAO().batchCreate(myObjectList);
    }

    Code:
    public int batchCreate(final List<Entity> entityList)  { // in the DAO
        Long insertedCount = 0L;
        for (int i = 0; i < entityList.size(); ++i) {
            create(entityList.get(i));
            if (++insertedCount % batchSize == 0) {
                flushAndClear();
    	}
        }
        flushAndClear();
        return insertedCount;
    }
    
    
    protected void flushAndClear()  {
        if (getSession().isDirty()) {
            getSession().flush();
            getSession().clear();
        }
    }
    I am doing a similar thing w/batch processing. I am basically adding elements to a Collection in a Swing Client, then passing this collection by way of HttpInvoker to a Service Layer Bean, which then reads the elements, saves each other, then checks the designated batch size and if met, commits the transaction.

    This works fine if one instance of the process is being executed. However, when run concurrently w/one or more of the same processes, I see the dreaded "multiple sessions attempted to access collection" exception. I have even tried to change the scope on the Service Bean to "prototype", thinking that each getBean() call would deliver a unique bean from the Spring container, hence, a separate collection, but no luck. I have also tried various combinations of AOP/Interceptor controlled transaction demarcation as well as programatically(not HibernateTemplate) demarcated code, using SessionFactory.openSession() for the programmatic style, and SessionFactory.getCurrentSession() for AOP, and get the same exception.

    Is there a way to successfully get concurrent batch inserts working when the process is initiated from the Client and not on the Server?

  8. #8
    Join Date
    Sep 2004
    Posts
    1,086

    Default

    "multiple sessions attempted to access collection" exception means you have an object A referencing a collection C and object B refencing the same collection C. Then you try to bind the object A with session1 and object B with session2. To which session should Hibernate bind the collection C?

  9. #9
    Join Date
    Jan 2007
    Posts
    139

    Default

    Quote Originally Posted by dejanp View Post
    "multiple sessions attempted to access collection" exception means you have an object A referencing a collection C and object B refencing the same collection C. Then you try to bind the object A with session1 and object B with session2. To which session should Hibernate bind the collection C?
    I understand that part. However, I should be able to get this to work by setting the scope property of my bean which handles the collection to prototype, for example, here is the snipet from my Spring app context, which runs within my distributed web app:
    Code:
    <bean id="importXTFService" class="com.xrite.ind.backcheck.service.imports.ImportXTFServiceImpl" scope="prototype">
            <property name="sessionFactory" ref="backcheckSessionFactory" />
        </bean>
    According to the Spring docs . . .
    creation of a new bean instance every time a request for that specific bean is made (that is, it is injected into another bean or it is requested via a programmatic getBean() method call on the container)
    This would imply that I should have a unique instance of my ImportXTFServiceImpl bean every time my client request the operation. However, the client requests the operation by way of Spring Remoting/HTTP Invoker, so I am wondering if the proxy has something to do w/the reason why I am not getting a unique instance of my bean for each call.

  10. #10
    Join Date
    Sep 2004
    Posts
    1,086

    Default

    Nope, that will not work. Hibernate doesn't care about instances of your service class, but instances of hibernate persistent classes referenced in multiple hibernate sessions.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •