Good news. Dave, thank you for all of your help.
I was able to get my spring-hadoop project running with some limited version of hibernate support. It was not possible to remove hibernate entirely from my dependencies, so that mandated finding something that works.
This went at the top of my Mapper:
Code:
//the following session magic is needed to ensure the whole map phase happens within one transcation
//which ensures that objects stay bound to their hibernate proxy objects
Session session = SessionFactoryUtils.getSession(sessionFactory, true);
TransactionSynchronizationManager.bindResource(sessionFactory, new SessionHolder(session));
and this went at the bottom of my mapper:
Code:
TransactionSynchronizationManager.unbindResource(sessionFactory);
session.close(); //otherwise, sessions go to sleep and hit max connections very quickly
The performance boost was significant - the spring batch implementation of the same solution took 10.5 hours to process 93,000 records. The spring-hadoop implementation completed the same 93,000 records in 3.25 hours running in a single node configuration. I haven't tried distributing the task yet.
This is an odd combination of technologies, but I hope this bit helps somebody out there. Shortly, I hope to write some how-to's and/or example code to submit to spring hadoop. Great project guys, I look forward to seeing it grow.
Rajat Banerjee