large transaction handling strategies?
Heya,
One of the common problems I always run into when running really large batch load processes (millions of update) is how to handle large transaction sizes. More often then not, I have to add two extra columns to my data and perform lots of mini transactions (new transactions) followed by one large "update" which makes all the new records live, and/or deletes the records being replaced. The first column added ties a record to a particular update job and the second column indicates if that record is live or in the process of being added (this usually coincides with a database view over the data for the production instance which only shows "live" records). This is a particularly tricky operation, and performing cleanup if the batch process fails is also now not as easy as relying on a normal transaction to take on the processing. What I'd like to see is something to make this easier... any ideas?
I guess one option would be to generalise some of the code I wrote to do this and allow people to plug in their DAO of choice to actually store the records. The project I was working on previously was an open source biodiversity portal so there shouldn't be any issues as far as code sharing - just time to generalise the code into an acceptable solution.
Cheers,
Dave