May 21st, 2012, 07:59 PM
Partitioner feeding a list to a JdbcCursorItemReader
I'm trying to configure a job that uses a custom partitioner to parallelize processing of the resulting items. I'm having some trouble using the partitioner results correctly with a JdbcCursorItemReader as part of the partition step. Let me try to explain with an example: given a data model where we have customers, and each customer has many accounts, and each account has many transactions, I need to write a report for a given customer that contains all the transactions for all of the customer accounts.
Ideally, I would like this partitioner to be able to take a job parameter (for example, a customer ID), and to split the list of accounts into multiple buckets using the grid size parameter. Then this list of accounts is placed on the step execution context, which is used by the partition step. This step is using a standard JdbcCursorItemReader. The problem is that I would like to define a SQL string with a parameter placeholder in the IN clause that can take the list of accounts. Unfortunately, the JdbcCursorItemReader is not able take a list of values and set them on a single placeholder. Right now I've coded my partitioner to simply ignore the grid size and instead always put one account per partition, but I think that this approach might not scale well if a customer has hundreds of accounts, which leads to the creation of as many partitions when the job starts.
It seems to me it is a very common case that a partitioner would split a list of things into several sublists, based on grid size, and then those sublists would need to be processed. Can someone please help me with this problem, or perhaps recommend a different design if you think there is a better way? Thank you!
May 29th, 2012, 06:00 PM
anyone able to help, please?