I'm writing an importer for very bog standard CSVs where the exact order the attributes come in is user defined in the header (where attribute names are comma separated). Now, I'm new to Spring Batch (particularly 2.0) but, I thought this would be a relatively simple thing to arrange. I've now got it to work (I think), but since I had to go about it in such a arse backwards way I thought I would post my findings here for three reasons,
1) Someone might be able to explain a way of doing it better and faster.
2) Someone might be able to spot some important step I missed (I'm still not sure my job will recover from interruption correctly).
3) This might be a useful record for anyone else trying to solve the same (very common I would have thought) problem.
Let's start with the FlatFileItemReader, mainly because that was where I started and originally thought I would very quickly finish up.
The last two attributes seem to be new to Spring Batch 2.0. Back when I was doing the same thing in Spring Batch 1.1 I only had a "firstLineIsHeader" attribute instead. The new flexibility is cooller, but one thing the old style did "for free" was automatically assume that the first line was full of attribute names and populate my tokenizer with them. The new FlatFileItemReader doesn't seem to do that and I had to create a LineCallbackHandler to try and achieve the same things myself.Code:<bean id="productFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader"> <property name="resource" value="classpath:dataload/products.csv" /> <property name="lineMapper" ref="lineMapper" /> <property name="linesToSkip" value="1" /> <property name="skippedLinesCallback" ref="csvHeaderCallbackHandler" /> </bean>
Code:<bean id="csvHeaderCallbackHandler" class="com.javelingroup.dataload.CSVHeaderCallbackHandler"> </bean>As you can see, handling the line is easy but finding somewhere to save it for later reference is a major drag. After digging around in the forums (eg here and here) it began to look like I should get a handle on the JobContext object and bung the line in there. This would have the added bonus (in theory) of saving the header if my job was interrupted. However, getting hold of the JobContext is a bit of a drag. As well as defining my @BeforeStep I had to register the listener with my jobCode:public class CSVHeaderCallbackHandler implements LineCallbackHandler { ExecutionContext jobExecutionContext; @BeforeStep public void setJobExecutionContext(final StepExecution stepExecution) { final JobExecution jobExecution = stepExecution.getJobExecution(); jobExecutionContext = jobExecution.getExecutionContext(); } @Override public void handleLine(final String line) { jobExecutionContext.put("header", line); } }
In that job, I also had to register a listener for the bean which would be needing to get the header line back out of the JobContext again in order to interpret individual lines (csvHeaderBasedLineTokenizer). To make that happen was a whole lot more config and code,Code:<batch:job id="ImportProductsJob" parent="simpleJob" > <batch:step id="step1" parent="simpleStep"> <batch:tasklet> <batch:chunk reader="productFileItemReader" writer="itemWriter" commit-interval="10" > <batch:listeners> <batch:listener ref="jobParamsProvider" /> <batch:listener ref="csvHeaderCallbackHandler" /> <batch:listener ref="csvHeaderBasedLineTokenizer" /> </batch:listeners> </batch:tasklet> </batch:step> </batch:job>
Code:<bean id="csvHeaderBasedLineTokenizer" class="com.javelingroup.dataload.CSVHeaderBasedLineTokenizer"> </bean>Code:<bean id="lineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper"> <property name="lineTokenizer" ref="csvHeaderBasedLineTokenizer" /> <property name="fieldSetMapper"> <bean class="com.javelingroup.dataload.product.ProductMapper" /> </property> </bean>With all that plugged together and wired in the whole thing works, but REALLY!!! What a drag! Is the architecture really this cumbersome? I like having the header string saved in state, but I'd LOVE to find a cleaner way of doing it.Code:public class CSVHeaderBasedLineTokenizer extends DelimitedLineTokenizer { private static Logger log = Logger.getLogger(CSVHeaderBasedLineTokenizer.class); ExecutionContext jobExecutionContext; @Override public FieldSet tokenize(final String line) { if (!hasNames()) { setNames(super.tokenize( (String) jobExecutionContext.get("header")).getValues()); log.info("Token names successfully picked up from header [ " + // StringUtils.join(names, ", ") + "]"); } return super.tokenize(line); } @BeforeStep public void setJobExecutionContext(final StepExecution stepExecution) { final JobExecution jobExecution = stepExecution.getJobExecution(); jobExecutionContext = jobExecution.getExecutionContext(); } }


Reply With Quote
