Results 1 to 10 of 12

Thread: Loading a file into the database in 1.0.?

Hybrid View

  1. #1

    Question Loading a file into the database in 1.0.?

    Hi,
    We have a huge file which needs to be uploaded into a table. We had done this using a customized ItemProviderProcessTasklet in m3. I don't know how to migrate this to current release 1.0.0.

    In m3 we just ask the user to configure the file path, linefilter (to skip the lines not just header and footer), table name, isBatchUpload, batchUploadSize..etc.

    Based on the configuration above, we used to create the ItemProvider and ItemProcessor in the afterProertiesSet() method. If isBatchUpload is true, then we write the contents into a temporary staging file in the table-needed format before uploading into the mysql table using the command "LOAD DATA LOCAL INFILE c:\equity.g INTO TABLE equity". Also if the item process count equals to batchUploadSize, we upload the data into the table and will reset the ItemProcessor in execute() to write records into the stagingfile from [batchUploadSize+1]. This would continue till all the items read from the itemProvider.

    This was the code in m3..

    public class FileLoaderTasklet extends ItemProviderProcessTasklet {
    ...
    ...
    @Override
    public ExitStatus execute() throws Exception {
    if(count++ == 0) {
    if(truncateTable) {
    jdbcTemplate.update("delete from " + tableName);
    }
    }

    ExitStatus status = ExitStatus.CONTINUABLE;

    // Handle the exceptions from ItemProvider gracefully
    try {

    status = super.execute();
    }
    /* This catch block is to catch and rethrow the exception while parsing an xml file of size ZERO
    *
    * */
    catch(DataAccessResourceFailureException eofException){
    if(eofException.getRootCause() instanceof WstxEOFException)
    throw eofException;
    }
    catch(Exception e){
    logger.error("Error in parsing/loading: "+e);
    handleProviderException(e);
    }

    // Finalize
    if(!status.isContinuable()) {
    if(useBatchUpload) {
    loadFileIntoDB(tempStagingFile, tableName);
    }
    count = 0;
    }
    else {
    if(useBatchUpload && count%batchUploadSize == 0){
    jdbcTemplate.update("LOAD DATA LOCAL INFILE c:\equity.g INTO TABLE equity");
    ItemWriterItemProcessor itemProcessor = new ItemWriterItemProcessor();
    FlatFileItemWriter writer = new FlatFileItemWriter();
    tempStagingFile = File.createTempFile("BFL", "data");
    writer.setResource(new FileSystemResource(tempStagingFile));
    String[] columns = BatchJdbcUtils.getColumnNames(jdbcTemplate, tableName);
    writer.setConverter(new FieldSetToDBStringConvertor(columns));
    itemProcessor.setItemWriter(writer);
    // Reset the Item processor
    setItemProcessor(itemProcessor);
    }
    }
    return status;
    }
    ...
    ...
    }

    I am not sure how do i do this using the current ItemOrientedStep approach.

    Any hints?. Please let me know if you need more information.

    thanks in advance,
    ramkris

  2. #2
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    It's actually much more natural to implement this scenario in 1.0. You can simply use a StepExecutionListener and implement your logic in the afterStep method. The following section in the reference documentation should help to explain it:

    http://static.springframework.org/sp...n.html#d0e3618

  3. #3

    Default

    Thanks lucas for your quick reply.

    But implementing a stepExecutionListener will not solve all the issues. Here is my question. Lets say i have 1,00,000 records in the file. I don't want to upload all the records at once. I want to do that in batches. Lets say i want to insert 10,000 records at a time. How do i achieve this using ItemOrientedStep?. If you see the code above, i do that and create a new FlatFileItemWriter for every 10,000 records.


    However I find a new class BatchSqlUpdateItemWriterin 1.0.0 release. Should i use this class instead of the above approach?Also when the doflush() will be called in this case?.

    thanks,
    ramkris

  4. #4
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    If your data is very clean, using the BatchSqlUpdateItemWriter is a good option. However, if you expect the data might cause a lot of write failures you might want to rethink using it. Flush is called just before the transaction is committed (when the commit interval is complete) If you have an interval of say, 10,000, and one record causes say a PK violation, you have absolutely no way of knowing which record caused the error in order to skip it. In that case, the write simply writes each one to the database one at a time in an attempt to figure out which record was bad.

  5. #5

    Question

    OK. Can you explain in detail how does the writer identifies the bad record?

    Also what is the best way to upload a file into the database considering the file might contain bad records?

  6. #6
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    It's not very complex. When you call write on the writer it stores a list of items to be written. When flush is called, just before committing the business transaction, it attempts to write them all out using Jdbc batch mode. This means that if your commit interval is 100, there will be 100 items written out during this flush. If there's an error, there's no way for the framework to know which item caused it, so after rolling back the transaction and starting from the beginning, it will 'flush' after each item, meaning it will just write it out to the database. In this way the framework can know that a specific item caused the failure, and use it's normal skip logic to mark the record as bad. As you can imagine, if your data is fairly dirty, it will have to do a lot of rollbacks to find and skip the bad record, destroying any speed advantage you gained by using the batched update.

    To be honest, I wouldn't use the BatchUpdate writer at first for any job. Instead I would just use a normal DAO and write the record out like normal. Once you get the job working you can start playing around with the skip limit and commit interval, to see how long an average job takes to process. If it's within acceptable limits, there's really no reason to go any farther. If not, and you think the data is reasonably clean, you could try the BatchSqlUpdateItemWriter in conjunction with increasing the commit interval to see if you get any speed boost. You could also try dumping the data into a staging table, which shouldn't have any issues with write failures (keep in mind that read failures won't cause you any issues either way, since they don't cause a rollback) Once it's in a staging table there's lots of things you could try, including partitioning the data and processing with multiple jobs, or many other techniques. It all really depends upon your data and your processing requirements.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •