Page 2 of 2 FirstFirst 12
Results 11 to 14 of 14

Thread: CSV file process - multi-thread; which class to use?

  1. #11
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    There have been discussions about this type of approach, but I have yet to hear of an actual implementation. Unlike the approach of staging with a database table, which would require a single staging step first, then a second step to process using the staging table as input, you would need to write a special tasklet that would take the returned item from the provider, and put it in a queue, and each ItemProcessor would need to get the item to process from said queue. You would also need to make sure that there was some way to throttle the producer (ItemProvider) so that it doesn't accidentally add too much data and cause the queue to fail.

    Again, I would only try this approach if you absolutely have to because of load issues. You could easily write an ItemProvider and ItemProcessor that could stay the same regardless of the solution, and try it without any additional threads.

  2. #12

    Default

    We implemented our batch using only files (xml files, each file one record to be processed) and it is working fine. We did this in part because the existing Business Processes already handled transaction (and it was not easy to handle it from the batch right now) and because uploading XML to the database was not straightforward. Either way, if the transaction is controlled in the service and not in the batch, it doesn't make any difference either way.

    We created some classes to manipulate files (renaming, moving, and the like) and we manage the state of the file by adding or changing names. The (big) problem with this approach is that the process may commit and the batch VM may fail before renaming the file being processed and hence getting an inconsistent state (it is also easily recoverable)..

    We are planning on contributing these classes for manipulating files.

    Just wanted to show another scenario where the staging table may not be as useful.

    Lucas, do we have an example with multiple chunks? How would the stepOperations and ItemProvider be configured?

  3. #13
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    Quote Originally Posted by sotretus View Post
    because uploading XML to the database was not straightforward.
    I'm assuming this means importing XML files with spring-batch, if so, keep an eye out for some upcoming changes to XMLInputSources that should be committed tomorrow. Hopefully, it makes that scenario a little more straightforward and extensible.

    Quote Originally Posted by sotretus View Post
    We created some classes to manipulate files (renaming, moving, and the like) and we manage the state of the file by adding or changing names. The (big) problem with this approach is that the process may commit and the batch VM may fail before renaming the file being processed and hence getting an inconsistent state (it is also easily recoverable)..

    We are planning on contributing these classes for manipulating files.
    I would still not recommend manipulating files within your batch processes unless you absolutely have to. Instead, an EAI solution that would rename/move, or upload when a file is completed would be a better solution. I say this because, in my experience, file moving can cause a lot of issues that could needlessly hold up a batch stream, even though it generally has no thing to do with whether or not processing was actually successful.


    Quote Originally Posted by sotretus View Post
    Lucas, do we have an example with multiple chunks? How would the stepOperations and ItemProvider be configured?
    I'm not sure I understand what you mean here, do you mean, an example of kicking off an ItemProvider in multiple threads? If so, we don't have an example yet. It's still strictly theoretical.

  4. #14

    Default

    Lucas

    Thanks for all the suggestions we will make sure to take them into account.
    What I meant with the "multiple chunks" stuff is if we have an example where a chunkOperation RepeatTemplate is executed several times by the steopOperations RepeatTemplate. In a more functional view, this is when you want to commit several records together in chunks, but have multiple chunks of records to commit.

    I.e.: I need to transform and update 1000 rows, but I want to process them 100 at a time.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •