Results 1 to 10 of 10

Thread: Merge multiple input source as one

  1. #1

    Lightbulb Merge multiple input source as one

    Hi,
    Two features of batch that I am interested in are Merge Multiple Input and Sort Input. I saw both features have been identified by Spring Batch, but I do not find out when you will implement/release it.
    Could you please let me know when we will implement these feature? thanks.

    • Merge - A program that reads records from multiple input files and produces one output file with combined data from the input files. Merges can be tailored or performed by parameter-driven standard system utilities.
    • Sort - A Program that reads an input file and produces an output file where records have been re-sequenced according to a sort key field in the records. Sorts are usually performed by standard system utilities.
    David Yinwei Liu

  2. #2
    Join Date
    Jun 2005
    Posts
    4,231

    Default

    It would probably be more efficient to use the "standard system utilities" you mentioned, rather than try and code merge or soret in Java. You can still launch them from Spring Batch if you want to. Or do you need something in Java for a particular reason?

  3. #3

    Default

    Hi Dave,
    Thank you for your reply.
    I am thinking the situation: we have two different input resources (one is the data in table, another is a file), I want to merge both input as one input in spring batch, and then sort them base on PK. Next, I will process all items. In this case, I need to extract all data from table and them write to a file, and then use *system utilities* (cat in unix) to merge them as one file.
    Is it the solution that you prefer? Do we have a java solution? What about Windows, as far as I know windows does not have this kind of command?
    David Yinwei Liu

  4. #4
    Join Date
    Jun 2005
    Posts
    4,231

    Default

    Why don't you just load the data into the database and use SQL to sort it then?

  5. #5
    Join Date
    Apr 2008
    Posts
    174

    Default

    If your sorting and merging is going to be pretty straight-forward, you can use shell script and use sort functionality (I have never tried, but the documentation says you can call your script from Tasklet):
    http://unixhelp.ed.ac.uk/utilities2/sort.html

    Also there is a more powerful tool recommended by another member in this forum, but that will cost per license:
    http://forum.springframework.org/sho...ight=sort+file

    File I/O should be faster than Database I/O if you are talking about millions of records.

  6. #6

    Default

    thank you very much for your reply.
    Yes, I can load data into database and use SQL to sort them. but what about performance? using file to operate millions of records is much faster than database. As I knew a lot of ETL tools suggest the job flow would be,
    a) load data from database -> b) write to file -> c) merge files or join them -> d) sort them base on file
    It will improve a lot of performance since there are no database lock and it will not impact other database process.
    What do you think?
    David Yinwei Liu

  7. #7
    Join Date
    Jun 2005
    Posts
    4,231

    Default

    I think you are correct. So it comes back to the original suggestion that I made - if you want it to be as fast as possible, and you want the Spring Batch meta data, you could use a job to launch your ETL tool. If you don't care about the meta data, use the ETL tool directly.

  8. #8

    Default

    Sorry, what do you mean about *Spring Batch meta data*, is it JobExecutionContext, StepExecutionContext?

    And I agree with the point that you suggested, we use ETL tool directly if we want it to be as fast as possible.
    David Yinwei Liu

  9. #9

    Default

    The "Spring Batch meta data" here would be JobInstance, JobExecution and StepExecution - basically a log in the database (start & end time, outcome completed/failed etc.)

  10. #10

    Default

    thanks a lot.
    David Yinwei Liu

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •