-
Apr 24th, 2008, 12:51 AM
#1
Merge multiple input source as one
Hi,
Two features of batch that I am interested in are Merge Multiple Input and Sort Input. I saw both features have been identified by Spring Batch, but I do not find out when you will implement/release it.
Could you please let me know when we will implement these feature? thanks.
- Merge - A program that reads records from multiple input files and produces one output file with combined data from the input files. Merges can be tailored or performed by parameter-driven standard system utilities.
- Sort - A Program that reads an input file and produces an output file where records have been re-sequenced according to a sort key field in the records. Sorts are usually performed by standard system utilities.
David Yinwei Liu
-
Apr 24th, 2008, 01:40 AM
#2
It would probably be more efficient to use the "standard system utilities" you mentioned, rather than try and code merge or soret in Java. You can still launch them from Spring Batch if you want to. Or do you need something in Java for a particular reason?
-
Apr 24th, 2008, 04:16 AM
#3
Hi Dave,
Thank you for your reply.
I am thinking the situation: we have two different input resources (one is the data in table, another is a file), I want to merge both input as one input in spring batch, and then sort them base on PK. Next, I will process all items. In this case, I need to extract all data from table and them write to a file, and then use *system utilities* (cat in unix) to merge them as one file.
Is it the solution that you prefer? Do we have a java solution? What about Windows, as far as I know windows does not have this kind of command?
David Yinwei Liu
-
Apr 24th, 2008, 04:56 AM
#4
Why don't you just load the data into the database and use SQL to sort it then?
-
Apr 24th, 2008, 08:48 AM
#5
If your sorting and merging is going to be pretty straight-forward, you can use shell script and use sort functionality (I have never tried, but the documentation says you can call your script from Tasklet):
http://unixhelp.ed.ac.uk/utilities2/sort.html
Also there is a more powerful tool recommended by another member in this forum, but that will cost per license:
http://forum.springframework.org/sho...ight=sort+file
File I/O should be faster than Database I/O if you are talking about millions of records.
-
Apr 25th, 2008, 01:57 AM
#6
thank you very much for your reply.
Yes, I can load data into database and use SQL to sort them. but what about performance? using file to operate millions of records is much faster than database. As I knew a lot of ETL tools suggest the job flow would be,
a) load data from database -> b) write to file -> c) merge files or join them -> d) sort them base on file
It will improve a lot of performance since there are no database lock and it will not impact other database process.
What do you think?
David Yinwei Liu
-
Apr 25th, 2008, 02:48 AM
#7
I think you are correct. So it comes back to the original suggestion that I made - if you want it to be as fast as possible, and you want the Spring Batch meta data, you could use a job to launch your ETL tool. If you don't care about the meta data, use the ETL tool directly.
-
Apr 25th, 2008, 02:55 AM
#8
Sorry, what do you mean about *Spring Batch meta data*, is it JobExecutionContext, StepExecutionContext?
And I agree with the point that you suggested, we use ETL tool directly if we want it to be as fast as possible.
David Yinwei Liu
-
Apr 25th, 2008, 05:41 AM
#9
The "Spring Batch meta data" here would be JobInstance, JobExecution and StepExecution - basically a log in the database (start & end time, outcome completed/failed etc.)
-
Apr 28th, 2008, 04:04 AM
#10
David Yinwei Liu
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules