Results 1 to 3 of 3

Thread: linking multiple lines of a file together to process together

  1. #1
    Join Date
    Oct 2004
    Location
    San Diego, CA USA
    Posts
    58

    Question linking multiple lines of a file together to process together

    I'm looking for a solution to the following situation, without using a database. Let's say I have a flat file (doesn't matter if it's delimited or fixed width, but my file is delimited). Each line/record has a key, the file is sorted by this key, but the key is not unique in the file. In other words the file could look like

    key,f1,f2,f3,f4
    111,a,b,c,d
    111,e,f,c,g
    222,a,x,y,z
    333,h,i,c,k
    333,m,n,o,a
    333,a,b,e,k

    What I need to do is read the file and "gather up" all the lines with the same key, then process them and write a result, say the count of the occurrences of a particular value in a particular column. Let's say it was the number of times 'c' was in column 'f3' in the above example. The output would be

    111,2
    222,0
    333,1

    Remember, no database. I already have a db solution. :-)

    I was looking at some kind of ChunkProvider or maybe RecordSeparatorPolicy but neither seem quite right. I could write a custom reader, but I was hoping there was a way to leverage the existing FlatFileItemReader and use existing extension points.

  2. #2

    Default

    You do need some kind of intermediate result storage and can use a simple Map(*) (spring Bean, concurrentHashMap preferred), your writer would then either add as new, or raises the count.

    In a second Step or in an afterStep/afterJob method the map can be written out.

    (*)instead of a map, files are possible too, it depends on the amount of business items and performance requirements
    Last edited by michael.lange; Oct 19th, 2011 at 03:00 AM.

  3. #3
    Join Date
    Oct 2004
    Location
    San Diego, CA USA
    Posts
    58

    Default

    I forgot to post what I ended up with, which is a custom ItemReader.

    I created a class implementing ResourceAwareItemReaderItemStream. The class also has a delegate of type PeekableItemReader.

    I called my class FieldBasedAggregatingReader. You configure it with one or more fields that are the keys to indicate when to stop reading. It calls peek() on the delegate, checks whether the key fields have changed from the first record read in the current call to read(). In pseudo-code ...
    Code:
    List<?> read() {
         currentRecord = delegate.peek()
    
         initialValues = extractKeyFields(currentRecord)
    
        while (!done) {
            done = compareCurrentAndInitial(currentRecord, initialValues)
    
            if (!done) {
                add current record to list of results
    
                delegate.read() to advance delegate to next record
    
                currentRecord = delegate.peek()
    
                if (currentRecord == null) {
                    done = true
                    delegate.read() to advance delegate to next record
                }
        }
    
        return list of results
    }
    So, each call to read() on my class returns 1 record, which is a list of 0..n records from the underlying reader, where all the records have the same key values.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •