Results 1 to 6 of 6

Thread: Reading Multiple Files from HDFS using MultiResourceItemReader and HdfsItemReader

  1. #1

    Default Reading Multiple Files from HDFS using MultiResourceItemReader and HdfsItemReader

    Hi,

    I have a step in my Spring Batch job in which I have to read multiple files within a directory in HDFS. I am using MultiResourceItemReader as my reader and have added HdfsItemReader as the delegate. Currently, I am using Spring Hadoop 1.0.0.RC1 release and Spring Batch 2.1.9.RELEASE.

    Below is my configuration for MultiResourceItemReader bean

    Code:
    <bean id="multiResourceReader" 
    class="org.springframework.batch.item.file.MultiResourceItemReader" 
    scope="step">
    	    <property name="resources" value="hdfs://${hdfs_dir}/*.txt" />
    	    <property name="delegate" ref="hdfsItemReader" />
                <property name="strict" value="true"/>
    </bean>
    The problem is that even though there are files that exist inside the specified directory, the step is failing with the following error


    Code:
    java.lang.IllegalStateException: No resources to read. Set strict=false if this is not an error condition

    Does anyone know why, even though there are resources to read, it is not able to pick it up?

    Also one more question I saw that in the new release 1.0.0.RC2, HdfsItemReader no longer exists since the batch package has been completely removed. What should be used as a replacement to HdfsItemReader if I use 1.0.0.RC2?

    Thanks!

  2. #2
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Make sure you're registering the hdfs URL prefix otherwise hdfs will not be understood. You can do this through the file-system element. Additionally make sure to specify the host and the port as sometimes these are required.
    The HdfsItemReader was removed as it didn't add any value - one can use the MultiResourceItemReader just fine.

    Additionally, to allow hdfs resources to be loaded through a ResourceLoader, one can use CustomResourceLoaderRegistrar under the fs package.
    It's a generic class but alongside the hdfs-resource-loader, it allows resources to be resolved from the hdfs space instead of the just the class space:
    Code:
    <bean id="customRL" class="org.springframework.data.hadoop.fs.CustomResourceLoaderRegistrar" p:loader-ref="hadoopResourceLoader" />
    
    <!-- batch is read from hdfs:/ space by default -->
    <bean id="mrir" class="org.springframework.batch.item.file.MultiResourceItemReader" p:resources="/batch/*" ... />
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  3. #3

    Default

    Thanks for the reply.

    I tried to use CustomResourceLoaderRegistrar like you mentioned but now I am getting BeanCreationException in multiResourceReader.

    Code:
    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'scopedTarget.multiResourceReader' defined in URL: Initialization of bean failed; nested exception is java.lang.NullPointerException
    My hadoopconfiguration is as follows

    Code:
    <hdp:configuration resources="classpath:/core-site.xml">
    	fs.default.name=${hdfs_address}
    </hdp:configuration>	
    
    <hdp:resource-loader id="hadoopResourceLoader" />
    
    <bean id="customResourceLoader"
    	class="org.springframework.data.hadoop.fs.CustomResourceLoaderRegistrar" 
            p:loader-ref="hadoopResourceLoader" />
    The value of hdfs_address is

    Code:
    hdfs_address=hdfs://localhost:9000
    My MultiResourceItemReader configuration is as follows

    Code:
    <bean id="multiResourceReader" 
    class="org.springframework.batch.item.file.MultiResourceItemReader" 
    scope="step">
    	    <property name="resources" value="${hdfs_address}/${hdfs_dir}/*.txt" />
    	    <property name="delegate" ref="hdfsItemReader" />
                <property name="strict" value="true"/>
    </bean>
    Am I missing something that is causing the BeanCreationException?

  4. #4
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    It looks like it has something to do with the scope="step" in Spring Batch - probably something wrong with the proxying process.
    Try first without the scope and then add it in - speaking of which, do you actually need it?
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  5. #5

    Default

    I had added the scope="step" because one of the requirements that I still have to implement is to get the hdfs_dir property from the job execution context.

  6. #6
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    I understand but to clarify the problem it helps to take it step by step since the NPE seems to come from the proxying mechanism not so much the HDFS resource.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •