Results 1 to 10 of 64

Thread: Specifying a JobJar in the Tool Tasklet.

Hybrid View

  1. #1

    Default Specifying a JobJar in the Tool Tasklet.

    Hi everyone,

    I have a use case, when in my project I need to configure several hadoop Tool jobs, and the way I do is by having the following configuration in the spring.cfg.xml:

    Code:
    <hdp:tool-tasklet id="testId" scope="step" configuration-ref="hadoop-configuration" tool-class="com.test.myClass">
        <!-- Some properties -->
    </hdp:tool-tasklet>
    The jar file, that contains the ToolClass is included as a dependency in my project and it works fine, however there is a problem that I am facing, namely I have several JAR files with dependencies and they have different versions of libraries included on their own and since I have included all these JOB JAR files as dependencies to my project, there are bunch of duplicate classes / libraries which can potentially be different versions.

    So here is my question, is there a way for running a Tool class and by specifying the jar location, like it is possible to do with Hadoop command line arguments, such as -files or -libjars?

    Can you suggest some other method of running Tool classes without loading the actual JAR file in the classpath and without using tool-class argument?

    P.S: I am using spring-data-hadoop version: 1.0.0.M1

    Thanks in advance.


    Sincerely,
    David Gevorkyan

  2. #2
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Hi David,

    We currently don't expose these parameters on the Tool namespace (as we do with streaming or job) - this looks like an omission. Can you please raise an issue on our tracker - also if you can, indicate how the command line looks like or what you would like to see in the namespace.

    Cheers,
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  3. #3
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Raised issue https://jira.springsource.org/browse/SHDP-49
    Feel use that to follow progress.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  4. #4

    Default

    Hi Costin,

    Thanks for the quick reply.

    Actually besides just exposing JAR file to the Tool namespace, we also need "-files" parameter, since we have some use cases when we need to provide properties file on fly, dynamically.

    So our command line looks like this:

    Code:
    hadoop jar fullpath:myJar_withDependencies.jar -files fullpath:myProp.properties -Dprop1=value1 -Dprop2=value2 -Dconfig=myProp.properties
    So ideally I want to be able to specify any file (such as property file in the above example) to be uploaded to the cluster and also be able to specify the jar with dependencies to be uploaded to the server.

    So if you can expose the same parameters to the Tool namespace as you have done for the streaming job, that would be great, namely the "file", "archive" and "lib".

    Sincerely,
    David

  5. #5
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Hi David,

    I'm almost done with exposing the params (file/archive/lib) but I'm not sure about the "jar" param. Hadoop jar currently just calls the Main class of the jar as a way to pass in configuration (the command line arguments). That's not needed in a Spring app since it throws out any existing configuration (including the hadoop one).

    With the upcoming improvements the command above would look like this:

    Code:
    <hdp:tool-runner id="someTool" tool-class="org.foo.SomeTool" configuration-ref="hadoop-configuration" 
        properties-location="myProp.properties" files="myProp.properties">  
         <hdp:arg value="data/in.txt"/>   
        <hdp:arg value="data/out.txt"/>       
        prop1=value1
        prop2=value2
    </hdp:tool-runner>
    Note the Tool instance (which can be configured) or class is still required and that's because the Tool (which is just a glorified Main) is executed in-process - we don't create a different JVM for it so we need it to be available. If my understanding is correct in your case, you have a lot of dependencies but that shouldn't be a problem since we only load the tool class - we disregard the rest of the classes and as long as your tool does that as well, there shouldn't be a problem.
    Let me know if this solves your problem and if not why?
    Last edited by Costin Leau; Apr 12th, 2012 at 10:38 AM.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  6. #6
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Commit the updates in master - you can pick the changes in the next snapshot.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  7. #7

    Default

    Hi Costin,

    The issue is we are attempting to replace our current shell script with spring batch. The shell script would look something like:

    hadoop -jobjar job1.jar ...
    ...
    hadoop -jobjar job2.jar ...
    .......
    hadoop -jobjar job10.jar ...


    These job jars have conflicting versions of libraries in them (for example jackson 1.4 and jackson 1.94), and even have different versions of spring contained within them.

    How would you propose handling this case? We can not simply just put all 10 jars in the classpath. Perhaps a classloader approach would work?
    Last edited by davidgevorkyan; Apr 12th, 2012 at 12:32 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •