Hi David,
I'm almost done with exposing the params (file/archive/lib) but I'm not sure about the "jar" param. Hadoop jar currently just calls the Main class of the jar as a way to pass in configuration (the command line arguments). That's not needed in a Spring app since it throws out any existing configuration (including the hadoop one).
With the upcoming improvements the command above would look like this:
Code:
<hdp:tool-runner id="someTool" tool-class="org.foo.SomeTool" configuration-ref="hadoop-configuration"
properties-location="myProp.properties" files="myProp.properties">
<hdp:arg value="data/in.txt"/>
<hdp:arg value="data/out.txt"/>
prop1=value1
prop2=value2
</hdp:tool-runner>
Note the Tool instance (which can be configured) or class is still required and that's because the Tool (which is just a glorified Main) is executed in-process - we don't create a different JVM for it so we need it to be available. If my understanding is correct in your case, you have a lot of dependencies but that shouldn't be a problem since we only load the tool class - we disregard the rest of the classes and as long as your tool does that as well, there shouldn't be a problem.
Let me know if this solves your problem and if not why?