Hi,
I try to use Spring Data Hadoop with CDH4 to write a Map Reduce Job.
On startup, I get the following exception:
I guess there is a problem with my Hadoop related dependencies. I couldn't find any referenceException in thread "SimpleAsyncTaskExecutor-1" java.lang.ExceptionInInitializerError
at org.springframework.data.hadoop.mapreduce.JobExecu tor$2.run(JobExecutor.java:183)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at org.springframework.util.ReflectionUtils.makeAcces sible(ReflectionUtils.java:405)
at org.springframework.data.hadoop.mapreduce.JobUtils .<clinit>(JobUtils.java:123)
... 2 more
showing how to configure Spring Data together with CDH4. But Costin showed, he is able to
configure it: https://build.springsource.org/brows...DOOP-CDH4-JOB1
Maven Setup
This is the complete pom file you need to reproduce the problem.
Code:<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>com.example.main</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <properties> <java-version>1.7</java-version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <spring.version>3.2.0.RELEASE</spring.version> <spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version> <hadoop.version>2.0.0-cdh4.1.3</hadoop.version> <log4j.version>1.2.17</log4j.version> </properties> <dependencies> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-core</artifactId> <version>${spring.version}</version> <exclusions> <exclusion> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>${spring.version}</version> </dependency> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-hadoop</artifactId> <version>${spring.hadoop.version}</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> </exclusions> </dependency> <!-- Hadoop Stuff --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-tools</artifactId> <version>2.0.0-mr1-cdh4.1.3</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>${java-version}</source> <target>${java-version}</target> </configuration> </plugin> </plugins> </build> <repositories> <repository> <id>spring-milestones</id> <url>http://repo.springsource.org/libs-milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>spring-snapshot</id> <name>Spring Maven SNAPSHOT Repository</name> <url>http://repo.springframework.org/snapshot</url> </repository> </repositories> </project>
Application Context
Cluster versionCode:<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hdp="http://www.springframework.org/schema/hadoop" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/integration http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd"> <context:property-placeholder location="classpath:hadoop.properties" /> <hdp:configuration id="hadoopConfiguration"> fs.default.name=hdfs://namenode.example.com:8020 </hdp:configuration> <hdp:job id="wordCountJob" mapper="com.example.WordMapper" reducer="com.example.WordReducer" input-path="/user/christian/input/test" output-path="/user/christian/output" /> <hdp:job-runner job-ref="wordCountJob" run-at-startup="true" wait-for-completion="true" /> </beans>
Hadoop 2.0.0-cdh4.1.3
Note:
This small Unittest is running fine with the current configuration:
Code:@RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "classpath:/applicationContext.xml" }) public class Starter { @Autowired private Configuration configuration; @Test public void shellOps() { Assert.assertNotNull(this.configuration); FsShell fsShell = new FsShell(this.configuration); final Collection<FileStatus> coll = fsShell.ls("/user"); System.out.println(coll); } }
It would be nice if someone can give me an example configuration.
Best Regards,
Christian.


Reply With Quote
.
