Results 1 to 9 of 9

Thread: why DistributedCache don't work ??

  1. #1
    Join Date
    Jul 2012
    Posts
    13

    Unhappy why DistributedCache don't work ??

    Code:
     <hdp:job id="news_result_sim_job"
    	    input-path="${sim.output.path}/news/convert/" output-path="${sim.output.path}/news/result/" 
    		mapper="com.xxx.wap.algorithm.mapred.sim.SimResultJob.MapClass"
    		reducer="com.xxx.wap.algorithm.mapred.sim.SimResultJob.Reduce"	
    		combiner="com.xxx.wap.algorithm.mapred.sim.SimResultJob.Combine"
    		jar="file:/data/DATA/smc/whftest/newrecom/algorithmUtils/algorithmUtils-1.0-SNAPSHOT.jar"
    		input-format="org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"	
    		map-key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
    		map-value="org.apache.hadoop.io.DoubleWritable"
    		key="com.xxx.wap.algorithm.model.sim.SimKeyPair"
    		value="org.apache.hadoop.io.DoubleWritable"
    		number-reducers="1"
    		configuration-ref="hadoopConfiguration"
    		>
    	</hdp:job>
    	
    	<hdp:cache configuration-ref="hadoopConfiguration" file-system-ref="fs">
           <hdp:cache value="/tmp/wanghf/sim/output/news/length/part-r-00000.lzo_deflate" />
        </hdp:cache>
    java code
    Code:
    private static Map<Integer, Double> loadCache(Configuration conf) {
    		Map<Integer, Double> gidLength = new HashMap<Integer, Double>();
    		try {
    		    System.out.println("$$$$$$$$$$$$$$$$$$$$"+conf.get("mapred.cache.localFiles"));
    			Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
    why localFiles is null?why?why?why?,please,

  2. #2
    Join Date
    Jul 2012
    Posts
    13

    Default

    is this spring hadoop bug?

  3. #3
    Join Date
    Jul 2012
    Posts
    13

    Default

    i execute jobs in order of job1--->job2-->job3.but cache file only use in job3,and the cache file is created by job2.what' wrong,please tell me

  4. #4
    Join Date
    Jul 2012
    Posts
    13

    Default

    no one reply me ??

  5. #5
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Can you summary what exactly is the problem and specifically how are you creating the cache in job2? Is this through SHDP and if so how exactly?
    Also have you tried the other DistributedCache entries (such as classpath)?
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  6. #6
    Join Date
    Jul 2012
    Posts
    13

    Default

    thank your reply.
    my configuration
    Code:
    <hdp:configuration
    	resources="classpath:/hdp/capacity-scheduler.xml,classpath:/hdp/core-site.xml,classpath:/hdp/hadoop-policy.xml,classpath:/hdp/hdfs-site.xml,classpath:/hdp/mapred-site.xml" id="hadoopConfiguration" >
    	</hdp:configuration>
    <hdp:job id="news_length_sim_job" 
    	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/length/" 
    		mapper="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.MapClass"
    		reducer="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.Reduce"
    		jar="${jar.file.path}"
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
    		map-key="org.apache.hadoop.io.Text"
    		map-value="com.xx.wap.algorithm.model.sim.SimKeyValue"
    		key="org.apache.hadoop.io.Text"
    		value="org.apache.hadoop.io.DoubleWritable"
    		number-reducers="1"		
    		scope="prototype"
    		/>    
    	
    	<hdp:job id="news_convert_sim_job" 
    	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/convert/" 
    		mapper="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.MapClass"
    		reducer="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.Reduce"
    		jar="${jar.file.path}"	
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat"
    		map-key="org.apache.hadoop.io.Text"
    		map-value="com.sohu.wap.algorithm.model.sim.SimKeyValue"
    		key="org.apache.hadoop.io.Text"
    		value="com.sohu.wap.algorithm.model.sim.SimKeyValueSet"			
    		scope="prototype"
    		/>
    
    	
        
        <hdp:job id="news_result_sim_job"
    	    input-path="${sim.output.path}/news/convert/" output-path="${sim.output.path}/news/result/" 
    		mapper="com.sohu.wap.algorithm.mapred.sim.SimResultJob.MapClass"
    		reducer="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Reduce"	
    		combiner="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Combine"
    		jar="${jar.file.path}"
    		input-format="org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"	
    		map-key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
    		map-value="org.apache.hadoop.io.DoubleWritable"
    		key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
    		value="org.apache.hadoop.io.DoubleWritable"
    		number-reducers="1"		
    		scope="prototype"
    		>
    	</hdp:job>
    <hdp:cache  >
    		<hdp:cache value="${length.file.path}" />
    	</hdp:cache>
    my problem:
    the job news_result_sim_job 's configuration can't get cache file,because its configuration has not set cache file path.

    finally i resolve it by setting "<hdp:cache >" between "<hdp job id=news_result_sim_job" and "<hdp job id=news_convert_sim_job"
    the cache file value="${length.file.path}" is generated by news_convert_sim_job.i must give news_result_sim_job a new hadoopconfiguration to resolve this problem.

  7. #7
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Interesting.

    hdp:cache is evaluated at startup so I'm not sure if you have actually solved your problem or are just seeing the side-effects of a previous run.
    If I understand correctly you want to add dynamic entries to the cache evaluated through the properly holder? Or is it SpEL you are looking at?
    Since the properties are also evaluated at startup while SpEL is evaluated dynamically but since hdp:cache is a singleton it means also startup.

    That is, the order of the definition of hdp:cache doesn't matter - it can be the first or last, it does not matter. Same goes for hdp:job - even though they are prototype, their declaration order is irrelevant as they are not executed through the declaration.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  8. #8
    Join Date
    Jul 2012
    Posts
    13

    Default

    i solved by doing this---creating a new hadoopConfiguration for news_result_sim_job
    my config:
    Code:
    <hdp:configuration
    	resources="classpath:/hdp/capacity-scheduler.xml,classpath:/hdp/core-site.xml,classpath:/hdp/hadoop-policy.xml,classpath:/hdp/hdfs-site.xml,classpath:/hdp/mapred-site.xml" id="hadoopConfiguration" >
    	</hdp:configuration>
        
       <hdp:configuration id="simResultConfiguration" configuration-ref="hadoopConfiguration" >	
    	</hdp:configuration>
    
    	<hdp:job id="news_length_sim_job" 
    	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/length/" 
    		mapper="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.MapClass"
    		reducer="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.Reduce"
    		jar="${jar.file.path}"
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
    		map-key="org.apache.hadoop.io.Text"
    		map-value="com.sohu.wap.algorithm.model.sim.SimKeyValue"
    		key="org.apache.hadoop.io.Text"
    		value="org.apache.hadoop.io.DoubleWritable"
    		number-reducers="1"
    		configuration-ref="hadoopConfiguration"
    		scope="prototype"
    		/>    
    	
    	<hdp:job id="news_convert_sim_job" 
    	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/convert/" 
    		mapper="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.MapClass"
    		reducer="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.Reduce"
    		jar="${jar.file.path}"	
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat"
    		map-key="org.apache.hadoop.io.Text"
    		map-value="com.sohu.wap.algorithm.model.sim.SimKeyValue"
    		key="org.apache.hadoop.io.Text"
    		value="com.sohu.wap.algorithm.model.sim.SimKeyValueSet"	
    		configuration-ref="hadoopConfiguration"
    		scope="prototype"
    		/>
    
    	<hdp:cache configuration-ref="simResultConfiguration" >
    		<hdp:cache value="${length.file.path}" />
    	</hdp:cache>
        
        <hdp:job id="news_result_sim_job"
    	    input-path="${sim.output.path}/news/convert/" output-path="${sim.output.path}/news/result/" 
    		mapper="com.sohu.wap.algorithm.mapred.sim.SimResultJob.MapClass"
    		reducer="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Reduce"	
    		combiner="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Combine"
    		jar="${jar.file.path}"
    		input-format="org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"	
    		map-key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
    		map-value="org.apache.hadoop.io.DoubleWritable"
    		key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
    		value="org.apache.hadoop.io.DoubleWritable"
    		number-reducers="1"
    		configuration-ref="simResultConfiguration"
    		scope="prototype"
    		>
    	</hdp:job>

  9. #9
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    What happens if you don't use a separate configuration?
    It looks like you're just cloning the initial config and adding the cache value - this should work with the initial configuration as well. Unless the cache entry gets replaced somehow.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •