Results 1 to 10 of 11

Thread: Additional Reducers

Hybrid View

  1. #1
    Join Date
    Sep 2012
    Posts
    5

    Default Additional Reducers

    Hi!

    I want to have multiple reducers in a Hadoop job to recreate the sorting problem described in the following Bog post:
    http://www.philippeadjiman.com/blog/...-partitioning/

    Unfortunately the number-reducers property of the hadoop job does not work for me. Also I failed to set the property over the JobFactoryBean.

    I have 2 VM's running as master/slave. Maybe there are some restrictions, which I don't understand, so there is just one reducer started.

    For a longer description of the Problem + configuration snippets, see the post on stackoverflow.
    http://stackoverflow.com/questions/1...le-hadoop-node

  2. #2
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    You've already figured out how to set the number of reducers - you can go a step further and simplify the config using the namespace.
    As for your test, if you want to get a FactoryBean and not the result it produces, you should ask for its name with "&" in front.

    As for your main question, the number of reducers is driven by the number of input splits - the parameter is really just a hint. See http://wiki.apache.org/hadoop/HowManyMapsAndReduces
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  3. #3
    Join Date
    Sep 2012
    Posts
    5

    Default

    Hi!

    Do you know a good way to retrace the decisions of hadoop concerning the number of reducers? Are there appropriate logfiles? (e.g. we have a certian number of mappers, + a certian number of machines, therefore n reducers are used on machine a, m reducers on machine b ...)

  4. #4
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    Sorry no - probably googling around and the hadoop mailing list is the best option. There are multiple parameters to tweak the hadoop runtime (some are even conflicting). The fundamental issue/topic here is tweaking the parallelism of the job according to the M/R algorithm. The input itself needs to be properly splittable but one also needs to take into account the storage layer (how big the block size is, whether the input split spans across multiple dfs blocks, etc...).
    As an alternative you can try playing around with an ad-hoc cluster (like Amazon EMR) that resembles the real-world scenario then your own as that will provide more accurate results and you'll be able to see better the impact of your tweaks and whether or not your setup works the way you want.

    Hope this helps
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

  5. #5
    Join Date
    Sep 2012
    Posts
    5

    Default

    Ok, I was able to set the number of mappers and reducers used by a different task with additional parameters in the hadoop call:

    hadoop jar /usr/lib/hadoop-0.20-cdh3u4-examples.jar wordcount -D mapred.reduce.tasks=8 -D mapred.map.tasks=4 /input /output

    how do I configure this in Spring hadoop? I`ve tried to set the number of mappers and reducers for the job in the hadoop context:
    <configuration>
    <!-- The value after the question mark is the default value if another value for hd.fs is not provided -->
    fs.default.name=${hd.fs:hdfs://master:9000}
    mapred.reduce.tasks=8
    mapred.map.tasks=4
    </configuration>

    This shows no effect.

    The Webinterface shows a Map Task Capacity of 2 and a Reduce Task Capacity of 2 on a cluster with pseudodistributed mode. So the job should use more mappers and reducers.

    P.s.: thank you for the help :-)

  6. #6
    Join Date
    Jan 2005
    Location
    Bucharest, Romania
    Posts
    5,403

    Default

    That's the correct way of setting the property (you can also set the property per-job):
    Code:
    <hdp:job ...>
       mapred.reduce.tasks=8
       mapred.map.tasks=4
    </hdp:job>
    Make sure, the configuration is used by your job. If you want to use the jar, you can use the <hdp:jar/> namespace.
    Costin Leau
    SpringSource - http://www.SpringSource.com- Spring Training, Consulting, and Support - "From the Source"
    http://twitter.com/costinl
    Please use [ c o d e ] [ / c o d e ] tags

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •