Skip to content

Launch separate TaskTracker instances for Map and Reduce slots #47

Open
@tarnfeld

Description

@tarnfeld

With the recent enhancements that landed related to freeing up some resources when a TaskTracker becomes idle, Hadoop is a little less greedy about holding onto cluster resources when it's not actually using them. However, because this is based on the whole TaskTracker being idle, we don't get the best chance of freeing resources when TTs have mixed slots, both map and reduce.

We should launch separate TTs for map and reduce slots. To do this effectively, we probably want to try and bunch up a many map or reduce slots onto each node as possible, as opposed to the current logic, which is to apply the map/reduce slot ratio to each incoming offer. Take the following example...


1 Slot = 1 CPU and 1GB RAM

Offers:

  • Slave (1) 10 CPUs, 10GB RAM
  • Slave (2) 10 CPUs, 10GB RAM
  • Slave (3) 10 CPUs, 10GB RAM

Pending tasks:

  • 1000 Map
  • 100 Reduce

Current result:

  • Slave(1) -> TaskTracker(9 Map, 1 Reduce)
  • Slave(2) -> TaskTracker(9 Map, 1 Reduce)
  • Slave(3) -> TaskTracker(9 Map, 1 Reduce)

Ideal Result:

  • Slave(1) -> TaskTracker(10 Map)
  • Slave(2) -> TaskTracker(10 Map)
  • Slave(3) -> TaskTracker(7 Map)
  • Slave(3) -> TaskTracker(3 Reduce)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions