Description
With the recent enhancements that landed related to freeing up some resources when a TaskTracker becomes idle, Hadoop is a little less greedy about holding onto cluster resources when it's not actually using them. However, because this is based on the whole TaskTracker being idle, we don't get the best chance of freeing resources when TTs have mixed slots, both map and reduce.
We should launch separate TTs for map and reduce slots. To do this effectively, we probably want to try and bunch up a many map or reduce slots onto each node as possible, as opposed to the current logic, which is to apply the map/reduce slot ratio to each incoming offer. Take the following example...
1 Slot = 1 CPU and 1GB RAM
Offers:
- Slave (1) 10 CPUs, 10GB RAM
- Slave (2) 10 CPUs, 10GB RAM
- Slave (3) 10 CPUs, 10GB RAM
Pending tasks:
- 1000 Map
- 100 Reduce
Current result:
- Slave(1) -> TaskTracker(9 Map, 1 Reduce)
- Slave(2) -> TaskTracker(9 Map, 1 Reduce)
- Slave(3) -> TaskTracker(9 Map, 1 Reduce)
Ideal Result:
- Slave(1) -> TaskTracker(10 Map)
- Slave(2) -> TaskTracker(10 Map)
- Slave(3) -> TaskTracker(7 Map)
- Slave(3) -> TaskTracker(3 Reduce)