Skip to content

Commit 9b2a9f8

Browse files
committed
Address load-aware active-slot review comments
1 parent a717a1c commit 9b2a9f8

3 files changed

Lines changed: 6 additions & 2 deletions

File tree

common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3158,7 +3158,6 @@ object CelebornConf extends Logging {
31583158

31593159
val MASTER_SLOT_ASSIGN_LOADAWARE_ACTIVE_SLOTS_WEIGHT: ConfigEntry[Double] =
31603160
buildConf("celeborn.master.slot.assign.loadAware.activeSlotsWeight")
3161-
.withAlternative("celeborn.slots.assign.loadAware.activeSlotsWeight")
31623161
.categories("master")
31633162
.doc(
31643163
"Weight of active slots when calculating ordering in load-aware assignment strategy")

docs/configuration/master.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ license: |
7777
| celeborn.master.slot.assign.extraSlots | 2 | false | Extra slots number when master assign slots. Provided enough workers are available. | 0.3.0 | celeborn.slots.assign.extraSlots |
7878
| celeborn.master.slot.assign.interruptionAware | false | false | If this is set to true, Celeborn master will prioritize partition placement on workers that are not in scope for maintenance soon. | 0.7.0 | |
7979
| celeborn.master.slot.assign.interruptionAware.threshold | 50 | false | This controls what percentage of hosts would be selected for slot selection in the first iteration of creating partitions. Default is 50%. | 0.7.0 | |
80-
| celeborn.master.slot.assign.loadAware.activeSlotsWeight | 0.0 | false | Weight of active slots when calculating ordering in load-aware assignment strategy | 0.7.0 | celeborn.slots.assign.loadAware.activeSlotsWeight |
80+
| celeborn.master.slot.assign.loadAware.activeSlotsWeight | 0.0 | false | Weight of active slots when calculating ordering in load-aware assignment strategy | 0.7.0 | |
8181
| celeborn.master.slot.assign.loadAware.diskGroupGradient | 0.1 | false | This value means how many more workload will be placed into a faster disk group than a slower group. | 0.3.0 | celeborn.slots.assign.loadAware.diskGroupGradient |
8282
| celeborn.master.slot.assign.loadAware.fetchTimeWeight | 1.0 | false | Weight of average fetch time when calculating ordering in load-aware assignment strategy | 0.3.0 | celeborn.slots.assign.loadAware.fetchTimeWeight |
8383
| celeborn.master.slot.assign.loadAware.flushTimeWeight | 0.0 | false | Weight of average flush time when calculating ordering in load-aware assignment strategy | 0.3.0 | celeborn.slots.assign.loadAware.flushTimeWeight |

docs/developers/slotsallocation.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,11 @@ Load-aware slots allocation will take following elements into consideration.
4646

4747
Slots allocator will find out all worker involved in this allocation and sort their disks by
4848
`disk's average flushtime * flush time weight + disk's average fetch time * fetch time weight + disk's active slots * active slots weight`.
49+
The average flush/fetch times are measured in nanoseconds, while active slots is a slot count, so
50+
`activeSlotsWeight` is effectively a nanoseconds-per-slot conversion factor. For example, if the
51+
average fetch time is around `100 ms` (`10^8` ns) and a disk has about `1000` active slots,
52+
`activeSlotsWeight=10^5` makes the active-slot term contribute about `10^8`, comparable to the
53+
fetch-time term.
4954
After getting the sorted disks list, Celeborn will split the disks into
5055
`celeborn.master.slot.assign.loadAware.numDiskGroups` groups. The slots number to be placed into a disk group
5156
is controlled by the `celeborn.master.slot.assign.loadAware.diskGroupGradient` which means that a group's

0 commit comments

Comments
 (0)