Skip to content

Node-size awareness (for token-unaware load balancing) #109

Description

@dkropachev

Jira task: https://scylladb.atlassian.net/browse/DRIVER-553
Jira epic: https://scylladb.atlassian.net/browse/DRIVER-42

Copied from Jira epic DRIVER-42:

This issue is about the client-side load balancer needing to send fewer requests to smaller nodes than to bigger nodes.

Although in the past we recommended that a Scylla cluster should contain many identically-sized nodes, we've recently reversed this recommendation:

  1. With tablets, it's easy and natural to support different-sized nodes. A half-sized node to get half the tablets - and therefore half the storage and half the work - of a double-sized node.
  2. Different-sized nodes allow much finer granularity in cluster growth: If we have a 3-node cluster, one node on each rack, and want to grow it by adding same-sized nodes, the minimum growth we can achieve is doubling the cluster to 6 nodes. If we allow adding a smaller node to each rack, we can grow the cluster at smaller increments.

If we implement token-aware load balancing ([#11|https://github.com/scylladb/alternator-load-balancing/issues/11]) , we will support the different-sized-nodes case authomatically: Smaller nodes will have fewer tokens (i.e., fewer tablets), so will get fewer requests.

However, we still haven't implemented [#11|https://github.com/scylladb/alternator-load-balancing/issues/11], so unless we do it soon we need a different solution to support different-sized nodes.

What we can do is for the load balancing library to figure out the size of each node and instead of picking a node uniformly from the list of live nodes, give larger nodes a higher probability of being picked. We could expand the /localnodes response (perhaps if given a new optional parameter) to return the size of each node, or alternatively the load balancer could figure out the size of each node by other means (reading tablets system table? REST API?). Perhaps can use the number of shards on a node as a good estimate of its strength. The number of tablets on each node is probably an even better criterion, if we assume that Scylla already tries its best to distribute tablets according to the strength of of the nodes - because our state-of-the-art tablet-aware CQL drivers will balance the load according to these tablets.

Migrated from GitHub issue: scylladb/alternator-load-balancing#128

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions