Skip to content

implement clustering for horizontal scalability #1532

@slingamn

Description

@slingamn

Motivation

The original model of IRC as a distributed system was as an open federation of symmetrical, equally privileged peer servers. This model failed almost immediately with the 1990 EFnet split. Modern IRC networks are under common management: they require all the server operators to agree on administration and policy issues. Similarly, the model of IRC as an AP system (available and partition-tolerant) failed with the introduction of services frameworks. In modern IRC networks, the services framework is a SPOF: if it fails, the network remains available but is dangerously degraded (in particular, its security properties have been silently weakened).

The current Oragono architecture is a single process. A single Oragono instance can scale comfortably to 10,000 clients and 2,000 clients per channel: you can push those limits if you have bigger hardware, but ultimately the single instance is a serious bottleneck. The largest IRC network of all time was 2004-era Quakenet, with 240,000 concurrent clients. Biella Coleman reports that the largest IRC channel of all time had 7,000 participants (#operationpayback on AnonOps in late 2010). This gives us our initial scalability targets: 250,000 concurrent clients with 10,000 clients per channel.

Oragono's single-process architecture offers compelling advantages in terms of flexibility and pace of development; it's significantly easier to prototype new features without having to worry about distributed systems issues. The architecture that balances all of these considerations --- acknowledging the need for centralized management, acknowledging the indispensability of user accounts, providing the horizontal scalability we need, and minimizing implementation complexity --- is a hub-and-spoke design.

Design

The oragono executable will accept two modes of operation: "root" and "leaf". The root mode of operation will be much like the present single-process mode. In leaf mode, the process will not have direct access to a config file or a buntdb database: it will take the IP (typically a virtual IP of some kind, as in Kubernetes) of the root node, then connect to the root node and receive a serialized copy of the root node's authoritative configuration over the network.

Clients will then connect to the leaf nodes. Most commands received by the leaf nodes will be serialized and passed through to the root node, which will process them and return an answer. A few commands, like capability negotiation, will be handled locally. (This corresponds loosely to the Session vs. Client distinction in the current codebase: the leaf node will own the Session and the root node will own the Client. Anything affecting global server state is processed by the root; anything affecting only the client's connection is processed by the leaf.)

The root node will unconditionally forward copies of all IRC messages (PRIVMSG, NOTICE, KICK, etc.) to each leaf node, which will then determine which sessions are eligible to receive them. This provides the crucial fan-out that generates the horizontal scalability: the traffic (and TLS) burden is divided evenly among the n leaf nodes.

Unresolved questions

The intended deployment strategy for this system is Kubernetes. However, I don't currently have a complete picture of which Kubernetes primitives will be used. The key assumption of the design is that the network can be virtualized such that the leaf nodes only need to know a single, consistent IP address for the root node.

I'm not sure how to do history. The simplest architecture is for HISTORY and CHATHISTORY requests to be forwarded to the root. This seems like it will result in a scalability bottleneck. In a deployment that uses MySQL, the leaf nodes can connect directly to MySQL; this reduces the problem to deploying a highly available and scalable MySQL, which is nontrivial. The logical next step would seemingly be to abstract away the historyDB interface, then provide a Cassandra implementation --- the problem with this is that we seem to be going in the direction of expecting read-after-write consistency from the history store (see this comment on #393 in particular.)

See also

This supersedes #343, #1000, and #1265; it's related to #747.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions