This project is a backend oriented demo with websocket capabilities that relies heavily in the Actor Model Framework implementation of Microsoft Orleans. I gave myself the luxury of experimenting with some technologies and practices that might be useful on a real world distributed system.
Click to view it on Loom with sound.
The domain contains two aggregate roots: Teams and Tournaments.
A Team contains a name, a list of players (strings) and a list of tournament IDs on which the team is registered.
A Tournament contains a name, a list of team IDs participating on it, and a Fixture that contains information of each different Phase: quarter finals, semi finals and finals.
For a Tournament to start, the list of teams must have a count of 8. When it starts, the participating teams are shuffled randomly, and the Fixture's quarter Phase is populated.
A Phase is a container for a list of Matches that belong to the bracket. Each Match contains the LocalTeamId, the AwayTeamId and a MatchResult. In order not to use null values, the MatchResult is created with default values of 0 goals for each team and the property Played set to false.
When the user intents the command SetMatchResult the correct result will be assigned to the corresponding Match on the corresponding Phase. When all the Matches on a Phase are played, the next Phase is generated. Meanwhile the quarter finals are generated by a random shuffle of the participating teams, the semifinals and the finals are not generated randomly but considering the results of previous brackets.
In Actor Model Framework, each Actor (instance of an aggregate root) contains a state that mutates when applying events. Orleans is not different, and each Grain (Actor) acts the same way.
- Each
TournamentGraincontains aTournamentState. - When a
TournamentGrainreceives aCommand, it first checks if the command is valid business wise. - If the
Commandis valid, it will publish anEventinforming what happened and modifying theStateaccordingly.
The TournamentState is not coupled to Orleans framework. This means that the class is just a plain one that exposes methods that acts upon certain events. This allows for replayability of events and event sourcing as a whole. The state is then mutable, but that does not mean that the properties exposed by it are mutable as well. For example the TournamentState contains the Fixture.
The Fixture is a value object implemented with C# records. This means that the methods exposed by it does not cause side effects and instead returns a new Fixture with the changes reflected. This enables an easier testability and predictability as all the methods are deterministic; you can execute a method a hundred times and the result is the same.
One of the advantages of using an Actor Model Framework is the innate segregation of commands and queries. The commands are executed against a certain Grain and causes side effects, such as publishing an Event. However, the queries do not go against a Grain, instead they go against a read database that gets populated via projections.
This is, in fact, eventual consistency. There are some milliseconds between the source of truth changes (Grain) are reflected in the projections which one can query. This should not be a problem, but one needs to make sure to enable retry strategies in case the database is down while consuming an event from the stream, etc.
In the scenario of a Grain being killed because of inactivity, or the Silo resetting and losing the in memory state of the Grain; each Grain can be recovered by applying the events in order:
TournamentGrainis not on memory.- Initialize the
TournamentGrainby replaying the Events stored in the write database. - Executes the
SetMatchResultcommand, etc.
As you can see, the Grain will never use the read state as the source of truth, and instead it will rely on the event sourcing mechanism, to apply all the events again one after another until the state is up to date.
Saga is a pattern for a distributed transaction that impacts more than one aggregate. In this case, lets look at the scenario when a Team wants to join a Tournament.
TeamGrainreceives the commandCreateTeamand theTeamStateis initialized.TournamentGrainreceives the commandAddTeamfor the team created above, validations kick in:- Does the team exist? (Note: here you are not supposed to query your read database, as it not the source of truth, but actually send a message to the
TeamGrain). - Does the tournament contain less than 8 teams?
- Is the tournament already started?
- Does the team exist? (Note: here you are not supposed to query your read database, as it not the source of truth, but actually send a message to the
- If validations are ok, publishes the
TeamAddedevent.
So far, the TournamentGrain is aware of the Team joining, but the TeamGrain is not aware of the participation on the Tournament. Enter the TeamAddedSaga Grain.
TeamAddedSagasubscribes implicitly to an Orleans Stream looking forTeamAddedevents.- When it receives an event, it gets a reference for the
TeamGrainwith the corresponding ID. - Sends the
JoinTournamentcommand to theTeamGrain. - There are no extra validations needed, as everything was already validated before.
TeamGrainpublishes theTeamJoinedTournamentevent.
In this case there is not a rollback strategy implemented but these capabilities can definitely be handled with an "intermediate" Grain such as the Saga.
Given the async nature of Orleans, the commands executed by a user should not return results on how does the resource looks after a command succeeded, not even IF the command succeeded.
Instead the response will contain a TraceId as a GUID representation on the intent of the user. The user will receive a message via websockets indicating what happened for that TraceId. The frontend can reflect the changes accordingly.
For the graph above the work, the user should establish a websocket connection with the API. The user will only get the results for those commands he/she invoked and not for everything happening in the system.
As not all users should receive the results for all the commands, there is a need for distinguishing users. For this purpose a super simple authentication and authorization mechanism is implemented. If you need to implement auth in a real world app, please do not reinvent the wheel and check existing solutions such as AD or Okta.
As mentioned before, each command executed will result on a TraceId, but internally there is also an InvokerUserId propagated that can also serve as an audit log for each event.
The user will only get websocket messages for those events on which the InvokerUserId matches with the one on the JWT Token.
Fallback mechanism
It makes sense to have a fallback mechanism in case a websocket event did not arrive due to network issues. This could be a key value database where a user can search for a TraceId to find the response in order to react to a command executed. This is not currently implemented but will be added later.
The solution consists on different projects that need to be executed at the same time:
- API.Identity: Create user and generate token.
- API: Entrypoint for the user to invoke commands, connects to the Orleans cluster as a client.
- Silo: Orleans cluster, handles everything related to Orleans such as placement, streams, etc.
- Silo.Dashboard: Orleans client that displays metrics and information.
It also requires an instance of Postgres to be running and accepting connections.
Taking in consideration that the API could be split in smaller pieces and the Silos count being scalable, I decided to create Kubernetes manifests to host this application. As it is not a common practice to have your own Kubernetes cluster locally, I also decided on using Kind which uses the docker engine to host a single node of Kubernetes locally.
I am not going to enter into details on how this works, but just so you know that Docker and Kind are required to be installed to run this solution. Of course, one can choose to run locally instead, bear in mind that in that scenario you need to be able to connect to a Postgres instance as mentioned above.
All the Kubernetes configuration files can be found on the kubernetes folder. NOTE: There are also "secrets" on the mentioned folder, for a real world application please do not store your secrets in plain text on your repository.
As mentioned before, it is required to have docker installed as well as kind command line and kubectl. It is also suggested to have make support, so you don't have to manually go through the commands one by one.
Initial bootstrap:
make run
Trigger rebuild of images and recreation of all the pods:
make restart
If you see something like this, everything should be up and running:
Use
kubectlinstead ofkbas it is an alias that I am used to.
Now you can access with the following endpoints:
You can use the Postman collection below to try out the functionality as you can see on the demo video. This collection assigns values from previous responses so you don't have to modify the requests manually but just execute them in order.
As this client is not containerized, dotnet sdk is required. After creating a user and getting the token, you can execute on the utils/Websockets.Client directory:
dotnet run
Paste your user token, and when choosing an environment type K and press enter key. A message saying connected should appear. Now you should be able to get the results of all the requests fired through the API using that user token.
Tests are not yet implemented, but you can see a simple example on the Domain.Tests project. I would like the unit tests just to cover the domain functionality:
- Given a state, apply an event, assert the modified state.
- Given a value object, invoke a method, assert the returned record.


