Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 20 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ A Sentinela Monitor is configured through 3 main parts, along some basic setting

These implementations are enough for Sentinela to autonomously execute monitoring logic and automatically manages the issues.

![sentinela example](./docs/images/example.gif)
![sentinela example](/docs/images/example.gif)

## Example scenario: Pending orders with completed shipments
Consider an online store where an order is expected to transition to `completed` as soon as its shipment is marked `completed`. Occasionally, inconsistencies arise: the shipment finishes but the order status remains stuck as `awaiting_delivery` or other intermediate state.
Expand Down Expand Up @@ -93,28 +93,29 @@ Sentinela provides a web dashboard, by default at port `8000`, with 2 sections:
2. a monitor editor, where you can create and edit monitors directly from the browser

**Overview**
![dashboard overview](./docs/images/dashboard_overview.png)
![dashboard overview](/docs/images/dashboard_overview.png)

**Editor**
![dashboard monitor editor](./docs/images/dashboard_editor.png)
![dashboard monitor editor](/docs/images/dashboard_editor.png)

# Documentation
1. [Overview](./docs/overview.md)
2. [Building a Monitor](./docs/monitor.md)
1. [Sample Monitor](./docs/sample_monitor.md)
3. [Querying data from databases](./docs/querying.md)
4. [Validating a monitor](./docs/monitor_validating.md)
5. [Registering a monitor](./docs/monitor_registering.md)
1. [Overview](/docs/overview.md)
2. [Building a Monitor](/docs/monitor.md)
1. [Monitor lifecycle](/docs/monitor_lifecycle.md)
2. [Example Monitors](/docs/example_monitors.md)
3. [Querying data from databases](/docs/querying.md)
4. [Validating a monitor](/docs/monitor_validating.md)
5. [Registering a monitor](/docs/monitor_registering.md)
6. Deployment
1. [Configuration](./docs/configuration.md)
2. [Configuration file](./docs/configuration_file.md)
3. [How to run](./docs/how_to_run.md)
7. [Monitoring Sentinela](./docs/monitoring_sentinela.md)
8. [Plugins](./docs/plugins/plugins.md)
1. [AWS](./docs/plugins/aws.md)
2. [Postgres](./docs/plugins/postgres.md)
3. [Slack](./docs/plugins/slack.md)
1. [Configuration](/docs/configuration.md)
2. [Configuration file](/docs/configuration_file.md)
3. [How to run](/docs/how_to_run.md)
7. [Monitoring Sentinela](/docs/monitoring_sentinela.md)
8. [Plugins](/docs/plugins/plugins.md)
1. [AWS](/docs/plugins/aws.md)
2. [Postgres](/docs/plugins/postgres.md)
3. [Slack](/docs/plugins/slack.md)
9. Interacting with Sentinela
1. [HTTP server](./docs/http_server.md)
1. [HTTP server](/docs/http_server.md)
10. Special cases
1. [Dropping issues](./docs/dropping_issues.md)
1. [Dropping issues](/docs/dropping_issues.md)
4 changes: 2 additions & 2 deletions configs/configs-scalable.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ plugins:
- postgres
- slack

load_sample_monitors: true
sample_monitors_path: sample_monitors
load_example_monitors: true
example_monitors_path: example_monitors
internal_monitors_path: internal_monitors
internal_monitors_notification:
enabled: true
Expand Down
4 changes: 2 additions & 2 deletions configs/configs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ plugins:
- postgres
- slack

load_sample_monitors: true
sample_monitors_path: sample_monitors
load_example_monitors: true
example_monitors_path: example_monitors
internal_monitors_path: internal_monitors
internal_monitors_notification:
enabled: true
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ COPY . /app/

RUN python3 -m venv $VIRTUAL_ENV \
&& pip install --no-cache-dir --upgrade pip \
&& pip install poetry --no-cache-dir \
&& pip install --no-cache-dir poetry \
&& sh tools/install_dependencies.sh


Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ services:
- 8000:8000
environment:
CONFIGS_FILE: configs/configs.yaml
SAMPLE_SLACK_CHANNEL: C07NCL94SDT
SAMPLE_SLACK_MENTION: U07NFGGMB98
EXAMPLE_SLACK_CHANNEL: C07NCL94SDT
EXAMPLE_SLACK_MENTION: U07NFGGMB98
SLACK_WEBSOCKET_ENABLED: true
SLACK_MAIN_CHANNEL: C07NCL94SDT
SLACK_MAIN_MENTION: U07NFGGMB98
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose-local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ services:
start_period: 2s
environment:
CONFIGS_FILE: configs/configs.yaml
SAMPLE_SLACK_CHANNEL: C07NCL94SDT
SAMPLE_SLACK_MENTION: U07NFGGMB98
EXAMPLE_SLACK_CHANNEL: C07NCL94SDT
EXAMPLE_SLACK_MENTION: U07NFGGMB98
SLACK_WEBSOCKET_ENABLED: true
SLACK_MAIN_CHANNEL: C07NCL94SDT
SLACK_MAIN_MENTION: U07NFGGMB98
Expand Down
8 changes: 4 additions & 4 deletions docker/docker-compose-scalable.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ services:
start_period: 2s
environment:
CONFIGS_FILE: configs/configs-scalable.yaml
SAMPLE_SLACK_CHANNEL: C07NCL94SDT
SAMPLE_SLACK_MENTION: U07NFGGMB98
EXAMPLE_SLACK_CHANNEL: C07NCL94SDT
EXAMPLE_SLACK_MENTION: U07NFGGMB98
SLACK_WEBSOCKET_ENABLED: true
SLACK_MAIN_CHANNEL: C07NCL94SDT
SLACK_MAIN_MENTION: U07NFGGMB98
Expand Down Expand Up @@ -68,8 +68,8 @@ services:
start_period: 2s
environment:
CONFIGS_FILE: configs/configs-scalable.yaml
SAMPLE_SLACK_CHANNEL: C07NCL94SDT
SAMPLE_SLACK_MENTION: U07NFGGMB98
EXAMPLE_SLACK_CHANNEL: C07NCL94SDT
EXAMPLE_SLACK_MENTION: U07NFGGMB98
SLACK_WEBSOCKET_ENABLED: true
SLACK_MAIN_CHANNEL: C07NCL94SDT
SLACK_MAIN_MENTION: U07NFGGMB98
Expand Down
4 changes: 2 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Configuration
The basic configs are set through the `configs.yaml` file. This file is read when the application starts and all the settings will be loaded. The documentation found at [Configuration file](./configuration_file.md) provides an overview of the configuration parameters available.
The basic configs are set through the `configs.yaml` file. This file is read when the application starts and all the settings will be loaded. The documentation found at [Configuration file](configuration_file.md) provides an overview of the configuration parameters available.

The monitors path is also defined in the `configs.yaml` file. By default, it's set to the `sample_monitors` folder, but it can be changed to another folder if desired. The `configs.yaml` file also have other configurations that can be adjusted.
The monitors path is also defined in the `configs.yaml` file. By default, it's set to the `example_monitors` folder, but it can be changed to another folder if desired. The `configs.yaml` file also have other configurations that can be adjusted.

# Environment variables
> [!IMPORTANT]
Expand Down
4 changes: 2 additions & 2 deletions docs/configuration_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ This document provides an overview of the configuration parameters available in
- `plugins`: List of strings. Plugins to be used by Sentinela. Check each plugin documentation to learn how to enable them.

## Monitors
- `load_sample_monitors`: Boolean. Flag to enable the sample monitors.
- `sample_monitors_path`: String. Path relative to the project root, where the sample monitors are stored.
- `load_example_monitors`: Boolean. Flag to enable the example monitors.
- `example_monitors_path`: String. Path relative to the project root, where the example monitors are stored.
- `internal_monitors_path`: String. Path relative to the project root, where the internal monitors are stored.
- `internal_monitors_notification`: Map. Settings for the notification to be sent by the internal monitors.
- `enabled`: Boolean. Flag to enable the internal monitors notification.
Expand Down
2 changes: 1 addition & 1 deletion docs/diagrams/diagrams.drawio
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@
<mxCell id="FxgzXEkrF_b3o0yHyvXR-78" style="edgeStyle=orthogonalEdgeStyle;rounded=0;sketch=1;hachureGap=4;jiggle=2;curveFitting=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;fontFamily=Architects Daughter;fontSource=https%3A%2F%2Ffonts.googleapis.com%2Fcss%3Ffamily%3DArchitects%2BDaughter;" parent="1" source="FxgzXEkrF_b3o0yHyvXR-79" target="FxgzXEkrF_b3o0yHyvXR-81" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="FxgzXEkrF_b3o0yHyvXR-79" value="If controller is enabled, register internal and sample monitors" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;fillColor=#dae8fc;strokeColor=#6c8ebf;sketch=1;curveFitting=1;jiggle=2;fontFamily=Architects Daughter;" parent="1" vertex="1">
<mxCell id="FxgzXEkrF_b3o0yHyvXR-79" value="If controller is enabled, register internal and example monitors" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;fillColor=#dae8fc;strokeColor=#6c8ebf;sketch=1;curveFitting=1;jiggle=2;fontFamily=Architects Daughter;" parent="1" vertex="1">
<mxGeometry x="850" y="540" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="FxgzXEkrF_b3o0yHyvXR-80" style="edgeStyle=orthogonalEdgeStyle;rounded=0;sketch=1;hachureGap=4;jiggle=2;curveFitting=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;fontFamily=Architects Daughter;fontSource=https%3A%2F%2Ffonts.googleapis.com%2Fcss%3Ffamily%3DArchitects%2BDaughter;" parent="1" source="FxgzXEkrF_b3o0yHyvXR-81" target="FxgzXEkrF_b3o0yHyvXR-83" edge="1">
Expand Down
2 changes: 1 addition & 1 deletion docs/dropping_issues.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Dropping issues
In some cases, a monitor may encounter edge cases where certain issues cannot be resolved automatically. While these edge cases should be considered during monitor development, dropping issues manually can serve as a solution for unavoidable situations.

Issues can be dropped either via an [HTTP request](./http_server.md) or a [Slack message](./slack_commands.md).
Issues can be dropped either via an [HTTP request](http_server.md) or a [Slack message](slack_commands.md).

> [!IMPORTANT]
> Since dropping issues is intended only for specific scenarios, issues IDs should be manually retrieved by querying the Sentinela application database.
Expand Down
74 changes: 74 additions & 0 deletions docs/example_monitors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Example Monitors
This page describes all available example monitors that demonstrate different features and patterns.

The described behaviors can be visualized in the dashboard and are useful for learning how to implement various monitoring scenarios using Sentinela.

## Alert Options - Age Rule Monitor
Demonstrates the `AgeRule`. The alert priority is determined by the age of the oldest active issue. Issues age over time, and older issues trigger higher priority alerts.

**How it works**: The monitor creates a new issue every 5 minutes and measures its age in seconds. As issues get older, they trigger higher priority alerts according to the configured thresholds. Issues are automatically resolved after 5 minutes have passed since creation.

**Monitor code**: [Age Rule Monitor](/example_monitors/alert_options/age_rule_monitor/age_rule_monitor.py)

## Alert Options - Count Rule Monitor
Demonstrates the `CountRule`. The alert priority is determined by the number of active issues. More active issues trigger higher priority alerts.

**How it works**: The monitor creates 5 random issues every search cycle. The alert priority increases based on the total count of active issues linked to the alert. Issues can be automatically solved based on a severity field that fluctuates randomly.

**Monitor code**: [Count Rule Monitor](/example_monitors/alert_options/count_rule_monitor/count_rule_monitor.py)

## Alert Options - Value Rule Greater Than Monitor
Demonstrates the `ValueRule` with the `greater_than` operation. The alert priority is determined by a specific numerical value from the issue data.

**How it works**: The monitor tracks a single issue with an `error_rate` that oscillates from 0 to 100, back and forth. Alert priority increases when the error rate exceeds configured thresholds. The issue is never automatically solved, demonstrating continuous monitoring of a metric.

**Monitor code**: [Value Rule Greater Than Monitor](/example_monitors/alert_options/value_rule_greater_than_monitor/value_rule_greater_than_monitor.py)

## Alert Options - Value Rule Less Than Monitor
Demonstrates the `ValueRule` with the `less_than` operation. The alert priority is determined by a specific numerical value from the issue data.

**How it works**: Similar to the Greater Than Monitor but in reverse. This monitor tracks a single issue with a `success_rate` that oscillates from 0 to 100, back and forth. Alert priority increases when the success rate drops below thresholds, demonstrating monitoring for degraded performance.

**Monitor code**: [Value Rule Less Than Monitor](/example_monitors/alert_options/value_rule_lesser_than_monitor/value_rule_lesser_than_monitor.py)

## Blocking Operations Monitor
Demonstrates how to handle blocking operations in search and update functions without blocking the async event loop.

**How it works**: The monitor simulates a long blocking operation that would typically block the entire application. Using `asyncio.to_thread()`, the blocking call is executed in a separate thread, allowing the async event loop to remain responsive. Both `search()` and `update()` demonstrate this pattern, showing how to safely integrate synchronous blocking code into async monitor functions.

**Monitor code**: [Blocking Operations Monitor](/example_monitors/blocking_operations_monitor/blocking_operations_monitor.py)

## Non-Solvable Issues Monitor
Demonstrates configuring issues as non-solvable. Non-solvable issues require manual intervention to be solved and cannot be automatically resolved by the monitor logic.

**How it works**: The monitor simulates finding deactivated users and creates issues for them. With `solvable=False` and `unique=True`, only one issue per user is created. If the same user appears in subsequent searches, no new issue is generated. These issues can only be solved manually through the dashboard or notifications, when available.

**Monitor code**: [Non-Solvable Issues Monitor](/example_monitors/non_solvable_issues_monitor/non_solvable_issues_monitor.py)

## Plugin Slack Notification Monitor
Demonstrates how to configure Slack notifications for alerts.

**How it works**: This monitor is similar to the Count Rule Monitor but includes Slack notification configuration. It sends alerts to a configured Slack channel with customizable fields and optional mentions, showing how to integrate Sentinela alerts with Slack.

**Monitor code**: [Plugin Slack Notification Monitor](/example_monitors/plugin_slack_notification_monitor/plugin_slack_notification_monitor.py)

## Query Monitor
Demonstrates using the `query` function to fetch data from a database. Shows how to connect to and execute queries against configured databases.

**How it works**: The monitor executes a simple `SELECT current_timestamp;` query on the 'local' database. In `search()`, it creates a single non-solvable issue with the database timestamp. In `update()`, it refreshes the timestamp field with the latest database value. The actual query can be replaced with real data retrieval for production monitoring.

**Monitor code**: [Query Monitor](/example_monitors/query_monitor/query_monitor.py)

## Reactions Monitor
Demonstrates how to configure reactions. Reactions are async callbacks triggered by specific events during monitor execution.

**How it works**: Reactions are async functions that execute in response to specific monitor events (search completion, update completion, issue creation, etc.). They receive event payloads containing monitor and issue data. This example shows the available reactions with comments explaining when each runs and what data is available.

**Monitor code**: [Reactions Monitor](/example_monitors/reactions_monitor/reactions_monitor.py)

## Variables Monitor
Demonstrates the variables feature for maintaining monitor-level state. Variables store information about the monitor's execution, not about individual issues.

**How it works**: The monitor uses a variable to bookmark the last timestamp processed. This prevents reprocessing the same events across multiple monitor executions and makes searches more efficient. Variables are persisted across monitor runs and can store any data needed for monitor-level state management.

**Monitor code**: [Variables Monitor](/example_monitors/variables_monitor/variables_monitor.py)
25 changes: 19 additions & 6 deletions docs/how_to_run.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Local execution is recommended for developing monitors. It should not be used in

When running the application locally, it is recommended to use the internal queue instead of the SQS queue for faster and smoother operation. However, it is also possible to use the AWS queue mock or a real SQS queue.

1. Set the secrets in the `.env.secrets` file and environment variables in the `docker/docker-compose-local.yml` file, as specified in the [Configuration](./configuration.md) documentation.
1. Set the secrets in the `.env.secrets` file and environment variables in the `docker/docker-compose-local.yml` file, as specified in the [Configuration](configuration.md) documentation.
2. Migrate the database to the latest version. This is only necessary when running for the first time or after updates.
```shell
make migrate-local
Expand All @@ -81,7 +81,7 @@ For a more scalable deployment, it is recommended to use separate containers for

The `docker-compose` file for this setup includes a SQS queue mock, which is used by default. However, it is also possible to use the internal queue or a real SQS queue.

1. Set the secrets in the `.env.secrets` file and environment variables in the `docker/docker-compose-scalable.yml` file, as specified in the [Configuration](./configuration.md) documentation.
1. Set the secrets in the `.env.secrets` file and environment variables in the `docker/docker-compose-scalable.yml` file, as specified in the [Configuration](configuration.md) documentation.
2. Set the `replicas` parameter in the `docker/docker-compose-scalable.yml` file to the desired number of executors.
3. Migrate the database to the latest version. This is only necessary when running for the first time or after updates.
```shell
Expand All @@ -107,19 +107,25 @@ For production deployment, it is recommended to use a more complex setup with mu
- Requires an external database and message queue.

### Building the Image
The [Dockerfile](../Dockerfile) is a starting point for building the application image. This file implements the logic to install all dependencies for the enabled plugins.
The [Dockerfile](/docker/Dockerfile) is a starting point for building the application image. This file implements the logic to install all dependencies for the enabled plugins.

1. Install the dependencies for the application and enabled plugins.
```shell
poetry install --no-root --only $(get_plugins_list)
poetry install --only main

plugins=$(get_plugins_list)

if ! [ "x$plugins" = "x" ]; then
poetry install --only $plugins
fi
```

### Deploying the Application
In production deployment, it is recommended to deploy the controller and executors in separate containers or pods (in the case of a Kubernetes deployment). This method requires an external queue to allow communication between the controller and executors. A persistent database is also recommended to prevent data loss.

The files provided in the [Kubernetes template](../resources/kubernetes_template) directory can be used as a reference for a Kubernetes deployment.
The files provided in the [Kubernetes template](/resources/kubernetes_template) directory can be used as a reference for a Kubernetes deployment.

All services must have the environment variables set as specified in the [Configuration](./configuration.md) documentation.
All services must have the environment variables set as specified in the [Configuration](configuration.md) documentation.

Controllers and executors can be run by specifying them as parameters when starting the application:
1. Run the controller.
Expand All @@ -130,3 +136,10 @@ Controllers and executors can be run by specifying them as parameters when start
```shell
sentinela executor
```

# Gracefully Stopping Sentinela
Sentinela can be gracefully stopped by sending a `SIGINT` or `SIGTERM` signal to the process. This allows the application to finish processing any ongoing tasks before shutting down.

It's recommended to not forcefully kill the application because a monitor execution might be in progress, and killing the application would interrupt it, potentially leaving the monitor in an inconsistent state.

Sentinela has it's own internal process to check if there are any monitors in this inconsistent state and fixes them. However, it's still recommended to allow the application to gracefully stop.
Loading