Skip to content

Commit 14c977b

Browse files
committed
Misc cleanup
- Rename the struct Host -> Readyset to emphasize its specific nature and reflect the ProxySQL struct - Similarly rename the module hosts -> readyset to emphasize its specific nature and reflect the proxysql module - Rename ProxyStatus -> ProxySQLStatus to disambiguate the proxying we're talking about - Remove some unused returns - Use MIRROR_QUERY_TOKEN and DESTINATION_QUERY_TOKEN when possible instead of copy/paste - Fix capitalization of Readyset/ProxySQL - Fix conn typo - Remove some trailing whitespace - Remove some dead code - Some minor comment changes - Some minor README.md formatting changes
1 parent 080e71c commit 14c977b

File tree

9 files changed

+216
-254
lines changed

9 files changed

+216
-254
lines changed

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ description = "Readyset ProxySQL Scheduler"
2626
[package.metadata.deb]
2727
extended-description = """\
2828
Readyset ProxySQL Scheduler"""
29-
copyright = "2024, ReadySet, Inc."
30-
maintainer = "ReadySet, Inc. <[email protected]>"
29+
copyright = "2024, Readyset, Inc."
30+
maintainer = "Readyset, Inc. <[email protected]>"
3131
assets = [
3232
["target/release/readyset_proxysql_scheduler", "/usr/bin/readyset_proxysql_scheduler", "755"],
3333
["./readyset_proxysql_scheduler.cnf", "/etc/readyset_proxysql_scheduler.cnf", "644"],

README.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# Automatic Query Caching with Readyset ProxySQL Scheduler
2-
Unlock the full potential of your database integrating ReadySet and ProxySQL by automatically analyzing and caching inefficient queries in your workload. Experience optimized performance with zero code changes—your application runs faster, effortlessly.
2+
Unlock the full potential of your database integrating Readyset and ProxySQL by automatically analyzing and caching inefficient queries in your workload. Experience optimized performance with zero code changes—your application runs faster, effortlessly.
33

44

55
# Workflow
66
This scheduler executes the following steps:
77

88
1. Locks an in disk file (configured by `lock_file`) to avoid multiple instances of the scheduler to overlap their execution.
9-
2. If `mode=(All|HealthCheck)` - Query `mysql_servers` and check all servers that have `comment='Readyset` (case insensitive) and `hostgroup=readyset_hostgroup`. For each server it checks if it can connect to Readyset and validate the output of `Status` and act as follow:
9+
2. If `operation_mode=("All"|"HealthCheck")` - Query `mysql_servers` and check all servers that have `comment='Readyset` (case insensitive) and `hostgroup=readyset_hostgroup`. For each server it checks if it can connect to Readyset and validate the output of `Status` and act as follow:
1010
* `Online` - Adjust the server status to `ONLINE` in ProxySQL.
1111
* `Maitenance Mode` - Adjust the server status to `OFFLINE_SOFT` in ProxySQL.
1212
* `Snapshot In Progress` - Adjust the server status to `SHUNNED` in ProxySQL.
13-
4. If `mode=(All|QueryDiscovery)` Query the table `stats_mysql_query_digest` finding queries executed at `source_hostgroup` by `readyset_user` and validates if each query is supported by Readyset. The rules to order queries are configured by [Query Discovery](#query-discovery) configurations.
13+
4. If `operation_mode=("All"|"QueryDiscovery")` Query the table `stats_mysql_query_digest` finding queries executed at `source_hostgroup` by `readyset_user` and validates if each query is supported by Readyset. The rules to order queries are configured by [Query Discovery](#query-discovery) configurations.
1414
3. If the query is supported it adds a cache in Readyset by executing `CREATE CACHE FROM __query__`.
1515
4. If `warmup_time_s` is NOT configure, a new query rule will be added redirecting this query to Readyset
1616
5. If `warmup_time_s` is configured, a new query rule will be added to mirror this query to Readyset. The query will still be redirected to the original hostgroup
@@ -39,64 +39,64 @@ SAVE SCHEDULER TO DISK;
3939
```
4040

4141
Configure `/etc/readyset_proxysql_scheduler.cnf` as follow:
42-
* `proxysql_user` - (Required) - Proxysql admin user
43-
* `proxysql_password` - (Required) - Proxysql admin password
44-
* `proxysql_host` - (Required) - Proxysql admin host
45-
* `proxysql_port` - (Required) - Proxysql admin port
42+
* `proxysql_user` - (Required) - ProxySQL admin user
43+
* `proxysql_password` - (Required) - ProxySQL admin password
44+
* `proxysql_host` - (Required) - ProxySQL admin host
45+
* `proxysql_port` - (Required) - ProxySQL admin port
4646
* `readyset_user` - (Required) - Readyset application user
4747
* `readyset_password` - (Required) - Readyset application password
4848
* `source_hostgroup` - (Required) - Hostgroup running your Read workload
4949
* `readyset_hostgroup` - (Required) - Hostgroup where Readyset is configure
50-
* `warmup_time_s` - (Optional) - Time in seconds to mirror a query supported before redirecting the query to Readyset (Default 0 - no mirror)
51-
* `lock_file` - (Optional) - Lock file to prevent two instances of the scheduler to run at the same time (Default '/etc/readyset_scheduler.lock')
52-
* `operation_mode` - (Optional) - Operation mode to run the scheduler. The options are described in [Operation Mode](#operation-mode) (Default All).
53-
* `number_of_queries` - (Optional) - Number of queries to cache in Readyset (Default 10).
54-
* `query_discovery_mode` / `query_discovery_min_execution` / `query_discovery_min_row_sent` - (Optional) - Query Discovery configurations. The options are described in [Query Discovery](#query-discovery) (Default CountStar / 0 / 0).
50+
* `warmup_time_s` - (Optional) - Time in seconds to mirror a query supported before redirecting the query to Readyset (Default `0` - no mirror)
51+
* `lock_file` - (Optional) - Lock file to prevent two instances of the scheduler to run at the same time (Default `"/etc/readyset_scheduler.lock"`)
52+
* `operation_mode` - (Optional) - Operation mode to run the scheduler. The options are described in [Operation Mode](#operation-mode) (Default `"All"`).
53+
* `number_of_queries` - (Optional) - Number of queries to cache in Readyset (Default `10`).
54+
* `query_discovery_mode` / `query_discovery_min_execution` / `query_discovery_min_row_sent` - (Optional) - Query Discovery configurations. The options are described in [Query Discovery](#query-discovery) (Default `"CountStar"` / `0` / `0`).
5555

5656

5757
# Query Discovery
5858
The Query Discovery is a set of configuration to find queries that are supported by Readyset. The configurations are defined by the following fields:
5959

60-
* `query_discovery_mode`: (Optional) - Mode to discover queries to automatically cache in Readyset. The options are described in [Query Discovery Mode](#query-discovery-mode) (Default CountStar).
61-
* `query_discovery_min_execution`: (Optional) - Minimum number of executions of a query to be considered a candidate to be cached (Default 0).
62-
* `query_discovery_min_row_sent`: (Optional) - Minimum number of rows sent by a query to be considered a candidate to be cached (Default 0).
60+
* `query_discovery_mode`: (Optional) - Mode to discover queries to automatically cache in Readyset. The options are described in [Query Discovery Mode](#query-discovery-mode) (Default `"CountStar"`).
61+
* `query_discovery_min_execution`: (Optional) - Minimum number of executions of a query to be considered a candidate to be cached (Default `0`).
62+
* `query_discovery_min_row_sent`: (Optional) - Minimum number of rows sent by a query to be considered a candidate to be cached (Default `0`).
6363

6464
# Query Discovery Mode
6565
The Query Discovery Mode is a set of possible rules to discover queries to automatically cache in Readyset. The options are:
6666

67-
1. `CountStar` - Total Number of Query Executions
67+
1. `"CountStar"` - Total Number of Query Executions
6868
* Formula: `total_executions = count_star`
6969
* Description: This metric gives the total number of times the query has been executed. It is valuable for understanding how frequently the query runs. A high count_star value suggests that the query is executed often.
7070

71-
2. `SumTime` - Total Time Spent Executing the Query
71+
2. `"SumTime"` - Total Time Spent Executing the Query
7272
* Formula: `total_execution_time = sum_time`
7373
* Description: This metric represents the total cumulative time spent (measured in microseconds) executing the query across all its executions. It provides a clear understanding of how much processing time the query is consuming over time. A high total execution time can indicate that the query is either frequently executed or is time-intensive to process.
7474

75-
3. `SumRowsSent` - Total Number of Rows Sent by the Query (sum_rows_sent)
75+
3. `"SumRowsSent"` - Total Number of Rows Sent by the Query (sum_rows_sent)
7676
* Formula: `total_rows_sent = sum_rows_sent`
77-
* Description: This metric provides the total number of rows sent to the client across all executions of the query. It helps you understand the query’s output volume and the amount of data being transmitted.
77+
* Description: This metric provides the total number of rows sent to the client across all executions of the query. It helps you understand the query’s output volume and the amount of data being transmitted.
7878

79-
4. `MeanTime` - Average Query Execution Time (Mean)
79+
4. `"MeanTime"` - Average Query Execution Time (Mean)
8080
* Formula: `mean_time = sum_time / count_star`
8181
* Description: The mean time gives you an idea of the typical performance (measured in microseconds) of the query over all executions. It provides a central tendency of how long the query generally takes to execute.
8282

83-
5. `ExecutionTimeDistance` - Time Distance Between Query Executions
83+
5. `"ExecutionTimeDistance"` - Time Distance Between Query Executions
8484
* Formula: `execution_time_distance = max_time - min_time`
8585
* Description: This shows the spread between the fastest and slowest executions of the query (measured in microseconds). A large range might indicate variability in system load, input sizes, or external factors affecting performance.
8686

87-
6. `QueryThroughput` - Query Throughput
87+
6. `"QueryThroughput"` - Query Throughput
8888
* Formula: `query_throughput = count_star / sum_time`
8989
* Description: This shows how many queries are processed per unit of time (measured in microseconds). It’s useful for understanding system capacity and how efficiently the database is handling the queries.
9090

91-
7. `WorstBestCase` - Worst Best-Case Query Performance
91+
7. `"WorstBestCase"` - Worst Best-Case Query Performance
9292
* Formula: `worst_case = max(min_time)`
9393
* Description: The min_time metric gives the fastest time the query was ever executed (measured in microseconds). It reflects the best-case performance scenario, which could indicate the query’s performance under optimal conditions.
9494

95-
8. `WorstWorstCase` - Worst Worst-Case Query Performance
95+
8. `"WorstWorstCase"` - Worst Worst-Case Query Performance
9696
* Formula: `worst_case = max(max_time)`
9797
* Description: The max_time shows the slowest time the query was executed (measured in microseconds). This can indicate potential bottlenecks or edge cases where the query underperforms, which could be due to larger data sets, locks, or high server load.
9898

99-
9. `DistanceMeanMax` - Distance Between Mean Time and Max Time (mean_time vs max_time)
99+
9. `"DistanceMeanMax"` - Distance Between Mean Time and Max Time (mean_time vs max_time)
100100
* Formula: `distance_mean_max = max_time - mean_time`
101101
* Description: The distance between the mean execution time and the maximum execution time provides insight into how much slower the worst-case execution is compared to the average (measured in microseconds). A large gap indicates significant variability in query performance, which could be caused by certain executions encountering performance bottlenecks, such as large datasets, locking, or high system load.
102102

src/config.rs

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
use serde::Deserialize;
12
use std::{
23
fmt::{Display, Formatter},
34
fs::File,
@@ -6,7 +7,7 @@ use std::{
67

78
use crate::messages::MessageType;
89

9-
#[derive(serde::Deserialize, Clone, Copy, PartialEq, PartialOrd, Default, Debug)]
10+
#[derive(Deserialize, Clone, Copy, PartialEq, PartialOrd, Default, Debug)]
1011
pub enum OperationMode {
1112
HealthCheck,
1213
QueryDiscovery,
@@ -35,7 +36,7 @@ impl Display for OperationMode {
3536
}
3637
}
3738

38-
#[derive(serde::Deserialize, Clone, Copy, PartialEq, PartialOrd, Default, Debug)]
39+
#[derive(Deserialize, Clone, Copy, PartialEq, PartialOrd, Default, Debug)]
3940
pub enum QueryDiscoveryMode {
4041
#[default]
4142
CountStar,
@@ -50,25 +51,7 @@ pub enum QueryDiscoveryMode {
5051
External,
5152
}
5253

53-
impl From<String> for QueryDiscoveryMode {
54-
fn from(s: String) -> Self {
55-
match s.to_lowercase().as_str() {
56-
"count_star" => QueryDiscoveryMode::CountStar,
57-
"sum_time" => QueryDiscoveryMode::SumTime,
58-
"sum_rows_sent" => QueryDiscoveryMode::SumRowsSent,
59-
"mean_time" => QueryDiscoveryMode::MeanTime,
60-
"execution_time_distance" => QueryDiscoveryMode::ExecutionTimeDistance,
61-
"query_throughput" => QueryDiscoveryMode::QueryThroughput,
62-
"worst_best_case" => QueryDiscoveryMode::WorstBestCase,
63-
"worst_worst_case" => QueryDiscoveryMode::WorstWorstCase,
64-
"distance_mean_max" => QueryDiscoveryMode::DistanceMeanMax,
65-
"external" => QueryDiscoveryMode::External,
66-
_ => QueryDiscoveryMode::CountStar,
67-
}
68-
}
69-
}
70-
71-
#[derive(serde::Deserialize, Clone, Debug)]
54+
#[derive(Deserialize, Clone, Debug)]
7255
pub struct Config {
7356
pub proxysql_user: String,
7457
pub proxysql_password: String,

src/main.rs

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
mod config;
2-
mod hosts;
32
mod messages;
43
mod proxysql;
54
mod queries;
5+
mod readyset;
66

77
use clap::Parser;
88
use config::read_config_file;
@@ -13,7 +13,7 @@ use proxysql::ProxySQL;
1313
use std::fs::OpenOptions;
1414

1515
/// Readyset ProxySQL Scheduler
16-
/// This tool is used to query ProxySQL Stats tables to find queries that are not yet cached in Readyset and then cache them.
16+
/// This tool is used to query ProxySQL stats tables to find queries that are not yet cached in Readyset and then cache them.
1717
#[derive(Parser, Debug)]
1818
#[command(version, about, long_about = None)]
1919
struct Args {
@@ -79,8 +79,6 @@ fn main() {
7979
proxysql.health_check();
8080
}
8181

82-
// retain only healthy hosts
83-
//hosts.retain_online();
8482
if running_mode == config::OperationMode::QueryDiscovery
8583
|| running_mode == config::OperationMode::All
8684
{
@@ -93,7 +91,7 @@ fn main() {
9391
.prefer_socket(false),
9492
)
9593
.expect("Failed to create ProxySQL connection");
96-
let mut query_discovery = queries::QueryDiscovery::new(config);
94+
let mut query_discovery = queries::QueryDiscovery::new(&config);
9795
query_discovery.run(&mut proxysql, &mut conn);
9896
}
9997

src/messages.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@ use std::process;
22

33
use chrono::{DateTime, Local};
44
use once_cell::sync::Lazy;
5+
use serde::Deserialize;
56
use std::sync::Mutex;
67

7-
#[derive(Clone, Copy, serde::Deserialize, Debug, Default, PartialEq, PartialOrd)]
8+
#[derive(Clone, Copy, Deserialize, Debug, Default, PartialEq, PartialOrd)]
89
pub enum MessageType {
910
/// Information message, this will not result in any action
1011
Info,

0 commit comments

Comments
 (0)