Skip to content

Commit 0111079

Browse files
authored
Make the auto compressor uploadable to pypi (#75)
1 parent bf57e81 commit 0111079

File tree

15 files changed

+118
-98
lines changed

15 files changed

+118
-98
lines changed

Cargo.lock

+21-21
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[workspace]
2-
members = ["auto_compressor", "compressor_integration_tests"]
2+
members = ["synapse_auto_compressor", "compressor_integration_tests"]
33

44
[package]
55
authors = ["Erik Johnston"]

README.md

+41-40
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
This workspace contains experimental tools that attempt to reduce the number of
44
rows in the `state_groups_state` table inside of a Synapse Postgresql database.
55

6-
# Automated tool: auto_compressor
6+
# Automated tool: synapse_auto_compressor
77

88
## Introduction:
99

1010
This tool is significantly more simple to use than the manual tool (described below).
1111
It scans through all of the rows in the `state_groups` database table from the start. When
1212
it finds a group that hasn't been compressed, it runs the compressor for a while on that
1313
group's room, saving where it got up to. After compressing a number of these chunks it stops,
14-
saving where it got up to for the next run of the `auto_compressor`.
14+
saving where it got up to for the next run of the `synapse_auto_compressor`.
1515

1616
It creates three extra tables in the database: `state_compressor_state` which stores the
1717
information needed to stop and start the compressor for each room, `state_compressor_progress`
@@ -21,41 +21,42 @@ which stores how far through the `state_groups` table the compressor has scanned
2121
The tool can be run manually when you are running out of space, or be scheduled to run
2222
periodically.
2323

24-
## Building
24+
## Building
2525

2626
This tool requires `cargo` to be installed. See https://www.rust-lang.org/tools/install
2727
for instructions on how to do this.
2828

29-
To build `auto_compressor`, clone this repository and navigate to the `autocompressor/`
30-
subdirectory. Then execute `cargo build`.
29+
To build `synapse_auto_compressor`, clone this repository and navigate to the
30+
`synapse_auto_compressor/` subdirectory. Then execute `cargo build`.
3131

32-
This will create an executable and store it in `auto_compressor/target/debug/auto_compressor`.
32+
This will create an executable and store it in
33+
`synapse_auto_compressor/target/debug/synapse_auto_compressor`.
3334

3435
## Example usage
3536
```
36-
$ auto_compressor -p postgresql://user:pass@localhost/synapse -c 500 -n 100
37+
$ synapse_auto_compressor -p postgresql://user:pass@localhost/synapse -c 500 -n 100
3738
```
3839
## Running Options
3940

40-
- -p [POSTGRES_LOCATION] **Required**
41+
- -p [POSTGRES_LOCATION] **Required**
4142
The configuration for connecting to the Postgres database. This should be of the form
4243
`"postgresql://username:[email protected]/database"` or a key-value pair
4344
string: `"user=username password=password dbname=database host=mydomain.com"`
4445
See https://docs.rs/tokio-postgres/0.7.2/tokio_postgres/config/struct.Config.html
4546
for the full details.
4647

47-
- -c [CHUNK_SIZE] **Required**
48+
- -c [CHUNK_SIZE] **Required**
4849
The number of state groups to work on at once. All of the entries from state_groups_state are
4950
requested from the database for state groups that are worked on. Therefore small chunk
5051
sizes may be needed on machines with low memory. Note: if the compressor fails to find
5152
space savings on the chunk as a whole (which may well happen in rooms with lots of backfill
5253
in) then the entire chunk is skipped.
5354

54-
- -n [CHUNKS_TO_COMPRESS] **Required**
55+
- -n [CHUNKS_TO_COMPRESS] **Required**
5556
*CHUNKS_TO_COMPRESS* chunks of size *CHUNK_SIZE* will be compressed. The higher this
5657
number is set to, the longer the compressor will run for.
5758

58-
- -d [LEVELS]
59+
- -d [LEVELS]
5960
Sizes of each new level in the compression algorithm, as a comma-separated list.
6061
The first entry in the list is for the lowest, most granular level, with each
6162
subsequent entry being for the next highest level. The number of entries in the
@@ -67,14 +68,14 @@ given set of state. [defaults to "100,50,25"]
6768
## Scheduling the compressor
6869
The automatic tool may put some strain on the database, so it might be best to schedule
6970
it to run at a quiet time for the server. This could be done by creating an executable
70-
script and scheduling it with something like
71+
script and scheduling it with something like
7172
[cron](https://www.man7.org/linux/man-pages/man1/crontab.1.html).
7273

7374
# Manual tool: synapse_compress_state
7475

7576
## Introduction
7677

77-
A manual tool that reads in the rows from `state_groups_state` and `state_group_edges`
78+
A manual tool that reads in the rows from `state_groups_state` and `state_group_edges`
7879
tables for a specified room and calculates the changes that could be made that
7980
(hopefully) will significantly reduce the number of rows.
8081

@@ -85,7 +86,7 @@ that if `-t` is given then each change to a particular state group is wrapped
8586
in a transaction). If you do wish to send the changes to the database automatically
8687
then the `-c` flag can be set.
8788

88-
The SQL generated is safe to apply against the database with Synapse running.
89+
The SQL generated is safe to apply against the database with Synapse running.
8990
This is because the `state_groups` and `state_groups_state` tables are append-only:
9091
once written to the database, they are never modified. There is therefore no danger
9192
of a modification racing against a running Synapse. Further, this script makes its
@@ -95,7 +96,7 @@ from any of the queries that Synapse performs.
9596
The tool will also ensure that the generated state deltas do give the same state
9697
as the existing state deltas before generating any SQL.
9798

98-
## Building
99+
## Building
99100

100101
This tool requires `cargo` to be installed. See https://www.rust-lang.org/tools/install
101102
for instructions on how to do this.
@@ -125,54 +126,54 @@ $ psql synapse < out.data
125126

126127
## Running Options
127128

128-
- -p [POSTGRES_LOCATION] **Required**
129+
- -p [POSTGRES_LOCATION] **Required**
129130
The configuration for connecting to the Postgres database. This should be of the form
130131
`"postgresql://username:[email protected]/database"` or a key-value pair
131132
string: `"user=username password=password dbname=database host=mydomain.com"`
132133
See https://docs.rs/tokio-postgres/0.7.2/tokio_postgres/config/struct.Config.html
133134
for the full details.
134135

135-
- -r [ROOM_ID] **Required**
136+
- -r [ROOM_ID] **Required**
136137
The room to process (this is the value found in the `rooms` table of the database
137138
not the common name for the room - it should look like: "!wOlkWNmgkAZFxbTaqj:matrix.org".
138139

139-
- -b [MIN_STATE_GROUP]
140+
- -b [MIN_STATE_GROUP]
140141
The state group to start processing from (non-inclusive).
141142

142-
- -n [GROUPS_TO_COMPRESS]
143+
- -n [GROUPS_TO_COMPRESS]
143144
How many groups to load into memory to compress (starting
144145
from the 1st group in the room or the group specified by -b).
145146

146-
- -l [LEVELS]
147+
- -l [LEVELS]
147148
Sizes of each new level in the compression algorithm, as a comma-separated list.
148-
The first entry in the list is for the lowest, most granular level, with each
149+
The first entry in the list is for the lowest, most granular level, with each
149150
subsequent entry being for the next highest level. The number of entries in the
150151
list determines the number of levels that will be used. The sum of the sizes of
151152
the levels affects the performance of fetching the state from the database, as the
152153
sum of the sizes is the upper bound on the number of iterations needed to fetch a
153154
given set of state. [defaults to "100,50,25"]
154155

155-
- -m [COUNT]
156+
- -m [COUNT]
156157
If the compressor cannot save this many rows from the database then it will stop early.
157158

158-
- -s [MAX_STATE_GROUP]
159+
- -s [MAX_STATE_GROUP]
159160
If a max_state_group is specified then only state groups with id's lower than this
160161
number can be compressed.
161162

162-
- -o [FILE]
163+
- -o [FILE]
163164
File to output the SQL transactions to (for later running on the database).
164165

165-
- -t
166+
- -t
166167
If this flag is set then each change to a particular state group is wrapped in a
167168
transaction. This should be done if you wish to apply the changes while synapse is
168169
still running.
169170

170-
- -c
171+
- -c
171172
If this flag is set then the changes the compressor makes will be committed to the
172173
database. This should be safe to use while synapse is running as it wraps the changes
173174
to every state group in it's own transaction (as if the transaction flag was set).
174175

175-
- -g
176+
- -g
176177
If this flag is set then output the node and edge information for the state_group
177178
directed graph built up from the predecessor state_group links. These can be looked
178179
at in something like Gephi (https://gephi.org).
@@ -196,10 +197,10 @@ $ docker-compose down
196197
# Using the synapse_compress_state library
197198

198199
If you want to use the compressor in another project, it is recomended that you
199-
use jemalloc `https://github.com/gnzlbg/jemallocator`.
200+
use jemalloc `https://github.com/gnzlbg/jemallocator`.
200201

201202
To prevent the progress bars from being shown, use the `no-progress-bars` feature.
202-
(See `auto_compressor/Cargo.toml` for an example)
203+
(See `synapse_auto_compressor/Cargo.toml` for an example)
203204

204205
# Troubleshooting
205206

@@ -216,29 +217,29 @@ from the machine where Postgres is running, the url will be the following:
216217
### From remote machine
217218

218219
If you wish to connect from a different machine, you'll need to edit your Postgres settings to allow
219-
remote connections. This requires updating the
220+
remote connections. This requires updating the
220221
[`pg_hba.conf`](https://www.postgresql.org/docs/current/auth-pg-hba-conf.html) and the `listen_addresses`
221222
setting in [`postgresql.conf`](https://www.postgresql.org/docs/current/runtime-config-connection.html)
222223

223224
## Printing debugging logs
224225

225-
The amount of output the tools produce can be altered by setting the RUST_LOG
226-
environment variable to something.
226+
The amount of output the tools produce can be altered by setting the RUST_LOG
227+
environment variable to something.
227228

228-
To get more logs when running the auto_compressor tool try the following:
229+
To get more logs when running the synapse_auto_compressor tool try the following:
229230

230231
```
231-
$ RUST_LOG=debug auto_compressor -p postgresql://user:pass@localhost/synapse -c 50 -n 100
232+
$ RUST_LOG=debug synapse_auto_compressor -p postgresql://user:pass@localhost/synapse -c 50 -n 100
232233
```
233234

234-
If you want to suppress all the debugging info you are getting from the
235+
If you want to suppress all the debugging info you are getting from the
235236
Postgres client then try:
236237

237238
```
238-
RUST_LOG=auto_compressor=debug,synapse_compress_state=debug auto_compressor [etc.]
239+
RUST_LOG=synapse_auto_compressor=debug,synapse_compress_state=debug synapse_auto_compressor [etc.]
239240
```
240241

241-
This will only print the debugging information from those two packages. For more info see
242+
This will only print the debugging information from those two packages. For more info see
242243
https://docs.rs/env_logger/0.9.0/env_logger/.
243244

244245
## Building difficulties
@@ -248,7 +249,7 @@ and building on Linux will also require `pkg-config`
248249

249250
This can be done on Ubuntu with: `$ apt-get install libssl-dev pkg-config`
250251

251-
Note that building requires quite a lot of memory and out-of-memory errors might not be
252+
Note that building requires quite a lot of memory and out-of-memory errors might not be
252253
obvious. It's recomended you only build these tools on machines with at least 2GB of RAM.
253254

254255
## Auto Compressor skips chunks when running on already compressed room
@@ -265,8 +266,8 @@ be a large problem.
265266

266267
## Compressor is trying to increase the number of rows
267268

268-
Backfilling can lead to issues with compression. The auto_compressor will
269-
skip chunks it can't reduce the size of and so this should help jump over the backfilled
269+
Backfilling can lead to issues with compression. The synapse_auto_compressor will
270+
skip chunks it can't reduce the size of and so this should help jump over the backfilled
270271
state_groups. Lots of state resolution might also impact the ability to use the compressor.
271272

272273
To examine the state_group hierarchy run the manual tool on a room with the `-g` option

compressor_integration_tests/Cargo.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ postgres = "0.19.0"
1313
postgres-openssl = "0.5.0"
1414
rand = "0.8.0"
1515
synapse_compress_state = { path = "../", features = ["no-progress-bars"] }
16-
auto_compressor = { path = "../auto_compressor/" }
16+
synapse_auto_compressor = { path = "../synapse_auto_compressor/" }
1717
env_logger = "0.9.0"
1818
log = "0.4.14"
1919

2020
[dependencies.state-map]
21-
git = "https://github.com/matrix-org/rust-matrix-state-map"
21+
git = "https://github.com/matrix-org/rust-matrix-state-map"

compressor_integration_tests/src/lib.rs

+4-4
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ fn collapse_state_with_database(state_group: i64) -> StateMap<Atom> {
179179
// the predecessor (so have split this into a different query)
180180
let query_pred = r#"
181181
SELECT prev_state_group
182-
FROM state_group_edges
182+
FROM state_group_edges
183183
WHERE state_group = $1
184184
"#;
185185

@@ -243,7 +243,7 @@ pub fn database_structure_matches_map(state_group_map: &BTreeMap<i64, StateGroup
243243
// the predecessor (so have split this into a different query)
244244
let query_pred = r#"
245245
SELECT prev_state_group
246-
FROM state_group_edges
246+
FROM state_group_edges
247247
WHERE state_group = $1
248248
"#;
249249

@@ -356,7 +356,7 @@ fn functions_are_self_consistent() {
356356
}
357357

358358
pub fn setup_logger() {
359-
// setup the logger for the auto_compressor
359+
// setup the logger for the synapse_auto_compressor
360360
// The default can be overwritten with RUST_LOG
361361
// see the README for more information
362362
if env::var("RUST_LOG").is_err() {
@@ -366,7 +366,7 @@ pub fn setup_logger() {
366366
// default to printing the debug information for both packages being tested
367367
// (Note that just setting the global level to debug will log every sql transaction)
368368
log_builder.filter_module("synapse_compress_state", LevelFilter::Debug);
369-
log_builder.filter_module("auto_compressor", LevelFilter::Debug);
369+
log_builder.filter_module("synapse_auto_compressor", LevelFilter::Debug);
370370
// use try_init() incase the logger has been setup by some previous test
371371
let _ = log_builder.try_init();
372372
} else {

compressor_integration_tests/tests/auto_compressor_manager_tests.rs

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
use std::collections::BTreeMap;
22

3-
use auto_compressor::{
4-
manager::{compress_chunks_of_database, run_compressor_on_room_chunk},
5-
state_saving::{connect_to_database, create_tables_if_needed},
6-
};
73
use compressor_integration_tests::{
84
add_contents_to_database, clear_compressor_state, database_collapsed_states_match_map,
95
database_structure_matches_map, empty_database,
@@ -14,6 +10,10 @@ use compressor_integration_tests::{
1410
setup_logger, DB_URL,
1511
};
1612
use serial_test::serial;
13+
use synapse_auto_compressor::{
14+
manager::{compress_chunks_of_database, run_compressor_on_room_chunk},
15+
state_saving::{connect_to_database, create_tables_if_needed},
16+
};
1717
use synapse_compress_state::Level;
1818

1919
#[test]

0 commit comments

Comments
 (0)