You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following #7809, this ticket is intended to summarize what we currently know about memory-bound scaling of the SecureDrop server, what we don't know, and some possible next steps. With the exception of my first recommendation (1) below, this is mostly a menu of options for future consideration.
Background
The SecureDrop Journalist App serves submitted files, submitted messages, and replies as files, via Flask's streaming send_file(). So this data is disk-bound for persistence, not memory-bound at all.
The Journalist APIs are memory-bound in how they return metadata about (in order of decreasing quantity):
submissions + replies
sources
journalists
That is, the APIs load one or more of these datasets entirely from SQLite into memory in order to serialize them into JSON. But they do so differently, with different bounds.
v1 Journalist API
The v1 API has individual bulk endpoints for these datasets:
graph BT
subgraph "tables (stored on disk)"
submissions
replies
sources
journalists
end
submissions --> /submissions
replies --> /replies
sources --> /sources
journalists --> /users
subgraph "endpoints (serialized in RAM)"
/submissions
/replies
/sources
/users
end
Loading
v2 Journalist API
The v2 API consolidates endpoints and offers a polling interface (/index) that queries the entire dataset in order to transmitas little of it as possible:
graph BT
index((index))
index --"always<br>(counts")--> /token
index --"always"--> /index
subgraph "tables (stored on disk)"
submissions
replies
sources
journalists
end
submissions --> index
replies --> index
sources --> index
journalists --> index
submissions --worst-case--> /data
replies --worst-case--> /data
sources --worst-case--> /data
journalists --worst-case--> /data
subgraph "endpoints (serialized in RAM)"
/token
/index
/data
end
Loading
In #7809 (review), I found that a database of about 4750 total records—
vagrant@app-prod:~$ sudo -u www-data sqlite3 /var/lib/securedrop/db.sqlitesqlite> SELECT COUNT() from sources;906sqlite> SELECT COUNT() from submissions;2564sqlite> SELECT COUNT() from replies;1282
—produced an index of about 0.5 MB. All the data going into that index, if resident in memory, left the server with very little headroom by default; with much more with #7809's malloc_trim() intervention; and with presumably more still if we were to run Apache and therefore Python under jemalloc.
Open questions and possible next steps
I think we should patch a production server to run with jemalloc, use it with the Inbox for a month, and see what it looks like under load. Unless it turns out to be somehow worse than malloc_trim() or introduces other problems (e.g., on a grsecurity kernel), it seems like it should be preferable for our stack and workload.
This will let us stop worrying about "Can a server survive long-lived Apache processes holding claims to the entire dataset resident in memory?" and focus on "How can we most efficiently interact with the dataset in memory?"
We don't actually know the relationship of either (a) dataset size (on disk) or (b) response size (on the wire) with (c) resident size (in memory). We could measure this more precisely than I was able to do in [2.15.1] Forcibly free memory after heavy APIv2 requests #7809 (review), to establish some kind of predictive estimate: e.g., you need $x$ GB of RAM to comfortably run a SecureDrop server with $y$ total records.
Even if we want to support arbitrarily large servers, however, I would sooner spend engineering effort on these alternatives, in order of preference:
Most requests to the Journalist API are reads. Most requests to /index will return a zero-length HTTP 304 response; and after a given client's initial sync most of its requests to /datawill be very small. We could adopt a trivial caching pattern, e.g.:
Compute the index on start-up.
Recompute the index on every write via the v2 Journalist API, so that journalists' Inbox sessions propagate their changes quickly.
Recompute the index on some interval (even every 60 seconds!), so that changes from the Source and Journalist Interfaces propagate quickly enough.
In addition:
This process could be sharded, so that the server only loads each shard into memory at a time, without requiring any changes in the Inbox.
Since the index is global, we could "cache" it by writing it to disk, where it could be served via send_file() without loading it into memory at all.
Following #7809, this ticket is intended to summarize what we currently know about memory-bound scaling of the SecureDrop server, what we don't know, and some possible next steps. With the exception of my first recommendation (1) below, this is mostly a menu of options for future consideration.
Background
The SecureDrop Journalist App serves submitted files, submitted messages, and replies as files, via Flask's streaming
send_file(). So this data is disk-bound for persistence, not memory-bound at all.The Journalist APIs are memory-bound in how they return metadata about (in order of decreasing quantity):
That is, the APIs load one or more of these datasets entirely from SQLite into memory in order to serialize them into JSON. But they do so differently, with different bounds.
v1 Journalist API
The v1 API has individual bulk endpoints for these datasets:
v2 Journalist API
The v2 API consolidates endpoints and offers a polling interface (
/index) that queries the entire dataset in order to transmit as little of it as possible:In #7809 (review), I found that a database of about 4750 total records—
—produced an index of about 0.5 MB. All the data going into that index, if resident in memory, left the server with very little headroom by default; with much more with #7809's
malloc_trim()intervention; and with presumably more still if we were to run Apache and therefore Python under jemalloc.Open questions and possible next steps
malloc_trim()or introduces other problems (e.g., on a grsecurity kernel), it seems like it should be preferable for our stack and workload.This will let us stop worrying about "Can a server survive long-lived Apache processes holding claims to the entire dataset resident in memory?" and focus on "How can we most efficiently interact with the dataset in memory?"
Even if we want to support arbitrarily large servers, however, I would sooner spend engineering effort on these alternatives, in order of preference:
Most requests to the Journalist API are reads. Most requests to
/indexwill return a zero-length HTTP304response; and after a given client's initial sync most of its requests to/datawill be very small. We could adopt a trivial caching pattern, e.g.:In addition:
send_file()without loading it into memory at all.The server has supported sharding the inbox since feat(index): let client request arbitrary metadata shards, hinted at login #7770. We just need to teach the Inbox to take advantage of it.