APIv2: options for memory-bound scaling

Following #7809, this ticket is intended to summarize what we currently know about memory-bound scaling of the SecureDrop server, what we don't know, and some possible next steps.  With the exception of my first recommendation (1) below, this is mostly a menu of options for future consideration.


## Background

The SecureDrop Journalist App serves submitted files, submitted messages, and replies *as files*, [via Flask's streaming `send_file()`][send_file].  So this data is disk-bound for persistence, not memory-bound at all.

The Journalist APIs are memory-bound in how they return metadata about (in order of decreasing quantity):
1. submissions + replies
2. sources
3. journalists

That is, the APIs load one or more of these datasets entirely from SQLite into memory in order to serialize them into JSON.  But they do so differently, with different bounds.


## v1 Journalist API

The v1 API has individual bulk endpoints for these datasets:

```mermaid
graph BT

subgraph "tables (stored on disk)"
submissions
replies
sources
journalists
end
submissions --> /submissions
replies --> /replies
sources --> /sources
journalists --> /users

subgraph "endpoints (serialized in RAM)"
/submissions
/replies
/sources

/users
end
```


## v2 Journalist API

The v2 API consolidates endpoints and offers a polling interface (`/index`) that *queries* the entire dataset in order to *transmit* [as little of it as possible][incremental-sync]:

```mermaid
graph BT

index((index))
index --"always<br>(counts")--> /token
index --"always"--> /index

subgraph "tables (stored on disk)"
submissions
replies
sources
journalists
end
submissions --> index
replies --> index
sources --> index
journalists --> index

submissions --worst-case--> /data
replies --worst-case--> /data
sources --worst-case--> /data
journalists --worst-case--> /data

subgraph "endpoints (serialized in RAM)"
/token
/index
/data
end
```

In <https://github.com/freedomofpress/securedrop/pull/7809#pullrequestreview-4159940552>, I found that a database of about 4750 total records—

```sh-session
vagrant@app-prod:~$ sudo -u www-data sqlite3 /var/lib/securedrop/db.sqlite
sqlite> SELECT COUNT() from sources;
906
sqlite> SELECT COUNT() from submissions;
2564
sqlite> SELECT COUNT() from replies;
1282
```

—produced an index of about 0.5 MB.  All the data *going into* that index, if resident in memory, left the server with very little headroom by default; with much more with #7809's `malloc_trim()` intervention; and with presumably more still if we were to run Apache and therefore Python under jemalloc.


## Open questions and possible next steps

1. I think we should patch a production server to run with jemalloc, use it with the Inbox for a month, and see what it looks like under load.  Unless it turns out to be somehow worse than `malloc_trim()` or introduces other problems (e.g., on a grsecurity kernel), it seems like it should be preferable for our stack and workload.

This will let us stop worrying about "Can a server survive long-lived Apache processes holding claims to the entire dataset resident in memory?" and focus on "How can we most efficiently interact with the dataset in memory?"

2. We don't actually know the relationship of either (a) dataset size (on disk) or (b) response size (on the wire) with (c) resident size (in memory).  We could measure this more precisely than I was able to do in <https://github.com/freedomofpress/securedrop/pull/7809#pullrequestreview-4159940552>, to establish some kind of predictive estimate: e.g., you need $x$ GB of RAM to comfortably run a SecureDrop server with $y$ total records.

Even if we want to support arbitrarily large servers, however, I would sooner spend engineering effort on these alternatives, in order of preference:

3. Most requests to the Journalist API are reads.  Most requests to `/index` will return a zero-length HTTP `304` response; and after a given client's [initial sync][initial-sync] most of its requests to `/data` [will be very small][incremental-sync].  We could adopt a trivial caching pattern, e.g.:
    1. Compute the index on start-up.
    2. Recompute the index on every write via the v2 Journalist API, so that journalists' Inbox sessions propagate their changes quickly.
    3. Recompute the index on some interval (even every 60 seconds!), so that changes from the Source and Journalist Interfaces propagate quickly enough.

    In addition:
    - *This* process could be sharded, so that the server only loads each shard into memory at a time, without requiring any changes in the Inbox.
    - Since the index is global, we could "cache" it by writing it to disk, where it could be served via `send_file()` without loading it into memory at all.

4. The server has supported [sharding] the inbox since #7770.  We just need to teach the Inbox to take advantage of it.


[incremental-sync]: https://github.com/freedomofpress/securedrop/blob/develop/API2.md#incremental-synchronization
[initial-sync]: https://github.com/freedomofpress/securedrop/blob/a4da8677f0290590c89fa104d80a440c50f84960/API2.md#initial-synchronization
[send_file]: https://github.com/freedomofpress/securedrop/blob/a4da8677f0290590c89fa104d80a440c50f84960/securedrop/journalist_app/utils.py#L568
[sharding]: https://github.com/freedomofpress/securedrop/blob/a4da8677f0290590c89fa104d80a440c50f84960/API2.md#sharding-metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APIv2: options for memory-bound scaling #7817

Background

v1 Journalist API

v2 Journalist API

Open questions and possible next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

APIv2: options for memory-bound scaling #7817

Description

Background

v1 Journalist API

v2 Journalist API

Open questions and possible next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions