Skip to content

APIv2: options for memory-bound scaling #7817

@cfm

Description

@cfm

Following #7809, this ticket is intended to summarize what we currently know about memory-bound scaling of the SecureDrop server, what we don't know, and some possible next steps. With the exception of my first recommendation (1) below, this is mostly a menu of options for future consideration.

Background

The SecureDrop Journalist App serves submitted files, submitted messages, and replies as files, via Flask's streaming send_file(). So this data is disk-bound for persistence, not memory-bound at all.

The Journalist APIs are memory-bound in how they return metadata about (in order of decreasing quantity):

  1. submissions + replies
  2. sources
  3. journalists

That is, the APIs load one or more of these datasets entirely from SQLite into memory in order to serialize them into JSON. But they do so differently, with different bounds.

v1 Journalist API

The v1 API has individual bulk endpoints for these datasets:

graph BT

subgraph "tables (stored on disk)"
submissions
replies
sources
journalists
end
submissions --> /submissions
replies --> /replies
sources --> /sources
journalists --> /users

subgraph "endpoints (serialized in RAM)"
/submissions
/replies
/sources

/users
end
Loading

v2 Journalist API

The v2 API consolidates endpoints and offers a polling interface (/index) that queries the entire dataset in order to transmit as little of it as possible:

graph BT

index((index))
index --"always<br>(counts")--> /token
index --"always"--> /index

subgraph "tables (stored on disk)"
submissions
replies
sources
journalists
end
submissions --> index
replies --> index
sources --> index
journalists --> index

submissions --worst-case--> /data
replies --worst-case--> /data
sources --worst-case--> /data
journalists --worst-case--> /data

subgraph "endpoints (serialized in RAM)"
/token
/index
/data
end
Loading

In #7809 (review), I found that a database of about 4750 total records—

vagrant@app-prod:~$ sudo -u www-data sqlite3 /var/lib/securedrop/db.sqlite
sqlite> SELECT COUNT() from sources;
906
sqlite> SELECT COUNT() from submissions;
2564
sqlite> SELECT COUNT() from replies;
1282

—produced an index of about 0.5 MB. All the data going into that index, if resident in memory, left the server with very little headroom by default; with much more with #7809's malloc_trim() intervention; and with presumably more still if we were to run Apache and therefore Python under jemalloc.

Open questions and possible next steps

  1. I think we should patch a production server to run with jemalloc, use it with the Inbox for a month, and see what it looks like under load. Unless it turns out to be somehow worse than malloc_trim() or introduces other problems (e.g., on a grsecurity kernel), it seems like it should be preferable for our stack and workload.

This will let us stop worrying about "Can a server survive long-lived Apache processes holding claims to the entire dataset resident in memory?" and focus on "How can we most efficiently interact with the dataset in memory?"

  1. We don't actually know the relationship of either (a) dataset size (on disk) or (b) response size (on the wire) with (c) resident size (in memory). We could measure this more precisely than I was able to do in [2.15.1] Forcibly free memory after heavy APIv2 requests #7809 (review), to establish some kind of predictive estimate: e.g., you need $x$ GB of RAM to comfortably run a SecureDrop server with $y$ total records.

Even if we want to support arbitrarily large servers, however, I would sooner spend engineering effort on these alternatives, in order of preference:

  1. Most requests to the Journalist API are reads. Most requests to /index will return a zero-length HTTP 304 response; and after a given client's initial sync most of its requests to /data will be very small. We could adopt a trivial caching pattern, e.g.:

    1. Compute the index on start-up.
    2. Recompute the index on every write via the v2 Journalist API, so that journalists' Inbox sessions propagate their changes quickly.
    3. Recompute the index on some interval (even every 60 seconds!), so that changes from the Source and Journalist Interfaces propagate quickly enough.

    In addition:

    • This process could be sharded, so that the server only loads each shard into memory at a time, without requiring any changes in the Inbox.
    • Since the index is global, we could "cache" it by writing it to disk, where it could be served via send_file() without loading it into memory at all.
  2. The server has supported sharding the inbox since feat(index): let client request arbitrary metadata shards, hinted at login #7770. We just need to teach the Inbox to take advantage of it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Epic.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions