Persist sandbox information locally in orchestrator #376

tychoish · 2025-03-05T15:58:50Z

This sets up a SQLite database in the orchestrator to track the
information about sandboxes in the orchestrator rather than the API
server, in support of rolling deployments.

API server(s) can't be responsible for this information anymore
because they might restart (because of deploys), there might be more
than one of them at a time (for redundancy, high avalibility, or as a
side effect of deployment.), and we don't want to add a load-baring
system of record for information about the sandboxes that the API
servers can access. Letting orchestrators be the system of record
makes sense because they already have the information, and the
lifecylce of a sandbox is (at the moment) tied to the lifecycle of the
orchestrator.

There are many implementation possibilities, but I/we went with SQLite
because:

it's well understood and battletested. It's also fast for this
kind of workload and we can avoid writing filters/queries in Go, and
just use SQL.
can be used with the DB tooling we already have. (n.b. this is the
first time I've really used this ORM tool, and its pretty nice.)
writing data to disk means that we can be less worried about a large
number of short running sandboxes filling up memory, and we can be
less aggressive about removing data because there's (likely) plenty
of disk space. We can rely on SQLite's caching mechanism (rather
than Go or our own implementation) to keep or release data from
memory.
because (at the moment) orchestrators never restart or are
redeployed, we don't have to worry about schema or data migration:
realistically every time the orchestrator starts, the database will
be empty. In the future when we might be able to add
if the API servers' view of what's running on the orchestrator is no
longer strictly consistent (because there might be many of them,
they might restart, etc.) then we need to keep a record of not just
what is running but what has run recently so we can make sure to
bill correctly and so we can distinguish between "this sandbox
doesn't exist anywhere" and "this sandbox used to exist."
embedded in this implementation are version numbers for both
sandboxes (as they change) and a global version number for all the
data in the database/orchestrator. The idea here is that if we
increment these numbers correctly when modifying the data in SQLite,
we can provide an interface that the API servers can use to
efficently determine if their cache is out of date.
- because we store the global version number in the sandbox record
  you can get all of the sandboxes that have been modified or
  created since your last view.
- you can compare integers per-record or per orchestrator, to figure
  out if your data is stale rather than needing a more complex
  algorithim.
I did attempt to implement this using an in-memory cache rather than
SQLite, which I think would be possible, but our concurrent map is
sharded (to prevent lock contention for modification-heavy
worklods,) and getting the version numbers (plus the extra level of
shard-versioning,) makes things much more complicated from a code
perspective and it's my assessment that SQLite will scale better,
require less code to write, and be easier to develop code against today,
in addition to being something we'll want in the future.

This isn't quite done. Remaining work includes:

We/I need to rehome the APIs to use data from the database rather
than from the cache.
We/I probably need to cache a version of the sandbox structure (with
more information,) in the database. (possibly binary protobuf?) to
support the APIs
Testing, of course. At this point the PR doesn't change the behavior
because the old data storage/cache is still the system of record.
I would like more feedback on this implementation or the use of the
ORM system.

jakubno

I am really sorry, I didn't mentioned it, but we are moving away from entgo, could you please use sqlc.

Here is the setup I currently have.

version: "2"
sql:
  - engine: "postgresql"
    queries: "db/queries"
    schema: "db/migrations"
    gen:
      go:
        emit_pointers_for_null_types: true
        package: "database"
        out: "packages/shared/pkg/database"
        sql_package: "pgx/v5"
        overrides:
          - db_type: "uuid"
            go_type:
              import: "github.com/google/uuid"
              type: "UUID"
          - db_type: "uuid"
            nullable: true
            go_type:
              import: "github.com/google/uuid"
              type: "UUID"
              pointer: true

          - db_type: "pg_catalog.numeric"
            go_type: "github.com/shopspring/decimal.Decimal"
          - db_type: "pg_catalog.numeric"
            nullable: true
            go_type: "*github.com/shopspring/decimal.Decimal"

          - db_type: "timestamptz"
            go_type: "time.Time"
          - db_type: "timestamptz"
            go_type:
              import: "time"
              type: "Time"
              pointer: true
            nullable: true

…rrently-cached-by-the-api-server-about-e2b-1394

(cherry picked from commit 2bc8680)

(cherry picked from commit efefde4)

(cherry picked from commit e0bdcf8)

…rrently-cached-by-the-api-server-about-e2b-1394

dobrac

Looks good, one thing though, if you include already a way how to query the current state/data, you should be able to write (at least) integration tests (which are now merged) to verify the functionality/behavior

(here is a PR adding the orchestrator client to the integration tests: #403)

packages/orchestrator/internal/db/db.go

packages/orchestrator/internal/sandbox/checks.go

packages/orchestrator/internal/server/main.go

packages/orchestrator/internal/server/sandboxes.go

ValentaTomas · 2025-03-21T01:21:16Z

What is the blocker for merging this?

…rrently-cached-by-the-api-server-about-e2b-1394

packages/orchestrator/Makefile

jakubno

waiting for tests

Co-authored-by: Jakub Novák <[email protected]>

…rrently-cached-by-the-api-server-about-e2b-1394

feat: persist sandboxes locally in orchestrator

6d3fa6a

tychoish requested review from ValentaTomas and jakubno as code owners March 5, 2025 15:58

jakubno requested changes Mar 6, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into track-all-the-data-cu…

f554858

…rrently-cached-by-the-api-server-about-e2b-1394

jakubno added the improvement Improvement for current functionality label Mar 6, 2025

jakubno assigned tychoish and jakubno Mar 6, 2025

tychoish added 4 commits March 7, 2025 08:34

chore: switch to sqlc

07426af

(cherry picked from commit 2bc8680)

feat: track state machine for of orchestrator status

ac9cedc

(cherry picked from commit efefde4)

fix: remove unneeded imports

d17e5d4

(cherry picked from commit e0bdcf8)

Merge remote-tracking branch 'origin/main' into track-all-the-data-cu…

bf4e94d

…rrently-cached-by-the-api-server-about-e2b-1394

dobrac reviewed Mar 7, 2025

View reviewed changes

tychoish added 5 commits March 27, 2025 15:21

Merge remote-tracking branch 'origin/main' into track-all-the-data-cu…

d42f094

…rrently-cached-by-the-api-server-about-e2b-1394

fix compiles and add sandbox configs to orchestrator database

f554c70

code review feedback

0aade1e

Merge remote-tracking branch 'origin/main' into track-all-the-data-cu…

0e1ea88

…rrently-cached-by-the-api-server-about-e2b-1394

clean up checks

a7088de

tychoish changed the title ~~feat: persist sandbox information locally in orchestrator~~ Persist sandbox information locally in orchestrator Mar 27, 2025

jakubno reviewed Mar 28, 2025

View reviewed changes

packages/orchestrator/Makefile Outdated Show resolved Hide resolved

jakubno approved these changes Mar 28, 2025

View reviewed changes

jakubno requested changes Apr 2, 2025

View reviewed changes

tychoish and others added 5 commits April 2, 2025 16:32

Update packages/orchestrator/Makefile

49759a4

Co-authored-by: Jakub Novák <[email protected]>

Merge remote-tracking branch 'origin/main' into track-all-the-data-cu…

2bde94c

…rrently-cached-by-the-api-server-about-e2b-1394

feat: process should initialize the database schema

608adc6

orchestrator status report and test

3727a6c

Merge remote-tracking branch 'origin/main' into track-all-the-data-cu…

05d6f45

…rrently-cached-by-the-api-server-about-e2b-1394

ValentaTomas unassigned tychoish Apr 5, 2025

jakubno marked this pull request as draft May 6, 2025 11:31

jakubno closed this May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persist sandbox information locally in orchestrator #376

Persist sandbox information locally in orchestrator #376

Uh oh!

tychoish commented Mar 5, 2025

Uh oh!

jakubno left a comment

Uh oh!

dobrac left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValentaTomas commented Mar 21, 2025

Uh oh!

Uh oh!

jakubno left a comment

Uh oh!

Uh oh!

Persist sandbox information locally in orchestrator #376

Persist sandbox information locally in orchestrator #376

Uh oh!

Conversation

tychoish commented Mar 5, 2025

Uh oh!

jakubno left a comment

Choose a reason for hiding this comment

Uh oh!

dobrac left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValentaTomas commented Mar 21, 2025

Uh oh!

Uh oh!

jakubno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dobrac left a comment •

edited

Loading