feat(storage): backend for SQLite on job storage by LeoBorai · Pull Request #125 · LeoBorai/mate

LeoBorai · 2026-02-24T22:26:07Z

Provides support to store jobs into SQLite so they persist after shutting down/crashing
the mate the server.

Copilot

Pull request overview

Adds a SQLite-backed storage layer to persist jobs across process restarts/crashes, replacing the prior in-memory job map in the storage process.

Changes:

Introduces a pluggable Backend trait with a SqliteBackend implementation plus SQL migrations.
Updates the storage process initialization to accept a home directory and create/use storage.sqlite.
Adds SQLx offline workflow support (SQLX_OFFLINE, .sqlx metadata, sqlx-prepare) and documents local dev prerequisites.

Reviewed changes

Copilot reviewed 13 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/storage/src/lib.rs`	Switches storage from in-memory state to a backend abstraction and wires IPC handlers to backend calls.
`src/storage/src/backend.rs`	Adds the `Backend` trait and exports the sqlite backend module.
`src/storage/src/backend/sqlite.rs`	Implements SQLite persistence using SQLx (migrations, CRUD-ish operations, claiming).
`src/storage/src/backend/sqlite/migrations/001_create_table_jobs.sql`	Creates the `jobs` table and supporting indexes.
`src/storage/Cargo.toml`	Adds dependencies needed for SQLx/SQLite backend (sqlx, serde_json, async-trait, etc.).
`src/mate/src/proto/job.rs`	Centralizes job status/result strings into constants and uses them in `Display`/`FromStr`.
`src/cli/src/process/storage.rs`	Makes `StorageProcess::new` async and passes the home directory through.
`src/cli/src/cli/cmd/component/storage.rs`	Computes the storage home path and passes it into the storage process.
`README.md`	Documents dev prerequisites (Rust toolchain, SQLx CLI).
`Justfile`	Adds `sqlx-prepare` helper and a `dkc` helper task.
`Cargo.toml`	Adds `sqlx` to workspace dependencies.
`Cargo.lock`	Locks new transitive dependencies from SQLx and sqlite support.
`.sqlx/query-ad36629d0f34f5d236818a2ad2a236cf5dc9c1c33cec41a5325cd954000f3e1a.json`	Adds SQLx offline metadata for the insert query.
`.gitignore`	Ignores `.env`.
`.env.example`	Provides a sample `DATABASE_URL` for SQLx tooling.
`.cargo/config.toml`	Enables SQLx offline mode for builds.

Files not reviewed (1)

.sqlx/query-ad36629d0f34f5d236818a2ad2a236cf5dc9c1c33cec41a5325cd954000f3e1a.json: Language not supported

Comments suppressed due to low confidence (7)

src/storage/src/lib.rs:94

ClaimJobs receives a (start, end) window from the scheduler, but storage discards start and uses SystemTime::now() instead. This can lead to inconsistent behavior if the scheduler and storage clocks differ or if the scheduler ever sends a non-"now" start. Consider passing the provided start through to backend.claim_jobs(...).

            MessagePayload::ClaimJobs((_, end)) => {
                match self
                    .backend
                    .claim_jobs(MAX_JOBS_PER_BATCH, SystemTime::now(), end)
                    .await

src/storage/src/backend/sqlite.rs:174

update_job_completed ignores the case where the job id doesn't exist: execute(...) returning 0 affected rows still results in Ok(()). Consider checking rows_affected() and returning an error when no row was updated so callers get a reliable signal (this also matches the previous behavior which surfaced "job not found").

        sqlx::query("UPDATE jobs SET status = ?, result = ?, completed_at = ? WHERE id = ?")
            .bind(status)
            .bind(result_json)
            .bind(completed_at)
            .bind(id)
            .execute(&self.pool)
            .await?;

        Ok(())

src/storage/src/backend/sqlite.rs:124

create_job only inserts a subset of Job fields, so persisted rows rely on DB defaults for errors, attempts, and max_attempts. This drops any non-default max_attempts set on the Job before storing (and would also ignore existing errors/attempts if those ever need to be persisted). Consider inserting these fields explicitly (and started_at/completed_at/result when present) to ensure round-tripping is lossless.

    async fn create_job(&self, job: Job) -> Result<Job> {
        let id = job.id.to_string();
        let args = serde_json::to_string(&job.args)?;
        let task = job.task.to_string();
        let status = job.status.to_string();
        let scheduled_at = into_unix_timestamp(job.scheduled_at)?;
        let record = sqlx::query_as!(
            JobRecord,
            r#"
            INSERT INTO jobs (
                id,
                name,
                args,
                status,
                scheduled_at,
                task,
                started_at,
                completed_at
            ) VALUES (
                $1,
                $2,
                $3,
                $4,
                $5,
                $6,
                $7,
                $8
            ) RETURNING *"#,
            id,
            job.name,
            args,
            status,
            scheduled_at,
            task,
            Option::<i64>::None,
            Option::<i64>::None,
        )

src/storage/src/lib.rs:29

Storage::new builds the sqlite path via home.to_str().unwrap(), which can panic on non-UTF8 paths, and it doesn't ensure the parent directory exists (e.g. ~/.mate). This can make first-run startup fail. Consider creating the directory with std::fs::create_dir_all(&home_parent) and passing a &Path/PathBuf into SqliteBackend::new instead of unwrapping a UTF-8 string.

    pub async fn new(transport: Box<dyn Transport>, home: PathBuf) -> Result<Self> {
        let ipc = Arc::new(IpcServer::new(IPC_SENDER_STORAGE, transport));
        let home = home.join("storage.sqlite");
        let backend = Arc::new(SqliteBackend::new(home.to_str().unwrap()).await?);

src/storage/src/lib.rs:98

On backend errors, QueryJobs and ClaimJobs currently return JobsResult(vec![]), which makes storage/database failures indistinguishable from "no jobs" and can cause the scheduler/API to silently stop seeing jobs. At minimum, log the error; ideally propagate an error payload (or add a dedicated error response) so callers can surface/alert on storage failures.

            MessagePayload::QueryJobs(query) => match self.backend.retrieve_jobs(query).await {
                Ok(jobs) => Some(MessagePayload::JobsResult(jobs)),
                Err(_err) => Some(MessagePayload::JobsResult(vec![])),
            },
            MessagePayload::ClaimJobs((_, end)) => {
                match self
                    .backend
                    .claim_jobs(MAX_JOBS_PER_BATCH, SystemTime::now(), end)
                    .await
                {
                    Ok(jobs) => Some(MessagePayload::JobsResult(jobs)),
                    Err(_err) => Some(MessagePayload::JobsResult(vec![])),
                }

src/cli/src/cli/cmd/component/storage.rs:25

home_dir().unwrap() can panic (and std::env::home_dir is deprecated). Since this path is used to place the sqlite DB, consider using a non-deprecated home resolution (or reading from config) and returning a clear error when the home directory can't be determined. Also ensure the ~/.mate directory is created before starting storage so sqlite file creation doesn't fail due to a missing parent directory.

        let config = Config::from_file(&self.config)?;
        let mut home = home_dir().unwrap();
        home.push(".mate");
        let transport = make_transport(config.clone(), ProcessType::Storage).await?;
        let mut storage = StorageProcess::new(transport, home).await?;

src/storage/src/backend/sqlite.rs:220

into_system_time casts timestamp: i64 to u64 before building a Duration. If a negative timestamp ever gets into the DB, this will wrap to a huge u64 and can produce a far-future SystemTime instead of erroring. Consider validating timestamp >= 0 before casting, and returning an error for negative values.

fn into_system_time(timestamp: i64) -> Result<SystemTime> {
    SystemTime::UNIX_EPOCH
        .checked_add(Duration::from_secs(timestamp as u64))
        .context("Invalid timestamp")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/storage/src/backend/sqlite.rs

Copilot

Pull request overview

Copilot reviewed 13 out of 16 changed files in this pull request and generated 4 comments.

Files not reviewed (1)

.sqlx/query-ad36629d0f34f5d236818a2ad2a236cf5dc9c1c33cec41a5325cd954000f3e1a.json: Language not supported

Comments suppressed due to low confidence (5)

src/storage/src/lib.rs:29

Storage::new builds a filesystem path and then calls home.to_str().unwrap(). This can panic on non-UTF8 paths, and it also doesn’t ensure the parent directory exists (SQLite will fail to create the DB file if ~/.mate/ hasn’t been created). Prefer passing a Path/PathBuf through to SqliteConnectOptions::filename and create_dir_all(home.parent()) before opening the DB.

    pub async fn new(transport: Box<dyn Transport>, home: PathBuf) -> Result<Self> {
        let ipc = Arc::new(IpcServer::new(IPC_SENDER_STORAGE, transport));
        let home = home.join("storage.sqlite");
        let backend = Arc::new(SqliteBackend::new(home.to_str().unwrap()).await?);

src/storage/src/lib.rs:94

ClaimJobs discards the start timestamp provided by the scheduler (ClaimJobs((_, end))) and substitutes SystemTime::now(). This can shrink/shift the intended claim window and potentially miss jobs near the boundary. Use the (start, end) values from the message when calling backend.claim_jobs.

            MessagePayload::ClaimJobs((_, end)) => {
                match self
                    .backend
                    .claim_jobs(MAX_JOBS_PER_BATCH, SystemTime::now(), end)
                    .await

src/storage/src/lib.rs:103

handle_message no longer handles several MessagePayload variants that other components send to Storage (e.g. UpdateJobStatus from the scheduler, and JobStarted/JobFailed from executors). As a result, job state transitions won’t be persisted/updated. Add match arms for these variants and implement the corresponding backend updates.

            MessagePayload::Ping => Some(MessagePayload::Pong),
            MessagePayload::Shutdown => Some(MessagePayload::ShutdownAck),
            _ => None,
        }

src/storage/src/lib.rs:89

On backend errors, QueryJobs returns an empty list. This will make API callers think there are simply no jobs instead of surfacing a storage failure. At minimum, log the error; ideally introduce an error-carrying response variant so callers can distinguish failure from an empty result set.

            MessagePayload::QueryJobs(query) => match self.backend.retrieve_jobs(query).await {
                Ok(jobs) => Some(MessagePayload::JobsResult(jobs)),
                Err(_err) => Some(MessagePayload::JobsResult(vec![])),
            },

src/storage/src/lib.rs:98

In the ClaimJobs handler, backend errors are swallowed by returning an empty JobsResult. This can silently stall scheduling while hiding the underlying storage issue. Please at least log the error (and ideally propagate it in a way the scheduler can detect).

                    Ok(jobs) => Some(MessagePayload::JobsResult(jobs)),
                    Err(_err) => Some(MessagePayload::JobsResult(vec![])),
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/storage/Cargo.toml

src/storage/src/backend/sqlite.rs

src/cli/src/cli/cmd/component/storage.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 16 changed files in this pull request and generated 2 comments.

Files not reviewed (1)

.sqlx/query-111a167ea1da690f9b2fdb8112d0fecf80a6ff4d202adc5772bf195ea7ad3a05.json: Language not supported

Comments suppressed due to low confidence (5)

src/cli/src/cli/cmd/component/storage.rs:25

home is set to ~/.mate, but the directory is not created before SQLite is opened. SqliteConnectOptions::create_if_missing(true) creates the DB file, but won’t create missing parent directories, so storage can fail to start on a fresh machine. Consider creating the directory (e.g., create_dir_all) before launching StorageProcess.

        let config = Config::from_file(&self.config)?;
        let mut home = home_dir().context("Failed to get home directory")?;
        home.push(".mate");
        let transport = make_transport(config.clone(), ProcessType::Storage).await?;
        let mut storage = StorageProcess::new(transport, home).await?;

src/storage/src/lib.rs:29

home.to_str().unwrap() can panic for non-UTF8 paths, and SqliteConnectOptions::filename(...) accepts a Path-like value. Prefer passing the PathBuf/&Path directly (or returning a proper error) instead of unwrapping.

        let home = home.join("storage.sqlite");
        let backend = Arc::new(SqliteBackend::new(home.to_str().unwrap()).await?);

src/storage/src/lib.rs:89

On backend errors, QueryJobs currently returns an empty list, which makes API callers interpret “storage failed” as “no jobs exist”. Consider logging the error and propagating it to callers (e.g., change JobsResult to carry Result<Vec<Job>, String> or add a dedicated error payload).

            MessagePayload::QueryJobs(query) => match self.backend.retrieve_jobs(query).await {
                Ok(jobs) => Some(MessagePayload::JobsResult(jobs)),
                Err(_err) => Some(MessagePayload::JobsResult(vec![])),
            },

src/storage/src/lib.rs:98

Similarly, returning JobsResult(vec![]) on claim_jobs failure hides the fact that the scheduler can’t load work and may leave the system idle without any visible error. At minimum, log the error; ideally propagate an error to the scheduler so it can surface/handle it explicitly.

            MessagePayload::ClaimJobs((_, end)) => {
                match self
                    .backend
                    .claim_jobs(MAX_JOBS_PER_BATCH, SystemTime::now(), end)
                    .await
                {
                    Ok(jobs) => Some(MessagePayload::JobsResult(jobs)),
                    Err(_err) => Some(MessagePayload::JobsResult(vec![])),
                }

src/storage/src/backend/sqlite.rs:203

update_job_completed doesn’t verify that a row was actually updated (e.g., when the job ID doesn’t exist). This can silently acknowledge success while losing the completion update. Consider checking rows_affected() and returning an error when it’s 0.

            JobResult::Success(_) => {
                sqlx::query(
                    r#"UPDATE jobs
                        SET
                            status = 'completed',
                            result = ?,
                            completed_at = ?,
                            attempts = attempts + 1
                        WHERE id = ?"#,
                )
                .bind(result_json)
                .bind(completed_at)
                .bind(id)
                .execute(&self.pool)
                .await?;
            }
            JobResult::Failure(error) => {
                sqlx::query(
                    r#"UPDATE jobs
                        SET
                            status = 'failed',
                            result = ?,
                            completed_at = ?,
                            attempts = attempts + 1,
                            errors = json_insert(errors, '$[#]', ?)
                        WHERE id = ?"#,
                )
                .bind(result_json)
                .bind(completed_at)
                .bind(error)
                .bind(id)
                .execute(&self.pool)
                .await?;
            }
        }

        Ok(())

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/storage/src/backend/sqlite.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LeoBorai requested a review from Copilot March 1, 2026 16:23

LeoBorai marked this pull request as ready for review March 1, 2026 16:23

Copilot started reviewing on behalf of LeoBorai March 1, 2026 16:23 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

src/storage/src/backend/sqlite.rs Outdated Show resolved Hide resolved

src/storage/src/backend/sqlite.rs Show resolved Hide resolved

LeoBorai added 10 commits March 1, 2026 17:32

chore: sqlite for local storage

5e28a6c

chore: setup sqlite as backend for storage

3a755a3

feat: capability to store new jobs into sqlite

bfafd34

chore: capability to list jobs

626c9aa

chore: update job results

45812ab

feat: capability to update job result

d8979af

fix: clippy

b59bf4d

chore: docs

64292a0

fix: include failed not depleted

b4cbaea

chore: compact sql

0d2a717

LeoBorai force-pushed the 110-job-persistance branch from b3f4ea0 to 0d2a717 Compare March 1, 2026 16:44

LeoBorai requested a review from Copilot March 1, 2026 16:45

Copilot started reviewing on behalf of LeoBorai March 1, 2026 16:45 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

src/storage/Cargo.toml Outdated Show resolved Hide resolved

src/storage/src/backend/sqlite.rs Show resolved Hide resolved

src/storage/src/backend/sqlite.rs Outdated Show resolved Hide resolved

src/cli/src/cli/cmd/component/storage.rs Show resolved Hide resolved

LeoBorai and others added 5 commits March 1, 2026 18:01

Update src/storage/Cargo.toml

749f5a0

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: max_attempts should not have a default in sql

6fd7eb6

fix: update status when claim

c9259bc

fix: unwrap

05c6632

fix: ensure errors are appended to the job log

a4958d3

LeoBorai requested a review from Copilot March 1, 2026 17:15

Copilot started reviewing on behalf of LeoBorai March 1, 2026 17:15 View session

LeoBorai changed the title ~~chore: sqlite for local storage~~ feat(storage): backend for SQLite on job storage Mar 1, 2026

Copilot AI reviewed Mar 1, 2026

View reviewed changes

src/storage/src/backend/sqlite.rs Show resolved Hide resolved

src/storage/src/backend/sqlite.rs Outdated Show resolved Hide resolved

LeoBorai and others added 2 commits March 1, 2026 18:27

Update src/storage/src/backend/sqlite.rs

6d7f04a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: extend the status

618a930

LeoBorai merged commit ffab08c into main Mar 1, 2026
8 checks passed

LeoBorai linked an issue Mar 2, 2026 that may be closed by this pull request

Write Jobs to Disk by Default #110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): backend for SQLite on job storage#125

feat(storage): backend for SQLite on job storage#125
LeoBorai merged 17 commits intomainfrom
110-job-persistance

LeoBorai commented Feb 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LeoBorai commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LeoBorai commented Feb 24, 2026 •

edited

Loading