Skip to content

docs: ADR for an upload API targeted towards the browser #1554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 21, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions docs/adrs/00005-ui-upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# 00005. Upload API for UI

Date: 2025-04-11

## Status

ACCEPTED

## Context

When uploading a document, the request only returns once the document has been fully processed. This may take
several minutes.

Using HTTP, there is not always a clear indication if the request is still being processed, or if the connection is
stuck. Having ingress stacks, like OCP, AWS, etc., it is easy to run into "request timeouts" due to not having
reliable feedback on the HTTP channel.

To improve the situation for the UI, the idea is to create a stateful upload API. Allowing the requester to
initiate an upload, drop off the file on the backend side, and then offer some way to check for progres and outcome.

While this may work with any command line tool or custom client too, the intention is to design this API for the
UI (console) use case.

## Caveats

* There is some kind of state attached to this flow. It must be ensured that the client must not be aware of which
backend instance to contact. So either it's not relevant due to some shared state. Or we implement something that
allows it ending up in the right instance.
* We should think about security upfront. Only the requestor of an upload should be able to fetch information about the
progres.

## Proposal

* We store the upload progress in a table
* We add an API allowing to query that table
* The backend process monitoring the upload needs to perform periodic updates to that table to keep the entry "fresh"
* Stale entries will periodically be cleaned up

### Database

The state table looks like this:

| Column | Type | Description |
|---------|----------------------------------------|---------------------------------|
| id | UUID | Unique ID |
| updated | timestamp | Last update timestamp |
| state | enum { processing, failed, succeeded } | The state of the upload process |
| result | JSON | result response |

### REST API

* `GET /api/v2/upload/{id}`: Get information about the upload

Response (`200 OK`):

```json5
{
"id": "opaque-unique-id",
"state": "processing", // or failed, succeeded
"updated": "2025-05-07T10:13:27Z", // always UTC,
"result": {} // or absent for `processing`, `failed`
}
```

* `DELETE /api/v2/upload/{id}`: Delete the state record, will not receive further updates

Response (`204 No Content`): Sent if found or if not found.

* `POST /api/v2/upload`: Start an upload
Request:
* `format`: Format of the document, defaults to "auto-detect". Can also be `sbom` or `advisory`.
* `watch` (default `false`): If present and `true`, the state tracking will be active. Otherwise, it will fall back to a
synchronous upload, like the existing upload API. But unified for SBOM and advisory.

Response (`202 Accepted`, `watch=true`):

```json5
{
"id": "opaque-unique-id",
"format": "concrete-format" // e.g. "spdx"
}
```

Response (`201 Created`, `watch=false`):

```json5
{
"id": "document-id",
"format": "evaluated format", // or provided
"result": {}, // same as the state would have
}
```

### Example flow: success

* Client initiates an upload on the specialized upload API (`POST /api/v2/upload?watch=true`)
* The client stores the file in the storage
* The backend adds an entry in the state table, using the digest returned by the storage as `id`
* The backend spawns a task, periodically updating the `updated` timestamp
* The backend returns the `id` and keeps processing the upload
* The client periodically checks the state using the returned `id` (`GET /api/v2/upload/{id}`)
* The client can delete the state entry if it's no longer interested. Future updates will be discarded.
* When the backend finished processing the upload
* It sets the final `state` (`failed` or `succeeded`) and the `result`
* It stops updating the `updated` column
* The backend cleans up (deletes) all entries with a "stale" `updated` timestamp

## Considerations

### Multiple backend instances

* Any backend can answer questions about the upload state, as the state is stored in the database
* All backends can clean up the upload state table, it is not important which instance does this

### Security

* The uploader will receive an ID to the update state, which is based on the file's content. Therefore, it can be
assumed that the sender knows the content of the file and can know about the state of the upload too.
* The state will only be available during the time of the upload plus the timeout period for the entry

### Performance

* As the table only holds states for active uploads, the number of entries should be small. Queries happen by "primary
id" and should be therefore fast.
* The upload process stores the file first. So it's not necessary to keep an additional copy in memory

### Format detection

In the process of this, we could also try to do some "format detection", allowing to use the same endpoint for
uploading any kind of document. However, I would see this as a stretch goal.

## Alternatives

* Keep the current API and deal with this on the HTTP, Ingress, Load Balancer side

👎 Doesn't really solve the problem

* Find a way to not store the state in the database. One way to achieve this could be by using websockets as upload
channel.

👎 The downside of this is that it might be quite complex, and doesn't seem like a very common way of uploading things
from the browser.

* Use the existing upload APIs and trigger this behavior with a flag.

👎 The downside of this is that the response of the request varies based on the flag. Making the whole request more
complex.

## Consequences

* Add a new upload state table
* Create REST API endpoints for initiating an upload and checking the state