Skip to content

docs: ADR for an upload API targeted towards the browser #1554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions docs/adrs/00004-ui-upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# 00004. Upload API for UI

Date: 2025-04-11

## Status

ACCEPTED

## Context

When uploading a document, the request only returns once the document has been fully processed. This may take
several minutes.

Using HTTP, there is not always a clear indication if the request is still being processed, or if the connection is
stuck. Having ingress stacks, like OCP, AWS, etc., it is easy to run into "request timeouts" due to not having
reliable feedback on the HTTP channel.

To improve the situation for the UI, the idea is to create a stateful upload API. Allowing the requester to
initiate an upload, drop off the file on the backend side, and then offer some way to check for progres and outcome.

While this may work with any command line tool or custom client too, the intention is to design this API for the
UI (console) use case.

## Caveats

* There is some kind of state attached to this flow. It must be ensured that the client must not be aware of which
backend instance to contact. So either it's not relevant due to some shared state. Or we implement something that
allows it ending up in the right instance.
* We should think about security upfront. Only the requestor of an upload should be able to fetch information about the
progres.

## Proposal

* We store the upload progress in a table
* We add an API allowing to query that table
* The backend process monitoring the upload needs to perform periodic updates to that table to keep the entry "fresh"
* Stale entries will periodically be cleaned up

### Database

The state table looks like this:

| Column | Type | Description |
|---------|----------------------------------------|---------------------------------|
| id | UUID | Unique ID |
| updated | timestamp | Last update timestamp |
| state | enum { processing, failed, succeeded } | The state of the upload process |
| result | JSON | result response |

### REST API

* `GET /api/v2/upload/{id}`: Get information about the upload

Response (`200 OK`):

```json5
{
"id": "opaque-unique-id",
"state": "processing", // or failed, succeeded
"updated": "2025-05-07T10:13:27Z", // always UTC,
"result": {} // or absent for `processing`, `failed`
}
```

* `DELETE /api/v2/upload/{id}`: Delete the state record, will not receive further updates

Response (`204 No Content`): Sent if found or if not found.

* `POST /api/v2/upload`: Start an upload
Request:
* `format`: Format of the document, defaults to "auto-detect". Can also be `sbom` or `advisory`.

Response (`202 Accepted`):

```json5
{
"id": "opaque-unique-id",
"format": "concrete-format" // e.g. "spdx"
}
```


Comment on lines +50 to +82
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the flow will be:

  • POST /api/v2/upload . Generates response (202 - Accepted):
{
     "id": "opaque-unique-id",
}
  • Then the client need to watch continuously the upload using. GET /api/v2/upload/{id} where id is the id generated in the previous step. The response will be:

    {
      "id": "opaque-unique-id",
      "state": "processing", // or failed, succeeded
    }
  • Finally, once the client wants to stop monitoring the upload the endpoint DELETE /api/v2/upload/{id} should be called.

I think that should work and cover all issues reported by QE.

On a side note

A crazy idea came to me while reading this ADR:

Would it be crazy to have an endpoint GET /api/v2/upload that list all uploads (with pagination in place)?

Given the fact that we have the endpoint DELET /api/v2/upload/{id} I guess the client is in charge of deleting Uploads. Then having a list of all Existing uploads would help to know which are the uploads that are pending to be cleared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the idea.

The downside with the enumeration endpoint is, that we'd need to somehow tie in authorization. Right now, we lack proper stuff anyway. The question is: why do we need it? Clearing up is a responsibility of the backend. I don't want to make the API more complex than we really need. If we do, ok. But let's wait for this use case.

### Example flow: success

* Client initiates an upload on the specialized upload API (`POST /api/v2/upload`)
* The client stores the file in the storage
* The backend adds an entry in the state table, using the digest returned by the storage as `id`
* The backend spawns a task, periodically updating the `updated` timestamp
* The backend returns the `id` and keeps processing the upload
* The client periodically checks the state using the returned `id` (`GET /api/v2/upload/{id}`)
* The client can delete the state entry if it's no longer interested. Future updates will be discarded.
* When the backend finished processing the upload
* It sets the final `state` (`failed` or `succeeded`) and the `result`
* It stops updating the `updated` column
* The backend cleans up (deletes) all entries with a "stale" `updated` timestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clean up should be done by the client and not by the server. E.g.

  • Client uploads file
  • Client keeps monitoring /api/v2/upload/{id} every 5 seconds.
  • Once the client gets a valid response at /api/v2/upload/{id} then the client stop requesting /api/v2/upload/{id} and deletes it

If the server decides to delete the upload state then the client might keep trying to fetch /api/v2/upload/{id} and all of a sudden get a 404 because the server deleted it without the client knowing about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the client should be in charge of cleaning it up. There's a bunch of cases where the client won't be able to. So we'd either have a growing table of stale entries. Or we need to implement it anyway.


## Considerations

### Multiple backend instances

* Any backend can answer questions about the upload state, as the state is stored in the database
* All backends can clean up the upload state table, it is not important which instance does this

### Security

* The uploader will receive an ID to the update state, which is based on the file's content. Therefore, it can be
assumed that the sender knows the content of the file and can know about the state of the upload too.
* The state will only be available during the time of the upload plus the timeout period for the entry

### Performance

* As the table only holds states for active uploads, the number of entries should be small. Queries happen by "primary
id" and should be therefore fast.
* The upload process stores the file first. So it's not necessary to keep an additional copy in memory

### Format detection

In the process of this, we could also try to do some "format detection", allowing to use the same endpoint for
uploading any kind of document. However, I would see this as a stretch goal.

## Alternatives

* Keep the current API and deal with this on the HTTP, Ingress, Load Balancer side

👎 Doesn't really solve the problem

* Find a way to not store the state in the database. One way to achieve this could be by using websockets as upload
channel.

👎 The downside of this is that it might be quite complex, and doesn't seem like a very common way of uploading things
from the browser.

* Use the existing upload APIs and trigger this behavior with a flag.

👎 The downside of this is that the response of the request varies based on the flag. Making the whole request more
complex.

## Consequences

* Add a new upload state table
* Create REST API endpoints for initiating an upload and checking the state