-
Notifications
You must be signed in to change notification settings - Fork 24
docs: ADR for an upload API targeted towards the browser #1554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
# 00004. Upload API for UI | ||
|
||
Date: 2025-04-11 | ||
|
||
## Status | ||
|
||
ACCEPTED | ||
|
||
## Context | ||
|
||
When uploading a document, the request only returns once the document has been fully processed. This may take | ||
several minutes. | ||
|
||
Using HTTP, there is not always a clear indication if the request is still being processed, or if the connection is | ||
stuck. Having ingress stacks, like OCP, AWS, etc., it is easy to run into "request timeouts" due to not having | ||
reliable feedback on the HTTP channel. | ||
|
||
To improve the situation for the UI, the idea is to create a stateful upload API. Allowing the requester to | ||
initiate an upload, drop off the file on the backend side, and then offer some way to check for progres and outcome. | ||
|
||
While this may work with any command line tool or custom client too, the intention is to design this API for the | ||
UI (console) use case. | ||
|
||
## Caveats | ||
|
||
* There is some kind of state attached to this flow. It must be ensured that the client must not be aware of which | ||
backend instance to contact. So either it's not relevant due to some shared state. Or we implement something that | ||
allows it ending up in the right instance. | ||
* We should think about security upfront. Only the requestor of an upload should be able to fetch information about the | ||
progres. | ||
|
||
## Proposal | ||
|
||
* We store the upload progress in a table | ||
* We add an API allowing to query that table | ||
* The backend process monitoring the upload needs to perform periodic updates to that table to keep the entry "fresh" | ||
* Stale entries will periodically be cleaned up | ||
|
||
### Database | ||
|
||
The state table looks like this: | ||
|
||
| Column | Type | Description | | ||
|---------|----------------------------------------|---------------------------------| | ||
| id | UUID | Unique ID | | ||
| updated | timestamp | Last update timestamp | | ||
| state | enum { processing, failed, succeeded } | The state of the upload process | | ||
| result | JSON | result response | | ||
|
||
### REST API | ||
|
||
* `GET /api/v2/upload/{id}`: Get information about the upload | ||
|
||
Response (`200 OK`): | ||
|
||
```json5 | ||
{ | ||
"id": "opaque-unique-id", | ||
"state": "processing", // or failed, succeeded | ||
"updated": "2025-05-07T10:13:27Z", // always UTC, | ||
"result": {} // or absent for `processing`, `failed` | ||
} | ||
``` | ||
|
||
* `DELETE /api/v2/upload/{id}`: Delete the state record, will not receive further updates | ||
|
||
Response (`204 No Content`): Sent if found or if not found. | ||
|
||
* `POST /api/v2/upload`: Start an upload | ||
Request: | ||
* `format`: Format of the document, defaults to "auto-detect". Can also be `sbom` or `advisory`. | ||
|
||
Response (`202 Accepted`): | ||
|
||
```json5 | ||
{ | ||
"id": "opaque-unique-id", | ||
"format": "concrete-format" // e.g. "spdx" | ||
} | ||
``` | ||
|
||
|
||
### Example flow: success | ||
|
||
* Client initiates an upload on the specialized upload API (`POST /api/v2/upload`) | ||
* The client stores the file in the storage | ||
* The backend adds an entry in the state table, using the digest returned by the storage as `id` | ||
* The backend spawns a task, periodically updating the `updated` timestamp | ||
* The backend returns the `id` and keeps processing the upload | ||
* The client periodically checks the state using the returned `id` (`GET /api/v2/upload/{id}`) | ||
* The client can delete the state entry if it's no longer interested. Future updates will be discarded. | ||
* When the backend finished processing the upload | ||
* It sets the final `state` (`failed` or `succeeded`) and the `result` | ||
* It stops updating the `updated` column | ||
* The backend cleans up (deletes) all entries with a "stale" `updated` timestamp | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the clean up should be done by the client and not by the server. E.g.
If the server decides to delete the upload state then the client might keep trying to fetch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the client should be in charge of cleaning it up. There's a bunch of cases where the client won't be able to. So we'd either have a growing table of stale entries. Or we need to implement it anyway. |
||
|
||
## Considerations | ||
|
||
### Multiple backend instances | ||
|
||
* Any backend can answer questions about the upload state, as the state is stored in the database | ||
* All backends can clean up the upload state table, it is not important which instance does this | ||
|
||
### Security | ||
|
||
* The uploader will receive an ID to the update state, which is based on the file's content. Therefore, it can be | ||
assumed that the sender knows the content of the file and can know about the state of the upload too. | ||
* The state will only be available during the time of the upload plus the timeout period for the entry | ||
|
||
### Performance | ||
|
||
* As the table only holds states for active uploads, the number of entries should be small. Queries happen by "primary | ||
id" and should be therefore fast. | ||
* The upload process stores the file first. So it's not necessary to keep an additional copy in memory | ||
|
||
### Format detection | ||
|
||
In the process of this, we could also try to do some "format detection", allowing to use the same endpoint for | ||
uploading any kind of document. However, I would see this as a stretch goal. | ||
|
||
## Alternatives | ||
|
||
* Keep the current API and deal with this on the HTTP, Ingress, Load Balancer side | ||
|
||
👎 Doesn't really solve the problem | ||
|
||
* Find a way to not store the state in the database. One way to achieve this could be by using websockets as upload | ||
channel. | ||
|
||
👎 The downside of this is that it might be quite complex, and doesn't seem like a very common way of uploading things | ||
from the browser. | ||
|
||
* Use the existing upload APIs and trigger this behavior with a flag. | ||
|
||
👎 The downside of this is that the response of the request varies based on the flag. Making the whole request more | ||
complex. | ||
|
||
## Consequences | ||
|
||
* Add a new upload state table | ||
* Create REST API endpoints for initiating an upload and checking the state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the flow will be:
POST /api/v2/upload
. Generates response (202 - Accepted):Then the client need to watch continuously the upload using.
GET /api/v2/upload/{id}
whereid
is theid
generated in the previous step. The response will be:Finally, once the client wants to stop monitoring the upload the endpoint
DELETE /api/v2/upload/{id}
should be called.I think that should work and cover all issues reported by QE.
On a side note
A crazy idea came to me while reading this ADR:
Would it be crazy to have an endpoint
GET /api/v2/upload
that list all uploads (with pagination in place)?Given the fact that we have the endpoint
DELET /api/v2/upload/{id}
I guess the client is in charge of deleting Uploads. Then having a list of all Existing uploads would help to know which are the uploads that are pending to be clearedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that's the idea.
The downside with the enumeration endpoint is, that we'd need to somehow tie in authorization. Right now, we lack proper stuff anyway. The question is: why do we need it? Clearing up is a responsibility of the backend. I don't want to make the API more complex than we really need. If we do, ok. But let's wait for this use case.