Skip to content

docs: ADR for an upload API targeted towards the browser #1554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ctron
Copy link
Contributor

@ctron ctron commented Apr 11, 2025

Copy link
Collaborator

@JimFuller-RedHat JimFuller-RedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really care about the state of any specific sbom ? I would think the only time we care is if/when there was a problem which might remove the need for an upload API ... so gets a LGTM in terms of design but I wonder if its worth the effort.

Copy link
Member

@carlosthe19916 carlosthe19916 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I just added a minor observation about the state cleanup definition.

* When the backend finished processing the upload
* It sets the final `state` (`failed` or `succeeded`) and the `result`
* It stops updating the `updated` column
* The backend cleans up (deletes) all entries with a "stale" `updated` timestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clean up should be done by the client and not by the server. E.g.

  • Client uploads file
  • Client keeps monitoring /api/v2/upload/{id} every 5 seconds.
  • Once the client gets a valid response at /api/v2/upload/{id} then the client stop requesting /api/v2/upload/{id} and deletes it

If the server decides to delete the upload state then the client might keep trying to fetch /api/v2/upload/{id} and all of a sudden get a 404 because the server deleted it without the client knowing about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the client should be in charge of cleaning it up. There's a bunch of cases where the client won't be able to. So we'd either have a growing table of stale entries. Or we need to implement it anyway.

Copy link
Contributor

@jcrossley3 jcrossley3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall one of the primary motivations for moving from V1 to V2 was that it was very confusing for the user to tell whether a document they uploaded ever got processed successfully.

This smells like backsliding into that same territory.

I'm not wholly against it, but I would like it to be designed from the user's perspective, and I would like real measurements and bug reports indicating that it's actually a problem we need to address.

For example, what's the size threshold beneath which the user has a perfectly satisfying experience uploading an SBOM today? How common is it that users attempt to upload files bigger than that threshold?

If we can determine that size, maybe the UX simply returns a helpful "exceeds size" message that instructs them how to use the dataset api instead.

Maybe all the suggestions in this ADR be applied to the dataset api, making it more robust and more user friendly?

I like our simple upload UX feature as it is. It's a 1000x better than V1, so I'd prefer to keep it simple for the common case, if possible.

@ctron
Copy link
Contributor Author

ctron commented Apr 14, 2025

I recall one of the primary motivations for moving from V1 to V2 was that it was very confusing for the user to tell whether a document they uploaded ever got processed successfully.

And that's still the case. We are able to track this. And this API allows you doing that.

I'm not wholly against it, but I would like it to be designed from the user's perspective, and I would like real measurements and bug reports indicating that it's actually a problem we need to address.

We got a bunch of JIRAs specifically for this already. That's the motivation.

If we can determine that size, maybe the UX simply returns a helpful "exceeds size" message that instructs them how to use the dataset api instead.

I don't think the user should be involved in that. The UI should deal with this. There would be no difference from a user's perspective. For the user, it just works. (Compared to right now, where it just fails).

Maybe all the suggestions in this ADR be applied to the dataset api, making it more robust and more user friendly?

The ADR defines a new API. That could including uploading datasets as well. But that would just be another format type.

I like our simple upload UX feature as it is. It's a 1000x better than V1, so I'd prefer to keep it simple for the common case, if possible.

And it creates a lot of JIRAs.

Copy link
Member

@carlosthe19916 carlosthe19916 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! We also have a REST proposal, which makes easier to understand the final output of data.

I see there is a proposed endpoint DELETE /api/v2/upload/{id}. Am I right interpreting it as the client being responsible of deleting Uploads? I think it makes sense but previously in another comment @ctron you gave me the impression that you didn't like that idea as it would generate a growing table of stale entries.

Comment on lines +50 to +82
### REST API

* `GET /api/v2/upload/{id}`: Get information about the upload

Response (`200 OK`):

```json5
{
"id": "opaque-unique-id",
"state": "processing", // or failed, succeeded
"updated": "2025-05-07T10:13:27Z", // always UTC,
"result": {} // or absent for `processing`, `failed`
}
```

* `DELETE /api/v2/upload/{id}`: Delete the state record, will not receive further updates

Response (`204 No Content`): Sent if found or if not found.

* `POST /api/v2/upload`: Start an upload
Request:
* `format`: Format of the document, defaults to "auto-detect". Can also be `sbom` or `advisory`.

Response (`202 Accepted`):

```json5
{
"id": "opaque-unique-id",
"format": "concrete-format" // e.g. "spdx"
}
```


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the flow will be:

  • POST /api/v2/upload . Generates response (202 - Accepted):
{
     "id": "opaque-unique-id",
}
  • Then the client need to watch continuously the upload using. GET /api/v2/upload/{id} where id is the id generated in the previous step. The response will be:

    {
      "id": "opaque-unique-id",
      "state": "processing", // or failed, succeeded
    }
  • Finally, once the client wants to stop monitoring the upload the endpoint DELETE /api/v2/upload/{id} should be called.

I think that should work and cover all issues reported by QE.

On a side note

A crazy idea came to me while reading this ADR:

Would it be crazy to have an endpoint GET /api/v2/upload that list all uploads (with pagination in place)?

Given the fact that we have the endpoint DELET /api/v2/upload/{id} I guess the client is in charge of deleting Uploads. Then having a list of all Existing uploads would help to know which are the uploads that are pending to be cleared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the idea.

The downside with the enumeration endpoint is, that we'd need to somehow tie in authorization. Right now, we lack proper stuff anyway. The question is: why do we need it? Clearing up is a responsibility of the backend. I don't want to make the API more complex than we really need. If we do, ok. But let's wait for this use case.

@ctron
Copy link
Contributor Author

ctron commented May 12, 2025

I see there is a proposed endpoint DELETE /api/v2/upload/{id}. Am I right interpreting it as the client being responsible of deleting Uploads? I think it makes sense but previously in another comment @ctron you gave me the impression that you didn't like that idea as it would generate a growing table of stale entries.

I still don't like it 😁 However, I think it makes sense allowing the client to perform this action anyway. Kind of a "best effort'. But not as a "responsibility", but more as an optimization.

Should the frontend/client be willing to clean up. Fine. But the backend would clean up in any case. Where the timeout for that should be configurable, and in the area of like 15mins or more.

@carlosthe19916
Copy link
Member

One other idea: when we initiate an upload, it can be optional/intentional the creation of an Upload instance GET /api/v2/upload/{id}. E.g.

  • I upload a file using my terminal (I won't expect to monitor/watch the process) so I will do POST /api/v2/upload. Just like the current upload process we have in the main branch.
  • I upload a file using the UI so I do expect to monitor the process so I will do POST /api/v2/upload?watch=true which should allow me to do GET /api/v2/upload/{id}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants