Skip to content

Improve compliance of FastAPIJobFiles with the HTTP spec (files endpoint) #20351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

kysrpex
Copy link
Contributor

@kysrpex kysrpex commented May 26, 2025

⚠️ This PR requires #20235, it will remain as a draft until the latter is merged. Have a look at f553524 to see a diff showing what truly changed. This is one of the tweaks that are needed for the development of an integration of Galaxy with ARC (Advanced Resource Connector) as a Pulsar job runner.

It introduces the following differences in behavior:

  • GET and HEAD requests to /api/jobs/{job_id}/files return HTTP status code 400 when the given path does not refer to a file or the input datasets for the job have been purged and 404 when the given path does not exist.
  • PROPFIND requests to /api/jobs/{job_id}/files are answered with HTTP status code 501 (read motivation for this change below).
  • POST requests to /api/jobs/{job_id}/files are answered with HTTP status code 400 when no file is provided.

The reason behind the code explicitly answering PROPFIND requests with status code 501 is an unfortunate interaction between the ARC remote job runner that is under development, the behavior of legacy API endpoints and how they are integrated within the FastAPI app.

The ARC remote job runner (which will be implemented as lib.galaxy.jobs.runners.pulsar.PulsarARCJobRunner) expects this endpoint to return HTTP codes other than 404 when PROPFIND requests are issued. They are not part of the HTTP spec, but they are used in the WebDAV protocol. The correct answer to such requests is likely 501 (not implemented).

FastAPI returns HTTP 405 (method not allowed) for PROPFIND, which maybe is not fully correct but tolerable because it is one less quirk to maintain.

However, because of the way legacy WSGI endpoints are injected into the FastAPI app (using app.mount("/", wsgi_handler)), the built-in support for returning HTTP 405 for PROPFIND breaks, because such requests are passed to the wsgi_handler sub-application. This means that the endpoint still needs to include some code to handle this behavior.

When ALL routes have been migrated to ASGI (no WSGI handler sub-application needed anymore), some lines of code can be removed, they are labeled using comments.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

kysrpex added 5 commits May 15, 2025 15:42
`FastAPIJobFiles` is the new, FastAPI version of `JobFilesAPIController`. The endpoints that have been migrated should exhibit exactly the same behavior as the old ones from `FastAPIJobFiles`. Something to keep in mind is that while FastAPI has some extra built-in features that the legacy WSGI system did not have, such as answering HEAD requests, those do not work because of the way legacy WSGI endpoints are injected into the FastAPI app (using `app.mount("/", wsgi_handler)`), meaning that for example, HEAD requests are passed to the `wsgi_handler` sub-application.

Endpoints dedicated to TUS uploads work in tandem with the WSGI middleware `TusMiddleware` from the `tuswsgi` package. As explained above, WSGI middlewares and endpoints are injected into the FastAPI app after FastAPI routes as a single sub-application `wsgi_handler` using `app.mount("/", wsgi_handler)`, meaning that requests are passed to the `wsgi_handler` sub-application (and thus to `TusMiddleware`) only if there was no FastAPI endpoint defined to handle them. Therefore, they cannot be migrated to FastAPI unless `TusMiddleware` is also migrated to ASGI.
Work around a bug in FastAPI (fastapi/fastapi#13175) that assigns the same operation id to both request methods GET and HEAD of the endpoint `/api/jobs/{job_id}/files` when using the `@router.api_route()` decorator with `methods=["GET", "HEAD"]` as keyword argument.
…T requests to `/api/jobs/{job_id}/files`

Pulsar formats the `path` and `job_key` parameters as query parameters when submitting POST requests to `/api/jobs/{job_id}/files`. However, many Galaxy tests format them as form parameters. The only way to keep the endpoint working as it should (as it worked before the migration to FastAPI) is to accept both query and form parameters.
… requests to `/api/jobs/{job_id}/files`

FastAPI will not use the parameter aliases of form parameters in the OpenAPI docs, but the name of their Python variables. Therefore, the API docs show `path_form` and `job_key_form`. Rename them so that the API docs show the correct parameter names.
…ndpoint)

This commit introduces the following differences in behavior:
- GET and HEAD requests to `/api/jobs/{job_id}/files` return HTTP status code 400 when the given path does not refer to a file or the input datasets for the job have been purged and 404 when the given path does not exist.
- PROPFIND requests to `/api/jobs/{job_id}/files` are answered with HTTP status code 501 (read motivation for this change below).
- POST requests to `/api/jobs/{job_id}/files` are answered with HTTP status code 400 when no file is provided.

The reason behind the code explicitly answering `PROPFIND` requests with status code 501 is an unfortunate interaction between the ARC remote job runner that is under development, the behavior of legacy API endpoints and how they are integrated within the FastAPI app.

The ARC remote job runner (which will be implemented as `lib.galaxy.jobs.runners.pulsar.PulsarARCJobRunner`) expects this endpoint to return HTTP codes other than 404 when `PROPFIND` requests are issued. They are not part of the HTTP spec, but they are used in the WebDAV protocol. The correct answer to such requests is likely 501 (not implemented).

FastAPI returns HTTP 405 (method not allowed) for `PROPFIND`, which maybe is not fully correct but tolerable because it is one less quirk to maintain.

However, because of the way legacy WSGI endpoints are injected into the FastAPI app (using `app.mount("/", wsgi_handler)`), the built-in support for returning HTTP 405 for `PROPFIND` breaks, because such requests are passed to the `wsgi_handler` sub-application. This means that the endpoint still needs to include some code to handle this behavior.

When ALL routes have been migrated to ASGI (no WSGI handler sub-application needed anymore), some lines of code can be removed, they are labeled using comments.
@kysrpex kysrpex self-assigned this May 26, 2025
@kysrpex kysrpex added area/jobs kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes labels May 26, 2025
@kysrpex kysrpex added this to the 25.1 milestone May 26, 2025
400: {
"description": "Path does not refer to a file, or input dataset(s) for job have been purged.",
"content": {
"application/json": {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
"example": {
"detail": (
"Path does not refer to a file, or input dataset(s) for job have been purged."
)

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
user-provided value
may run slow on strings starting with 'dataset_' and with many repetitions of 'dataset_a'.
@expose_api_anonymous_and_sessionless
def create(self, trans, job_id, payload, **kwargs):
raise exceptions.ObjectNotFound("File not found.")
elif not os.path.isfile(path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
@expose_api_anonymous_and_sessionless
def create(self, trans, job_id, payload, **kwargs):
raise exceptions.ObjectNotFound("File not found.")
elif not os.path.isfile(path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
responses={
200: {"description": "An okay message.", "content": {"application/json": {"example": {"message": "ok"}}}},
400: {"description": "Bad request (including no file provided)."},
},

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
responses={
200: {"description": "An okay message.", "content": {"application/json": {"example": {"message": "ok"}}}},
400: {"description": "Bad request (including no file provided)."},
},

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
400: {"description": "Bad request (including no file provided)."},
},
)
def create(

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
@kysrpex
Copy link
Contributor Author

kysrpex commented May 27, 2025

@maikenp FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant