Skip to content

Migrate JobFilesAPIController to FastAPI (excluding TUS uploads) #20235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

kysrpex
Copy link
Contributor

@kysrpex kysrpex commented May 15, 2025

As part of the development of an integration of Galaxy with ARC (Advanced Resource Connector) as a Pulsar job runner, tweaks to the JobFilesAPIController are needed. None of such tweaks are included in this PR, but it makes sense to implement them building upon a FastAPI endpoint rather than a legacy WSGI one; and the first step to do that is migrating the controller to FastAPI.

FastAPIJobFiles is the new, FastAPI version of JobFilesAPIController. The endpoints that have been migrated should exhibit exactly the same behavior as the old ones from FastAPIJobFiles.

Endpoints dedicated to TUS uploads work in tandem with the WSGI middleware TusMiddleware from the tuswsgi package. WSGI middlewares and endpoints are injected into the FastAPI app after FastAPI routes as a single sub-application wsgi_handler using app.mount("/", wsgi_handler), meaning that requests are passed to the wsgi_handler sub-application (and thus to TusMiddleware) only if there was no FastAPI endpoint defined to handle them. Therefore, they cannot be migrated to FastAPI unless TusMiddleware is also migrated to ASGI. I am postponing that migration, because the ARC integration needs to be delivered soonish.

I also included three new tests for existing functionality: writing from uploads done with the nginx_upload_module, from TUS uploads and using the parameter __file.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

kysrpex added 2 commits May 15, 2025 15:42
`FastAPIJobFiles` is the new, FastAPI version of `JobFilesAPIController`. The endpoints that have been migrated should exhibit exactly the same behavior as the old ones from `FastAPIJobFiles`. Something to keep in mind is that while FastAPI has some extra built-in features that the legacy WSGI system did not have, such as answering HEAD requests, those do not work because of the way legacy WSGI endpoints are injected into the FastAPI app (using `app.mount("/", wsgi_handler)`), meaning that for example, HEAD requests are passed to the `wsgi_handler` sub-application.

Endpoints dedicated to TUS uploads work in tandem with the WSGI middleware `TusMiddleware` from the `tuswsgi` package. As explained above, WSGI middlewares and endpoints are injected into the FastAPI app after FastAPI routes as a single sub-application `wsgi_handler` using `app.mount("/", wsgi_handler)`, meaning that requests are passed to the `wsgi_handler` sub-application (and thus to `TusMiddleware`) only if there was no FastAPI endpoint defined to handle them. Therefore, they cannot be migrated to FastAPI unless `TusMiddleware` is also migrated to ASGI.
Work around a bug in FastAPI (fastapi/fastapi#13175) that assigns the same operation id to both request methods GET and HEAD of the endpoint `/api/jobs/{job_id}/files` when using the `@router.api_route()` decorator with `methods=["GET", "HEAD"]` as keyword argument.
@kysrpex
Copy link
Contributor Author

kysrpex commented May 15, 2025

Pulsar uses these endpoints, so before merging this, it is critical that it is passes all tests from test/integration/test_job_files.py and test_job_files_tus.py (./run_tests.sh -integration test/integration/test_job_files.py --, ./run_tests.sh -integration test/integration/test_job_files_tus.py --).

@kysrpex kysrpex added kind/refactoring cleanup or refactoring of existing code, no functional changes area/jobs labels May 15, 2025
@kysrpex kysrpex self-assigned this May 15, 2025
@kysrpex
Copy link
Contributor Author

kysrpex commented May 15, 2025

Locally, test/integration/test_job_files_tus.py::test_tools[simple_constructs] and test/integration/test_job_files_tus.py::test_tools[composite_output_tests] are failing because the tools produce different outputs from what's expected.

@jmchilton I could use some help from you (I see that you wrote test/integration/test_job_files_tus.py). Do you think the failures are related to the changes from this PR? Do you have any clue of what's failing before I look further into it? I would be quite grateful if you could have a look after the CI finishes running the tests.


@router.post(
"/api/jobs/{job_id}/files",
summary="Populate an output file.",

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
responses={
200: {"description": "An okay message.", "content": {"application/json": {"example": {"message": "ok"}}}},
},
)

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
user-provided value
may run slow on strings starting with 'dataset_' and with many repetitions of 'dataset_a'.
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):
with open(path, "ab") as destination:
shutil.copyfileobj(open(input_file.name, "rb"), destination)
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
with open(path, "ab") as destination:
shutil.copyfileobj(open(input_file.name, "rb"), destination)
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):
with open(path, "ab") as destination:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):
with open(path, "ab") as destination:
if input_file_path:
with open(input_file_path, "rb") as input_file_handle:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even
# a path where uploaded files can be accessed on disk
if input_file_path:
shutil.move(input_file_path, path)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even
# a path where uploaded files can be accessed on disk
if input_file_path:
shutil.move(input_file_path, path)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
# tempfile has moved and Python wants to delete it.
pass
return {"message": "ok"}
with open(path, "wb") as destination:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
@kysrpex
Copy link
Contributor Author

kysrpex commented May 15, 2025

About the CodeQL issues, although there is self.__check_job_can_write_to_path(trans, job, path), which does some permission checks (I did not check if it completely gets rid of the risk), I am migrating the endpoint, not improving it. Those issues were there already and unfortunately I cannot spend time at the moment in fixing them.

One solution is to add an exception. Another to delay the merge to see if there is some time remaining to fix them after the ARC integration has been worked through.

@kysrpex kysrpex force-pushed the job_files_fastapi_migration branch from 66e0f95 to 1a8b10b Compare May 20, 2025 09:59
…T requests to `/api/jobs/{job_id}/files`

Pulsar formats the `path` and `job_key` parameters as query parameters when submitting POST requests to `/api/jobs/{job_id}/files`. However, many Galaxy tests format them as form parameters. The only way to keep the endpoint working as it should (as it worked before the migration to FastAPI) is to accept both query and form parameters.
@kysrpex kysrpex force-pushed the job_files_fastapi_migration branch from 1a8b10b to d17002d Compare May 20, 2025 10:04

job = self.__authorize_job_access(trans, job_id, path=path, job_key=job_key)

if not os.path.exists(path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):
with open(path, "ab") as destination:
shutil.copyfileobj(open(input_file.name, "rb"), destination)
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).
with open(path, "ab") as destination:
shutil.copyfileobj(open(input_file.name, "rb"), destination)
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):
with open(path, "ab") as destination:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")):
with open(path, "ab") as destination:
if input_file_path:
with open(input_file_path, "rb") as input_file_handle:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1). This path depends on a [user-provided value](2).
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even
# a path where uploaded files can be accessed on disk
if input_file_path:
shutil.move(input_file_path, path)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1). This path depends on a [user-provided value](2).
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even
# a path where uploaded files can be accessed on disk
if input_file_path:
shutil.move(input_file_path, path)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).

See more discussion of checking upload access, but we shouldn't need the
API key and session stuff the user upload tusd server should be configured with.
with open(path, "wb") as destination:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).
@kysrpex
Copy link
Contributor Author

kysrpex commented May 20, 2025

Locally, test/integration/test_job_files_tus.py::test_tools[simple_constructs] and test/integration/test_job_files_tus.py::test_tools[composite_output_tests] are failing because the tools produce different outputs from what's expected.

@jmchilton I could use some help from you (I see that you wrote test/integration/test_job_files_tus.py). Do you think the failures are related to the changes from this PR? Do you have any clue of what's failing before I look further into it? I would be quite grateful if you could have a look after the CI finishes running the tests.

It was because the endpoint /api/jobs/{job_id}/files needs to accept path and and job_key both as query and form parameter for POST requests because most Galaxy tests submit the parameters as form parameters but Pulsar submits them as query parameters. It now does so (see d17002d). There are now no more relevant test failures.

… requests to `/api/jobs/{job_id}/files`

FastAPI will not use the parameter aliases of form parameters in the OpenAPI docs, but the name of their Python variables. Therefore, the API docs show `path_form` and `job_key_form`. Rename them so that the API docs show the correct parameter names.
@kysrpex
Copy link
Contributor Author

kysrpex commented May 27, 2025

@maikenp FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API area/jobs area/testing/integration area/testing kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant