-
Notifications
You must be signed in to change notification settings - Fork 1k
Migrate JobFilesAPIController
to FastAPI (excluding TUS uploads)
#20235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
`FastAPIJobFiles` is the new, FastAPI version of `JobFilesAPIController`. The endpoints that have been migrated should exhibit exactly the same behavior as the old ones from `FastAPIJobFiles`. Something to keep in mind is that while FastAPI has some extra built-in features that the legacy WSGI system did not have, such as answering HEAD requests, those do not work because of the way legacy WSGI endpoints are injected into the FastAPI app (using `app.mount("/", wsgi_handler)`), meaning that for example, HEAD requests are passed to the `wsgi_handler` sub-application. Endpoints dedicated to TUS uploads work in tandem with the WSGI middleware `TusMiddleware` from the `tuswsgi` package. As explained above, WSGI middlewares and endpoints are injected into the FastAPI app after FastAPI routes as a single sub-application `wsgi_handler` using `app.mount("/", wsgi_handler)`, meaning that requests are passed to the `wsgi_handler` sub-application (and thus to `TusMiddleware`) only if there was no FastAPI endpoint defined to handle them. Therefore, they cannot be migrated to FastAPI unless `TusMiddleware` is also migrated to ASGI.
Work around a bug in FastAPI (fastapi/fastapi#13175) that assigns the same operation id to both request methods GET and HEAD of the endpoint `/api/jobs/{job_id}/files` when using the `@router.api_route()` decorator with `methods=["GET", "HEAD"]` as keyword argument.
Pulsar uses these endpoints, so before merging this, it is critical that it is passes all tests from test/integration/test_job_files.py and test_job_files_tus.py ( |
Locally, @jmchilton I could use some help from you (I see that you wrote test/integration/test_job_files_tus.py). Do you think the failures are related to the changes from this PR? Do you have any clue of what's failing before I look further into it? I would be quite grateful if you could have a look after the CI finishes running the tests. |
|
||
@router.post( | ||
"/api/jobs/{job_id}/files", | ||
summary="Populate an output file.", |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
responses={ | ||
200: {"description": "An okay message.", "content": {"application/json": {"example": {"message": "ok"}}}}, | ||
}, | ||
) |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
regular expression
user-provided value
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): | ||
with open(path, "ab") as destination: | ||
shutil.copyfileobj(open(input_file.name, "rb"), destination) | ||
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
with open(path, "ab") as destination: | ||
shutil.copyfileobj(open(input_file.name, "rb"), destination) | ||
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): | ||
with open(path, "ab") as destination: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): | ||
with open(path, "ab") as destination: | ||
if input_file_path: | ||
with open(input_file_path, "rb") as input_file_handle: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
This path depends on a
user-provided value
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even | ||
# a path where uploaded files can be accessed on disk | ||
if input_file_path: | ||
shutil.move(input_file_path, path) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
This path depends on a
user-provided value
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even | ||
# a path where uploaded files can be accessed on disk | ||
if input_file_path: | ||
shutil.move(input_file_path, path) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
# tempfile has moved and Python wants to delete it. | ||
pass | ||
return {"message": "ok"} | ||
with open(path, "wb") as destination: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
About the CodeQL issues, although there is One solution is to add an exception. Another to delay the merge to see if there is some time remaining to fix them after the ARC integration has been worked through. |
66e0f95
to
1a8b10b
Compare
…T requests to `/api/jobs/{job_id}/files` Pulsar formats the `path` and `job_key` parameters as query parameters when submitting POST requests to `/api/jobs/{job_id}/files`. However, many Galaxy tests format them as form parameters. The only way to keep the endpoint working as it should (as it worked before the migration to FastAPI) is to accept both query and form parameters.
1a8b10b
to
d17002d
Compare
|
||
job = self.__authorize_job_access(trans, job_id, path=path, job_key=job_key) | ||
|
||
if not os.path.exists(path): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): | ||
with open(path, "ab") as destination: | ||
shutil.copyfileobj(open(input_file.name, "rb"), destination) | ||
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
with open(path, "ab") as destination: | ||
shutil.copyfileobj(open(input_file.name, "rb"), destination) | ||
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): | ||
with open(path, "ab") as destination: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
if os.path.exists(path) and (path.endswith("tool_stdout") or path.endswith("tool_stderr")): | ||
with open(path, "ab") as destination: | ||
if input_file_path: | ||
with open(input_file_path, "rb") as input_file_handle: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even | ||
# a path where uploaded files can be accessed on disk | ||
if input_file_path: | ||
shutil.move(input_file_path, path) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
# (https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile), so now there is not even | ||
# a path where uploaded files can be accessed on disk | ||
if input_file_path: | ||
shutil.move(input_file_path, path) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
|
||
See more discussion of checking upload access, but we shouldn't need the | ||
API key and session stuff the user upload tusd server should be configured with. | ||
with open(path, "wb") as destination: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
It was because the endpoint |
… requests to `/api/jobs/{job_id}/files` FastAPI will not use the parameter aliases of form parameters in the OpenAPI docs, but the name of their Python variables. Therefore, the API docs show `path_form` and `job_key_form`. Rename them so that the API docs show the correct parameter names.
@maikenp FYI |
As part of the development of an integration of Galaxy with ARC (Advanced Resource Connector) as a Pulsar job runner, tweaks to the
JobFilesAPIController
are needed. None of such tweaks are included in this PR, but it makes sense to implement them building upon a FastAPI endpoint rather than a legacy WSGI one; and the first step to do that is migrating the controller to FastAPI.FastAPIJobFiles
is the new, FastAPI version ofJobFilesAPIController
. The endpoints that have been migrated should exhibit exactly the same behavior as the old ones fromFastAPIJobFiles
.Endpoints dedicated to TUS uploads work in tandem with the WSGI middleware
TusMiddleware
from thetuswsgi
package. WSGI middlewares and endpoints are injected into the FastAPI app after FastAPI routes as a single sub-applicationwsgi_handler
usingapp.mount("/", wsgi_handler)
, meaning that requests are passed to thewsgi_handler
sub-application (and thus toTusMiddleware
) only if there was no FastAPI endpoint defined to handle them. Therefore, they cannot be migrated to FastAPI unlessTusMiddleware
is also migrated to ASGI. I am postponing that migration, because the ARC integration needs to be delivered soonish.I also included three new tests for existing functionality: writing from uploads done with the nginx_upload_module, from TUS uploads and using the parameter
__file
.How to test the changes?
(Select all options that apply)
License