Skip to content

httpx from scratch #1823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

httpx from scratch #1823

wants to merge 7 commits into from

Conversation

technillogue
Copy link
Contributor

@technillogue technillogue commented Jul 24, 2024

as a prelude to #1813, we would like to cut an intermediate release that can be tested so we can discover bugs in a smaller surface area. here, we're postponing all of the async worker changes to only bring async runner and httpx, without supporting any kind of concurrency. this is slightly artificial but hopefully will reduce how many times we need to roll back #1813

this is perhaps an opportunity to fix the webhook and files related tests as well

@technillogue technillogue force-pushed the syl/httpx-from-scratch branch from 6fa0af9 to b1d4d18 Compare July 24, 2024 18:55
@technillogue technillogue requested review from a team, aron and mattt July 24, 2024 19:01
dependencies = [
# intentionally loose. perhaps these should be vendored to not collide with user code?
"attrs>=20.1,<24",
"fastapi>=0.75.2,<0.99.0",
# we may not need http2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We almost certainly don't. What's the cost of including it vs not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only real motivation for this when it was written is using http2 for file uploads which did not go through localhost at the time. h2 is not a big library so I don't think this is terribly consequential either way

@@ -86,4 +86,5 @@ def setup_logging(*, log_level: int = logging.NOTSET) -> None:

# Reconfigure log levels for some overly chatty libraries
logging.getLogger("uvicorn.access").setLevel(logging.WARNING)
# FIXME: no more urllib3(?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does httpx log to a system logger for normal requests? If not, you can just remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights_url = os.environ.get("COG_WEIGHTS")
weights_path = "weights"
weights_path = "weights" # this is the source of a bug isn't it?
Copy link
Contributor

@nickstenning nickstenning Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment refer to, and does it need to be in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rationale for including this here was that the behavior of CogPath.validate was changed, in that URLPath aka URLTempFile's validate method is now async and takes httpx.AsyncClient as an argument.

probably more problematically for this PR, it changes the behavior of setup paths altogether

that said, most of the changes included here are running in the child worker, which is supposed to be out of scope for this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def webhook_headers() -> "dict[str, str]":
headers = common_headers()
auth_token = os.environ.get("WEBHOOK_AUTH_TOKEN")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return urlparse(final_url)._replace(query="").geturl()

# this previously lived in json.upload_files, but it's clearer here
# this is a great pattern that should be adopted for input files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What pattern is being referred to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recursively iterating through the entire object rather than only the top level

# # it would be kind of cleaner to make the default file_url
# # instead of skipping entirely, we need to convert to datauri
# if url is None:
# return obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did this commented-out code do and why is it commented out?

yield chunk


# there's a case for splitting this apart or inlining parts of it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think HTTP clients and data structure manipulations should be smooshed together like this. We should extract these back into separate business logic and client management pieces.

The current approach makes it hard to test, hard to iterate on (e.g. TODO: upload concurrently would be far easier to achieve if we separated the processes of working out what we have to upload from the process of actually uploading it), and hard to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced, this is identical to json.upload_files. I am sympathetic to extracting some helper functions for data structure manipulations.

you could have a helper function yield each file that has to be uploaded, but then those have to go back into the data structure anyway, it doesn't seem like less complexity to me.

the previous setup had about five layers of indirection across four files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that said if you have a narrow proposal I would be open to that

error_callback=handle_error,
)

raise # we don't actually want to raise anymore but w/e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor

@nickstenning nickstenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really reviewable. The tests aren't passing, and I don't think the changeset is any fit state for us to merge to main in good conscience.

I appreciate that this probably isn't the review you'd like. As a reviewer, anything more than a few hundred lines is quite hard to review effectively unless a substantial portion of the changes are trivial. Reviewing a diff five times that size that's full of half-finished code (stuff commented out, comments suggesting that the code isn't ready for primetime) is just not something anyone is likely to be able to do effectively.

Let's take a step back here and rather than trying to adapt what we have on the async branch by cutting stuff away from it, let's take what we have learned from that branch and make a series of smaller and more incremental changes -- possibly starting afresh from current main -- to get us to where we want to be.

self, url: str, response: Dict[str, Any], event: WebhookEvent
) -> None:
if Status.is_terminal(response["status"]):
Copy link
Contributor

@mattt mattt Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event is unused. Should we drop it from the method signature entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right, this was moved up to make_webhook_sender

@technillogue technillogue force-pushed the syl/httpx-from-scratch branch 2 times, most recently from a48bc75 to 8896136 Compare July 25, 2024 20:05
* have runner return asyncio.Task instead of AsyncFuture
* make tests async and fix them
* delete remaining runner thread code :)
* review changes to tests and server

(reverts commit 828eee9)

Signed-off-by: technillogue <[email protected]>
@technillogue technillogue force-pushed the syl/httpx-from-scratch branch from 8896136 to ce25ad5 Compare July 25, 2024 20:12
technillogue and others added 6 commits July 26, 2024 02:59
* input downloads, output uploads, and webhooks are now handled by ClientManager, which persists for the lifetime of runner, allowing us to reuse connections, which may significantly help with large uploads.
* although I was originally going to drop output_file_prefix, it's not actually hard to maintain. the behavior is changed now and objects are uploaded as soon as they're outputted rather than after the prediction is completed.
* there's an ugly hack with uploading an empty body to get the redirect instead of making api time out from trying to upload an 140GB file. that can be fixed by implemented an MPU endpoint and/or a "fetch upload url" endpoint.
* the behavior of the non-indempotent endpoint is changed; the id is now randomly generated if it's not provided in the body. this isn't strictly required for this change alone, but is hard to carve out.
* the behavior of Path is changed significantly. see https://www.notion.so/replicate/Cog-Setup-Path-Problem-2fc41d40bcaf47579ccd8b2f4c71ee24

Co-authored-by: Mattt <[email protected]>

* format
* stick a %s on line 190 clients.py (#1707)
* local upload server can be called cluster.local in addition to .internal (#1714)

Signed-off-by: technillogue <[email protected]>
Based on the implementation in #1698 for sync cog.

If the request to /predict contains headers `traceparent` and
`tracestate` defined by w3c Trace Context[^1] then these headers are
forwarded on to the webhook and upload calls.

This allows observability systems to link requests passing through cog.

[^1]: https://www.w3.org/TR/trace-context/

Signed-off-by: technillogue <[email protected]>
* Cast TraceContext into Mapping[str, str] to fix linter

* Include prediction id upload request

Based on #1667

This PR introduces two small changes to the file upload interface.

1. We now allow downstream services to include the destination of the
asset in a `Location` header, rather than assuming that it's the same as
the final upload url (either the one passed via `--upload-url` or the
result of a 307 redirect response.

2. We now include the `X-Prediction-Id` header in upload request, this
allows the downstream client to potentially do configuration/routing
based on the prediction ID. This ID should be considered unsafe and
needs to be validated by the downstream service.

* Extract ChunkFileReader into top-level class

---------

Co-authored-by: Mattt Zmuda <[email protected]>
Co-authored-by: Mattt <[email protected]>
Signed-off-by: technillogue <[email protected]>
* optimize webhook serialization and logging
* optimize logging by binding structlog proxies
* fix tests

---------

Signed-off-by: technillogue <[email protected]>
@technillogue technillogue force-pushed the syl/httpx-from-scratch branch from dc7aace to 72f8f05 Compare July 26, 2024 06:59
@technillogue technillogue requested a review from yorickvP July 29, 2024 18:45
@@ -10,11 +10,13 @@ authors = [{ name = "Replicate", email = "[email protected]" }]
license.file = "LICENSE"
urls."Source" = "https://github.com/replicate/cog"

requires-python = ">=3.7"
requires-python = ">=3.8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this officially drop 3.7?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@aron
Copy link
Contributor

aron commented Oct 16, 2024

@technillogue I think this is stale now right? Shall we close it and re-open when we pick it up again?

@aron aron closed this Oct 16, 2024
@nickstenning nickstenning deleted the syl/httpx-from-scratch branch February 11, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants