Description
First check
- I added a descriptive title to this issue.
- I used the GitHub search to find a similar issue and didn't find it.
- I refreshed the page and this issue still occurred.
- I checked if this issue was specific to the browser I was using by testing with a different browser.
Bug summary
TLDR: The /api/flow_runs/filter
endpoint pulls the entirety of flow parameters
, even if they are ~gigabytes of file contents. This can make the UI very slow.
Hi Prefect friends. I’m running self-hosted Prefect, and the UI had become ungodly slow - taking 15-20 seconds or more to resolve sometimes. No worries, still useable, new product, etc, etc
Went in to see if it was something I could put in a bugfix for, and I discovered the culprit is /api/flow_runs/filter
. That API call contains all of the data files I processed that day, because they were parameters to a subflow.
I don't think I actually need a subflow here so I can remove it and probably solve my issue. But there probably should be an idiot-proof truncation limit to parameters? Or maybe they aren't shown unless specifically requested?
Reproduction
My setup looks something like this. There's a main orchestrator flow that first downloads all the data and persists it to cache. It then hands that list of cached files to a subflow to transform.
main_flow()
is the orchestrator.pages
here ends up being a list of cached files, I thinkPersistedResult
since they're saved to a local filesystemcsv_transform(pages)
is the subflow that gets handed the list of pages
This is mostly psuedocode. I just wanted to give a flavor of what I'm doing, but I can sanitize real code if you want.
@flow
def main_flow():
pages = crawl_dumb_enterprise_api(endpoint)
csv_transform(pages)
kickoff_warehouse_load()
def crawl_dumb_enterprise_api(endpoint):
result = []
for page in endpoint:
result.append(fetch_url(url)
return result
@task(
retries=3,
retry_delay_seconds=60,
timeout_seconds=600, # ten minutes
cache_result_in_memory=False,
result_serializer=CompressedJSONSerializer(),
cache_key_fn=fetch_url_cache_key,
cache_expiration=timedelta(hours=6),
)
def fetch_url(url):
return(requests.get(url))
@flow
def csv_transform(pages):
for giant_xml_thing in pages:
extract_and_put_in_csv_format()
(If you're wondering why I have to crawl the entire API before starting anything, it's because this is a terrible slow fragile API that uses unguessable IDs for pagination and embeds the URL to the next page in the giant result XML so I have to wait until it downloads anyways to start loading the next page not that I'm bitter or anything)
Error
It's specifically this UI call (copied out of devtools as curl & lightly edited for privacy)
curl 'https://werehaus.site/api/flow_runs/filter' \
-H 'authority: werehaus.site' \
-H 'accept: application/json, text/plain, */*' \
-H 'accept-language: en-US,en;q=0.9' \
-H 'content-type: application/json' \
-H 'cookie: logged_out_marketing_header_id=blahblah; _ga=blahblah; _ga_blahblah=blahblahblah' \
-H 'origin: https://werehaus.site/' \
-H 'referer: https://werehaus.site/flow-runs' \
-H 'sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' \
-H 'x-prefect-ui: true' \
--data-raw '{"flow_runs":{"expected_start_time":{"before_":"2023-06-15T06:59:59.999Z","after_":"2023-06-06T07:00:00.000Z"}},"sort":"START_TIME_DESC"}' \
--compressed
If I run that, I get about an 1GB download that has a few normal entries and then this thing:
{
"id":"246bba3b-d01b-427a-8e1e-eb59e42d585b",
"created":"2023-06-13T11:34:44.821804+00:00",
"updated":"2023-06-13T11:43:46.647193+00:00",
"name":"curly-impala",
"flow_id":"0a5a23d3-3d0d-4a81-b114-ceb45fad54ca","state_id":"99d54758-53f8-4dfb-af4c-5d51cfb9aeb5",
"deployment_id":null,
"work_queue_id":null,
"work_queue_name":null,
"flow_version":"2d50993280a293fe8a999910e662b216",
"parameters":{
"pages":[
"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n<!DOCTYPE 50O MB CHONKYTONK XML FILE...."
Browers
- Chrome
- Firefox
- Safari
- Edge
Prefect version
Version: 2.10.13
API version: 0.8.4
Python version: 3.10.11
Git commit: 179edeac
Built: Thu, Jun 8, 2023 4:10 PM
OS/Arch: darwin/arm64
Profile: prod
Server type: server
Additional context
It's possible I'm doing a dumb. If there's a better way to do this, please feel free to point this out. I appreciate it.
I didn't see an easy way to fix this without refactoring way too much, but I'm happy to take a crack at it if it's simple.
Thanks so much. I really do like working with Prefect, it's solid.