Best practice for webserver liveness probe check #54853
Unanswered
hanxdatadog
asked this question in
General
Replies: 2 comments 7 replies
-
Please use discussions for questions, not issues (as instructed in the template). Converted it. |
Beta Was this translation helpful? Give feedback.
0 replies
-
I guess our Helm chart is a good one to look at. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description
👋 Dear Airflow community,
Recently we ran some stress tests on Airflow’s asset-based scheduling and noticed that the webserver was frequently restarting due to liveness probe failures. The liveness probe we were using was:
This was based on the guidance from the old health endpoint response:
airflow/airflow-core/src/airflow/api_fastapi/core_api/app.py
Line 85 in 31f0eac
From reading the source code, my understanding is that
/api/v2/monitor/health
checks the overall health of the metadatabase, scheduler, and triggerer. If there’s any slowdown in retrieving health information from these components, the webserver gets restarted, which makes the UI unavailable. Ideally, we’d like the UI to remain available even if the metadb or scheduler is under heavy load.What would be the recommended alternative liveness check that doesn’t make the webserver’s health dependent on backend components? I see some options, such as the execution API health endpoint:
airflow/airflow-core/src/airflow/api_fastapi/execution_api/routes/health.py
Line 30 in 31f0eac
I also noticed that the official chart for the API server uses the version endpoint:
airflow/chart/templates/api-server/api-server-deployment.yaml
Line 194 in 31f0eac
Any suggestions or guidance would be much appreciated 🙏
Use case/motivation
A liveness probe check API end point for webserver that is not dependent on other components
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions