prevent WorkflowMonitor OOM by projecting only metric fields#1145
Open
mohammaddanishali-bit wants to merge 5 commits into
Open
prevent WorkflowMonitor OOM by projecting only metric fields#1145mohammaddanishali-bit wants to merge 5 commits into
mohammaddanishali-bit wants to merge 5 commits into
Conversation
4aa76be to
7abcda3
Compare
WorkflowMonitor.reportMetrics() loaded every version of every workflow/task definition and deserialized full json_data into WorkflowDef/TaskDef object graphs, only to read name, ownerApp and concurrencyLimit. At ~100K definitions this peaked at hundreds of MB of heap per refresh, causing a recurring OutOfMemoryError on every node (the monitor is enabled by default) that cascaded into sweeper, polling and Hikari pool failures. Add lightweight projections: WorkflowMetricInfo/TaskMetricInfo DTOs and getWorkflowMetricInfo()/getTaskMetricInfo() on MetadataDAO with a default that projects from getAllWorkflowDefsLatestVersions()/getAllTaskDefs(); Postgres and MySQL override with DB-side projection. WorkflowMonitor holds the DTOs and the per-name grouping helper is removed.
7abcda3 to
d9609e8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request type
NOTE: Please remember to run
./gradlew spotlessApplyto fix any format violations.Changes in this PR
WorkflowMonitor.reportMetrics()runs on every node (enabled by default) andloaded
every version of every workflow/task definition via
getWorkflowDefs()/getTaskDefs(),deserializing the full
json_dataintoWorkflowDef/TaskDefobjects just toread
name,ownerApp, andconcurrencyLimit. At ~100K definitions this peaked at hundredsof MB of heap
per refresh, causing a recurring
OutOfMemoryErrorthat cascaded into sweeper,polling, and
HikariPool failures.
This PR adds lightweight projections:
WorkflowMetricInfo(name, ownerApp)andTaskMetricInfo(name, ownerApp, concurrencyLimit).getWorkflowMetricInfo()/getTaskMetricInfo()onMetadataDAOwith adefault
implementation projecting from
getAllWorkflowDefsLatestVersions()/getAllTaskDefs()(Redis/Cassandra/SQLite unchanged); Postgres and MySQL override with
of catalog size,
with no change to the emitted metrics.
Issue # #1006
Alternatives considered
Set conductor.workflow-monitor.enabled=false on deployments that don't need these metrics and increase heap on deployments where it stays enabled.