Skip to content

Conversation

@jsvd
Copy link
Member

@jsvd jsvd commented Jul 29, 2025

Instead of running Psych::Parse which creates the entire Psych-based Tree before emitting the ruby hash dictionary, the plugin should be capable of gradually constructing the final hash dictionary as the parser identifies YAML elements (streaming parsing).

This changeset adds an opt-in streaming parsing for YAML files based on snakeyaml-engine.

The difference in loading a 26MB YAML with 50.000 entries of objects is significant in terms of memory pressure.

With the current psych parser in non streaming mode, memory required to load the YAML + generate the final dictionary is about 1GB:

Screenshot 2025-07-30 at 11 10 05

With this PR using a streaming snakeyaml-engine parser is about 330MB:

Screenshot 2025-07-30 at 11 12 20

It also reduces loading time from 6 seconds to 2 on my laptop.

fixes #107

@jsvd jsvd marked this pull request as ready for review July 30, 2025 18:48
@jsvd jsvd changed the title draft implementation of streaming yaml parsing implementation of streaming yaml parsing Jul 30, 2025
@jsvd jsvd closed this Aug 4, 2025
@jsvd jsvd reopened this Aug 4, 2025
@andsel andsel self-requested a review August 4, 2025 10:44
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions, then I'll LGTM

Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsvd jsvd merged commit 38e4424 into logstash-plugins:main Aug 4, 2025
2 of 3 checks passed
@jsvd jsvd deleted the streaming_yaml_parsing branch August 4, 2025 14:04
@jsvd jsvd restored the streaming_yaml_parsing branch August 4, 2025 14:04
@jsvd jsvd deleted the streaming_yaml_parsing branch August 4, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

YAML file stream parsing

2 participants