Skip to content

manifest: Use faster yaml CLoader if available#806

Merged
carlescufi merged 1 commit into
zephyrproject-rtos:mainfrom
pdgendt:yaml-cloader
Apr 30, 2025
Merged

manifest: Use faster yaml CLoader if available#806
carlescufi merged 1 commit into
zephyrproject-rtos:mainfrom
pdgendt:yaml-cloader

Conversation

@pdgendt
Copy link
Copy Markdown
Collaborator

@pdgendt pdgendt commented Apr 15, 2025

Speed up the yaml parsing used by west by using LibYAML if available, as specified in the docs.

On Zephyr main with optional modules enabled;

Before

$ hyperfine --warmup 3 'west -q forall -c "true"'
Benchmark 1: west -q forall -c "true"
  Time (mean ± σ):     218.6 ms ±   1.8 ms    [User: 184.8 ms, System: 17.1 ms]
  Range (min … max):   215.7 ms … 223.0 ms    13 runs

After

$ hyperfine --warmup 3 'west -q forall -c "true"'
Benchmark 1: west -q forall -c "true"
  Time (mean ± σ):     200.9 ms ±   1.6 ms    [User: 166.2 ms, System: 16.6 ms]
  Range (min … max):   198.9 ms … 204.4 ms    15 runs

An ~8% speedup for a simple command that loads the manifest and loops the modules.

Speed up the yaml parsing used by west by using LibYAML if available.

Signed-off-by: Pieter De Gendt <pieter.degendt@basalte.be>
Copy link
Copy Markdown
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an 8% improvement when doing practically nothing. On your test system, this saved 20ms. Doesn't 20ms become negligible when west actually performs something instead?

Also, this adds a dependency on additional C code, doesn't it? Not something desirable in 2024.

@pdgendt
Copy link
Copy Markdown
Collaborator Author

pdgendt commented Apr 15, 2025

This is an 8% speed when doing practically nothing. On your test system, this saved 20ms. Isn't 20ms negligible when west actually performs something?

Also, this adds a dependency on additional C code, doesn't it? Not something desirable in 2024.

This does not add a dependency, if the C library isn't available, it falls back to the previous behaviour.

It's a free optimisation, even documented by the authors, also used in Zephyr.

It's a small gain, but if you start adding up all the small bits of daily runs in CI or locally, why wouldn't we?

@marc-hb
Copy link
Copy Markdown
Collaborator

marc-hb commented Apr 15, 2025

This does not add a dependency, if the C library isn't available, it falls back to the previous behaviour.

I understand but this does run more C code than before when the library is available, doesn't it?

I bet this C library is valuable for programs that spend a lot of time parsing a lot of large YAML files but west does not seem to fall in this category.

It's a small gain, but if you start adding up all the small bits of daily runs in CI or locally, why wouldn't we?

If you add up 1% savings everywhere it's still 1%.

@pdgendt
Copy link
Copy Markdown
Collaborator Author

pdgendt commented Apr 15, 2025

I understand but this does run more C code than before when the library is available, doesn't it?

As is running Python itself? This just takes a shortcut.

@marc-hb
Copy link
Copy Markdown
Collaborator

marc-hb commented Apr 15, 2025

As is running Python itself? This just takes a shortcut.

I'm afraid you lost me. The current commit message says "Speed up the yaml parsing used by west by using LibYAML if available", but now you say Python itself leverages the C library if available even without this PR? How so?

@pdgendt
Copy link
Copy Markdown
Collaborator Author

pdgendt commented Apr 16, 2025

As is running Python itself? This just takes a shortcut.

I'm afraid you lost me. The current commit message says "Speed up the yaml parsing used by west by using LibYAML if available", but now you say Python itself leverages the C library if available even without this PR? How so?

I mean Python itself is running some compiled C code, how is that so much different than calling into a library?

@marc-hb
Copy link
Copy Markdown
Collaborator

marc-hb commented Apr 16, 2025

I mean Python itself is running some compiled C code, how is that so much different than calling into a library?

Does Python run _yaml.cpython-313-x86_64-linux-gnu.so (or similar) when using west right now? I just tried to rename this file and it looks like the answer is "no". Will this PR cause this previously unused .so file to run? I'm afraid the answer is "yes".

I don't see why the fact Python runs other C code matters. This is not about banning all C code overnight, this is about minimizing (and pinning) dependencies and especially new C dependencies (best avoided).

EDIT: moreover this is a... parser = the poster child of where NOT to use C.

I'm really confused by the current security "posture". On one hand we have PRs to pin every single, de-privileged, CI workflow (#802) that runs only inside an ephemeral github.com container. On the other hand, we have this PR which adds a brand new, run-time C dependency for every west user in order to save 20 milliseconds. So the current security priorities and threat models don't make sense to me.

PS: again, I'm sure _yaml.cpython-313-x86_64-linux-gnu.so makes a significant difference and is worth the extra C dependency and corresponding review time for some other, "YAML-intensive" programs. Not for west IMHO.

@pdgendt pdgendt requested a review from Copilot April 29, 2025 10:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR speeds up YAML parsing in west by leveraging LibYAML’s CSafeLoader when available.

  • Switches YAML loading in manifest.py from yaml.safe_load to yaml.load with Loader=SafeLoader.
  • Applies the same change in commands.py for processing command specifications.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/west/manifest.py Uses CSafeLoader (faster LibYAML loader) for manifest loading.
src/west/commands.py Uses CSafeLoader for loading extension command specs.

@pdgendt pdgendt added this to the v1.4.0 milestone Apr 30, 2025
@carlescufi carlescufi merged commit 07aa937 into zephyrproject-rtos:main Apr 30, 2025
@pdgendt pdgendt deleted the yaml-cloader branch April 30, 2025 13:10
@carlescufi
Copy link
Copy Markdown
Member

Does Python run _yaml.cpython-313-x86_64-linux-gnu.so (or similar) when using west right now? I just tried to rename this file and it looks like the answer is "no". Will this PR cause this previously unused .so file to run? I'm afraid the answer is "yes".

Sorry @marc-hb, I did not see you had an open comment here. But I have to disagree with the potential security concern, running precompiled C code is not inherently more dangerous as long as we pin the version carefully as it is done in #802? Feel free to revert this, but I think it's worthwhile.

@marc-hb
Copy link
Copy Markdown
Collaborator

marc-hb commented May 8, 2025

running precompiled C code is not inherently more dangerous as long as we pin the version carefully as it is done in #802?

So, pinning is the latest "silver bullet"? For years we were told to "update, update and update" to get fixes early and stay secure. Which is it?

Answer: neither is a silver bullet. It's complicated and it depends.

Pinning this particular library is more secure if and only if the current version has very thoroughly reviewed, fuzzed and tested. Has it?

Feel free to revert this, but I think it's worthwhile.

I really don't think increasing the attack surface with a parser in C is worth 20ms here. Other places with a bigger performance impact may want to spend time evaluating that dependency and risk but not here in west.

@pdgendt
Copy link
Copy Markdown
Collaborator Author

pdgendt commented May 9, 2025

So, pinning is the latest "silver bullet"? For years we were told to "update, update and update" to get fixes early and stay secure. Which is it?

Answer: neither is a silver bullet. It's complicated and it depends.

Pinning this particular library is more secure if and only if the current version has very thoroughly reviewed, fuzzed and tested. Has it?

Feel free to revert this, but I think it's worthwhile.

I really don't think increasing the attack surface with a parser in C is worth 20ms here. Other places with a bigger performance impact may want to spend time evaluating that dependency and risk but not here in west.

I'm not sure why we turned this back into a pinning thing? It's unrelated?

The reasons why I proposed this:

  • This mechanism has been used for years in Zephyr
  • It is documented by the authors of the library as a recommended approach
  • It's "free"

@marc-hb
Copy link
Copy Markdown
Collaborator

marc-hb commented May 9, 2025

I'm not sure why we turned this back into a pinning thing? It's unrelated?

This was brought by @carlescufi for one reason: security.

The reasons why I proposed this:

None of these seems concerned about security. Which is what the only controversy has been about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants