Skip to content

Conversation

@Manishearth
Copy link
Member

Baked data is set up to run rustfmt on generated files, if available.

In Google's build system, we don't do that, because the hermetic build won't have a rustfmt binary available. This might be fixable with some work.

Either way, this isn't usually a big deal: baked data isn't meant to be legible anyway.

However, the mod.rs file is useful: you might need to edit it when adding a more markers. It's nice for it to always be formatted to make this easy.

Also, mod.rs is not that interesting a file from a codegen point of view anyway: the benefit of using token streams is slight to nonexistant.

@Manishearth Manishearth requested a review from sffc January 12, 2026 22:16
@Manishearth Manishearth requested a review from a team as a code owner January 12, 2026 22:16
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the mod.rs file is useful: you might need to edit it when adding a more markers. It's nice for it to always be formatted to make this easy.

Baked data output is not meant to be editable like this. You can easily extend baked data output by include!-ing it, e.g. include!("bake1/mod.rs"); include!("bake2/mod.rs"); to merge two baked outputs.

I'd like to hear more about your use case for editing this file.

@Manishearth
Copy link
Member Author

Manishearth commented Jan 13, 2026

You can easily extend baked data output by include!-ing it, e.g. include!("bake1/mod.rs"); include!("bake2/mod.rs"); to merge two baked outputs.

mod.rs does includes but it also defines some macros, so no, this does not work. You need to edit the provider macro as well.

I'd like to hear more about your use case for editing this file.

Google has a single unified build for rust crates, so you cannot easily "build without compiled_data" to do datagen (the way ICU4X make bakeddata does). This leads to a philisophical cyclic dependency when updating baked data: we need the new versions of the crates to compile, but we need baked data for that, but we need the crates to compile to be able to run datagen, ....

To help with this, we have "stubdata", empty data that can be copied in to serve as a temporary stand-in for real data while we run datagen. We also sometimes just copy real ICU4X data.

It is easy to copy in data when we modify an existing data marker: we just replace the file. It is harder to do this for new data keys, because we need to manually edit mod.rs.

I've framed this from Google's needs, but single-feature-set build systems aren't uncommon.

@Manishearth
Copy link
Member Author

FWIW at least in part I want to do this change because in situations like this I consider templated strings to be far better than token streams.

@sffc sffc removed their request for review January 13, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants