Skip to content

build: keep ICU data out of the V8 static library#1999

Open
bartlomieju wants to merge 2 commits into
mainfrom
fix/external-icu-data-file
Open

build: keep ICU data out of the V8 static library#1999
bartlomieju wants to merge 2 commits into
mainfrom
fix/external-icu-data-file

Conversation

@bartlomieju
Copy link
Copy Markdown
Member

@bartlomieju bartlomieju commented Jun 1, 2026

The 14.9 bump (12d7ef1) added icu_use_data_file = false to .gn, which
embeds the ICU data blob (icudt*_dat, ~10MB) directly into the V8 static
library. That made ICU data an inseparable part of libv8 and added a
roughly 11MB regression to downstream binary sizes. It also makes it
impossible to experiment with alternative data sets (small-icu, no data)
without rebuilding V8.

This restores icu_use_data_file to ICU's default of true, so the data is
no longer compiled in and the embedder supplies it at runtime via
v8::icu::set_common_data (or by loading icudtl.dat). Keeping ICU data as
a separate, swappable artifact is what lets downstreams choose full-icu,
small-icu, or no data at all.

mksnapshot initializes ICU while generating the snapshot, so it needs
icudtl.dat present at build time even though the data is not embedded
into the final library. To satisfy that, v8_depend_on_icu_data_file and
icu_copy_icudata_to_root_build_dir are aligned with icu_use_data_file:
the copy_icudata target stages the file in root_out_dir and mksnapshot
depends on it. The staged file is a build-time artifact only and is not
linked into libv8. Without this the build fails with "mksnapshot: Failed
to initialize ICU".

Verified with a from-source build (V8_FROM_SOURCE=1) on macOS arm64: the
library builds, icudtl.dat is not embedded, and the icu_date, icu_format,
icu_collator and get_default_locale tests pass once set_common_data is
called in the test harness.

Downstream: denoland/deno#34681 re-enables deno_core's include_icu_data
to consume this (verified end-to-end against a from-source build of this
branch).

Set icu_use_data_file = true (ICU's default) so the icudt*_dat blob is
not embedded in libv8. The embedder must supply ICU data at runtime via
v8::icu::set_common_data. This reverts the embedding introduced in the
14.9 bump and keeps ICU data a separate, swappable artifact.
mksnapshot initializes ICU while generating the V8 snapshot, so with
icu_use_data_file = true it needs icudtl.dat present at build time even
though the data is not embedded into the final library. Align
v8_depend_on_icu_data_file and icu_copy_icudata_to_root_build_dir with
icu_use_data_file so the copy_icudata target stages the file in
root_out_dir and mksnapshot depends on it. The staged file is a
build-time artifact only and is not linked into libv8; consumers still
supply ICU data at runtime via set_common_data.

Without this, the build fails with "mksnapshot: Failed to initialize
ICU".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant