Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: A high-level mechanism for embedding raw data from binary files #86204

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hIAhmad
Copy link
Contributor

@hIAhmad hIAhmad commented Feb 24, 2025

Adding CMake functions generate_binary_embed_inc_* to introduce a portable
high-level build mechanism for embedding raw binary data.

These functions enable including raw data from one or more binary files,
along with desired symbol names for corresponding data being embedded.
Controlling aspects such as address alignment, linker section, inclusion
as read-only (default) or as writable, data compression and including only
a subset of data from the specified binary files, is supported through
named optional parameters.

These functions provide a higher-level embedding mechanism compared to
generate_inc_* functions, enabling advanced use cases like:

  • Embedding of data from more than one files, where the list and number
    of files being included is determined at build time, e.g. driven by
    some device-tree fields or config options. generate_inc_* functions
    cannot handle such use-cases without a fair amount of additional custom
    support code added to corresponding CMake build files.

  • generate_inc_* functions embed binary data by first converting it
    into C-source compilable form; which is fine for small binary files
    but for larger files this can increase build time and resources usage
    significantly. Whereas these new functions can transparently make use
    of GNU-style "incbin" directive, when available (and usable), for much
    faster inclusion of binary files, automatically falling back to using
    generate_inc_* functions internally, for other cases.

Testing:

  • Tested on following platforms:
    native_sim
    qemu_x86
    qemu_cortex_a9
    qemu_cortex_a53
    qemu_cortex_m3
    qemu_cortex_r5
    qemu_riscv32
    qemu_riscv64
    qemu_xtensa/dc233c
    qemu_arc/qemu_arc_hs
    
  • Tested both with and without Zephyr CMake linker generator feature (on qemu_cortex_m3).

Related PRs:

Note:
Changes to include/zephyr/toolchain/common.h and scripts/build/file2hex.py are not part of this PR - they are coming from the separate PRs that this PR depends on. I've added them here to get early feedback on rest of the feature changeset. These changes will be removed from this PR (through rebase) once the respective PRs get approved and merged to main.

Zephyr toolchain abstraction provides a macro __aligned(x) to add specified
alignment, in bytes, to a chosen symbol for C/C++ source files but there is
no portable counterpart available for symbols defined in assembly source
files. This change-set adds a new macro BALIGN(x) for this purpose.

Signed-off-by: Irfan Ahmad <[email protected]>
(cherry picked from commit e8411f4)
Added recently introduced optimizations - generation in string literal form
and faster generation for hexadecimal initializer list form - to gzip path
as well.

Signed-off-by: Irfan Ahmad <[email protected]>
(cherry picked from commit 8520edd)
*/

#if defined(_ASMLANGUAGE) && !defined(_LINKER)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spacing to look as it looks above (remove empty newlines)

Comment on lines +86 to +87
/* .align assembler directive is supposed by all ARC toolchains and it is
* implemented in the same way across ARC toolchains.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a toolchain maintainer but this looks like the wrong thing and looks to be doing something for a specific build tool by looking at the arch, when instead it should be looking at the build tool @LoveKarlsson can you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are from #85758 that needs to get merged first and are being discussed and reviewed there (see: #85758 (comment)). I have added these here to get early feedback on the rest of the changeset. It will no longer be part of this PR, after that PR gets merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for future proofing toolchains not based on GCC it should be a macro. IAR for example does not use .align

" */\n")

if(USE_INCBIN)
file(WRITE ${GENERATED_FILE}.S "/*\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 space indent for cmake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

# ALIGN - Alignment for data being embedded in bytes. Default is 4.
# SECTION - Name of the desired linker section. Data from all specified
# binary files will be placed in the given section name.
# ZIP - Compress the data with specified compression method: gzip or no (default).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gzip != zip, and don't think that has a place here, if you need a zip then the zip file should be present and provided to the function (or compressed manually with a user build step then included)

Copy link
Contributor Author

@hIAhmad hIAhmad Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ZIP as the name of the parameter is a bit arguable. While the word zip can sometimes be used as a generic word for compression it is also a name for a specific compression method. COMPRESS can be an alternative name here, that does not suffer from this ambiguity.

Now, regarding your other point - whether this parameter should even be provided in first place. It is true that a file can be compressed in a separate build step, prior to passing it on to generate_binary_embed_inc_file but its lower-level counterpart generate_inc_file already provides this capability so I think it is better to preserve the existing capability of that function in this higher-level wrapper (and it is certainly more convenient than having to do it in a separate build step).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed the ZIP parameter to COMPRESS.

# specified files should be started.
# LENGTH - Length, in bytes, of the data subset to be included from
# the specified files.
# READONLY - yes (default) or no.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a flag and should be CONST not READONLY

Copy link
Contributor Author

@hIAhmad hIAhmad Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a flag and should be CONST not READONLY

I am a bit divided on the name. I had thought about naming it CONST but then felt that the parameter name should better reflect the eventual logical effect that normal/intended use of the parameter should achieve, rather than an implementation detail so chose READONLY instead. Let's wait for some more feedback on the naming and then consider renaming it accordingly.

I'll make it a flag right away, though. (see below: #86204 (comment)).

Copy link
Contributor Author

@hIAhmad hIAhmad Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nordicjm There is a downside to making this parameter a flag. By default, you'd typically include/embed binary data as read-only, and not as writable. So if we make this a flag, it will always have to be specified for the common and typical case. By making it take a value, you need to specify it only if you want to make the included data writable (which is relatively a rare case, currently, #82603 would be likely the only such practical use in the source tree).

I think it is better to have it take a value.

@hIAhmad hIAhmad force-pushed the gen_binary_embed_inc_file branch from 7e400f1 to ac3355f Compare February 26, 2025 09:55
Adding CMake functions generate_binary_embed_inc_* to introduce a portable
high-level build mechanism for embedding raw binary data.

These functions enable including raw data from one or more binary files,
along with desired symbol names for corresponding data being embedded.
Controlling aspects such as address alignment, linker section, inclusion
as read-only (default) or as writable, data compression and including only
a subset of data from the specified binary files, is supported through
named optional parameters.

These functions provide a higher-level embedding mechanism compared to
generate_inc_* functions, enabling advanced use cases like:

  - Embedding of data from more than one files, where the list and number
    of files being included is determined at build time, e.g. driven by
    some device-tree fields or config options. generate_inc_* functions
    cannot handle such use-cases without a fair amount of additional custom
    support code added to corresponding CMake build files.

  - generate_inc_* functions embed binary data by first converting it
    into C-source compilable form; which is fine for small binary files
    but for larger files this can increase build time and resources usage
    significantly. Whereas these new functions can transparently make use
    of GNU-style "incbin" directive, when available (and usable), for much
    faster inclusion of binary files, automatically falling back to using
    generate_inc_* functions internally, for other cases.

Signed-off-by: Irfan Ahmad <[email protected]>
Add a test that exercises generate_binary_embed_inc_* CMake functions to
embed raw binary data from two test binary files, into an application, in
a comprehensive set of permutations of supported parameters.

Signed-off-by: Irfan Ahmad <[email protected]>
@hIAhmad hIAhmad force-pushed the gen_binary_embed_inc_file branch from ac3355f to afd4afb Compare February 26, 2025 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants