Add -fhoist-locals, -finitialize-locals#721
Conversation
New pass HoistLocals.ml, controlled by -fhoist-locals, hoists all local variable declarations to the beginning of the function body: - Declarations in the "declaration zone" (before any real statements) keep their initializers. - Declarations after real statements get separated: the declaration (with EAny init) goes to the top, and an assignment replaces it at the original location. - Buffer creations (EBufCreate/EBufCreateL) are left in place since later passes require them under their let-bindings. - VLA (non-constant size stack arrays) that would need to cross real statements trigger warning 29 (fatal by default) and are left in place. The pass runs before mark_possibly_unused in the pipeline. Test suite in test/hoist-locals/ with 5 test modules: - HoistInt: machine integer locals - HoistArray: constant-size array locals - HoistScope: local scopes with if/else - HoistVlaNoWarn: VLAs already at the top (no warning) - HoistVlaWarn: VLAs after statements (exercises warning 29) Each test runs with and without -fhoist-locals and verifies identical output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cherry-picked from 40c7257 (without tests). Rewrite -finitialize-locals to take an argument: no (default), c23, c99, or c89 (case-insensitive). - c23: uses = {} (C23 empty initializer) - c99: uses designated initializers - c89: uses positional initializers Warning 30 (InitializerUnknownType) for unknown types, fatal by default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Test suite in test/initialize-locals/ with standalone Makefile. InitLocals.fst tests scalar, shadowed, branching, boolean, and struct locals. Each test is compiled and executed with c23, c99, and c89 modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In the -fhoist-locals phase, comments in the declaration zone are now hoisted alongside local variable declarations so that they maintain their position relative to initialised declarations. Previously, a standalone comment could be swapped past a declaration with an initializer because comments stayed in the residual body while declarations were collected into the hoisted list. Additionally, recurse into EIfThenElse, ESwitch, EWhile, EFor, and EMatch bodies to hoist local variable declarations from inside those blocks up to the function level, replacing each initialiser with an assignment at the original location. Buffer creations (EBufCreate) with constant size are split: the declaration is hoisted with EAny as the initializer, and an EBufFill is emitted at the original location (CStarToC11 renders it as memset or a for-loop). Likewise, EBufCreateL is split into a declaration with all-EAny elements and individual EBufWrite statements at the original location. Buffer creations with non-constant size (VLA) emit warning 29 and are left in place. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When -fhoist-locals is enabled, local variable declarations are hoisted to the function top. If a function contains #if/#else branches (from IfDef-flagged externals), some hoisted variables may only be used in one branch, triggering -Wunused-variable in the other. Add a C11-level pass (MarkMaybeUnused) that detects variables not used in all branches of an IfDef and inserts KRML_MAYBE_UNUSED_VAR() calls after the declarations. This uses the existing KRML_MAYBE_UNUSED_VAR macro which is available in both minimal and full krmllib headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of inserting KRML_MAYBE_UNUSED_VAR calls outside the IfDef
(between declarations and the #if), insert them at the top of each
branch for the variables that branch does not reference.
Before:
uint32_t a; uint32_t b;
KRML_MAYBE_UNUSED_VAR(a);
KRML_MAYBE_UNUSED_VAR(b);
#if USE_FEATURE
a = ...; return a;
#else
b = ...; return b;
#endif
After:
uint32_t a; uint32_t b;
#if USE_FEATURE
KRML_MAYBE_UNUSED_VAR(b);
a = ...; return a;
#else
KRML_MAYBE_UNUSED_VAR(a);
b = ...; return b;
#endif
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks Tahina, I will take a look later this week. Please clarify how much AI was used for this PR to help me adjust my level of scrutiny. Thanks. |
|
Sorry for my late reply, I am just back from vacation. |
protz
left a comment
There was a problem hiding this comment.
Hi Tahina, thanks for the PR. These are high-level comments before I review this more in detail.
- Can you explain why you chose to duplicate the existing use analysis? There is already an infrastructure in place to perform use-analysis (lib/UseAnalysis.ml), and this already tries to handle the case of the if-then-else. There is also a MaybeUnused flag that already appears in places and emits the correct macro. It looks like you chose to circumvent all of this existing infrastructure and write fresh code that operates at the very end of the pipeline on a syntactic tree that is only intended for pretty-printing. Request: please minimize the issue and explain why the current code doesn't work for you. Furthermore, please fix the existing use-analysis.
- For the hoisting of locals, two remarks.
- Can we drop C89 compatibility mode? Frankly, C89 is nearly 30 years old and the C89 mode has at least one bad bug that I know of in krml. This mode is a liability and I would like to eliminate it. Can we ditch it instead of adding more support for it?
- You are duplicating (or perhaps, your LLM is) a lot of code in the existing hoisting code. Why not change the behavior of the existing hoisting phase with a flag that percolates the let-bindings further up? (in Simplify.ml I believe) Request: convince me that adding a whole lot of fresh code rather than work with the existing code makes sense, or tweak the existing hoist phase that is already written to operate the way you want.
- The
initialize_localsphase should operate at the level of CStar, at the very least. Like I wrote in the other PR, C11 is only a syntactic tree and manipulating it is a nightmare because of the spec-and-decl convoluted representation of C (not to mention the types...). Request: please revise this.
Generally, I am under the impression that your PR is either working around existing code rather than fixing it, and sometimes operating at the wrong level of abstraction. Let's revise this to make sure we don't add 700 lines of fresh code unless there's a very good reason to do so.
Perhaps a good way to proceed would be to break this down into individual PRs. It would be easier to review.
Thanks.
|
Edit: there is already |
tl;dr: This PR adds two passes, along with corresponding tests:
-fhoist-locals: hoist local variables to the beginning of functions-finitialize-locals: add initializers to uninitialized local variablesYou can find an example use of
-fhoist-localsin my EverParse fork: tahina-pro/quackyducky@61a825fPlease do not rebase this branch to a commit that is not on
everparse-fstar1, which is the Karamel branch (an ancestor ofmaster) that EverParsemasteris currently using.-fhoist-localsThis option hoists local variables to the beginning of functions:
-finitialize-localsThis option adds initializers to uninitialized local variables.
It takes an argument:
no: do not initialize uninitialized local variables. This is the default.c23: initialize uninitialized local variables with= {}. This is enough to initialize everything without descending into nested structs, etc.c99: deep-initialize structs, using a field designator (.field =) to initialize each struct field. This option descends into nested structs, and initializes each integer with 0 and each pointer with NULL.c89: same asc99, but without the field designator syntaxc99andc89initialize unions by initializing their first field, which is the only possible way to initialize a union in C89.This option emits a warning, by default an error, if it encounters an unknown (abstract) type.
Other than for integer fields, this option does not rely on the old GCC
= {0}behavior, which has been removed in recent (>= 14.1) versions of GCC.