Skip to content

Add -fhoist-locals, -finitialize-locals#721

Open
tahina-pro wants to merge 10 commits into
FStarLang:masterfrom
tahina-pro:_taramana_hoist_locals
Open

Add -fhoist-locals, -finitialize-locals#721
tahina-pro wants to merge 10 commits into
FStarLang:masterfrom
tahina-pro:_taramana_hoist_locals

Conversation

@tahina-pro
Copy link
Copy Markdown
Member

tl;dr: This PR adds two passes, along with corresponding tests:

  • -fhoist-locals: hoist local variables to the beginning of functions
  • -finitialize-locals: add initializers to uninitialized local variables

You can find an example use of -fhoist-locals in my EverParse fork: tahina-pro/quackyducky@61a825f

Please do not rebase this branch to a commit that is not on everparse-fstar1, which is the Karamel branch (an ancestor of master) that EverParse master is currently using.

-fhoist-locals

This option hoists local variables to the beginning of functions:

  • Declarations already present at the beginning of functions are unchanged. Elsewhere:
  • Declarations without initializers are hoisted to the beginning of functions
  • Declarations with initializers are split into two parts: a declaration without initializer hoisted at the beginning of functions, and initializers become assignments:
    • local arrays with non-constant size, or with per-element initializers make hoisting stop: Karamel emits a warning, by default as an error
    • local arrays with a single initializer (EBufCreate not Any) are split into a declaration without initializer and a EBufFill operation

-finitialize-locals

This option adds initializers to uninitialized local variables.

It takes an argument:

  • no: do not initialize uninitialized local variables. This is the default.
  • c23: initialize uninitialized local variables with = {}. This is enough to initialize everything without descending into nested structs, etc.
  • c99: deep-initialize structs, using a field designator (.field =) to initialize each struct field. This option descends into nested structs, and initializes each integer with 0 and each pointer with NULL.
  • c89: same as c99, but without the field designator syntax

c99 and c89 initialize unions by initializing their first field, which is the only possible way to initialize a union in C89.

This option emits a warning, by default an error, if it encounters an unknown (abstract) type.

Other than for integer fields, this option does not rely on the old GCC = {0} behavior, which has been removed in recent (>= 14.1) versions of GCC.

tahina-pro and others added 8 commits April 2, 2026 19:36
New pass HoistLocals.ml, controlled by -fhoist-locals, hoists all local
variable declarations to the beginning of the function body:

- Declarations in the "declaration zone" (before any real statements)
  keep their initializers.
- Declarations after real statements get separated: the declaration
  (with EAny init) goes to the top, and an assignment replaces it at
  the original location.
- Buffer creations (EBufCreate/EBufCreateL) are left in place since
  later passes require them under their let-bindings.
- VLA (non-constant size stack arrays) that would need to cross real
  statements trigger warning 29 (fatal by default) and are left in
  place.

The pass runs before mark_possibly_unused in the pipeline.

Test suite in test/hoist-locals/ with 5 test modules:
- HoistInt: machine integer locals
- HoistArray: constant-size array locals
- HoistScope: local scopes with if/else
- HoistVlaNoWarn: VLAs already at the top (no warning)
- HoistVlaWarn: VLAs after statements (exercises warning 29)

Each test runs with and without -fhoist-locals and verifies identical
output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cherry-picked from 40c7257 (without tests).

Rewrite -finitialize-locals to take an argument: no (default), c23,
c99, or c89 (case-insensitive).

- c23: uses = {} (C23 empty initializer)
- c99: uses designated initializers
- c89: uses positional initializers

Warning 30 (InitializerUnknownType) for unknown types, fatal by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Test suite in test/initialize-locals/ with standalone Makefile.
InitLocals.fst tests scalar, shadowed, branching, boolean, and struct
locals. Each test is compiled and executed with c23, c99, and c89 modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In the -fhoist-locals phase, comments in the declaration zone are now
hoisted alongside local variable declarations so that they maintain
their position relative to initialised declarations.  Previously, a
standalone comment could be swapped past a declaration with an
initializer because comments stayed in the residual body while
declarations were collected into the hoisted list.

Additionally, recurse into EIfThenElse, ESwitch, EWhile, EFor, and
EMatch bodies to hoist local variable declarations from inside those
blocks up to the function level, replacing each initialiser with an
assignment at the original location.

Buffer creations (EBufCreate) with constant size are split: the
declaration is hoisted with EAny as the initializer, and an EBufFill
is emitted at the original location (CStarToC11 renders it as memset
or a for-loop).  Likewise, EBufCreateL is split into a declaration
with all-EAny elements and individual EBufWrite statements at the
original location.  Buffer creations with non-constant size (VLA)
emit warning 29 and are left in place.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When -fhoist-locals is enabled, local variable declarations are hoisted
to the function top. If a function contains #if/#else branches (from
IfDef-flagged externals), some hoisted variables may only be used in one
branch, triggering -Wunused-variable in the other.

Add a C11-level pass (MarkMaybeUnused) that detects variables not used
in all branches of an IfDef and inserts KRML_MAYBE_UNUSED_VAR() calls
after the declarations. This uses the existing KRML_MAYBE_UNUSED_VAR
macro which is available in both minimal and full krmllib headers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of inserting KRML_MAYBE_UNUSED_VAR calls outside the IfDef
(between declarations and the #if), insert them at the top of each
branch for the variables that branch does not reference.

Before:
  uint32_t a; uint32_t b;
  KRML_MAYBE_UNUSED_VAR(a);
  KRML_MAYBE_UNUSED_VAR(b);
  #if USE_FEATURE
    a = ...; return a;
  #else
    b = ...; return b;
  #endif

After:
  uint32_t a; uint32_t b;
  #if USE_FEATURE
    KRML_MAYBE_UNUSED_VAR(b);
    a = ...; return a;
  #else
    KRML_MAYBE_UNUSED_VAR(a);
    b = ...; return b;
  #endif

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tahina-pro tahina-pro requested review from nikswamy and protz April 6, 2026 15:01
@protz
Copy link
Copy Markdown
Collaborator

protz commented Apr 6, 2026

Thanks Tahina, I will take a look later this week.

Please clarify how much AI was used for this PR to help me adjust my level of scrutiny. Thanks.

@tahina-pro
Copy link
Copy Markdown
Member Author

Sorry for my late reply, I am just back from vacation.
This PR is AI co-authored (the commit message is tagged as such).
I have been testing this PR with EverParse and it seems to work well.

Copy link
Copy Markdown
Collaborator

@protz protz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tahina, thanks for the PR. These are high-level comments before I review this more in detail.

  • Can you explain why you chose to duplicate the existing use analysis? There is already an infrastructure in place to perform use-analysis (lib/UseAnalysis.ml), and this already tries to handle the case of the if-then-else. There is also a MaybeUnused flag that already appears in places and emits the correct macro. It looks like you chose to circumvent all of this existing infrastructure and write fresh code that operates at the very end of the pipeline on a syntactic tree that is only intended for pretty-printing. Request: please minimize the issue and explain why the current code doesn't work for you. Furthermore, please fix the existing use-analysis.
  • For the hoisting of locals, two remarks.
    1. Can we drop C89 compatibility mode? Frankly, C89 is nearly 30 years old and the C89 mode has at least one bad bug that I know of in krml. This mode is a liability and I would like to eliminate it. Can we ditch it instead of adding more support for it?
    2. You are duplicating (or perhaps, your LLM is) a lot of code in the existing hoisting code. Why not change the behavior of the existing hoisting phase with a flag that percolates the let-bindings further up? (in Simplify.ml I believe) Request: convince me that adding a whole lot of fresh code rather than work with the existing code makes sense, or tweak the existing hoist phase that is already written to operate the way you want.
  • The initialize_locals phase should operate at the level of CStar, at the very least. Like I wrote in the other PR, C11 is only a syntactic tree and manipulating it is a nightmare because of the spec-and-decl convoluted representation of C (not to mention the types...). Request: please revise this.

Generally, I am under the impression that your PR is either working around existing code rather than fixing it, and sometimes operating at the wrong level of abstraction. Let's revise this to make sure we don't add 700 lines of fresh code unless there's a very good reason to do so.

Perhaps a good way to proceed would be to break this down into individual PRs. It would be easier to review.

Thanks.

@protz
Copy link
Copy Markdown
Collaborator

protz commented Apr 24, 2026

Edit: there is already -fc89-scope and a corresponding implementation. Why did you choose to reimplement this logic from scratch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants