Skip to content

Fix typedef/variable name conflict parsing error#219

Open
Copilot wants to merge 10 commits into
developfrom
copilot/fix-typedef-variable-name-conflict
Open

Fix typedef/variable name conflict parsing error#219
Copilot wants to merge 10 commits into
developfrom
copilot/fix-typedef-variable-name-conflict

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 12, 2026

  • Create test file test/small1/typedef_varname.c for the typedef/variable name conflict
  • Add test to test/testcil.pl
  • Fix init_declarator rule in src/frontc/cparser.mly to call add_identifier via declarator_init_start helper rule between declarator and EQ
  • Fix comment style in cparser.mly to use /* (* TEXT *) */ as per file conventions
  • Fix add_identifier/add_type timing: declarator_no_init immediately registers each declarator's name — add_type for typedef declarators, add_identifier for non-typedef declarators (C11 6.2.1.7: scope begins just after completion of declarator); declarator_init_start registers initialized declarators before their =; doDeclaration now only creates AST nodes with no lexer management
  • Fix CI regression in test/scope2: declarator_no_init calls add_type (not add_identifier) for typedef declarators, so names are never added to the context tracking list and pop_context cannot remove the wrong binding
  • Add tests to typedef_varname.c: NO_INIT middle declarator (mytype a = 1, mytype, b = sizeof mytype) and chain initialization (T T = 1, U = T + 1) — both prove correct C11 6.2.1.7 scoping; sizeof mytype without parens is only valid when mytype is a variable
  • Move is_typedef_decl flag into src/frontc/lexerhack.ml alongside the other lexer hack state
Original prompt

This section details on the original issue you should resolve

<issue_title>typedef and variable name conflict parsing error</issue_title>
<issue_description>### Test case from goblint/bench#38 (comment)

#include <stddef.h>
#include <stdint.h>

typedef struct raxNode {
    uint32_t iskey:1;
    uint32_t isnull:1;
    uint32_t iscompr:1;
    uint32_t size:29;
    unsigned char data[];
} raxNode;

typedef struct rax {
    raxNode *head;
    uint64_t numele;
    uint64_t numnodes;
} rax;

void *malloc(size_t size);

int main() {
  rax *rax = malloc(sizeof(*rax)); // PARSING ERROR HERE
  return 0;
}

This is completely fine by GCC/Clang and rax inside the sizeof should refer to the local variable rax that has been declared and is being initialized.

Cause

The parser errors because that inner rax is a NAMED_TYPE token from the lexer instead of IDENT. This is due to the lexer hack being buggy: ours only adds a local variable to the lexer hack table after the entire declaration, but actually the initializer in the declaration may already refer to the local variable. Thus at this point, the lexer hack table considers rax to refer to the typedef instead.

Specification

It's extremely difficult to find specification for this, but I believe it might be 6.2.1.7 in N1570:

Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. Each enumeration constant has scope that begins just after the appearance of its defining enumerator in an enumerator list. Any other identifier has scope that begins just after the completion of its declarator.

The declaractor is the *rax part of the declaration in the above example, so rax needs to be added to the lexer hack table as identifier between the declarator and initializer, not after the whole thing.</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

… register identifier before parsing initializer

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix typedef and variable name conflict parsing error Fix typedef/variable name conflict parsing error Mar 12, 2026
Comment thread src/frontc/cparser.mly Outdated
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
@michael-schwarz michael-schwarz marked this pull request as ready for review March 13, 2026 09:00
@michael-schwarz michael-schwarz requested a review from sim642 March 13, 2026 09:00
Comment thread src/frontc/cparser.mly
…initialized declarators

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… its declarator via helper rules

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
…t_start; update test

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Comment thread test/small1/typedef_varname.c
… typedef guard

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
@michael-schwarz
Copy link
Copy Markdown
Member

@sim642: This looks reasonable now!

Comment thread src/frontc/cparser.mly Outdated
Comment on lines +1053 to +1054
of its declarator). For typedef declarations (is_typedef_decl = true), the name
is registered later via add_type in doDeclaration. *) */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at this in detail but introducing even more global state into the parser is a bit worrying.

Is typedef scoping actually defined to work differently like this by the standard? Or maybe it would also be the right thing to do the add_type stuff earlier than at the very end?
Perhaps one typedef could also define multiple types with commas (I haven't looked at the standard grammar) and then it would also matter there (and even be more correct)?

I might be optimistically hoping for more consistency and less special-casing than C actually has...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Look up the details of this in the standard and either implement Simmo's suggestion or argue why it is not a good idea.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in aa40855.

C11 6.2.1.7: "Any other identifier has scope that begins just after the completion of its declarator." This applies uniformly to all declarations — typedef names included. So moving add_type earlier (per-declarator) is both correct and more standard-conformant.

What changed: declarator_no_init now handles both cases with a single branch:

  • add_type n when is_typedef_decl = true (typedef declarators)
  • add_identifier n when is_typedef_decl = false (variable declarators)

doDeclaration now has zero lexer management — it only builds AST nodes (TYPEDEF / ONLYTYPEDEF / DECDEF).

On the is_typedef_decl global flag: it's still necessary. Yacc reductions are bottom-up — by the time declarator_no_init reduces, the TYPEDEF keyword has already been consumed (in decl_spec_list_common). There's no way to pass this context through grammar parameters in standard yacc without a mid-rule side effect, which is what is_typedef_decl provides. The flag is set by the TYPEDEF rule and cleared by doDeclaration, giving it a well-defined lifetime that spans exactly the declarator list of a typedef declaration.

On typedef commas: per the C grammar, typedef int T, *TP; uses the same init_declarator_listinit_declarator path as variables. With this change, T and TP each get add_type called immediately after their declarator — so a hypothetical typedef int T, T_alias; would see T as already registered when T_alias is parsed. This is C11-correct behaviour.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, nonsense like this is also allowed:

typedef int foo, bar[sizeof(foo)];

int main() {
    _Static_assert(sizeof(foo) == 4);
    _Static_assert(sizeof(bar) == 4*4);
    return 0;
}

…rator, none in doDeclaration

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Agent-Logs-Url: https://github.com/goblint/cil/sessions/459679d3-44f6-4f0b-8a30-92252ab7fa0d
Copilot AI requested a review from michael-schwarz March 26, 2026 03:55
@sim642 sim642 self-requested a review March 31, 2026 13:15
@sim642 sim642 added the bug label Mar 31, 2026
Comment thread test/small1/typedef_varname_noinit.c Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this test testing anything different from the other one. It's just the same situation three times.
Also the name noinit makes no sense: all of the variables are being initialized. It's the other test that also includes a variable without the initializer.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 66f8d59. The unique test (T T = 1, U = T + 1) has been merged into typedef_varname.c and typedef_varname_noinit.c has been deleted.

Comment thread src/frontc/cparser.mly
Comment on lines +1057 to +1069
declarator_no_init:
declarator { let (n, _, _, _) = $1 in
if !is_typedef_decl then !Lexerhack.add_type n
else !Lexerhack.add_identifier n;
$1 }
;

/* (* Parses "declarator =" and adds the declared name as a variable identifier
in the lexer hack, so that in the initializer the name shadows any typedef
with the same name (C11 6.2.1.7: scope begins just after the completion of the
declarator). *) */
declarator_init_start:
declarator EQ { let (n, _, _, _) = $1 in !Lexerhack.add_identifier n; $1 }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do the lexer hack in declarator, instead of duplicating it at both new rules?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the registration into declarator itself would fire in every context where declarator is used — including field_decl (struct members), parameter_decl (function parameters), function_def_start (function definitions), and old_pardef (K&R-style parameters). All of those must not trigger lexer registration. The two helper rules are the minimal way to isolate the registration to the init_declarator context, while also handling the semantic difference: declarator_no_init must conditionally call add_type or add_identifier (based on is_typedef_decl), whereas declarator_init_start always calls add_identifier (typedef declarations cannot have initializers).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting question here is: where exactly in the C standard is it stated that typedefs cannot have initializers?
Of course, I don't know what it would mean, but it'd also be strange if it isn't forbidden.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the grammar doesn't explicitly ban it, but one can get some kind of semantic ban out of

6.7.9 (8) An initializer specifies the initial value stored in an object.

C11 draft

as typedefs do not define objects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, there's no need for two new grammar rules. The declarator_no_init would suffice and could be reused also in the EQ case. It just needs a suitable name then.

Having declarator_init_start hardcode add_identifier is odd because the grammar doesn't forbid typedef initializers. As is, the parser accepts such a thing, but adds it to the lexer hack with the wrong kind which will screw up all the following parsing.
It seems more reasonable to me to just use the same logic for both and not try to be clever about it if the grammar was never intended to be so precise.

Instead, I would expect Cabs2cil to reject a typedef with initializer. I haven't checked if it does but surely it cannot use it for anything meaningful.
Either way, then the following lexer hack state wouldn't be affected by the strange but valid grammatical construct.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what exactly you suggest here. If you're happy with the rest of it feel free to push directly to this branch. I'll have a look at it and then we can merge.

Comment thread src/frontc/cparser.mly Outdated
Comment on lines +81 to +83
(* Set to true while parsing a typedef declaration's declarator list, so that
declarator_no_init calls add_type (not add_identifier) for typedef names. *)
let is_typedef_decl = ref false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a necessary part of the lexer hack, then it seems like the Lexerhack module would be the right place for this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 66f8d59is_typedef_decl is now in Lexerhack alongside the other lexer hack state (add_identifier, add_type, push_context, pop_context).

@michael-schwarz
Copy link
Copy Markdown
Member

@copilot address simmo's comments

…te tests, remove noinit file

Agent-Logs-Url: https://github.com/goblint/cil/sessions/e317c474-30b0-43d6-aec0-4a7a9377d477

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typedef and variable name conflict parsing error

3 participants