Fix typedef/variable name conflict parsing error#219
Conversation
… register identifier before parsing initializer Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
…initialized declarators Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… its declarator via helper rules Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
…t_start; update test Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… typedef guard Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
|
@sim642: This looks reasonable now! |
| of its declarator). For typedef declarations (is_typedef_decl = true), the name | ||
| is registered later via add_type in doDeclaration. *) */ |
There was a problem hiding this comment.
I haven't looked at this in detail but introducing even more global state into the parser is a bit worrying.
Is typedef scoping actually defined to work differently like this by the standard? Or maybe it would also be the right thing to do the add_type stuff earlier than at the very end?
Perhaps one typedef could also define multiple types with commas (I haven't looked at the standard grammar) and then it would also matter there (and even be more correct)?
I might be optimistically hoping for more consistency and less special-casing than C actually has...
There was a problem hiding this comment.
@copilot Look up the details of this in the standard and either implement Simmo's suggestion or argue why it is not a good idea.
There was a problem hiding this comment.
Implemented in aa40855.
C11 6.2.1.7: "Any other identifier has scope that begins just after the completion of its declarator." This applies uniformly to all declarations — typedef names included. So moving add_type earlier (per-declarator) is both correct and more standard-conformant.
What changed: declarator_no_init now handles both cases with a single branch:
add_type nwhenis_typedef_decl = true(typedef declarators)add_identifier nwhenis_typedef_decl = false(variable declarators)
doDeclaration now has zero lexer management — it only builds AST nodes (TYPEDEF / ONLYTYPEDEF / DECDEF).
On the is_typedef_decl global flag: it's still necessary. Yacc reductions are bottom-up — by the time declarator_no_init reduces, the TYPEDEF keyword has already been consumed (in decl_spec_list_common). There's no way to pass this context through grammar parameters in standard yacc without a mid-rule side effect, which is what is_typedef_decl provides. The flag is set by the TYPEDEF rule and cleared by doDeclaration, giving it a well-defined lifetime that spans exactly the declarator list of a typedef declaration.
On typedef commas: per the C grammar, typedef int T, *TP; uses the same init_declarator_list → init_declarator path as variables. With this change, T and TP each get add_type called immediately after their declarator — so a hypothetical typedef int T, T_alias; would see T as already registered when T_alias is parsed. This is C11-correct behaviour.
There was a problem hiding this comment.
By the way, nonsense like this is also allowed:
typedef int foo, bar[sizeof(foo)];
int main() {
_Static_assert(sizeof(foo) == 4);
_Static_assert(sizeof(bar) == 4*4);
return 0;
}…rator, none in doDeclaration Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com> Agent-Logs-Url: https://github.com/goblint/cil/sessions/459679d3-44f6-4f0b-8a30-92252ab7fa0d
There was a problem hiding this comment.
I don't see this test testing anything different from the other one. It's just the same situation three times.
Also the name noinit makes no sense: all of the variables are being initialized. It's the other test that also includes a variable without the initializer.
There was a problem hiding this comment.
Fixed in 66f8d59. The unique test (T T = 1, U = T + 1) has been merged into typedef_varname.c and typedef_varname_noinit.c has been deleted.
| declarator_no_init: | ||
| declarator { let (n, _, _, _) = $1 in | ||
| if !is_typedef_decl then !Lexerhack.add_type n | ||
| else !Lexerhack.add_identifier n; | ||
| $1 } | ||
| ; | ||
|
|
||
| /* (* Parses "declarator =" and adds the declared name as a variable identifier | ||
| in the lexer hack, so that in the initializer the name shadows any typedef | ||
| with the same name (C11 6.2.1.7: scope begins just after the completion of the | ||
| declarator). *) */ | ||
| declarator_init_start: | ||
| declarator EQ { let (n, _, _, _) = $1 in !Lexerhack.add_identifier n; $1 } |
There was a problem hiding this comment.
Why not do the lexer hack in declarator, instead of duplicating it at both new rules?
There was a problem hiding this comment.
Moving the registration into declarator itself would fire in every context where declarator is used — including field_decl (struct members), parameter_decl (function parameters), function_def_start (function definitions), and old_pardef (K&R-style parameters). All of those must not trigger lexer registration. The two helper rules are the minimal way to isolate the registration to the init_declarator context, while also handling the semantic difference: declarator_no_init must conditionally call add_type or add_identifier (based on is_typedef_decl), whereas declarator_init_start always calls add_identifier (typedef declarations cannot have initializers).
There was a problem hiding this comment.
An interesting question here is: where exactly in the C standard is it stated that typedefs cannot have initializers?
Of course, I don't know what it would mean, but it'd also be strange if it isn't forbidden.
There was a problem hiding this comment.
Apparently, the grammar doesn't explicitly ban it, but one can get some kind of semantic ban out of
6.7.9 (8) An initializer specifies the initial value stored in an object.
C11 draft
as typedefs do not define objects.
There was a problem hiding this comment.
Still, there's no need for two new grammar rules. The declarator_no_init would suffice and could be reused also in the EQ case. It just needs a suitable name then.
Having declarator_init_start hardcode add_identifier is odd because the grammar doesn't forbid typedef initializers. As is, the parser accepts such a thing, but adds it to the lexer hack with the wrong kind which will screw up all the following parsing.
It seems more reasonable to me to just use the same logic for both and not try to be clever about it if the grammar was never intended to be so precise.
Instead, I would expect Cabs2cil to reject a typedef with initializer. I haven't checked if it does but surely it cannot use it for anything meaningful.
Either way, then the following lexer hack state wouldn't be affected by the strange but valid grammatical construct.
There was a problem hiding this comment.
I'm not sure what exactly you suggest here. If you're happy with the rest of it feel free to push directly to this branch. I'll have a look at it and then we can merge.
| (* Set to true while parsing a typedef declaration's declarator list, so that | ||
| declarator_no_init calls add_type (not add_identifier) for typedef names. *) | ||
| let is_typedef_decl = ref false |
There was a problem hiding this comment.
If this is a necessary part of the lexer hack, then it seems like the Lexerhack module would be the right place for this.
There was a problem hiding this comment.
Fixed in 66f8d59 — is_typedef_decl is now in Lexerhack alongside the other lexer hack state (add_identifier, add_type, push_context, pop_context).
|
@copilot address simmo's comments |
…te tests, remove noinit file Agent-Logs-Url: https://github.com/goblint/cil/sessions/e317c474-30b0-43d6-aec0-4a7a9377d477 Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
test/small1/typedef_varname.cfor the typedef/variable name conflicttest/testcil.plinit_declaratorrule insrc/frontc/cparser.mlyto calladd_identifierviadeclarator_init_starthelper rule between declarator and EQcparser.mlyto use/* (* TEXT *) */as per file conventionsadd_identifier/add_typetiming:declarator_no_initimmediately registers each declarator's name —add_typefor typedef declarators,add_identifierfor non-typedef declarators (C11 6.2.1.7: scope begins just after completion of declarator);declarator_init_startregisters initialized declarators before their=;doDeclarationnow only creates AST nodes with no lexer managementtest/scope2:declarator_no_initcallsadd_type(notadd_identifier) for typedef declarators, so names are never added to the context tracking list andpop_contextcannot remove the wrong bindingtypedef_varname.c: NO_INIT middle declarator (mytype a = 1, mytype, b = sizeof mytype) and chain initialization (T T = 1, U = T + 1) — both prove correct C11 6.2.1.7 scoping;sizeof mytypewithout parens is only valid whenmytypeis a variableis_typedef_declflag intosrc/frontc/lexerhack.mlalongside the other lexer hack stateOriginal prompt
typedefand variable name conflict parsing error #114🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.