Description
In my IDE work, support for any given language frequently includes hundreds up to several thousand references to context objects in the parse tree. Regardless of whether manual walking methods (iterating over or instanceof
on the result of getChild(n)
etc) or implicit labels are used, this code can be extremely sensitive to minor changes in the grammar and is a frequent source of regression bugs. A "versioning mechanism" for rules could be extremely helpful in tracking down interface problems early which arise when the grammar changes.
Rule annotations (current experimental implementation)
By adding annotations named @RuleVersion
and @RuleDependency
, a balance between the above items could be achieved. The generated code for a rule method would be marked with an @RuleVersion(n)
annotation with the rule version. Code which depends on rules could be marked with one or more @RuleDependency(RULE_expr, 1)
annotations declaring the dependency. Multiple dependencies can be wrapped in a @RuleDependencies
annotation.
Benefits of this method include:
- Compile time dependency checking
can be providedis provided by an annotation processor, but would not force a failure to compile following a version change. Runtime checking could be provided by a utility method which could check all dependency for a class and/or package at once. - No overhead when
-ea
is specified (runtime assertion checking). - Clean, declarative syntax.
- User control over when the version of a rule is incremented. This allows cross-rule changes to be reflected in versioning, such as incrementing the versions of rules
a
,b
, andc
when rulea
changes froma : b;
toa : c;
.
A possible syntax for this which is reasonably clean, minimizes changes to the tool, and would keep the grammar target language agnostic could be:
foo
@version{1}
: ...
;
Inline actions and runtime assertions (alternative 1, not in use)
This method does not introduce an code generation changes, so it can be used with the current versions of ANTLR 4. While better than unversioned rules, it does have a number of drawbacks. If the following code is added to an @members{}
block:
private static int[] ruleVersions;
{
if (ruleVersions == null) {
ruleVersions = new int[_ATN.ruleToStartState.length];
}
}
public static int getRuleVersion(int rule) {
return ruleVersions[rule];
}
public static int getRuleVersion(ParserRuleContext<?> context) {
return ruleVersions[context.ruleIndex];
}
private static void setRuleVersion(ParserRuleContext<?> context, int version) {
ruleVersions[context.ruleIndex] = version;
}
Then the following @init{}
action can be used to mark a rule version that can be incremented when the rule changes:
@init{setRuleVersion($ctx, 1);}
When a block/statement of code depends on a particular form of the rule, a statement like the following will allow quicker detection of potential problems.
// for code in a listener, or where a typed context object is used
ExprContext ctx = ...;
assert MyParser.getRuleVersion(ctx) == 1;
// for general dependencies
assert MyParser.getRuleVersion(MyParser.RULE_expr) == 1;
CRC constants and compile-time assertions (alternative 2, not in use)
If a block of code like the following could be automatically generated based on a CRC calculation of each rule's syntax (ignoring whitespace, actions, and unnecessary parentheses):
public static final boolean HASH_expr_2bc29fa4=true,
HASH_stmt_56ed0cbb=true, ...;
Then an assertion like the following will actually produce a compile-time error until dependent code is updated following a change to a rule:
assert HASH_expr_2bc29fa4;
While the fields would be hard to keep track of, any modern editor will allow updating the assertion after code verification by simply typing HASH_expr_
followed by a complete word action (Ctrl+Space
in many IDEs).