Investigate better backtracking optimizations.

## Background

The Java-Parser is implemented in a style as closely resembling the official specifications.
This is done to increase confidence in it's correctness and ease the maintainability/upgradability when new versions of the specs
are released.

So the specification for CompilationUnit looks like:
![image](https://user-images.githubusercontent.com/4233843/140628749-30a3c0a0-fbd3-4592-881a-ae961cef2365.png)
- https://docs.oracle.com/javase/specs/jls/se16/html/jls-7.html#CompilationUnit

The Matching implementation in the parser would be very **similar**:
- https://github.com/jhipster/prettier-java/blob/d20c1e3900bbdb5a24577d6ec42de31058de8006/packages/java-parser/src/productions/packages-and-modules.js#L6-L45

## The Problem and current mitigation

The parser toolkit (Chevrotain) cannot be used to implement the grammar **directly**  due to different parsing algorithims (LL vs LR if I recall correctly). So instead the parser is using  **backtracking** to figure out which alternative to pick in many "places" in the grammar.

Simple naïve Backtracking is very inefficient, so the parser has slightly more optimized backtracking in many rules
where we try to **limit** how far forward in the grammar we look instead of the most naïve backtracking approach in
which a backtracked rule is fully parsed **twice**

## Recent Discoveries

It seems like a great percentage of the parsing time is spent going "back and forward" due to backtracking.
- https://github.com/jhipster/prettier-java/issues/423
- Seems like ~23 backtracking happening (exceptions) per line of code.

## Open Questions / More info needed

How bad is the performance impact due to backtracking? in other words how many times do
we reparse the **same** input?
- Given an input of N tokens, how many times does CONSUME gets called (successfully/unsuccessfully) ? (1.5N / 3N / 10N) ?
- Do the same parsing rules gets called multiple times on the same idx in the token stream?
  - Could be counted by logging tracing / logging the data for all `SUBRULE` invocations and grouping the results, e.g for each `SUBRULE` called, collect and later group up: 
    - `occurrenceIdx`
    - tokenVector idx
    - sublrule name
    -  rule parameters (if any).
    
## More possible optimizations 

Once we have some answer to the questions above we can consider options for more optimizations.
Basically **if** in the extreme case if we end attempting to parse the same subsets of input in (mostly) identical manners **many times**
It may be more optimal to save and cache the intermediate results and re-use those instead of re-parsing again.
Basically a kind of **memoization**.

It is also possible the performance problem is in a specific sub-set of the grammar (expressions grammar is often a performance hog). so optimizations may not need be targeted at the whole grammar.





	$.RULE("compilationUnit", () => {
	// custom optimized backtracking lookahead logic
	const isModule = $.BACKTRACK_LOOKAHEAD($.isModuleCompilationUnit);

	$.OR([
	{
	GATE: () => isModule === false,
	ALT: () => $.SUBRULE($.ordinaryCompilationUnit)
	},
	{
	ALT: () => $.SUBRULE($.modularCompilationUnit)
	}
	]);
	// https://github.com/jhipster/prettier-java/pull/217
	$.CONSUME(EOF);
	});

	// https://docs.oracle.com/javase/specs/jls/se16/html/jls-7.html#jls-OrdinaryCompilationUnit
	$.RULE("ordinaryCompilationUnit", () => {
	$.OPTION({
	GATE: $.BACKTRACK($.packageDeclaration),
	DEF: () => {
	$.SUBRULE($.packageDeclaration);
	}
	});
	$.MANY(() => {
	$.SUBRULE3($.importDeclaration);
	});
	$.MANY2(() => {
	$.SUBRULE($.typeDeclaration);
	});
	});

	// https://docs.oracle.com/javase/specs/jls/se16/html/jls-7.html#jls-ModularCompilationUnit
	$.RULE("modularCompilationUnit", () => {
	$.MANY(() => {
	$.SUBRULE($.importDeclaration);
	});
	$.SUBRULE($.moduleDeclaration);
	});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigate better backtracking optimizations. #516

Background

The Problem and current mitigation

Recent Discoveries

Open Questions / More info needed

More possible optimizations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Investigate better backtracking optimizations. #516

Description

Background

The Problem and current mitigation

Recent Discoveries

Open Questions / More info needed

More possible optimizations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions