Replies: 1 comment
-
|
There is no useful universal "too many states" number. The important runtime cost is mostly the active state's ordered rules, because Monarch applies only the rules for the current state, tries them in order, and stops at the first match. So 1,000 states versus 10,000 states matters most for loading, compiling, memory, and the number of states your tokenizer can actually enter. It does not mean every token checks every state. For your shape of grammar, I would benchmark the generated tokenizer instead of guessing:
Keep the hot states small, put common/simple rules first, and avoid very large alternations in hot paths if a smaller state transition or keyword table can express the same thing. If you are considering 100,000 states, I would test startup/registration time first; that is more likely to fail before a normal 300-line tokenization pass does. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
How would react monarch with a json of :
. 1'000 states ?
. 10'000 states ?
. 100'000 states ?
Each sates would be :
. Small ( 2 to 10 rules for 99% of them )
. Without lookahead in regex
. Regex would be extremely simple with exact string match or ( string_A | string_AA | string_AAA )
The highlighted file should not exceed 300 lines on average
Beta Was this translation helpful? Give feedback.
All reactions