Modifying the token stream #4914
-
|
I work in C#. I've created a solid proof of concept but need some confidence this carries no hidden risks. As part of language preprocessing, I want to operate on the token stream (list) and remove/insert tokens (based on preprocessing metadata obtained after initial parsing) this is the phase 1 parse (preprocessing directives are just part of the grammar). This works fine using AddRange/RemoveRange etc, once I've updated the token list after all preprocessing, I wrap the updated list in a token stream class then reparse that stream to get my actual CST (macros expanded, preprocessor directives removed etc) this is the phase 2 parse. This performs nested include file expansion for example, I read the header file, then tokenize it and remove the original include directive tokens and insert the tokens generated when the header file was tokenized. This works well, the code is easy to understand and as a strategy it seems solid. I understand this could be done textually but operating directly with the tokens seems much cleaner for preprocessing. But the tokenindex gets discontinuous, as if one should walk the list and reset it for every token once preprocessing is complete before the second parse, but is that important? does the parser care about tokenindex property? Is this approach reasonable? I opted months ago to not use listeners/visitors incidentally, I manually walk the CST to create my AST and this has been very successful. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
You can certainly reindex each token in the token stream after an edit. However, I do know tokens are referenced in at least the start and stop pointers of a tree node. If you remove a token in the token stream that a parse tree node referenced, you will need to update the start/stop pointers, too. But it depends on what you are planning to do with the tree editing. The data structure is not designed to support fast, extensive, independent tree-to-tree edits. You'll need a tree representation if you cannot reconstruct the tree from a serialized representation, or if you can't do so quickly. |
Beta Was this translation helpful? Give feedback.
-
|
By the way, would it be sensible then to convert the |
Beta Was this translation helpful? Give feedback.
You can certainly reindex each token in the token stream after an edit. However, I do know tokens are referenced in at least the start and stop pointers of a tree node. If you remove a token in the token stream that a parse tree node referenced, you will need to update the start/stop pointers, too. But it depends on what you are planning to do with the tree editing. The data structure is not designed to support fast, extensive, independent tree-to-tree edits. You'll need a tree representation if you cannot reconstruct the tree from a serialized representation, or if you can't do so quickly.