-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Scanning Tokenizer with Improved String support #1174
base: main
Are you sure you want to change the base?
Conversation
This approach seems to have increased memory usage for the +-----------------+-------------------------+--------------------------+
| Phase | Parse | Render |
+-----------------+-------------------------+--------------------------+
- | Total allocated | 4.53 MB (53197 objects) | 979.68 kB (8827 objects) |
+ | Total allocated | 7.23 MB (88561 objects) | 979.68 kB (8827 objects) |
| Total retained | 0 B (0 objects) | 49.70 kB (276 objects) |
+-----------------+-------------------------+--------------------------+ |
Oh yeah, performance for this is likely, not great at the moment. I was just focusing on getting it to pass. This has the ability to be highly efficient though so just needs to be optimised. |
Great to know! |
This regex wouldn't exist either, needs to be replaced by a scanner. Just a quick hack to get this moving
|
And the conditional statements are double checking things which should be optimised |
Plus the point of all this is to achieve that the following code is valid (basically curly brackets inside strings that are inside tags)
and
|
I'm also starting to wonder if we should solve the few following issues as these can be fixed with a few more tokens to scan for Allow escaping in " so that it is possible to use both a single quote and double quote in a string, which is not currently possible
The new liquid tag splits on newlines, this means strings will no longer be able to express new lines in the new liquid tag
Carriage returns and tabs as these are also common, not as much with the web but windows and Tab Seperated Values
Finally because we support the above we have to handle escaping escapes
|
An alternative proposal for the changes above is to bring in a third quoting method using backticks Allow escaping in ` so that it is possible to use both a single quote and double quote, and back ticks in a string, which is not currently possible
The new liquid tag splits on newlines, this means strings will no longer be able to express new lines in the new liquid tag
Carriage returns and tabs as these are also common, not as much with the web but windows and Tab Separated Values
Finally because we support the above we have to handle escaping escapes
|
The commit currently isn't the final version but is showing the working version. The proposal is to shift the Tokenizer to be a scanner that identifies the following tokens.
Because of this the scanner only ever needs to look at 2 positions and can operate in a single pass.
It then has the following states to control flow
The pseudo-code is as follows
This resolves and has the tests from the following PRs and Issues
Closes #701
Closes #779
Closes #624
Closes #623
Closes #344
Closes #213
Will need matching PR for
liquid-c
and improvements to this ruby version but the concept is easily implemented in both.@Shopify/guardians-of-the-liquid @Shopify/liquid