Skip to content

Cache repeated string instances in the lexer (.NET 9) #38

@alexrp

Description

@alexrp

When lexing a typical source file, there's going to be a lot of repeated strings - identifiers, literals, white space, and so on. We can't intern these, but it would make good sense to cache tokens up to a certain length and return the same instance instead of building them up repeatedly.

To implement this, instead of building up the token string in a StringBuilder, we would keep track of where the token starts and ends. When creating the token, if the length is below our caching threshold, we first look it up in the token cache. For larger tokens, we shouldn't bother as the lookup will take too long to be worth it.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: analysisIssues related to language analyses.state: approvedEnhancements and tasks that have been approved.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions