Make the tokeniser more useful for other people

While working in https://github.com/Mathics3/mathics-scanner/pull/11 it became clear to me that the tokeniser is still very much tied up to internals of Mathics core. For instance, the whole messaging mechanism is completely useless to anyone other us (the developers of Mathics) and it could likely be entirely replaced by simply throwing errors. Also, there are multiple improvements that could be made to make the public interface cleaner and more intuitive.

I propose the following changes:

* Entirely remove the messaging mechanism from `Tokeniser` and `LineFeed` (this will require some refactoring in core)
* Implement `__next__` for `Tokeniser` by simply calling the `next` method
* Add functionality to control whether we want comments to be skipped or not (this is useful for syntax-highlighting-related usecases)
* Remove the `tag` parameter of `Token(tag, text, pos)` and mark the type of the token by using subclasses of `Token` (i.e. `Token("Number", "3", 4)` becomes `NumberToken("3", 4)`)
* Rethink the usage of the `incomplete` method: I'd like to remove it (since it's more of an implementation detail than anything else), but it's used in core so we'd had to deal with that too. Even if we don't remove it, we should rename it to something descriptive

Ideally, I'd like to take care of this before the release, since this are breaking changes and therefore would require a major version bump if we were to merge them after the first release. However, I understand that the refactoring required will take some time and therefore I'm OK with doing this after the first release.

I also take full responsibility over this. I can do most of this work on my own if the rest of the contributors aren't interested. @rocky @mmatera Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the tokeniser more useful for other people #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development