Description
Hello, first of all thank your very much for this project!
Is your feature request related to a problem? Please describe.
Yes, it is.
Some of our clients may have outdated encodings on their client application.
We still want our clients to have access to new encodings even if their client application is not up to date, hence we want to serve the encoder dictionary from a server endpoint.
A clear and concise description of what the problem is.
The problem is that, currently, the Encoder property in TiktokenTokenizer is internal.
machinelearning/src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs
Lines 998 to 1001 in 5090327
Describe the solution you'd like
I would like to expose this Encoder property.
There seems to be the intent to expose this property at some point in the future.
machinelearning/test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs
Lines 732 to 740 in 5090327
Maybe this is the time to do it, what do you think?
Describe alternatives you've considered
Maybe a separate method that does exactly what that test from above does using reflection.
Sounds like overkill and a lot of overhead though.
Exposing the property is probably the best way to deal with this.
Additional context
I'm sending a PR your way with the changes, feel free to ask for/make any modifications you think are necessary.