Do we support for generation constrained to multi-token phrases in custom vocabulary? #1502
Unanswered
kuailehaha
asked this question in
Q&A
Replies: 1 comment 2 replies
-
If I understand your question correctly, you may want something like generator = outlines.generate.choice(["New York", "once upon a time"])
# Must return either "New York" or "once upon a time"
generator("Choose new york or once upon a time") |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description:
Hi! I’d like to use Outlines to generate text constrained to sequences from a predefined vocabulary that includes multi-token phrases (e.g., ["New York", "once upon a time"]).
The current choice-like functionality (via logits restriction) works for single tokens, but for phrases spanning multiple tokens, this approach isn't sufficient.
For example:
If my vocabulary contains the phrase "New York", I need the generation to commit to the full sequence once "New" is chosen, rather than treating "York" as an independent choice.
Arbitrary combinations (e.g., "New Paris") should be disallowed unless explicitly included in the vocabulary.
Is there a way to achieve this with Outlines, perhaps by extending the regex/FSM-guided generation to handle predefined phrase choices?
Desired Behavior:
A method like model.generate(vocab=my_phrases) where my_phrases is a list of strings (potentially multi-token), ensuring the output is a valid concatenation of these phrases.
Any guidance or workarounds would be appreciated!
Beta Was this translation helpful? Give feedback.
All reactions