-
Notifications
You must be signed in to change notification settings - Fork 485
BREAKING FEAT: introduce word-level converter #847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
863fc37
to
8d88759
Compare
… `select_word_indices`
…ation requirements
…_ratio` parameters
8d88759
to
12b79e1
Compare
Thanks for the reviews! I'll make the requested changes and let you know once they’re all ready (might be a few days 😃) |
54c574e
to
14cb1a5
Compare
Zalgo is merged now so you can add it as well 🙂 |
@paulinek13 there are still two open comment threads as far as I can see. Let me know if I should elaborate on anything! |
@romanlutz Thanks! Yes, I'm aware of the remaining threads. I haven't resolved them yet because I'm just not quite satisfied with my original approach to initializing the word-level converters 🙂 After giving it some more thought, I believe it might be cleaner to move the selection configuration into separate methods, example: converter = CharSwapConverter(max_iterations=1).select_random(proportion=0.5) This seems to reduce repeating things in both docs and code, and could offer some other maintainability benefits as well (like making it easier to add a new converter based on WordLevelConverter and not having to copy-paste the docstring part with the args every time). So this should address the following: #847 (comment) and #847 (comment) Seems like a nicer pattern overall. I have to say I like it 😄 No mode_kwargs, just methods allowing to change the selection of words: class WordLevelConverter(PromptConverter):
# ...
@final
def select_keywords(self: T, keywords: List[str]) -> T:
"""Configure the converter to only process words matching specific keywords."""
self._selection_mode = "keywords"
self._selection_keywords = keywords
return self
# ... I hope it makes sense. I'll push this change shortly for you to review (if it won't be good we can always revert it 😃) BTW, This PR’s taking a bit more time than planned, so thanks for bearing with me 😅 Edit: |
Description
This PR introduces a new base class called
WordLevelConverter
, which simplifies the creation of word-level converters by providing a reusable foundation that standardizes word selection for transformation and reduces code duplication across similar converters.The key benefit is that one only needs to implement the specific word transformation logic (
convert_word_async
) while the base class handles word selection, iteration, and final result.Word selection strategies/modes
The base class supports various word selection modes through the
select_word_indices
util function:List of refactored prompt converters
The following converters have been refactored to use the new base class:
BinaryConverter
CharSwapGenerator
EmojiConverter
LeetspeakConverter
ROT13Converter
StringJoinConverter
TextToHexConverter
UnicodeReplacementConverter
Note: I'm not sure if all the prompt converters that I've refactored should be word-level based, or if there are other converters that haven't been refactored that would benefit from this base class.
Related: #818 (comment)
Tests and Documentation
Updated docs and tests