-
Notifications
You must be signed in to change notification settings - Fork 522
BREAKING FEAT: introduce word-level converter #847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING FEAT: introduce word-level converter #847
Conversation
863fc37
to
8d88759
Compare
… `select_word_indices`
…ation requirements
…_ratio` parameters
8d88759
to
12b79e1
Compare
@romanlutz Thanks! Yes, I'm aware of the remaining threads. I haven't resolved them yet because I'm just not quite satisfied with my original approach to initializing the word-level converters 🙂 After giving it some more thought, I believe it might be cleaner to move the selection configuration into separate methods, example: converter = CharSwapConverter(max_iterations=1).select_random(proportion=0.5) This seems to reduce repeating things in both docs and code, and could offer some other maintainability benefits as well (like making it easier to add a new converter based on WordLevelConverter and not having to copy-paste the docstring part with the args every time). So this should address the following: #847 (comment) and #847 (comment) Seems like a nicer pattern overall. I have to say I like it 😄 No mode_kwargs, just methods allowing to change the selection of words: class WordLevelConverter(PromptConverter):
# ...
@final
def select_keywords(self: T, keywords: List[str]) -> T:
"""Configure the converter to only process words matching specific keywords."""
self._selection_mode = "keywords"
self._selection_keywords = keywords
return self
# ... I hope it makes sense. I'll push this change shortly for you to review (if it won't be good we can always revert it 😃) BTW, This PR’s taking a bit more time than planned, so thanks for bearing with me 😅 Edit: |
The PR is touching a lot of pieces, so it taking a little bit is by no means your fault 🙂 Sorry for taking a while to get back to you. It's been a busy few weeks with our Build conference this week. Regarding the update, I'd rather have all the parameters set in the constructor. It's just what we do everywhere else and keeps it consistent. The duplicated documentation (docstring) is expected and mainly an issue caused by the documentation generation process. We can't really solve that here. Maybe the way to get rid of mode_kwargs is to put all the args (keywords, regex, etc.) explicitly (instead of as kwargs that are not explicitly mentioned) and thereby avoid the kwargs one?
would become
Wdyt? |
…ove unused select_word_indices function
Thanks for the feedback and no worries about the delay -- I hope the conference went well 😃
It makes sense for me now and I agree with it (consistency with the rest of the codebase). |
No problem! Thank you for thinking through ways to make it intuitive. If it wasn't for consistency I might agree that your suggestion is preferable 🙂 The conference went great, thank you 🙂 |
@romanlutz |
@paulinek13 fantastic! I'll review today and will let you know if anything is still missing. Last time, it looked like it's essentially ready, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I had one nitpicky change that I just made myself to shorten an import, otherwise this is exactly what I was looking for.
Co-authored-by: jsong468 <[email protected]> Co-authored-by: Roman Lutz <[email protected]>
Description
This PR introduces a new base class called
WordLevelConverter
, which simplifies the creation of word-level converters by providing a reusable foundation that standardizes word selection for transformation and reduces code duplication across similar converters.The key benefit is that one only needs to implement the specific word transformation logic (
convert_word_async
) while the base class handles word selection, iteration, and final result.Word selection strategies/modes
The base class supports various word selection modes through the
select_word_indices
util function:List of refactored prompt converters
The following converters have been refactored to use the new base class:
BinaryConverter
CharSwapGenerator
EmojiConverter
LeetspeakConverter
ROT13Converter
StringJoinConverter
TextToHexConverter
UnicodeReplacementConverter
Note: I'm not sure if all the prompt converters that I've refactored should be word-level based, or if there are other converters that haven't been refactored that would benefit from this base class.
Related: #818 (comment)
Tests and Documentation
Updated docs and tests