-
Notifications
You must be signed in to change notification settings - Fork 135
Use LibCST instead of parso #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ooh! Very cool. I looked at this briefly but gave up due to time constraints and API differences. I would love this! |
|
If you have time, I would have a few questions for the further steps, @boxed :
Currently it clones a function for every mutation. I'd tend to instead insert if-else and if-then-else statements. For instance, Maybe this would make it hard to change method signatures defined and definitions in the global scope, not sure about these yet.
The trampoline setup is not perfectly covered by the unit tests (e.g. I removed some code and no test complained). Do you have a suggestion on how to test it? I would default to going through the source code and writing some more tests, which seems reasonable but takes some time 😅
I've tried to run it but it puts all mutants into the |
|
And don't worry if you have no time to review it currently :) |
Global symbol mutation is something I dropped in mutmut 3. Mutmut 2 could do this because it modified the source in place and restarted the execution every time.
|
Previously, mutmut would fail on a function call that contains the kwarg `orig` or `mutants`, e.g. `foo(orig=123)`. The trampoline setup would try to call `_mutmut_trampoline(foo_orig, foo_mutants, *args, **kwargs)` which now has the argument `orig` twice (once as a positional arg and once as a kwarg) and thus raise a TypeError.
Previously, mutated class methods were always killed, because no `self` arg was passed to their calls and thus raised a TypeError for invalid number of arguments. The class methods in the mutants dict are stored without reference to any class instance and therefore not bound to any instance. We need to pass the `self` arg explicitly in those cases. Note that the call to the original method is a bound method (the method is looked up via the `self` parameter), thus we do not pass the `self` arg in this case.
|
It's ready for review now :) If you need some explanations / changes to make it easier to review, feel free to ask. It may be reasonable to look at the diffs of each commit individually. CommitsTwo commits for fixes (identical to #375 , you can close the other one). Not necessary for the libcst implementation, I could remove them and adapt the test cases. Two commits that add tests for more edge cases and E2E testing. One commit that removes unused code from the code base (which made it easier for me to understand the logic; not necessary for this PR). Then a big commit that replaces parso with LibCST. And finally a commit that adds multiprocessing to the mutant generation. E2E snapshot testingThe E2E test runs the function used by This is helpful to see if the whole trampoline setup works as expected, or is accidentally broken (which is how I found the class method bug fixed in the 2nd commit). When code changes should influence the mutants of the E2E test, one can delete Parso -> LibCSTI've kept the trampoline setup as before, including the generated function names. I did changed the mutations implementation and the code creation implementation. Node mutationsThe mutations from Each mutation function takes a node and returns an iterable of mutated nodes. The list File mutationSome code from
The main function here is The class The The code in Known differencesPerformanceThe LibCST implementation is significantly slower at creating mutated source code. Without multiprocessing on a project of mine
EDIT: With multiprocessing, the LibCST version gets down to 2s. The LibCST implementation seems hard to optimize, I don't think it will get much better than this. A lot of the time is spent in traversing the CST and deep-replacing nodes with mutants, which both seem necessary and are implemented within LibCST. I've already improved some other parts compared to a previous version. MutationsAdded a mutation No decorators removal. Previously it removed decorators on inner functions (e.g. a No slice replacement with None. Previously it would replace A more frequent replacement of args with None ( No mutation of complex default params. For instance, Removed dict synonyms, because they were not used anyway. Whitespace differences in the mutated code. Function hashesPreviously it created a |
|
Amazing work! I've invited you to be a maintainer. After this change you will be the foremost expert on the mutmut code base I think :P |
|
LibCST being slower is a bit sad. Could we maybe do something like mocking/proxying to avoid the deep copying? The LibCST API was what made me not pursue it the first time around too, as it felt clunky... and I see that some of that reaction I had was warranted. But the ability to support modern python is well worth it of course. |
Thank you, looking forward to work on this project! How would you prefer contributions from my side? Would you prefer if I ask you for all changes in advance? And to create PRs rather than directly pushing to main? I could also push changes without asking, if it seems straightforward (small bug fixes or features; for instance I noticed that there is a |
I've created a separate issue for this discussion. |
|
I think you can just commit straight in. You've already shown that you have more time/energy right now than I have :) I'm of course here for discussions if you want a sounding board. Btw, maybe you should join the mutation testing discord where there is a mutmut channel? |
Why
LibCST seems well maintained and currently parses all Python 3 versions (3.0 up to 3.13). This would fix #281 and allow new mutations (e.g. for
match case).Additionally, the LibCST abstractions seems more suitable for source code modifications. For instance, I didn't need to handle any whitespace while implementing the mutators. This makes the code simpler.
In the future, this change could also allow to use type analysis for mutations with the LibCST type inference provider.
Status
# pragmacomments