Description
Currently, hypothesis-jsonschema
basically just gives up if the schema has two overlapping regular expressions - at best, we'll try to randomly pick one to generate examples from, and use the other to filter. The status quo is mostly fine, but we can do better!
- https://github.com/qntm/greenery implements all the key logic to translate between (truly) regular expressions and finite automata, which means we can do set operations like intersection and difference, but has a custom not-quite-Python-compatible pattern syntax.
- https://github.com/doni69/regex_transformer supports python
re
syntax forgreenery
(Not all "python regular expressions" are truly regular in the sense of being equivalent to finite automata. Also, JSONSchema regex actually follow ECMA262 syntax, which is neither truly regular nor entirely compatible with the Python syntax. Fortunately the recommended subset is both compatible (except for Python allowing a trailing newline with $
) and - with some special handling for lookahead - regular, so we'll continue our approach of handling what we can and gracefully degrading on the rest.)
This is a medium-to-large feature to develop, since regex_transformer
exists but isn't packaged or particularly mature, and of unknown (neg-medium to medium) benefit. greenery
might also need some patches to make Unicode handling more efficient. However I'd also like better regex handling in upstream Hypothesis, and it should only get easier over time!