-
Notifications
You must be signed in to change notification settings - Fork 616
[experimental] Run crosshair in CI #4034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
175b347
to
424943f
Compare
@Zac-HD your triage above is SO great. I am investigating. |
Knocked out a few of these in 0.0.60.
More soon. |
Ah - the
|
This comment was marked as outdated.
This comment was marked as outdated.
Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with |
ah-ha, seems like we might want some #4029 - style 'don't cache on backends with avoid_realize=True' logic. |
1d2345d
to
7bf8983
Compare
Still here and excited about this! I am on a detour of doing a real symbolic implementation of the |
cc07927
to
018ccab
Compare
Triaging a pile of the So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞 |
This comment was marked as outdated.
This comment was marked as outdated.
Looks like that worked, and we now have a very slow test in the a-d range (5+ hours before I gave up and pushed, which canceled it). We should rerun with verbosity to narrow it down; I'll cancel it for now to avoid wasting minutes. |
I think the biggest cause of the slow ones is that the hypothesis time patching confounds CrossHair's timeout mechanisms. In my runs, I did this to work around that, but not sure what's actually appropriate. It might be better to just skip the slow ones. |
We've skipped them so far, but maybe we should instead disable our monkeypatching-in-selftests for the Crosshair tests? |
@Zac-HD what's the |
Latest run looks great. I found the test that was hanging in the Here's a failure that is probably our fault for not realizing somewhere (CI run): File "/home/runner/work/hypothesis/hypothesis/hypothesis-python/.tox/crosshair-custom/lib/python3.10/site-packages/hypothesis/database.py", line 1068, in choices_to_bytes
assert isinstance(elem, str)
AssertionError: assert False
+ where False = isinstance(<[CrossHairInternal('Numeric operation on symbolic while not tracing') raised in repr()] SymbolicInt object at 0x7f7f20f61420>, str) |
Oh, that was a very very temporary hack, I think we've fixed the underlying now and should delete it. |
Yup - I think we're accessing |
oh, beautiful. That case is for fatal engine errors and takes a different save path than normal. We should realize there. There's a second comment, which is that this fatal path was being taken for |
I think the remaining failures are split roughly evenly between Hypothesis' fault, and tests that need to be skipped. They all deserve more thorough investigation to determine which one, and what the cause is. I haven't looked too deeply at them yet. We're getting really close to a clean run! |
Posting some early diagnostics here: I think we're interacting with the symbolics when observability mode is enabled (ie we have a testcase callback) in such a way that causes the symbolic to pick up a path constraint (or maybe be fully realized?), because the following is very fast: from hypothesis import *
from hypothesis import strategies as st
@given(st.integers(), st.floats(), st.data())
@settings(backend="crosshair")
def f(v1, v2, data):
print("call")
data.draw(st.booleans())
f() but when you add a testcase callback, it gets much slower, and crosshair abandons some test cases with BackendCannotProceed, which shouldn't be happening for a test that doesn't interact with its args at all: import hypothesis.internal.observability
from hypothesis import *
from hypothesis import strategies as st
def f(x):
pass
hypothesis.internal.observability.TESTCASE_CALLBACKS.append(f)
@given(st.integers(), st.floats(), st.data())
@settings(backend="crosshair")
def f(v1, v2, data):
print("call")
data.draw(st.booleans())
f() e: yup, repr_call and to_jsonable are adding constraints or straight up realizing. I'm looking into a fix |
Latest push is an improvement, but I'm pretty confident we're still adding path constraints somewhere, because I still see a speed difference |
See #3914
To reproduce this locally, you can run
make check-crosshair-cover/nocover/niche
for the same command as in CI, but I'd recommendpytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail
to select and run only the xfailed tests.Hypothesis' problems
Flaky: Inconsistent results from replaying a failing test...
- mostly backend-specific failures; we've both"hypothesis/internal/conjecture/data.py", line 2277, in draw_boolean
assert p > 2 ** (-64)
, fixed in1f845e0
(#4049)@given
, fixed in 3315be6target()
, fixed in85712ad
(#4049)typing_extensions
when crosshair depends on it@xfail_on_crosshair(...)
..too_slow
and.filter_too_much
, and skip remaining affected tests under crosshair.-k 'not decimal'
once we're closerPathTimeout
; see RarePathTimeout
errors inprovider.realize(...)
pschanely/hypothesis-crosshair#21 and Further improve support for symbolic execution #3914 (comment)Add
BackendCannotProceed
to improve integration #4092Probably Crosshair's problems
Duplicate type "<class 'array.array'>" registered
from repeated imports? pschanely/hypothesis-crosshair#17RecursionError
, seeRecursionError
in_issubclass
pschanely/CrossHair#294unsupported operand type(s) for -: 'float' and 'SymbolicFloat'
intest_float_clamper
TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object
(or'values'
or'items'
). Fixed in Implement various fixes for hypothesis integration pschanely/CrossHair#269TypeError: _int() got an unexpected keyword argument 'base'
hashlib
requires the buffer protocol, which symbolics bytes don't provide pschanely/CrossHair#272typing.get_type_hints()
raisesValueError
, seetyping.get_type_hints()
raisesValueError
when used inside Crosshair pschanely/CrossHair#275TypeError
in bytes regex, seeTypeError
in bytes regex pschanely/CrossHair#276provider.draw_boolean()
insideFeatureStrategy
, see Invalid combination of arguments todraw_boolean(...)
pschanely/hypothesis-crosshair#18dict(name=value)
, see Support nameddict
init syntax pschanely/CrossHair#279PurePath
constructor, seePurePath(LazyIntSymbolicStr)
error pschanely/CrossHair#280zlib.compress()
not symbolic, see a bytes-like object is required, notSymbolicBytes
when callingzlib.compress(b'')
pschanely/CrossHair#286int.from_bytes(map(...), ...)
, see Acceptmap()
object - or any iterable - inint.from_bytes()
pschanely/CrossHair#291base64.b64encode()
and friends pschanely/CrossHair#293TypeError: conversion from SymbolicInt to Decimal is not supported
; see also snan belowTypeVar
problem, seez3.z3types.Z3Exception: b'parser error'
from interaction withTypeVar
pschanely/CrossHair#292RecursionError
inside Lark, see Weird failures using sets pschanely/CrossHair#297Error in
operator.eq(Decimal('sNaN'), an_int)
Cases where crosshair doesn't find a failing example but Hypothesis does
Seems fine, there are plenty of cases in the other direction. Tracked with
@xfail_on_crosshair(Why.undiscovered)
in case we want to dig in later.Nested use of the Hypothesis engine (e.g. given-inside-given)
This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.