-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[experimental] Run crosshair in CI #4034
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
175b347
to
424943f
Compare
@Zac-HD your triage above is SO great. I am investigating. |
Knocked out a few of these in 0.0.60.
More soon. |
Ah - the
|
This comment was marked as outdated.
This comment was marked as outdated.
Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with |
ah-ha, seems like we might want some #4029 - style 'don't cache on backends with avoid_realize=True' logic. |
1d2345d
to
7bf8983
Compare
Still here and excited about this! I am on a detour of doing a real symbolic implementation of the |
cc07927
to
018ccab
Compare
Triaging a pile of the So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞 |
This comment was marked as outdated.
This comment was marked as outdated.
OK, we have very many |
102058b
to
262dba3
Compare
0e4d99d
to
95e734c
Compare
(sorry, my local was out of date and force pushes above were me rebasing and subsequently realizing that had already been done in the up-to-date branch. nothing should have changed) |
|
Yup, Phillip beat us to it 😅 pschanely@fd6958f Underlying reason is crosshair raising This is also in part because we decided to give alternative backends control over |
I've addressed the zero_pin issue discussed above by tracking the @pschanely after doing this the CI run shows crosshair returning values of the wrong type from Unfortunately this is flaky locally, possibly due to speed-related time outs. |
Thank you!
SGTM; I will investigate tomorrow! One change that we made a while back is possibly relevant here: to avoid false-positives from approximations like real-based floats, we don't actually report the error right away; instead we'll abort the failing pass and try to recreate the pass next time with concrete draws. This can go haywire though if the requested draw types aren't the same on the next pass. |
Ok, @tybug so I've confirmed that our first and second passes have inconsistent draws, but only got through one layer of diagnostics. Maybe it's enough to spark an idea? Fortunately, at least for me, the issue happens consistently. 🤷 The first draw of the first pass is for a boolean, here: (_execute_once_for_engine core.py:1091) (execute_once core.py:1028) (default_executor core.py:729) (run core.py:1002) (run_state_machine stateful.py:115) (test core.py:894) (run_state_machine stateful.py:160) (draw_boolean data.py:929) (_draw data.py:749) (_pop_choice data.py:972) (draw_boolean crosshair_provider.py:231) But the second pass skips over this draw. In order for Inconsistent draws aside, I should change the provider to ensure it yields SOME value of the requested type. After all, I'm running the concrete execution under the premise that the behavior might actually differ, so I shouldn't be quite so surprised when it does. 😆 |
ah, this is great, thanks. I think Hypothesis requests a value from the backend for |
Looks like that worked, and we now have a very slow test in the a-d range (5+ hours before I gave up and pushed, which canceled it). We should rerun with verbosity to narrow it down; I'll cancel it for now to avoid wasting minutes. |
I think the biggest cause of the slow ones is that the hypothesis time patching confounds CrossHair's timeout mechanisms. In my runs, I did this to work around that, but not sure what's actually appropriate. It might be better to just skip the slow ones. |
We've skipped them so far, but maybe we should instead disable our monkeypatching-in-selftests for the Crosshair tests? |
See #3914
To reproduce this locally, you can run
make check-crosshair-cover/nocover/niche
for the same command as in CI, but I'd recommendpytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail
to select and run only the xfailed tests.Hypothesis' problems
Flaky: Inconsistent results from replaying a failing test...
- mostly backend-specific failures; we've both"hypothesis/internal/conjecture/data.py", line 2277, in draw_boolean
assert p > 2 ** (-64)
, fixed in1f845e0
(#4049)@given
, fixed in 3315be6target()
, fixed in85712ad
(#4049)typing_extensions
when crosshair depends on it@xfail_on_crosshair(...)
..too_slow
and.filter_too_much
, and skip remaining affected tests under crosshair.-k 'not decimal'
once we're closerPathTimeout
; see RarePathTimeout
errors inprovider.realize(...)
pschanely/hypothesis-crosshair#21 and Stable support for symbolic execution #3914 (comment)Add
BackendCannotProceed
to improve integration #4092Probably Crosshair's problems
Duplicate type "<class 'array.array'>" registered
from repeated imports? pschanely/hypothesis-crosshair#17RecursionError
, seeRecursionError
in_issubclass
pschanely/CrossHair#294unsupported operand type(s) for -: 'float' and 'SymbolicFloat'
intest_float_clamper
TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object
(or'values'
or'items'
). Fixed in Implement various fixes for hypothesis integration pschanely/CrossHair#269TypeError: _int() got an unexpected keyword argument 'base'
hashlib
requires the buffer protocol, which symbolics bytes don't provide pschanely/CrossHair#272typing.get_type_hints()
raisesValueError
, seetyping.get_type_hints()
raisesValueError
when used inside Crosshair pschanely/CrossHair#275TypeError
in bytes regex, seeTypeError
in bytes regex pschanely/CrossHair#276provider.draw_boolean()
insideFeatureStrategy
, see Invalid combination of arguments todraw_boolean(...)
pschanely/hypothesis-crosshair#18dict(name=value)
, see Support nameddict
init syntax pschanely/CrossHair#279PurePath
constructor, seePurePath(LazyIntSymbolicStr)
error pschanely/CrossHair#280zlib.compress()
not symbolic, see a bytes-like object is required, notSymbolicBytes
when callingzlib.compress(b'')
pschanely/CrossHair#286int.from_bytes(map(...), ...)
, see Acceptmap()
object - or any iterable - inint.from_bytes()
pschanely/CrossHair#291base64.b64encode()
and friends pschanely/CrossHair#293TypeError: conversion from SymbolicInt to Decimal is not supported
; see also snan belowTypeVar
problem, seez3.z3types.Z3Exception: b'parser error'
from interaction withTypeVar
pschanely/CrossHair#292RecursionError
inside Lark, see Weird failures using sets pschanely/CrossHair#297Error in
operator.eq(Decimal('sNaN'), an_int)
Cases where crosshair doesn't find a failing example but Hypothesis does
Seems fine, there are plenty of cases in the other direction. Tracked with
@xfail_on_crosshair(Why.undiscovered)
in case we want to dig in later.Nested use of the Hypothesis engine (e.g. given-inside-given)
This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.