[experimental] Run crosshair in CI #4034

Zac-HD · 2024-07-07T18:06:24Z

To reproduce this locally, you can run make check-crosshair-cover/nocover/niche for the same command as in CI, but I'd recommend pytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail to select and run only the xfailed tests.

Hypothesis' problems

Probably Crosshair's problems

Error in `operator.eq(Decimal('sNaN'), an_int)`

____ test_rewriting_does_not_compare_decimal_snan ____
  File "hypothesis/strategies/_internal/strategies.py", line 1017, in do_filtered_draw
    if self.condition(value):
TypeError: argument must be an integer
while generating 's' from integers(min_value=1, max_value=5).filter(functools.partial(eq, Decimal('sNaN')))

Cases where crosshair doesn't find a failing example but Hypothesis does

Seems fine, there are plenty of cases in the other direction. Tracked with @xfail_on_crosshair(Why.undiscovered) in case we want to dig in later.

Nested use of the Hypothesis engine (e.g. given-inside-given)

This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.

pschanely · 2024-07-07T21:03:45Z

@Zac-HD your triage above is SO great. I am investigating.

pschanely · 2024-07-08T17:47:56Z

Knocked out a few of these in 0.0.60.
I think that means current status on my end is:

TypeError: conversion from SymbolicInt to Decimal is not supported
Unsupported operand type(s) for -: 'float' and 'SymbolicFloat' in test_float_clamper
TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object (or 'values' or 'items').
TypeError: _int() got an unexpected keyword argument 'base'
Symbolic not realized (in e.g. test_suppressing_filtering_health_check)
Error in operator.eq(Decimal('sNaN'), an_int)
Zac's cursed example below!

More soon.

Zac-HD · 2024-07-12T07:17:48Z

Ah - the Flaky failures are of course because we had some failure under the Crosshair backend, which did not reproduce under the Hypothesis backend. This is presumably going to point to a range of integration bugs, but is also something that we'll want to clearly explain to users because integration bugs are definitely going to happen in future and users will need to respond (by e.g. using a different backend, ignoring the problem, whatever).

improve the reporting around Flaky failures where the differing or missing errors are related to a change of backend while shrinking. See also Change Flaky to be an ExceptionGroup #4040.
triage all the current failures so we can fix them

tybug · 2024-07-12T15:33:33Z

Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with export HYPOTHESIS_NO_TRACEBACK_TRIM=1 reveals limited_category_index_cache in cm.query is at fault.

Zac-HD · 2024-07-12T18:43:06Z

ah-ha, seems like we might want some #4029 - style 'don't cache on backends with avoid_realize=True' logic.

pschanely · 2024-07-13T00:39:13Z

Still here and excited about this! I am on a detour of doing a real symbolic implementation of the decimal module - should get that out this weekend.

Zac-HD · 2024-07-13T08:18:25Z

Triaging a pile of the Flaky erorrs, most were due to getting a RecursionError under crosshair and then passing under Hypothesis - and it looks like most of those were in turn because of all our nested-@given() test helpers.

So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞

tybug · 2025-03-18T08:08:01Z

Looks like that worked, and we now have a very slow test in the a-d range (5+ hours before I gave up and pushed, which canceled it). We should rerun with verbosity to narrow it down; I'll cancel it for now to avoid wasting minutes.

pschanely · 2025-03-18T12:54:08Z

Looks like that worked, and we now have a very slow test in the a-d range

I think the biggest cause of the slow ones is that the hypothesis time patching confounds CrossHair's timeout mechanisms. In my runs, I did this to work around that, but not sure what's actually appropriate. It might be better to just skip the slow ones.

Zac-HD · 2025-03-18T16:04:15Z

We've skipped them so far, but maybe we should instead disable our monkeypatching-in-selftests for the Crosshair tests?

tybug · 2025-03-20T05:30:47Z

@Zac-HD what's the _hack_xfail_crosshair_error fixture for? is it related to the time monkeypatching? I didn't actually realize we were skipping monkeypatched-time tests currently, I thought we had fixed that by importing crosshair before patching it.

tybug · 2025-03-20T06:19:29Z

Latest run looks great. I found the test that was hanging in the a-d range by running locally, and skipped that one. Remaining failures are about 20% "probably a real problem" and 80% tests that need to be skipped or adjusted.

Here's a failure that is probably our fault for not realizing somewhere (CI run):

  File "/home/runner/work/hypothesis/hypothesis/hypothesis-python/.tox/crosshair-custom/lib/python3.10/site-packages/hypothesis/database.py", line 1068, in choices_to_bytes
    assert isinstance(elem, str)
AssertionError: assert False
 +  where False = isinstance(<[CrossHairInternal('Numeric operation on symbolic while not tracing') raised in repr()] SymbolicInt object at 0x7f7f20f61420>, str)

Zac-HD · 2025-03-20T19:25:30Z

@Zac-HD what's the _hack_xfail_crosshair_error fixture for?

Oh, that was a very very temporary hack, I think we've fixed the underlying now and should delete it.

pschanely · 2025-03-21T20:40:36Z

Latest run looks great. I found the test that was hanging in the a-d range by running locally, and skipped that one. Remaining failures are about 20% "probably a real problem" and 80% tests that need to be skipped or adjusted.

Here's a failure that is probably our fault for not realizing somewhere (CI run):
  File "/home/runner/work/hypothesis/hypothesis/hypothesis-python/.tox/crosshair-custom/lib/python3.10/site-packages/hypothesis/database.py", line 1068, in choices_to_bytes
    assert isinstance(elem, str)
AssertionError: assert False
 +  where False = isinstance(<[CrossHairInternal('Numeric operation on symbolic while not tracing') raised in repr()] SymbolicInt object at 0x7f7f20f61420>, str)

Yup - I think we're accessing data.choices here, just prior to the finally block that will do the realization we need.

tybug · 2025-03-22T03:59:04Z

oh, beautiful. That case is for fatal engine errors and takes a different save path than normal. We should realize there.

There's a second comment, which is that this fatal path was being taken for _pytest.outcomes.Skipped! I don't think this is correct, I think we should be except skip_exceptions_to_reraise(): and just reraising those for control flow rather than treating it as a failure that needs to be saved to the db.

tybug · 2025-03-22T20:52:04Z

I think the remaining failures are split roughly evenly between Hypothesis' fault, and tests that need to be skipped. They all deserve more thorough investigation to determine which one, and what the cause is. I haven't looked too deeply at them yet. We're getting really close to a clean run!

tybug · 2025-03-22T23:04:34Z

Posting some early diagnostics here: I think we're interacting with the symbolics when observability mode is enabled (ie we have a testcase callback) in such a way that causes the symbolic to pick up a path constraint (or maybe be fully realized?), because the following is very fast:

from hypothesis import *
from hypothesis import strategies as st

@given(st.integers(), st.floats(), st.data())
@settings(backend="crosshair")
def f(v1, v2, data):
    print("call")
    data.draw(st.booleans())
f()

but when you add a testcase callback, it gets much slower, and crosshair abandons some test cases with BackendCannotProceed, which shouldn't be happening for a test that doesn't interact with its args at all:

import hypothesis.internal.observability
from hypothesis import *
from hypothesis import strategies as st

def f(x):
    pass

hypothesis.internal.observability.TESTCASE_CALLBACKS.append(f)

@given(st.integers(), st.floats(), st.data())
@settings(backend="crosshair")
def f(v1, v2, data):
    print("call")
    data.draw(st.booleans())
f()

e: yup, repr_call and to_jsonable are adding constraints or straight up realizing. I'm looking into a fix

tybug · 2025-03-23T18:16:20Z

Latest push is an improvement, but I'm pretty confident we're still adding path constraints somewhere, because I still see a speed difference

Zac-HD added tests/build/CI about testing or deployment *of* Hypothesis interop how to play nicely with other packages labels Jul 7, 2024

This comment was marked as outdated.

Sign in to view

Zac-HD force-pushed the crosshair-in-ci branch 3 times, most recently from 175b347 to 424943f Compare July 7, 2024 20:26

Zac-HD mentioned this pull request Jul 7, 2024

Further improve support for symbolic execution #3914

Open

27 tasks

Zac-HD force-pushed the crosshair-in-ci branch from 424943f to b2d11c7 Compare July 7, 2024 20:56

Zac-HD force-pushed the crosshair-in-ci branch from b2d11c7 to 98ccf44 Compare July 11, 2024 07:23

This comment was marked as outdated.

Sign in to view

Zac-HD force-pushed the crosshair-in-ci branch from 98ccf44 to 4bd7e45 Compare July 12, 2024 07:48

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from 1d2345d to 7bf8983 Compare July 12, 2024 20:15

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from cc07927 to 018ccab Compare July 13, 2024 07:23

This comment was marked as outdated.

Sign in to view

pschanely mentioned this pull request Jul 13, 2024

Auto-realize encode/decode arguments for unsupported codecs pschanely/CrossHair#271

Closed

This was referenced Jul 13, 2024

Make Flaky into an ExceptionGroup #4043

Merged

Test-only improvements #4047

Merged

Zac-HD force-pushed the crosshair-in-ci branch from 5d77e32 to 00a9931 Compare July 13, 2024 21:35

This was referenced Jul 13, 2024

Duplicate type "<class 'array.array'>" registered from repeated imports? pschanely/hypothesis-crosshair#17

Closed

hashlib requires the buffer protocol, which symbolics bytes don't provide pschanely/CrossHair#272

Closed

pschanely mentioned this pull request Mar 18, 2025

Ensure concrete re-executions are resilient to behavior variation pschanely/hypothesis-crosshair#25

Closed

tybug mentioned this pull request Mar 20, 2025

Don't pin zero data if backend cannot proceed #4312

Merged

tybug mentioned this pull request Mar 22, 2025

Fix symbolic realization on fatal error #4316

Merged

Zac-HD mentioned this pull request Mar 23, 2025

Run crosshair in CI [final] [v2] [this time for sure] #4319

Closed

Zac-HD and others added 9 commits March 23, 2025 19:43

tweak some stale docstrings

050b7fc

Ensure crosshair gets real monotonic()

75b290f

run crosshair in CI

34d36d3

skip hanging or very slow tests

86e94ae

Mark expected failures under crosshair

c8a5cea

Mark failures for crosshair to fix?

755486a

Improve pprint/to_jsonable symbolics

77e3d4a

Skip crosshair-incompatible tests

2b50425

update extras link

159c3b8

Zac-HD force-pushed the crosshair-in-ci branch from ad701d8 to 159c3b8 Compare March 24, 2025 02:43

Zac-HD marked this pull request as ready for review March 24, 2025 02:50

Zac-HD requested a review from DRMacIver as a code owner March 24, 2025 02:50

Zac-HD merged commit e66f7fb into HypothesisWorks:master Mar 24, 2025
59 of 60 checks passed

Zac-HD deleted the crosshair-in-ci branch March 24, 2025 03:03

Zac-HD mentioned this pull request Mar 24, 2025

Update extras link #4320

Closed

[experimental] Run crosshair in CI #4034

[experimental] Run crosshair in CI #4034

Uh oh!

Conversation

Zac-HD commented Jul 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis' problems

Probably Crosshair's problems

Error in operator.eq(Decimal('sNaN'), an_int)

Cases where crosshair doesn't find a failing example but Hypothesis does

Nested use of the Hypothesis engine (e.g. given-inside-given)

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

pschanely commented Jul 7, 2024

Uh oh!

pschanely commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zac-HD commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

tybug commented Jul 12, 2024

Uh oh!

Zac-HD commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pschanely commented Jul 13, 2024

Uh oh!

Zac-HD commented Jul 13, 2024

Uh oh!

This comment was marked as outdated.

tybug commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pschanely commented Mar 18, 2025

Uh oh!

Zac-HD commented Mar 18, 2025

Uh oh!

tybug commented Mar 20, 2025

Uh oh!

tybug commented Mar 20, 2025

Uh oh!

Zac-HD commented Mar 20, 2025

Uh oh!

pschanely commented Mar 21, 2025

Uh oh!

tybug commented Mar 22, 2025

Uh oh!

tybug commented Mar 22, 2025

Uh oh!

tybug commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tybug commented Mar 23, 2025

Uh oh!

Uh oh!

Uh oh!

Zac-HD commented Jul 7, 2024 •

edited

Loading

Error in `operator.eq(Decimal('sNaN'), an_int)`

pschanely commented Jul 8, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

tybug commented Mar 18, 2025 •

edited

Loading

tybug commented Mar 22, 2025 •

edited

Loading