Skip to content

Stop can cause race detector errors in testing #59

@eli-darkly

Description

@eli-darkly

The unit tests for our project are normally run with the -race option, because we do a lot of concurrency stuff and want to avoid subtle unsafe usages. When I integrated ccache into the project, I started getting race detector errors in a test scenario where the cache is shut down with Stop() at the end of the test.

It seems that this is due to what the race detector considers to be unsafe usage of the promotables channel, where there is the potential for a race between close and a previous channel send, as documented here. The race detector isn't saying that a race really did happen during the test run, but it can tell, based on the pattern of accesses to the channel, that one could happen— so it considers that to be an automatic fail.

I wondered why such an issue wouldn't have shown up in ccache's own unit tests, but that's because—

  1. Those tests aren't being run in race detection mode.
  2. The tests are not calling Stop at all. Like, there's no defer cache.Stop() after creating a store (so I imagine there are a lot of orphaned goroutines being created during test runs)— and also there doesn't seem be any test coverage of Stop itself.

When I added a single defer cache.Stop() to a test, and then ran go test -race ./... instead of go test ./..., I immediately got the same kind of error. In any codebase where concurrency is very important, like this one, this is a bit concerning. Even if this particular kind of race condition might not have significant consequences in itself, the fact that it's not possible to run tests with race detection means we can't use that tool to detect other kinds of concurrency problems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions