Skip to content

Auto-generate unit tests & benchmarks #145

@regexident

Description

@regexident

tl;dr

We've gone way past the point where writing/maintaining highly redundant manual unit tests is any fun. If writing unit tests becomes tedious and a maintenance hell people start neglecting them instead. Let's thus make use of the fact that our APIs (and such the tests) almost all follow the pattern and automatically generate the tests for us, allowing us to increase test coverage even more, at actually far less overall cost.

What?

A quick look at the /Tests reveals a suite of tests that all pretty much share the same pattern:

Our tests look something like this:

func test_<something>_float() {
    // Define a type-alias for convenience:
    typealias Scalar = Float

    // Create some dummy data:
    let lhs: [Scalar] = .monotonicNormalized()
    let rhs: [Scalar] = .monotonicNormalized()

    // Create a working copy of the dummy data:
    var actual: [Scalar] = lhs
    // Operate on the working copy:
    Surge.eladdInPlace(&actual, rhs)

    // Provide a ground-truth implementation to compare against:
    let expected = zip(lhs, rhs).map { $0 + $1 }

    // Compare the result:
    XCTAssertEqual(actual, expected, accuracy: 1e-8)
}

… only differentiating each other by a change in this line:

Surge.eladdInPlace(&actual, rhs)

… and this line:

let expected = zip(lhs, rhs).map { $0 + $1 }

And our benchmarks look something like this:

// benchmarks:
func test_add_in_place_array_array_float() {
    // Call convenience function:
    measure_inout_array_array(of: Float.self) { measure in
        // Call XCTest's measurement method:
        measureMetrics([.wallClockTime], automaticallyStartMeasuring: false) {
            // Perform the actual operations to be measured:
            measure(Surge.eladdInPlace)
        }
    }
}

… which is semantically equivalent to the more verbose:

func test_add_in_place_array_array_float() {
    typealias Scalar = T

    let lhs = produceLhs()
    let rhs = produceRhs()

    // Call XCTest's measurement method:
    measureMetrics([.wallClockTime], automaticallyStartMeasuring: false) {
        var lhs = lhs
        
        startMeasuring()
        let _ = Surge.eladdInPlace(&lhs, rhs)
        stopMeasuring()
    }
}

… again, only differentiating each other by a change in this line:

let _ = Surge.eladdInPlace(&actual, rhs)

Why?

At now shy over 200 tests and over 60 benchmarks maintenance of our tests/benchmarks suites has become quite a chore. 😣

So this got me thinking: What if … what if instead of writing and maintaining hundreds of highly redundant tests functions (for a lack of macros in Swift) we had a way to have the tests and even benchmarks generated auto-magically for us?

With this we could easily increase test coverage from "just the functions containing non-trivial logic" to "basically every public function, regardless of complexity", allowing us to catch regressions for even the most-trivial wrapper function, currently not covered at hardly any additional maintenance burden.

How?

The basic idea is to get rid of all the existing unit tests and replace them with mere Sourcery annotations, like this:

// sourcery: test, floatAccuracy = 1e-5, expected = "add(array:array)"
public func add<L, R>(_ lhs: L, _ rhs: R) -> [Float] where L: UnsafeMemoryAccessible, R: UnsafeMemoryAccessible, L.Element == Float, R.Element == Float {
    // …
}

… given a fixture like this:

enum Fixture {
    enum Argument {
        func `default`<Scalar>() -> Scalar {  }
        func `default`<Scalar>() -> [Scalar] {  }
        func `default`<Scalar>() -> Vector<Scalar> {  }
        func `default`<Scalar>() -> Matrix<Scalar> {  }
    }
    enum Accuracy {
        func `default`() -> Float {  }
        func `default`() -> Double {  }
    }
    enum Expected {}
}

extension Fixture.Expected {
    func add<Scalar>(array lhs: [Scalar], array rhs: [Scalar]) -> [Scalar] {
        return zip(lhs, rhs).map { $0 + $1 }
    }
}
Function Annotation Description
test Generate test function (Optional)
bench Generate benchmark function (Optional)
expected = <function name> The fixture function to use as ground-truth (Required by test)
accuracy = <float literal> A custom testing accuracy (Optional, used by test)
floatAccuracy = <float literal> A custom Float-specific testing accuracy (Optional, used by test)
doubleAccuracy = <float literal> A custom Double-specific testing accuracy (Optional, used by test)
arg<N> = <function name> The fixture factory function for the nth argument (Optional, used by test)

One would have Sourcery parse the ource code and generate a test suite per source file (or type extension, preferably), looking for test and bench annotations.

The current unit tests make use of minimal customization of lhs/rhs dummy values, so arg<N> will rarely find use, but a few tests need custom data to test against.

Also given that Surge has a rather restricted set of types that are to be expected as function arguments we should be able to match against them (Scalar, Collection where Element == Scalar, Vector<Scalar>, Matrix<Scalar>) rather naïvely, allowing us to elide most data we would otherwise have to specify explicitly.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions