Skip to content

Improve CommentedString.validString performance #1066

@ChrisBenua

Description

@ChrisBenua

Current Performance

For example, on my benchmark, validString takes about 25 seconds in total.

Benchmark code
@inline(never)
func blackHole<T>(_ value: T) {
    withExtendedLifetime(value) {}
}

func benchmark(_ label: String, iterations: Int = 100_000, _ body: () -> Void) {
    // warmup
    for _ in 0..<1000 { body() }

    let start = CFAbsoluteTimeGetCurrent()
    for _ in 0..<iterations { body() }
    let elapsed = CFAbsoluteTimeGetCurrent() - start

    let perCall = elapsed / Double(iterations) * 1_000_000 // µs
    print(String(format: "%-40@ %8.2f µs/call (%.3f s total, %d iters)", label as NSString, perCall, elapsed, iterations))}

// MARK: - Test Data

let simpleAlpha = CommentedString("SomeIdentifier123")
let withUnderscore = CommentedString("path_to/file_name")
let withDoubleSlash = CommentedString("path//to___file")
let needsEscaping = CommentedString("Hello \"World\"\nLine\ttab\\end")
let withSpaces = CommentedString("some value here")
let empty = CommentedString("")
let boolTrue = CommentedString("true")
let boolFalse = CommentedString("false")
let longClean = CommentedString(String(repeating: "abcDEF_123.", count: 200))
let longDirty = CommentedString(String(repeating: "hello \"world\" \n", count: 100))

// MARK: - Run

print("=== CommentedString.validString Benchmark ===\n")

benchmark("simple alphanumeric (fast path)") { blackHole(simpleAlpha.validString) }
benchmark("underscore+slash (valid, special)") { blackHole(withUnderscore.validString) }
benchmark("// and ___ (needs quoting)")        { blackHole(withDoubleSlash.validString) }
benchmark("needs escaping (\\\" \\n \\t \\\\)") { blackHole(needsEscaping.validString) }
benchmark("spaces (invalid chars, no escape)")  { blackHole(withSpaces.validString) }
benchmark("empty string")                       { blackHole(empty.validString) }
benchmark("\"true\"")                           { blackHole(boolTrue.validString) }
benchmark("\"false\"")                          { blackHole(boolFalse.validString) }
benchmark("long clean (2200 chars)")            { blackHole(longClean.validString) }
benchmark("long dirty (1500 chars, escaping)")  { blackHole(longDirty.validString) }

// MARK: - Batch (realistic mix)

let mixedBatch: [CommentedString] = [
    simpleAlpha, withUnderscore, withDoubleSlash, needsEscaping,
    withSpaces, empty, boolTrue, boolFalse, longClean, longDirty
]

benchmark("mixed batch (10 strings)", iterations: 50_000) {
    for s in mixedBatch { blackHole(s.validString) }
}

These functions on screenshot consume the most amount of processing time
Image

Optimization points

  • Replace StringProtocol.contains with checking that one UInt8 array (specifically UnsafeBufferPointer) is subarray of another. Why this approach is faster? Because StringProtocol.contains spends most of its time on string indexing (which requires much logic to preserve grapheme clusters).
  • Replace rangeOfCharacter with checking that UInt8 array (specifically UnsafeBufferPointer) contains specific value.
  • Instead of calling append for each string character in reduce function - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

Performance Comparison

Measured with M2 Max 32 GB RAM, MacOS 26.2, release build. If I run these benchmarks many times, average deviation is about 3-4%

Current validString performance:

simple alphanumeric (fast path)       0.47 µs/call (0.047 s total, 100000 iters)
underscore+slash (valid, special)     1.30 µs/call (0.130 s total, 100000 iters)
// and ___ (needs quoting)            0.77 µs/call (0.077 s total, 100000 iters)
needs escaping (\" \n \t \\)          0.74 µs/call (0.074 s total, 100000 iters)
spaces (invalid chars, no escape)     0.39 µs/call (0.039 s total, 100000 iters)
"true"                                0.01 µs/call (0.001 s total, 100000 iters)
"false"                               0.01 µs/call (0.001 s total, 100000 iters)
long clean (2200 chars)               119.46 µs/call (11.946 s total, 100000 iters)
long dirty (1500 chars, escaping)     36.50 µs/call (3.650 s total, 100000 iters)
mixed batch (10 strings)              161.31 µs/call (8.065 s total, 50000 iters)

Improved validString performance:

simple alphanumeric (fast path)       0.03 µs/call (0.003 s total, 100000 iters) - 15x boost
underscore+slash (valid, special)     0.06 µs/call (0.006 s total, 100000 iters) - 20x boost
// and ___ (needs quoting)            0.09 µs/call (0.009 s total, 100000 iters) - 8x boost
needs escaping (\" \n \t \\)          0.09 µs/call (0.009 s total, 100000 iters) - 8x boost
spaces (invalid chars, no escape)     0.07 µs/call (0.007 s total, 100000 iters) - 5x boost
"true"                                0.01 µs/call (0.001 s total, 100000 iters) - same
"false"                               0.00 µs/call (0.000 s total, 100000 iters) - same
long clean (2200 chars)               5.50 µs/call (0.550 s total, 100000 iters) - 20x boost
long dirty (1500 chars, escaping)     2.30 µs/call (0.230 s total, 100000 iters) - 16x boost
mixed batch (10 strings)              8.12 µs/call (0.406 s total, 50000 iters) - 20x boost

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions