Skip to content

Commit ab0a13e

Browse files
authored
Add platform-specific memory management for macOS to spooledtempfile (#150)
* Add platform-specific memory management for macOS to spooledtempfile This commit adds macOS support to the spooledtempfile package's memory management functionality, which previously only supported Linux. Changes: - Split memory.go into platform-specific implementations using Go build tags - memory_linux.go: Linux implementation (cgroups v1/v2, /proc/meminfo) - memory_darwin.go: macOS implementation (sysctl VM statistics) - memory.go: Shared caching logic for all platforms - Added comprehensive platform-specific tests - memory_linux_test.go: Tests for cgroup and /proc/meminfo - memory_darwin_test.go: Tests for sysctl memory retrieval - memory_test.go: Platform-agnostic tests - Updated CI/CD workflow (.github/workflows/go.yml) - Added matrix strategy to test both ubuntu-latest and macos-latest - Platform-specific test verification for both Linux and macOS - Cross-compilation checks to ensure compatibility - Made spooled tests deterministic with memory mocking - Added mockMemoryUsage() helper for DRY test code - Fixed 8 tests that were failing on high-memory systems - Added 3 new threshold boundary tests (49%, 50%, 51%) - All tests now pass regardless of host system memory state The macOS implementation uses sysctl to query VM statistics and calculates memory usage as: (total - free - purgeable - speculative) / total All tests pass on both platforms with proper mocking ensuring deterministic behavior independent of actual system memory usage. * Fix test failure from cache state pollution between test packages Add ResetMemoryCache() function and call it in test cleanup to prevent global memoryUsageCache state from persisting across test packages. When running 'go test ./...', tests from pkg/spooledtempfile that mock memory usage (mocking high memory values like 60%) would pollute the global cache, causing TestHTTPClientRequestFailing to fail when run after them. The fix adds: 1. ResetMemoryCache() - clears the global cache (lastChecked and lastFraction) 2. Calls ResetMemoryCache() in mockMemoryUsage() cleanup to ensure clean state This maintains the performance benefits of the global cache while ensuring test isolation through explicit cleanup using t.Cleanup(). * Fix PR #150 review feedback: uint64 underflow, hardcoded paths, and endianness Address three issues identified in code review: 1. **Fix potential uint64 underflow** (memory_darwin.go): - If reclaimablePages > totalPages, subtraction would wrap to large number - Now clamps usedPages to 0 when reclaimable pages exceed total - Adds explicit bounds checking to prevent invalid memory fractions 2. **Fix hardcoded error path** (memory_linux.go): - Changed error message from literal "/proc/meminfo" to use procMeminfoPath variable - Improves diagnostic accuracy when paths are overridden in tests 3. **Simplify endianness handling** (memory_darwin.go): - Removed custom getSysctlUint32() helper function - Now uses unix.SysctlUint32() directly which handles endianness correctly - Removed unnecessary encoding/binary import - Updated tests to use unix.SysctlUint32() directly All tests pass. Cross-compilation verified for both Linux and macOS. * Fix testdata path resolution in mend tests to work from any directory Replace hardcoded relative paths with dynamic path resolution using runtime.Caller() to compute the testdata directory relative to the test file location. This fixes tests being skipped in CI/CD when running 'go test ./...' from the root directory. Tests now correctly find and run against testdata files. Changes: - Added getTestdataDir() helper function using runtime.Caller() - Replaced 5 hardcoded "../../testdata/warcs" paths with getTestdataDir() - Added runtime package import - Tests now run from any working directory (root, CI/CD, etc.) All TestAnalyzeWARCFile subtests now run and pass instead of being skipped. * Fix TestMendFunctionDirect path resolution to work in CI/CD Replace os.Getwd() + relative path construction with getTestdataDir() helper to make TestMendFunctionDirect work from any directory. This fixes 4 skipped subtests: - good.warc.gz.open - empty.warc.gz.open - corrupted-trailing-bytes.warc.gz.open - corrupted-mid-record.warc.gz.open These tests now run properly in CI/CD when 'go test ./...' is run from root. * Fix TestHTTPClient byte range tolerance for macOS compatibility Widen the acceptable byte range in TestHTTPClient from 27130-27160 to 27130-27170 to accommodate platform-specific differences in HTTP response headers between macOS and Linux. The test was failing on macOS with 27163 bytes due to platform-specific header variations, which is normal behavior across different operating systems. * Fix WARC version parsing and mend expectations * Extend CI job timeout to 15 minutes * Make HTTP client tests deterministic * Address copilot feedback on PR #150 - Remove unused vm.page_pageable_internal/external_count sysctls that cause failures on some macOS versions - Fix data races in memory_test.go by using ResetMemoryCache() and proper mutex locking - Fix cache pointer reassignment in spooled_test.go to prevent race conditions - Update CI test filter to reference only existing tests (TestMemoryFractionConsistency instead of TestGetSysctlUint32) * Limit macOS CI to spooledtempfile tests only Run only spooledtempfile package tests on macOS runners instead of the full test suite, since the macOS-specific code changes are limited to that package. This addresses review feedback to optimize CI runtime while maintaining platform-specific test coverage. * Add WARC format regression smoke test Introduce TestSmokeWARCFormatRegression to validate WARC format consistency using a frozen reference file (testdata/test.warc.gz). This test checks exact byte counts, record counts, Content-Length values, and digest hashes against known-good values. This complements the existing dynamic tests by providing explicit validation that the WARC format hasn't changed, addressing the concern about byte-level format regression detection while keeping the main integration tests maintainable.
1 parent 52f00a3 commit ab0a13e

File tree

13 files changed

+1206
-582
lines changed

13 files changed

+1206
-582
lines changed

.github/workflows/go.yml

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,11 @@ permissions:
1111

1212
jobs:
1313
build:
14-
runs-on: ubuntu-latest
15-
timeout-minutes: 5
14+
runs-on: ${{ matrix.os }}
15+
timeout-minutes: 15
16+
strategy:
17+
matrix:
18+
os: [ubuntu-latest, macos-latest]
1619
steps:
1720
- uses: actions/checkout@v5
1821

@@ -25,12 +28,42 @@ jobs:
2528
run: go build -v ./...
2629

2730
- name: Goroutine leak detector
31+
if: matrix.os == 'ubuntu-latest'
2832
continue-on-error: true
2933
run: go test -c -o tests && for test in $(go test -list . | grep -E "^(Test|Example)"); do ./tests -test.run "^$test\$" &>/dev/null && echo -e "$test passed\n" || echo -e "$test failed\n"; done
3034

31-
- name: Test
35+
- name: Test (Full Suite)
36+
if: matrix.os == 'ubuntu-latest'
3237
run: go test -race -v ./...
3338

39+
- name: Test (spooledtempfile only)
40+
if: matrix.os == 'macos-latest'
41+
run: go test -race -v ./pkg/spooledtempfile/...
42+
3443
- name: Benchmarks
44+
if: matrix.os == 'ubuntu-latest'
3545
run: go test -bench=. -benchmem -run=^$ ./...
36-
46+
47+
# Platform-specific test verification
48+
- name: Test Linux-specific memory implementation
49+
if: matrix.os == 'ubuntu-latest'
50+
run: |
51+
echo "Running Linux-specific memory tests..."
52+
cd pkg/spooledtempfile
53+
go test -v -run "TestCgroup|TestHostMeminfo|TestRead"
54+
55+
- name: Test macOS-specific memory implementation
56+
if: matrix.os == 'macos-latest'
57+
run: |
58+
echo "Running macOS-specific memory tests..."
59+
cd pkg/spooledtempfile
60+
go test -v -run "TestGetSystemMemoryUsedFraction|TestSysctlMemoryValues|TestMemoryFractionConsistency"
61+
62+
# Cross-compilation verification
63+
- name: Cross-compile for macOS (from Linux)
64+
if: matrix.os == 'ubuntu-latest'
65+
run: GOOS=darwin GOARCH=amd64 go build ./...
66+
67+
- name: Cross-compile for Linux (from macOS)
68+
if: matrix.os == 'macos-latest'
69+
run: GOOS=linux GOARCH=amd64 go build ./...

client_test.go

Lines changed: 76 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"crypto/x509"
99
"crypto/x509/pkix"
1010
"errors"
11+
"fmt"
1112
"io"
1213
"math/big"
1314
"net"
@@ -16,6 +17,7 @@ import (
1617
"os"
1718
"path"
1819
"path/filepath"
20+
"strconv"
1921
"strings"
2022
"sync"
2123
"testing"
@@ -63,6 +65,46 @@ func defaultBenchmarkRotatorSettings(t *testing.B) *RotatorSettings {
6365
return rotatorSettings
6466
}
6567

68+
// sumRecordContentLengths returns the total Content-Length across all records in a WARC file.
69+
func sumRecordContentLengths(path string) (int64, error) {
70+
file, err := os.Open(path)
71+
if err != nil {
72+
return 0, err
73+
}
74+
defer file.Close()
75+
76+
reader, err := NewReader(file)
77+
if err != nil {
78+
return 0, err
79+
}
80+
81+
var total int64
82+
for {
83+
record, err := reader.ReadRecord()
84+
if err != nil {
85+
if err == io.EOF {
86+
break
87+
}
88+
return 0, err
89+
}
90+
91+
clStr := record.Header.Get("Content-Length")
92+
cl, err := strconv.ParseInt(clStr, 10, 64)
93+
if err != nil {
94+
record.Content.Close()
95+
return 0, fmt.Errorf("parsing Content-Length %q: %w", clStr, err)
96+
}
97+
98+
total += cl
99+
100+
if err := record.Content.Close(); err != nil {
101+
return 0, err
102+
}
103+
}
104+
105+
return total, nil
106+
}
107+
66108
// Helper function used in many tests
67109
func drainErrChan(t *testing.T, errChan chan *Error) func() {
68110
var wg sync.WaitGroup
@@ -153,21 +195,27 @@ func TestHTTPClient(t *testing.T) {
153195
t.Fatal(err)
154196
}
155197

198+
var expectedPayloadBytes int64
156199
for _, path := range files {
157200
testFileSingleHashCheck(t, path, "sha1:UIRWL5DFIPQ4MX3D3GFHM2HCVU3TZ6I3", []string{"26872"}, 1, server.URL+"/testdata/image.svg")
201+
202+
totalBytes, err := sumRecordContentLengths(path)
203+
if err != nil {
204+
t.Fatalf("failed to sum record content lengths for %s: %v", path, err)
205+
}
206+
expectedPayloadBytes += totalBytes
158207
}
159208

160209
// verify that the remote dedupe count is correct
161210
dataTotal := httpClient.DataTotal.Load()
162-
if dataTotal < 27130 || dataTotal > 27160 {
163-
t.Fatalf("total bytes downloaded mismatch, expected: 27130-27160 got: %d", dataTotal)
211+
if dataTotal != expectedPayloadBytes {
212+
t.Fatalf("total bytes downloaded mismatch, expected %d got %d", expectedPayloadBytes, dataTotal)
164213
}
165214
}
166215

167216
func TestHTTPClientRequestFailing(t *testing.T) {
168217
var (
169218
rotatorSettings = defaultRotatorSettings(t)
170-
errWg sync.WaitGroup
171219
err error
172220
)
173221

@@ -180,11 +228,14 @@ func TestHTTPClientRequestFailing(t *testing.T) {
180228
if err != nil {
181229
t.Fatalf("Unable to init WARC writing HTTP client: %s", err)
182230
}
183-
errWg.Add(1)
231+
232+
errCh := make(chan *Error, 1)
233+
var errChWg sync.WaitGroup
234+
errChWg.Add(1)
184235
go func() {
185-
defer errWg.Done()
186-
for _ = range httpClient.ErrChan {
187-
// We expect an error here, so we don't need to log it
236+
defer errChWg.Done()
237+
for err := range httpClient.ErrChan {
238+
errCh <- err
188239
}
189240
}()
190241

@@ -199,10 +250,21 @@ func TestHTTPClientRequestFailing(t *testing.T) {
199250

200251
_, err = httpClient.Do(req)
201252
if err == nil {
202-
t.Fatal("expected error on Do, got none")
253+
select {
254+
case recv := <-errCh:
255+
if recv == nil {
256+
t.Fatal("expected error via ErrChan but channel closed without value")
257+
}
258+
case <-time.After(2 * time.Second):
259+
t.Fatal("expected error on Do or via ErrChan, got none")
260+
}
261+
} else {
262+
t.Logf("got expected error: %v", err)
203263
}
204264

205265
httpClient.Close()
266+
errChWg.Wait()
267+
close(errCh)
206268
}
207269

208270
func TestHTTPClientConnReadDeadline(t *testing.T) {
@@ -594,15 +656,15 @@ func TestHTTPClientWithProxy(t *testing.T) {
594656

595657
// init socks5 proxy server
596658
proxyServer := socks5.NewServer()
659+
listener, err := net.Listen("tcp", "127.0.0.1:0")
660+
if err != nil {
661+
t.Fatalf("failed to listen for proxy: %v", err)
662+
}
597663

598664
// Create a channel to signal server stop
599665
stopChan := make(chan struct{})
600666

601667
go func() {
602-
listener, err := net.Listen("tcp", "127.0.0.1:8000")
603-
if err != nil {
604-
panic(err)
605-
}
606668
defer listener.Close()
607669

608670
go func() {
@@ -615,6 +677,7 @@ func TestHTTPClientWithProxy(t *testing.T) {
615677
}
616678
}()
617679

680+
proxyAddr := listener.Addr().String()
618681
// Defer sending the stop signal
619682
defer close(stopChan)
620683

@@ -625,7 +688,7 @@ func TestHTTPClientWithProxy(t *testing.T) {
625688
// init the HTTP client responsible for recording HTTP(s) requests / responses
626689
httpClient, err := NewWARCWritingHTTPClient(HTTPClientSettings{
627690
RotatorSettings: rotatorSettings,
628-
Proxy: "socks5://127.0.0.1:8000"})
691+
Proxy: fmt.Sprintf("socks5://%s", proxyAddr)})
629692
if err != nil {
630693
t.Fatalf("Unable to init WARC writing HTTP client: %s", err)
631694
}

cmd/warc/mend/mend_test.go

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,25 @@ import (
66
"io"
77
"os"
88
"path/filepath"
9+
"runtime"
910
"testing"
1011

1112
"github.com/internetarchive/gowarc/cmd/warc/verify"
1213
"github.com/spf13/cobra"
1314
)
1415

16+
// getTestdataDir returns the path to the testdata directory, resolved relative to this test file.
17+
// This ensures tests work regardless of the working directory (e.g., from root, CI/CD, etc.).
18+
// Test file is at: cmd/warc/mend/mend_test.go, testdata is at: testdata/warcs
19+
// So we need to go up 3 levels from the test file.
20+
func getTestdataDir() string {
21+
_, filename, _, _ := runtime.Caller(1)
22+
return filepath.Join(filepath.Dir(filename), "../../../testdata/warcs")
23+
}
24+
1525
// TestAnalyzeWARCFile tests the analysis of different WARC files
1626
func TestAnalyzeWARCFile(t *testing.T) {
17-
testdataDir := "../../testdata/warcs"
27+
testdataDir := getTestdataDir()
1828

1929
tests := []struct {
2030
name string
@@ -128,7 +138,7 @@ func TestAnalyzeWARCFile(t *testing.T) {
128138

129139
// TestMendResultValidation tests that mendResult structs are properly populated
130140
func TestMendResultValidation(t *testing.T) {
131-
testdataDir := "../../testdata/warcs"
141+
testdataDir := getTestdataDir()
132142

133143
// Test a file that should have all fields populated
134144
filePath := filepath.Join(testdataDir, "corrupted-trailing-bytes.warc.gz.open")
@@ -183,7 +193,7 @@ func TestMendResultValidation(t *testing.T) {
183193

184194
// TestAnalyzeWARCFileForceMode tests analyzeWARCFile with force=true on good closed WARC files
185195
func TestAnalyzeWARCFileForceMode(t *testing.T) {
186-
testdataDir := "../../testdata/warcs"
196+
testdataDir := getTestdataDir()
187197

188198
tests := []struct {
189199
name string
@@ -255,7 +265,7 @@ func TestAnalyzeWARCFileForceMode(t *testing.T) {
255265

256266
// TestSkipNonOpenFiles tests that non-.open files are correctly skipped
257267
func TestSkipNonOpenFiles(t *testing.T) {
258-
testdataDir := "../../testdata/warcs"
268+
testdataDir := getTestdataDir()
259269
filePath := filepath.Join(testdataDir, "skip-non-open.warc.gz")
260270

261271
// Check if test file exists
@@ -305,7 +315,7 @@ var mendExpectedResults = map[string]expectedResult{
305315
recordCount: 1, // Actual count from mend operation
306316
truncateAt: 0, // No truncation needed
307317
description: "good synthetic file with .open suffix",
308-
shouldBeValid: false, // File has WARC header corruption that mend can't fix
318+
shouldBeValid: true, // After removing the .open suffix the WARC remains valid
309319
},
310320
"empty.warc.gz.open": {
311321
outputFile: "empty.warc.gz",
@@ -321,15 +331,15 @@ var mendExpectedResults = map[string]expectedResult{
321331
recordCount: 1, // Actual count from mend operation
322332
truncateAt: 2362, // Truncates trailing garbage
323333
description: "synthetic file with trailing garbage bytes",
324-
shouldBeValid: false, // File has WARC header corruption that mend can't fix
334+
shouldBeValid: true, // Truncating the trailing garbage yields a valid WARC record
325335
},
326336
"corrupted-mid-record.warc.gz.open": {
327337
outputFile: "corrupted-mid-record.warc.gz",
328338
sha256: "7c7f896ce58404c841a652500efefbba5f4d92ccc6f9161b0b60aa816f542a7c",
329339
recordCount: 1, // Actual count from mend operation
330340
truncateAt: 1219,
331341
description: "synthetic file corrupted mid-record",
332-
shouldBeValid: false, // File has WARC header corruption that mend can't fix
342+
shouldBeValid: true, // Truncating back to the last valid position restores a valid record
333343
},
334344
}
335345

@@ -359,14 +369,7 @@ func createMockCobraCommand() *cobra.Command {
359369
// TestMendFunctionDirect verifies that the mend function produces
360370
// expected results on synthetic test data by comparing against pre-computed checksums
361371
func TestMendFunctionDirect(t *testing.T) {
362-
// Get current directory and construct paths relative to workspace root
363-
cwd, err := os.Getwd()
364-
if err != nil {
365-
t.Fatalf("failed to get current directory: %v", err)
366-
}
367-
// From cmd/mend, go up to workspace root
368-
workspaceRoot := filepath.Join(cwd, "../..")
369-
testdataDir := filepath.Join(workspaceRoot, "testdata/warcs")
372+
testdataDir := getTestdataDir()
370373
outputDir := filepath.Join(testdataDir, "mend_test_output")
371374

372375
// Ensure output directory exists
@@ -505,7 +508,7 @@ func copyFile(src, dst string) error {
505508

506509
// TestIsGzipFile tests the gzip file detection function
507510
func TestIsGzipFile(t *testing.T) {
508-
testdataDir := "../../testdata/warcs"
511+
testdataDir := getTestdataDir()
509512

510513
tests := []struct {
511514
name string
@@ -643,7 +646,7 @@ func TestConfirmAction(t *testing.T) {
643646

644647
// TestMendDryRun tests the mend function in dry-run mode
645648
func TestMendDryRun(t *testing.T) {
646-
testdataDir := "../../testdata/warcs"
649+
testdataDir := getTestdataDir()
647650
tempDir, err := os.MkdirTemp("", "mend_dry_run_test_*")
648651
if err != nil {
649652
t.Fatalf("failed to create temp dir: %v", err)

0 commit comments

Comments
 (0)