-
Notifications
You must be signed in to change notification settings - Fork 4
Rewrite Mimecast adapter with multi-API support #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Rewrite Mimecast adapter with multi-API support #246
Conversation
Code Review FindingsCritical Issues1. Token Refresh Race Condition (client.go:198-215)Multiple goroutines can simultaneously call // Multiple goroutines pass this check simultaneously
if a.oauthToken != "" && time.Now().Before(a.tokenExpiry) {
token := a.oauthToken
a.tokenMu.Unlock() // <-- Unlocked here
return map[string]string{...}, nil
}
a.tokenMu.Unlock() // <-- All goroutines unlock and call refresh
return a.refreshOAuthToken(ctx) // <-- Multiple simultaneous callsFix: Use double-checked locking or sync.Once per refresh cycle to ensure only one goroutine refreshes the token. 2. Negative Token Expiry Duration (client.go:261)If a.tokenExpiry = time.Now().Add(time.Duration(tokenResp.ExpiresIn-60) * time.Second)Fix: Use 3. Unstable Hash-Based Deduplication (client.go:890-900)
jsonBytes, err := json.Marshal(logMap) // Map order is random in Go
hash := sha256.Sum256(jsonBytes)Fix: Sort keys before marshaling or use a deterministic serialization method. 4. Context Mismatch (client.go:150)USP client uses parent context while adapter uses child context. If parent cancels, USP client stops but adapter continues: ctxChild, cancel := context.WithCancel(ctx)
a.uspClient, err = uspclient.NewClient(ctx, conf.ClientOptions) // Should use ctxChildFix: Use 5. Negative Retry-After Duration (client.go:652)If retryUntilTime := time.Until(retryAfterTime).Seconds() // Can be negative
if err := sleepContext(a.ctx, time.Duration(retryUntilTime)*time.Second)Fix: Add validation: High Priority Issues6. No Validation for MaxConcurrentWorkers (client.go:130-132)Accepts any positive value. Setting to 100,000 creates 100,000 goroutines: if c.MaxConcurrentWorkers == 0 {
c.MaxConcurrentWorkers = 10
}
// No upper bound checkFix: Add reasonable upper limit (e.g., 100). 7. Retryable Errors Not Retried (client.go:732-740)Mimecast API returns errorMessages = append(errorMessages, fmt.Sprintf("%s: %s (retryable: %v)",
errDetail.Code, errDetail.Message, errDetail.Retryable))
}
// Returns error without checking Retryable field
return nil, fmt.Errorf("mimecast api errors: %v", errorMessages)Fix: Check 8. Inconsistent 5XX Error Handling (client.go:704-712 vs 714-720)5XX errors return if status >= 500 && status < 600 {
return nil, nil // Silent failure, no error propagated
}
if status != http.StatusOK {
return allItems, err // Error propagated
}Fix: Return error for 5XX to ensure proper error tracking, or document this design choice. Medium Priority Issues11. Semaphore Hardcoded (client.go:370)
shipperSem := make(chan struct{}, 2) // Always 2, regardless of configFix: Make this configurable or document why it's separate from worker concurrency. 12. Unused Variable (client.go:418, 438-439)
count := 0
// ...
count += len(events)Fix: Remove or use for metrics/logging. 13. Redundant querySucceeded Flag (client.go:531, 823, 828)Only set to true at line 823, checked at 828. Always true at check point: var querySucceeded bool
// ... lots of code ...
querySucceeded = true // Line 823
if querySucceeded { // Line 828 - always true hereFix: Remove flag and simplify logic. |
maximelb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also getting the robot to post some relevant findings from its review as comments to the PR.
|
@RagingRedRiot Let me know if you prefer we pick up the PR from here and make mods vs you doing it. |
|
@maximelb I'm capable of making the updates, but I'm not protective of being the one to do them. |
|
This isn't actually true.
The MaxConcurrentWorkers is only a max limit of concurrent routines using Semaphores, it doesn't actually spawn that many routines. The impact is a small memory consumption from generating a large channel size. |
The code does log 5XX errors via usp-adapters/mimecast/client.go Lines 712 to 721 in 9d1555d
|
Co-authored-by: Maxime Lamothe-Brassard <[email protected]>
|
@maximelb I believe I have addressed all code review findings. |
|
/gcbrun |
- Fix error variable shadowing bug where err from io.ReadAll was being shadowed by err from strconv.Atoi/http.ParseTime - Fix mutex contention by not holding tokenMu during HTTP calls in refreshOAuthToken - Fix silent error ignore in submitEvents for non-ErrorBufferFull errors - Fix potential deadlock by using context cancellation instead of calling Close() from within fetch loop goroutines - Fix tight loop when Retry-After time has already passed by adding minimum 1 second sleep - Fix 5xx errors being swallowed - now properly returns error so api.since won't be updated and data won't be lost - Fix struct tag alignment inconsistencies in MimecastConfig - Fix generateLogHash to use JSON marshaling for deterministic hashing of complex/nested values - Add shutdown check in submitEvents loop 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
When there was no Retry-After header or it couldn't be parsed, retryAfterTime remained the zero value. The condition retryAfterTime.Before(time.Now()) was always true for the zero value (year 0001 is before current time), causing the code to incorrectly enter the "time already passed" branch (1s wait) instead of the "no header" branch (60s wait). Fix by checking !retryAfterTime.IsZero() before the Before check and restructure the conditions for clarity. Also added comment documenting that InitialLookback defaults to zero. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
/gcbrun |
- Replace complex nested goroutine structure with errgroup for cleaner concurrency control and automatic cancellation propagation - Fix data race in shouldShutdown() by using api.IsActive() instead of direct field access - Fix token refresh race condition with double-checked locking - Fix Retry-After duration truncation by using time.Duration directly The refactored RunFetchLoop is ~75 lines shorter and eliminates: - 3 levels of nested goroutines - 4 coordination channels (cycleSem, shipperSem, shipCh, shipDone) - Multiple early exit paths that could leak goroutines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove unused MaxConcurrentShippers config field - Remove unused AuditLog type - Add 1-hour retry deadline for 429 rate limiting - Add 5XX retry with exponential backoff (30s-5m), 1h max - Use singleflight for token refresh to prevent thundering herd - Extend dedupe cleanup window from 60s to 1 hour - Fix minor style issues (indentation, blank lines) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Increase Close() timeout from 10s to 2min to allow in-flight HTTP requests (60s timeout) and Ship() calls to complete gracefully - Reset retry counters after each successful page fetch so each page gets a fresh retry budget 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
| } | ||
| // Handle non-ErrorBufferFull errors | ||
| a.conf.ClientOptions.OnError(fmt.Errorf("Ship(): %v", err)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Non-buffer-full shipping errors don't trigger shutdown
When uspClient.Ship returns an error that isn't ErrorBufferFull, the error is logged on line 926 but the loop continues processing subsequent events. Other adapters in the codebase stop and signal shutdown when any non-recoverable ship error occurs. This allows the adapter to silently drop messages that fail to ship due to unexpected errors, continuing operation in a potentially broken state instead of failing cleanly.
Rewrite Mimecast adapter with multi-API support
Type of change
Note
Replaces the Mimecast adapter with an OAuth-backed, concurrent multi-API fetcher, adds new config, robust retry/rate-limit handling, deduping, and graceful shutdown.
getAuthTokenandbaseURLconst.errgroupacross APIs:auditEvents,attachment,impersonation,url,dlp.queryInterval) with per-API state (since,active, per-APIdedupe).base_url,initial_lookback,max_concurrent_workers; validate and set sane defaults.http.Clientand header handling; page size increased to500.Retry-After, and 5xx with capped backoff; 1h retry deadline.ApiResponse.Datanow[]map[string]interface{}; support nested log arrays (e.g.,attachmentLogs,urlLogs).closeOnce/fetchOnce,chFetchLoop; improvedClose()waiting.submitEventsstreams with backpressure handling and cancellation on prolonged buffer full.Written by Cursor Bugbot for commit 7407d8a. This will update automatically on new commits. Configure here.