Skip to content

Commit eb5f468

Browse files
konardclaude
andcommitted
fix: Add serialized installation with retry logic to prevent cache race conditions
Fixes #72 - The root cause was race conditions when multiple provider packages are installed in parallel, causing Bun cache corruption ("FileNotFound: failed copying files from cache"). Changes: - Add write lock to serialize package installations - Add retry logic (up to 3 attempts) for cache-related errors - Improve error detection for ENOENT, EACCES, EBUSY errors - Add 500ms delay between retry attempts - Remove fallback behavior - opencode/grok-code remains the default This ensures the first run after installation works reliably. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent e13da35 commit eb5f468

4 files changed

Lines changed: 171 additions & 109 deletions

File tree

.changeset/graceful-provider-fallback.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,24 @@
22
'@link-assistant/agent': patch
33
---
44

5-
fix: Add graceful fallback when provider initialization fails
5+
fix: Add retry logic and serialized installation for reliable provider initialization
66

7-
Fixes issue #72 where version 0.3.0 appeared "completely broken" due to Bun package cache corruption preventing opencode provider initialization. The agent now gracefully falls back to alternative providers when initialization fails, improving resilience and user experience.
7+
Fixes issue #72 where version 0.3.0 appeared "completely broken" due to race conditions in parallel package installations causing Bun cache corruption.
8+
9+
Root Cause:
10+
11+
- When multiple provider packages (e.g., @ai-sdk/openai-compatible, @ai-sdk/openai) are installed concurrently, they can cause race conditions in Bun's package cache
12+
- This leads to "FileNotFound: failed copying files from cache" errors on first run after update
813

914
Changes:
1015

11-
- Test provider initialization before selecting it as default
12-
- Fall back to alternative providers if opencode provider fails to initialize
13-
- Add helpful error messages when Bun cache corruption is detected
14-
- Log warnings with detailed error information for troubleshooting
16+
- Add write lock to serialize package installations (prevents concurrent bun add commands)
17+
- Add retry logic with up to 3 attempts for cache-related errors
18+
- Improve error detection to catch ENOENT, EACCES, EBUSY errors
19+
- Add delay between retries to allow filesystem operations to complete
1520

1621
Impact:
1722

18-
- Agent no longer crashes when provider initialization fails
19-
- Better error messages guide users to recovery steps
20-
- Improved stability in production environments
23+
- opencode/grok-code remains the default provider and works reliably
24+
- Agent handles transient cache issues gracefully with automatic retries
25+
- Better stability during first run after installation/update

docs/case-studies/issue-72/README.md

Lines changed: 43 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -199,66 +199,57 @@ FileNotFound: failed copying files from cache to destination for package zod
199199

200200
## Proposed Solutions
201201

202-
### Solution 1: Add Graceful Fallback (Recommended)
202+
### Solution 1: Serialized Installation with Retry Logic (Implemented)
203203

204204
**Priority:** High
205-
**Effort:** Low
206-
**Impact:** Fixes the issue for all users
205+
**Effort:** Medium
206+
**Impact:** Fixes the issue for all users while keeping opencode as default
207207

208-
Modify provider initialization to:
208+
The root cause was identified as race conditions when multiple packages are installed in parallel. The fix:
209209

210-
1. Catch provider init failures
211-
2. Log warning instead of crashing
212-
3. Try next available provider
213-
4. Only crash if NO providers can be initialized
210+
1. Serialize package installations using a write lock
211+
2. Add retry logic for cache-related errors
212+
3. Improve error detection for various cache corruption symptoms
214213

215-
**Implementation location:** `src/provider/provider.ts:781-790`
214+
**Implementation location:** `src/bun/index.ts:68-220`
216215

217216
**Benefits:**
218217

219-
- Resilient to transient installation failures
220-
- Better user experience
221-
- Maintains backward compatibility
222-
- Users can still use the agent with other providers
218+
- opencode/grok-code remains the default provider
219+
- Resilient to transient cache issues
220+
- Automatic retry handles temporary failures
221+
- No fallback to other providers needed
223222

224223
**Code change:**
225224

226225
```typescript
227-
// In defaultModel() function
228-
try {
229-
const opencodeProvider = providers.find((p) => p.info.id === 'opencode');
230-
if (opencodeProvider) {
231-
const [model] = sort(Object.values(opencodeProvider.info.models));
232-
if (model) {
233-
try {
234-
// Verify provider can be initialized
235-
await getSDK(opencodeProvider.info, model);
236-
return {
237-
providerID: opencodeProvider.info.id,
238-
modelID: model.id,
239-
};
240-
} catch (initError) {
241-
log.warn(
242-
'Failed to initialize preferred opencode provider, falling back',
243-
{
244-
error:
245-
initError instanceof Error
246-
? initError.message
247-
: String(initError),
248-
}
249-
);
250-
}
226+
// Use a write lock to serialize all package installations
227+
using _ = await Lock.write(INSTALL_LOCK_KEY);
228+
229+
// Retry logic for cache-related errors
230+
let lastError: Error | undefined;
231+
for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
232+
try {
233+
await BunProc.run(args, { cwd: Global.Path.cache });
234+
log.info('package installed successfully', { pkg, version, attempt });
235+
return mod;
236+
} catch (e) {
237+
const errorMsg = e instanceof Error ? e.message : String(e);
238+
const isCacheError = isCacheRelatedError(errorMsg);
239+
240+
if (isCacheError && attempt < MAX_RETRIES) {
241+
log.info('retrying installation after cache-related error', {
242+
pkg,
243+
version,
244+
attempt,
245+
nextAttempt: attempt + 1,
246+
});
247+
await delay(RETRY_DELAY_MS);
248+
continue;
251249
}
250+
throw new InstallFailedError({ pkg, version, details: errorMsg });
252251
}
253-
} catch (e) {
254-
log.warn('Error checking opencode provider, continuing with fallback');
255252
}
256-
257-
// Fall back to any available provider
258-
const provider = providers.find(
259-
(p) => !cfg.provider || Object.keys(cfg.provider).includes(p.info.id)
260-
);
261-
// ... rest of existing fallback logic
262253
```
263254

264255
### Solution 2: Provide Cache Clear Instructions
@@ -348,8 +339,12 @@ bun install -g @link-assistant/agent@0.2.1
348339

349340
## Conclusion
350341

351-
Version 0.3.0 is NOT fundamentally broken in code, but **fails due to Bun runtime cache corruption** when trying to initialize the new default opencode provider. The issue is **environmental** rather than a code defect.
342+
Version 0.3.0 is NOT fundamentally broken in code, but **fails due to race conditions in parallel package installations** causing Bun cache corruption when trying to initialize the new default opencode provider. The issue is **environmental** rather than a code defect.
343+
344+
**Implemented Fix:**
352345

353-
**Implemented Fix:** Added graceful fallback in `Provider.defaultModel()` to try opencode provider first, and if it fails, skip it and fall back to other available providers. This makes the agent resilient to provider initialization failures.
346+
1. **Serialized package installations** - Added a write lock to ensure only one `bun add` command runs at a time, preventing race conditions
347+
2. **Retry logic for cache errors** - Added automatic retry (up to 3 attempts) for cache-related errors with a 500ms delay between attempts
348+
3. **Improved error detection** - Enhanced detection of cache-related errors (FileNotFound, ENOENT, EACCES, EBUSY)
354349

355-
**Status:** Fix implemented and tested. Issue should be resolved for users with working alternative providers.
350+
**Status:** Fix implemented and tested. The opencode/grok-code provider remains the default and will work reliably even with transient cache issues.

src/bun/index.ts

Lines changed: 103 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,14 @@ import path from 'path';
55
import { NamedError } from '../util/error';
66
import { readableStreamToText } from 'bun';
77
import { Flag } from '../flag/flag';
8+
import { Lock } from '../util/lock';
89

910
export namespace BunProc {
1011
const log = Log.create({ service: 'bun' });
1112

13+
// Lock key for serializing package installations to prevent race conditions
14+
const INSTALL_LOCK_KEY = 'bun-install';
15+
1216
export async function run(
1317
cmd: string[],
1418
options?: Bun.SpawnOptions.OptionsObject<any, any, any>
@@ -65,8 +69,38 @@ export namespace BunProc {
6569
})
6670
);
6771

72+
// Maximum number of retry attempts for cache-related errors
73+
const MAX_RETRIES = 3;
74+
// Delay between retries in milliseconds
75+
const RETRY_DELAY_MS = 500;
76+
77+
/**
78+
* Check if an error is related to Bun cache issues
79+
*/
80+
function isCacheRelatedError(errorMsg: string): boolean {
81+
return (
82+
errorMsg.includes('failed copying files from cache') ||
83+
errorMsg.includes('FileNotFound') ||
84+
errorMsg.includes('ENOENT') ||
85+
errorMsg.includes('EACCES') ||
86+
errorMsg.includes('EBUSY')
87+
);
88+
}
89+
90+
/**
91+
* Wait for a specified duration
92+
*/
93+
function delay(ms: number): Promise<void> {
94+
return new Promise((resolve) => setTimeout(resolve, ms));
95+
}
96+
6897
export async function install(pkg: string, version = 'latest') {
6998
const mod = path.join(Global.Path.cache, 'node_modules', pkg);
99+
100+
// Use a write lock to serialize all package installations
101+
// This prevents race conditions when multiple packages are installed concurrently
102+
using _ = await Lock.write(INSTALL_LOCK_KEY);
103+
70104
const pkgjson = Bun.file(path.join(Global.Path.cache, 'package.json'));
71105
const parsed = await pkgjson.json().catch(async () => {
72106
const result = { dependencies: {} };
@@ -108,39 +142,78 @@ export namespace BunProc {
108142
version,
109143
});
110144

111-
await BunProc.run(args, {
112-
cwd: Global.Path.cache,
113-
}).catch((e) => {
114-
const errorMsg = e instanceof Error ? e.message : String(e);
115-
const isCacheError =
116-
errorMsg.includes('failed copying files from cache') ||
117-
errorMsg.includes('FileNotFound');
145+
// Retry logic for cache-related errors
146+
let lastError: Error | undefined;
147+
for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
148+
try {
149+
await BunProc.run(args, {
150+
cwd: Global.Path.cache,
151+
});
118152

119-
log.error('package installation failed', {
120-
pkg,
121-
version,
122-
error: errorMsg,
123-
stack: e instanceof Error ? e.stack : undefined,
124-
possibleCacheCorruption: isCacheError,
125-
});
126-
127-
// Provide helpful recovery instructions for cache-related errors
128-
if (isCacheError) {
129-
log.error(
130-
'Bun package cache may be corrupted. Try clearing the cache with: bun pm cache rm'
153+
log.info('package installed successfully', { pkg, version, attempt });
154+
parsed.dependencies[pkg] = version;
155+
await Bun.write(pkgjson.name!, JSON.stringify(parsed, null, 2));
156+
return mod;
157+
} catch (e) {
158+
const errorMsg = e instanceof Error ? e.message : String(e);
159+
const isCacheError = isCacheRelatedError(errorMsg);
160+
161+
log.warn('package installation attempt failed', {
162+
pkg,
163+
version,
164+
attempt,
165+
maxRetries: MAX_RETRIES,
166+
error: errorMsg,
167+
isCacheError,
168+
});
169+
170+
if (isCacheError && attempt < MAX_RETRIES) {
171+
log.info('retrying installation after cache-related error', {
172+
pkg,
173+
version,
174+
attempt,
175+
nextAttempt: attempt + 1,
176+
delayMs: RETRY_DELAY_MS,
177+
});
178+
await delay(RETRY_DELAY_MS);
179+
lastError = e instanceof Error ? e : new Error(errorMsg);
180+
continue;
181+
}
182+
183+
// Non-cache error or final attempt - log and throw
184+
log.error('package installation failed', {
185+
pkg,
186+
version,
187+
error: errorMsg,
188+
stack: e instanceof Error ? e.stack : undefined,
189+
possibleCacheCorruption: isCacheError,
190+
attempts: attempt,
191+
});
192+
193+
// Provide helpful recovery instructions for cache-related errors
194+
if (isCacheError) {
195+
log.error(
196+
'Bun package cache may be corrupted. Try clearing the cache with: bun pm cache rm'
197+
);
198+
}
199+
200+
throw new InstallFailedError(
201+
{ pkg, version, details: errorMsg },
202+
{
203+
cause: e,
204+
}
131205
);
132206
}
207+
}
133208

134-
throw new InstallFailedError(
135-
{ pkg, version, details: errorMsg },
136-
{
137-
cause: e,
138-
}
139-
);
140-
});
141-
log.info('package installed successfully', { pkg, version });
142-
parsed.dependencies[pkg] = version;
143-
await Bun.write(pkgjson.name!, JSON.stringify(parsed, null, 2));
144-
return mod;
209+
// This should not be reached, but handle it just in case
210+
throw new InstallFailedError(
211+
{
212+
pkg,
213+
version,
214+
details: lastError?.message ?? 'Installation failed after all retries',
215+
},
216+
{ cause: lastError }
217+
);
145218
}
146219
}

src/provider/provider.ts

Lines changed: 11 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -899,37 +899,26 @@ export namespace Provider {
899899
const cfg = await Config.get();
900900
if (cfg.model) return parseModel(cfg.model);
901901

902-
// Prefer opencode provider if available, but verify it can be initialized
902+
// Prefer opencode provider if available
903903
const providers = await list().then((val) => Object.values(val));
904904
const opencodeProvider = providers.find((p) => p.info.id === 'opencode');
905905
if (opencodeProvider) {
906906
const [model] = sort(Object.values(opencodeProvider.info.models));
907907
if (model) {
908-
try {
909-
// Try to initialize the opencode provider to ensure it works
910-
await getSDK(opencodeProvider.info, model);
911-
log.info('using preferred opencode provider as default');
912-
return {
913-
providerID: opencodeProvider.info.id,
914-
modelID: model.id,
915-
};
916-
} catch (error) {
917-
// If opencode provider fails to initialize, log warning and fall back
918-
log.warn(
919-
'failed to initialize preferred opencode provider, falling back to alternative',
920-
{
921-
error: error instanceof Error ? error.message : String(error),
922-
provider: opencodeProvider.info.id,
923-
model: model.id,
924-
}
925-
);
926-
}
908+
log.info('using opencode provider as default', {
909+
provider: opencodeProvider.info.id,
910+
model: model.id,
911+
});
912+
return {
913+
providerID: opencodeProvider.info.id,
914+
modelID: model.id,
915+
};
927916
}
928917
}
929918

930-
// Fall back to any available provider (skip opencode since it failed)
919+
// Fall back to any available provider if opencode is not available
931920
const provider = providers.find(
932-
(p) => p.info.id !== 'opencode' && (!cfg.provider || Object.keys(cfg.provider).includes(p.info.id))
921+
(p) => !cfg.provider || Object.keys(cfg.provider).includes(p.info.id)
933922
);
934923
if (!provider) throw new Error('no providers found');
935924
const [model] = sort(Object.values(provider.info.models));

0 commit comments

Comments
 (0)