-
Notifications
You must be signed in to change notification settings - Fork 198
Open
Description
Problem
Models fail to load across multiple surfaces (chat.webllm.ai, JSFiddle examples, Chrome extensions) as reported in #85. Current error handling lacks:
- Structured error classification
- Automatic retry mechanisms
- Cache recovery logic
- User-actionable error messages
- Self-hosting capabilities
Root Causes Identified
- Insufficient Error Diagnostics - Generic error messages without classification codes
- No Retry Logic - Transient network/CDN failures cause hard stops
- Cache Corruption - No automatic cache clearing and retry
- No Self-Hosting Support - Users locked into default CDN with no override option
Proposed Solution
Phase 1: Enhanced Error Diagnostics (High Priority)
- Add
ModelLoadErrorCodeenum (manifest_fetch_failed, artifact_fetch_failed, worker_init_failed, webgpu_init_failed, cache_invalid) - Implement error classification in
webllm.ts - Add structured error display with actionable guidance
- Include "Copy Diagnostics" feature for bug reports
Files: app/client/api.ts, app/client/webllm.ts, app/store/chat.ts
Phase 2: Retry Logic & Self-Recovery
- Automatic retry with exponential backoff (max 3 attempts, 1s → 2s → 4s)
- Automatic cache clearing on cache_invalid errors
- Progress indication during retries
- Only retry on retryable error types
Files: app/client/webllm.ts
Phase 3: Custom Artifact Source Support
- Add
customModelBaseUrlconfig option - UI setting in model configuration panel
- Allow users to self-host model artifacts
- Addresses issue [Bug] chat.weblm.ai not loading LLMs #85 reporter's request
Files: app/store/config.ts, app/components/model-config.tsx, app/client/webllm.ts
Phase 4: Documentation
- Troubleshooting guide with error code explanations
- Self-hosting setup instructions
- Updated issue templates with diagnostic fields
Files: docs/TROUBLESHOOTING.md, docs/SELF_HOSTING.md, .github/ISSUE_TEMPLATE/bug_report.md
Acceptance Criteria
- All model load errors map to defined error codes
- Retryable errors trigger automatic retry (max 3)
- Cache corruption triggers automatic clear + retry
- Custom base URL configurable in Settings
- Error messages include actionable guidance
- "Copy Diagnostics" provides complete debug info
- Documentation covers all error codes and self-hosting
Implementation Details
Full implementation plan available in plan-85.md with:
- Detailed code examples for each phase
- Testing strategy (unit, integration, manual)
- Rollout strategy with risk assessment
- Success metrics and monitoring approach
Related Issues
- Addresses [Bug] chat.weblm.ai not loading LLMs #85 - Original bug report
- Improves overall error handling infrastructure
Estimated Effort
Time: 3-4 weeks (1 developer)
Priority: High (affects user experience across all surfaces)
Risk: Low-Medium (Phase 1-2), Low (Phase 3-4)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels