This file provides guidelines for AI assistants (Claude, GPT, etc.) working on the Video Caption Suite project. Any changes to the codebase must be reflected in the documentation.
Video Caption Suite is a video captioning application using the Qwen3-VL-8B vision-language model. It consists of:
- Backend: Python/FastAPI server (
backend/) - Frontend: Vue 3/TypeScript application (
frontend/src/) - Documentation: Comprehensive docs (
documentation/)
CRITICAL: When making any code changes, you MUST update the corresponding documentation.
| Change Type | Files to Update |
|---|---|
| New API endpoint | documentation/API.md |
| API endpoint changes | documentation/API.md |
| New Vue component | documentation/FRONTEND.md |
| Component prop/event changes | documentation/FRONTEND.md |
| New/changed settings | documentation/CONFIGURATION.md |
| Architecture changes | documentation/ARCHITECTURE.md |
| Setup/build changes | documentation/DEVELOPMENT.md |
| Major features | documentation/README.md |
Before completing any task, verify:
- Code changes compile/run without errors
- TypeScript types match backend schemas
- API documentation matches actual endpoints
- Frontend documentation matches actual components
- Configuration documentation matches actual options
- Line number references in docs are still accurate
| Purpose | File | Key Sections |
|---|---|---|
| API Server | backend/api.py |
Endpoints, WebSocket, CORS |
| Data Models | backend/schemas.py |
Settings, Progress, Video/Caption, Analytics |
| Processing | backend/processing.py |
ProcessingManager, multi-GPU |
| Analytics | backend/analytics.py |
Word frequency, n-grams, correlations |
| Resource Monitor | backend/resource_monitor.py |
ResourceMonitor, CPU/RAM/GPU metrics |
| Model Loading | backend/model_loader.py |
load_model, generate_caption, clear_cache |
| Video Processing | backend/video_processor.py |
extract_frames, process_video |
| Configuration | backend/config.py |
All defaults |
| Purpose | File |
|---|---|
| Root Component | frontend/src/App.vue |
| Video State | frontend/src/stores/videoStore.ts |
| Progress State | frontend/src/stores/progressStore.ts |
| Settings State | frontend/src/stores/settingsStore.ts |
| Analytics State | frontend/src/stores/analyticsStore.ts |
| Resource State | frontend/src/stores/resourceStore.ts |
| API Calls | frontend/src/composables/useApi.ts |
| WebSocket | frontend/src/composables/useWebSocket.ts |
| Resource WebSocket | frontend/src/composables/useResourceWebSocket.ts |
| Types | frontend/src/types/*.ts |
| Analytics Components | frontend/src/components/analytics/*.vue |
| Resource Monitor | frontend/src/components/layout/ResourceMonitor.vue |
Adding an API endpoint:
# backend/api.py
@app.post("/api/my-endpoint", response_model=MyResponse)
async def my_endpoint(request: MyRequest):
"""Docstring describing the endpoint"""
# Implementation
return MyResponse(...)Updating schemas:
# backend/schemas.py
class MyModel(BaseModel):
field: str = Field(default="value", description="Description")Vue component structure:
<script setup lang="ts">
// Props and emits
interface Props { ... }
const props = defineProps<Props>()
const emit = defineEmits<{ ... }>()
// Store usage
const store = useMyStore()
// Computed and methods
const computed = computed(() => ...)
</script>
<template>
<!-- Template -->
</template>Store pattern:
export const useMyStore = defineStore('my', {
state: () => ({ ... }),
getters: { ... },
actions: { ... }
})- Add to
backend/schemas.py(Settings class) - Add default to
backend/config.py - Add to
frontend/src/types/settings.ts - Add UI in appropriate settings component
- Update
documentation/CONFIGURATION.md
- Define request/response in
backend/schemas.py - Add endpoint in
backend/api.py - Add frontend function in
frontend/src/composables/useApi.ts - Update
documentation/API.md
Memory management is critical. When modifying model loading/unloading:
- Clear all references before
gc.collect() - Call
torch.cuda.synchronize()beforeempty_cache() - Update
clear_cache()inbackend/model_loader.py - Update
unload_model()inbackend/processing.py
Before marking a task complete:
# Backend syntax check
python -m py_compile backend/api.py backend/processing.py backend/model_loader.py
# Frontend build check
cd frontend && npm run build:check
# Full test suite (if applicable)
pytest backend/tests/
cd frontend && npm run testBackend and frontend types must stay synchronized:
| Backend (Pydantic) | Frontend (TypeScript) |
|---|---|
backend/schemas.py:Settings |
frontend/src/types/settings.ts:Settings |
backend/schemas.py:ProgressUpdate |
frontend/src/types/progress.ts:ProgressState |
backend/schemas.py:VideoInfo |
frontend/src/types/video.ts:VideoInfo |
backend/schemas.py:ProcessingStage |
frontend/src/types/progress.ts:ProcessingStage |
backend/schemas.py:WordFrequency* |
frontend/src/types/analytics.ts:WordFrequency* |
backend/schemas.py:Ngram* |
frontend/src/types/analytics.ts:Ngram* |
backend/schemas.py:Correlation* |
frontend/src/types/analytics.ts:Correlation* |
backend/schemas.py:GPUResourceMetrics |
frontend/src/types/resources.ts:GPUMetrics |
backend/schemas.py:ResourceUpdate |
frontend/src/types/resources.ts:ResourceSnapshot |
When changing one, change the other.
- Real-time updates without polling
- Server can push updates as they happen
- Automatic reconnection handles network issues
- Large video libraries (1000+ files) need progressive loading
- Reduces memory usage vs single large response
- Better perceived performance
- Parallel loading can cause OOM
- Sequential is slower but reliable
- Each GPU needs ~16GB for the 8B model
- SageAttention: Disabled for Qwen3-VL due to non-standard head dimensions (80 vs 64/96/128)
- GGUF Support: Not implemented - requires llama.cpp server approach
- OpenRouter: Not implemented - would require base64 frame encoding
- Authentication: None - designed for local use only
The following features were removed and should NOT be re-implemented without discussion:
- TensorRT Compilation: Removed due to complexity with VLMs (torch.compile doesn't serialize, TensorRT-LLM required)
When updating documentation:
- Use tables for structured data
- Include code examples
- Reference file paths with line numbers when helpful
- Keep the table of contents updated
- Use consistent heading levels
If unclear about:
- Where code should go → Check ARCHITECTURE.md
- How an API works → Check API.md
- How the frontend is structured → Check FRONTEND.md
- What settings exist → Check CONFIGURATION.md
- How to set up/test → Check DEVELOPMENT.md