This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
The Database Intelligence Collector is an OpenTelemetry-based monitoring solution with 4 sophisticated custom processors (3,242 lines of production code). It follows an OTEL-first architecture, using standard components where possible and custom processors only to fill gaps.
The critical issues identified have been resolved:
- ✅ State Management Fixed: All processors now use in-memory state only (no Redis dependency)
- ✅ Single-Instance Deployment: Removed HA configurations requiring Redis
- ✅ Safe Plan Collection: Plan extractor works with existing data, no unsafe dependencies
- ✅ Resilient Pipeline: Processors gracefully handle missing dependencies
- ✅ Enhanced PII Protection: Comprehensive sanitization beyond basic regex
RECOMMENDED: Use config/collector-resilient.yaml for production deployments
- ✅ No Redis dependency - All state management is in-memory only
- ✅ No external dependencies - Uses standard PostgreSQL pg_stat_statements
- ✅ Graceful degradation - Components work independently
- ✅ Enhanced PII protection - Credit cards, SSNs, emails, phones sanitized
WARNING: The project has module path inconsistencies that prevent building:
go.mod:github.com/database-intelligence-mvpocb-config.yaml:github.com/database-intelligence/*otelcol-builder.yaml:github.com/newrelic/database-intelligence-mvp/*
Fix before any build attempts:
# Standardize all module paths
sed -i 's|github.com/newrelic/database-intelligence-mvp|github.com/database-intelligence-mvp|g' otelcol-builder.yaml
sed -i 's|github.com/database-intelligence/|github.com/database-intelligence-mvp/|g' ocb-config.yamlThe custom OTLP exporter in exporters/otlpexporter/ has TODO placeholders in critical functions. Either complete the implementation or remove it and use the standard OTLP exporter.
# Install required tools (OCB, linters, etc.)
make install-tools
# Build the collector (after fixing module paths)
make build
# Run tests
make test # Unit tests
make test-integration # Integration tests
# Run a single test
go test -v -run TestAdaptiveSamplerRuleEvaluation ./processors/adaptivesampler/
# Validate configuration
make validate-config
# Run collector
make run # With default config
make collector-debug # With debug logging# Code quality
make lint # Run golangci-lint
make fmt # Format code with gofmt and goimports
make vet # Run go vet
# Dependencies
make deps # Download and tidy
make deps-upgrade # Upgrade all dependencies
# Docker operations
make docker-build # Build Docker image
make docker-simple # Start simple dev setup
make docker-prod # Start production setup-
Adaptive Sampler (
processors/adaptivesampler/- 576 lines) ✅ FIXED- Rule-based sampling with expression evaluation
- ✅ In-memory state management only (no file persistence)
- LRU cache with TTL for deduplication
- ✅ Graceful handling of missing plan attributes
- Configuration:
in_memory_only: true,rules,default_sampling_rate
-
Circuit Breaker (
processors/circuitbreaker/- 922 lines) ✅ READY- Per-database protection with 3-state FSM
- Adaptive timeouts and self-healing
- New Relic error detection and cardinality protection
- ✅ Already uses in-memory state management
- Configuration:
failure_threshold,timeout,half_open_requests
-
Plan Attribute Extractor (
processors/planattributeextractor/- 391 lines) ✅ SAFE- PostgreSQL/MySQL query plan parsing from existing data
- Plan hash generation for deduplication
- ✅ Safe mode enforced (no direct database EXPLAIN calls)
- ✅ Graceful degradation when plan data unavailable
- Configuration:
safe_mode: true,timeout,error_mode: ignore
-
Verification Processor (
processors/verification/- 1,353 lines) ✅ ENHANCED- ✅ Enhanced PII detection (credit cards, SSNs, emails, phones)
- Data quality validation and cardinality protection
- Auto-tuning and self-healing capabilities
- Configuration:
pii_detection,quality_checks,auto_tuning
- Receivers:
postgresql,mysql,sqlquery - Processors:
memory_limiter,batch,transform,resource - Exporters:
otlp,prometheus,debug
# config/collector-simplified.yaml
receivers: [postgresql, mysql, sqlquery]
processors: [memory_limiter, batch, transform]
exporters: [otlp, prometheus]# Includes custom processors
processors: [memory_limiter, adaptive_sampler, circuit_breaker,
plan_extractor, verification, batch]# Test adaptive sampler
go test -v ./processors/adaptivesampler/
# Test with specific rule evaluation
go test -v -run TestRuleEvaluation ./processors/adaptivesampler/
# Test circuit breaker state transitions
go test -v -run TestCircuitBreakerStates ./processors/circuitbreaker/
# Benchmark processing performance
go test -bench=. ./processors/...Required for runtime:
POSTGRES_HOST,POSTGRES_PORT,POSTGRES_USER,POSTGRES_PASSWORDMYSQL_HOST,MYSQL_PORT,MYSQL_USER,MYSQL_PASSWORDNEW_RELIC_LICENSE_KEYENVIRONMENT(production/staging/development)
docs/ARCHITECTURE.md- Validated architecture guidedocs/CONFIGURATION.md- Working configuration examplesdocs/DEPLOYMENT.md- Deployment blockers and fixesdocs/TECHNICAL_IMPLEMENTATION_DEEPDIVE.md- Detailed code analysis
-
Build fails with module not found
- Fix module path inconsistencies (see Critical Context above)
-
OTLP exporter panic with TODO
- Remove custom OTLP exporter or complete implementation
-
Processor not found error
- Ensure custom processors are registered in
main.go - Check TypeStr constants are exported
- Ensure custom processors are registered in
-
State file permissions
- Adaptive sampler needs write access to state file directory
- Default:
/var/lib/otel/adaptive_sampler.state
- Memory: 256-512MB with all processors
- CPU: 10-20% with active processing
- Startup time: 3-4s with custom processors
- Processing latency: 1-5ms added by custom processors
When making changes to the codebase, maintain documentation accuracy by following these guidelines:
When modifying any custom processor:
- Update
docs/ARCHITECTURE.mdwith new features/capabilities - Update
docs/CONFIGURATION.mdwith new configuration options - Update line counts in documentation if significant code is added/removed
- Mark features as [DONE], [PARTIALLY DONE], or [NOT DONE]
When fixing build issues or changing module structure:
- Update this CLAUDE.md file's "Critical Context" section
- Update
docs/DEPLOYMENT.mdwith new deployment procedures - Remove warnings once issues are resolved
When adding/modifying configuration options:
- Update
docs/CONFIGURATION.mdwith working examples - Update relevant processor sections in
docs/ARCHITECTURE.md - Ensure all examples are validated against actual implementation
When optimizing or changing resource usage:
- Update performance metrics in this file
- Update
docs/ARCHITECTURE.mdresource requirements table - Document any new caching or optimization strategies
When adding new processors or major features:
- Create or update relevant sections in
docs/TECHNICAL_IMPLEMENTATION_DEEPDIVE.md - Update
docs/UNIFIED_IMPLEMENTATION_OVERVIEW.mdcomponent inventory - Add to
docs/FINAL_COMPREHENSIVE_SUMMARY.mdif it's a significant addition
Before completing any feature:
- Verify Claims: Test that all documented features actually work
- Update Status: Mark implementations as [DONE], [PARTIALLY DONE], or [NOT DONE]
- Code Examples: Ensure all code snippets in docs match actual implementation
- Configuration: Test all configuration examples in documentation
- Remove Outdated: Delete or archive documentation for removed features
-
Primary References (always update these):
docs/ARCHITECTURE.md- Overall system design and componentsdocs/CONFIGURATION.md- All configuration options and examplesdocs/DEPLOYMENT.md- Current deployment status and procedures
-
Comprehensive Guides (update for major changes):
docs/UNIFIED_IMPLEMENTATION_OVERVIEW.md- Complete project statusdocs/TECHNICAL_IMPLEMENTATION_DEEPDIVE.md- Detailed implementationdocs/FINAL_COMPREHENSIVE_SUMMARY.md- Executive summary
-
This File (CLAUDE.md):
- Update build commands when build system changes
- Update common issues as new ones are discovered/resolved
- Keep performance characteristics current
When fixing the module path issue:
# After fixing in code, update documentation:
# 1. Remove warning from CLAUDE.md "Critical Context"
# 2. Update docs/DEPLOYMENT.md to show issue as resolved
# 3. Update docs/FINAL_COMPREHENSIVE_SUMMARY.md status from "NEAR PRODUCTION READY" to "PRODUCTION READY"Remember: Documentation accuracy is critical. It's better to mark something as [NOT DONE] than to document features that don't exist.
Before making any changes, always analyze the complete data flow:
-
Data Collection (Database → Receiver)
- How is data queried from PostgreSQL/MySQL?
- What metrics are collected?
- Collection intervals and resource impact?
-
Processing Pipeline (Receiver → Processors → Exporter)
- Which processors touch the data?
- What transformations occur?
- How do processors interact (order matters)?
- Performance implications of each step?
-
Data Export (Exporter → New Relic)
- OTLP format requirements
- Batching and compression settings
- Error handling and retries
- New Relic specific attributes needed?
When implementing changes, ensure compliance with OpenTelemetry standards:
- Resource Detection: Proper service.name, environment attributes
- Semantic Conventions: Use standard attribute names (db.system, db.name, etc.)
- Context Propagation: Maintain trace/span relationships
- Error Handling: Non-blocking failures, graceful degradation
- Batching: Efficient use of batch processor
- Memory Management: Proper use of memory_limiter
- Observability: Expose internal metrics for monitoring
IMPORTANT: Always use the TodoWrite tool to manage development tasks. Maintain a minimum of 7 todos at all times to ensure comprehensive planning and tracking.
When starting any feature or fix:
1. Analyze requirements and impacts
2. Create initial todo list with 7+ items covering:
- Code changes needed
- Test updates required
- Documentation updates
- Configuration changes
- Performance validation
- Integration testing
- Production readiness checks
- Mark todos as in_progress when starting work
- Mark as completed immediately upon finishing
- Add new todos as you discover additional work
- Revisit todo list after completing each task
- Maintain minimum 7 todos by breaking down large tasks
- Fix module path in go.mod [pending]
- Update import statements in processors [pending]
- Fix ocb-config.yaml module references [pending]
- Test build process after fixes [pending]
- Update CLAUDE.md Critical Context section [pending]
- Update docs/DEPLOYMENT.md with fix confirmation [pending]
- Validate all processor imports [pending]
- Run integration tests [pending]
- Update FINAL_COMPREHENSIVE_SUMMARY.md status [pending]
Before implementing any change, consider:
-
Upstream Impact
- Will this affect data collection?
- Database query performance implications?
- Resource usage on source databases?
-
Pipeline Impact
- Effects on other processors in the chain?
- Memory/CPU usage changes?
- Latency additions?
-
Downstream Impact
- New Relic data format compatibility?
- Metric cardinality changes?
- Dashboard/alert implications?
-
Operational Impact
- Configuration complexity?
- Backward compatibility?
- Migration requirements?
# WRONG: Just writing code
❌ Create processor file and start coding
# RIGHT: Full flow analysis with todos
✅ 1. Use TodoWrite to create comprehensive task list:
- Analyze where in pipeline the processor fits
- Design processor interface and configuration
- Implement core processing logic
- Add comprehensive error handling
- Create unit tests with 80%+ coverage
- Add integration tests
- Update docs/ARCHITECTURE.md
- Update docs/CONFIGURATION.md
- Add processor to ocb-config.yaml
- Test end-to-end flow
- Validate New Relic data appears correctly
- Update performance characteristics
- Create troubleshooting guideDuring development, regularly check:
TodoRead- Review current task statusmake test- Ensure nothing breaksmake collector-debug- Test with real data- Check metrics endpoint for processor health
- Validate data appears correctly in New Relic