Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
4d8dda0
added run form structs and implementation
tsebastiani Oct 8, 2025
f58987b
feat: Add comprehensive test suite for forms package
tsebastiani Oct 8, 2025
a19c814
minor nit
tsebastiani Oct 9, 2025
b7f0fb2
flag parsing rebased
tsebastiani Oct 9, 2025
95a8ddb
feat: Implement platform-based GPU detection system for Lightspeed
tsebastiani Sep 12, 2025
5050168
feat: Add RAG model deployment and multi-platform containers
tsebastiani Sep 12, 2025
34b67f6
feat: Complete Lightspeed integration and documentation
tsebastiani Sep 12, 2025
406d61f
feat: Integrate krkn-lightspeed repository into Lightspeed containers
tsebastiani Sep 16, 2025
ccbcaca
feat: Pre-download embedding models and improve container environment
tsebastiani Sep 16, 2025
9921a72
fix: Remove trailing spaces from Containerfile.apple-silicon header
tsebastiani Sep 16, 2025
047fa7a
fix: Fix Python multiline code in Containerfiles causing FROM syntax …
tsebastiani Sep 16, 2025
7bcb329
fix: Fix malformed Python command in NVIDIA Containerfile
tsebastiani Sep 17, 2025
112b769
feat: Remove offline mode and fix container build issues
tsebastiani Sep 17, 2025
4f10d27
fix: Update RAG query endpoint to OpenAI-compatible API
tsebastiani Sep 17, 2025
8eca069
feat: refactoring + added unit tests for lightspeed package
tsebastiani Sep 17, 2025
1868ed5
fix cache invalidation for quicker builds
tsebastiani Sep 17, 2025
c1a9ffa
fix: adding explicit paths on pip and python
tsebastiani Sep 17, 2025
4bc00bc
fix: missing modules
tsebastiani Sep 17, 2025
a05a70e
feat: accelerating pytorch
tsebastiani Sep 17, 2025
39874fc
feat: preloading chromadb
tsebastiani Sep 17, 2025
dbf979b
feat: Add automatic scenario description for Lightspeed AI responses
tsebastiani Sep 18, 2025
cb613c6
fix: Remove existing krkn-lightspeed directory before fresh clone in …
tsebastiani Sep 18, 2025
6f4498b
debug: print json
tsebastiani Sep 18, 2025
a2656b8
feat: Implement complete scenario detail display for Lightspeed AI re…
tsebastiani Sep 18, 2025
1466c27
updated model to 3B
tsebastiani Sep 22, 2025
5395bfa
etrypoint updated
tsebastiani Sep 22, 2025
8783a24
feat: Optimize NVIDIA container size with aggressive space reduction
tsebastiani Sep 22, 2025
ee6dc02
fix: Remove --depth=1 from git clones to access krknctl_lightspeed br…
tsebastiani Sep 22, 2025
f5b7cea
fix: Add missing build dependencies and allow essential package deps
tsebastiani Sep 22, 2025
454d16a
fix: Simplify llama-cpp-python installation with precompiled wheels
tsebastiani Sep 22, 2025
e67522f
fix: Add missing langchain dependencies for document processing
tsebastiani Sep 22, 2025
88c52cc
fix: Add opentelemetry dependencies to resolve context loading issues
tsebastiani Sep 22, 2025
2d13dd6
feat: new model containerfile
tsebastiani Sep 23, 2025
6dbca16
fix: branch in containerfile
tsebastiani Sep 23, 2025
ae60d73
fix: updated entrypoint.sh
tsebastiani Sep 23, 2025
eb8d62b
fix: updated containerfile.apple-silicon
tsebastiani Sep 23, 2025
abc83ba
code refresh
tsebastiani Sep 23, 2025
a92df4b
added query timer
tsebastiani Sep 23, 2025
52865d2
Containerfiles updates
tsebastiani Sep 23, 2025
b1cbb9f
Containerfile optimizations
tsebastiani Sep 23, 2025
febf277
Eyecandies
tsebastiani Sep 23, 2025
faff535
cleanup containers folder and entrypoint.sh
tsebastiani Sep 24, 2025
f2d93c0
cleanup Containerfiles
tsebastiani Sep 24, 2025
6f9965a
fixing gosec
tsebastiani Sep 24, 2025
58e177c
renaming lightspeed to assist
tsebastiani Oct 30, 2025
40c7cf8
changed registry pointer
tsebastiani Oct 30, 2025
9fd1dc5
file rename
tsebastiani Oct 31, 2025
7a14c70
scenario runs correctly from assist form
tsebastiani Oct 31, 2025
1e47a35
removed LLM
tsebastiani Nov 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,18 @@ krknctl-*
.env
.idea
bin/
bin-linux/

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
*.egg-info/

.DS_Store
node_modules
node_modules/
227 changes: 227 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# CLAUDE.md
<!-- Generated by Claude Sonnet 4 -->

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Development Commands

### Building
```bash
# Build for current platform
go build -tags containers_image_openpgp -ldflags="-w -s" ./...

# Build for specific platforms (as used in CI)
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -tags containers_image_openpgp -ldflags="-w -s" -o linux-amd64/ ./...
GOOS=darwin GOARCH=arm64 CGO_ENABLED=0 go build -tags containers_image_openpgp -ldflags="-w -s" -o darwin-apple-silicon/ ./...
```

### Testing
```bash
# Run full test suite (requires podman/docker and Kubernetes cluster)
go test -tags containers_image_openpgp -race -json -v -coverprofile=coverage.out ./...

# Generate coverage report
go tool cover -func coverage.out
```

### Code Quality
```bash
# Run security scanner
gosec --exclude G402 ./...

# Run static code analyzer
staticcheck -checks all ./...
```

### Dependencies
- Requires Go 1.23.3+
- Requires either Podman or Docker runtime installed
- Tests require a Kubernetes cluster (kind is used in CI)
- On Ubuntu: `sudo apt-get install podman libbtrfs-dev nodejs wamerican libgpgme-dev`

## Architecture Overview

### Core Components

**Entry Point (main.go)**
- Initializes configuration from `pkg/config/config.json`
- Detects container runtime (Podman/Docker) via `utils.DetectContainerRuntime`
- Creates scenario orchestrator and provider factory instances
- Delegates to Cobra CLI commands in `cmd/`

**Configuration System (pkg/config/)**
- `config.json`: Central configuration with container registries, API endpoints, paths
- `config.go`: Configuration struct and loading logic with embedded JSON file

**CLI Commands (cmd/)**
- `root.go`: Main command structure with subcommands and global flags
- Individual command files: `run.go`, `list.go`, `describe.go`, `clean.go`, etc.
- Support for private registry authentication via flags or environment variables

**Provider System (pkg/provider/)**
- `factory/`: Factory pattern for different container registry providers
- `quay/`: Quay.io registry implementation
- `registryv2/`: Generic Docker Registry v2 API support
- `models/`: Data structures for registry interactions

**Scenario Orchestrator (pkg/scenarioorchestrator/)**
- Abstracts container runtime operations (Podman/Docker)
- `podman/`: Podman-specific implementation
- Manages chaos scenario container lifecycle

**Utility Packages**
- `pkg/utils/`: Common utilities and helpers
- `pkg/typing/`: Type definitions and validation
- `pkg/dependencygraph/`: Dependency graph management for scenario workflows
- `pkg/randomgraph/`: Random scenario generation

### Key Features

**Scenario Management**
- List available chaos scenarios from container registries
- Describe scenario details and input requirements
- Run individual scenarios or orchestrated workflows
- Support for detached execution mode

**Graph Workflows**
- Define dependency graphs of chaos scenarios in JSON format
- Execute scenarios in dependency order
- Support for parallel execution where dependencies allow
- Scaffold new workflow templates

**Random Testing**
- Generate random scenario execution plans
- Control parallelism and scenario count
- Use seed files for template-based random generation

**Private Registry Support**
- Basic authentication and token-based authentication
- Custom domain support beyond quay.io
- TLS configuration options

### Container Runtime Integration

The tool auto-detects and supports both Podman and Docker:
- Podman: Uses socket communication via `unix://` sockets
- Docker: Standard Docker socket integration
- Platform detection for Darwin vs Linux socket paths
- Graceful fallback between runtimes

### Configuration Patterns

- Global configuration embedded in binary via `go:embed`
- Runtime configuration via CLI flags and environment variables
- Kubeconfig path resolution for Kubernetes integration
- Custom alerts and metrics profile support

## Lightspeed AI-Powered Assistance

### Overview
Lightspeed is krknctl's AI-powered chaos engineering assistance feature that provides intelligent command suggestions and documentation search using Retrieval-Augmented Generation (RAG) with GPU acceleration.

### Major Implementation Tasks Completed

#### 1. GPU Detection System Redesign
**Previous System**: Complex container-based GPU detection using test images
- Removed complex GPU check implementation using container images
- Eliminated `GetSupportedGPUTypes()` and container-based testing approach

**New System**: Platform-based automatic detection
- **macOS arm64**: Automatically assumes Apple Silicon GPU support (Metal via libkrun)
- **Linux with NVIDIA devices**: Detects physical NVIDIA devices (`/dev/nvidia0`, `/dev/nvidiactl`, `/dev/nvidia-uvm`)
- **Generic fallback**: CPU-only mode for all other platforms
- Added `--no-gpu` flag to force CPU-only mode without device mounting

#### 2. Container Runtime Support
- **Podman Only**: Lightspeed exclusively supports Podman container runtime
- **Docker Blocking**: Commands fail gracefully with helpful error messages when Docker is detected
- **Error Handling**: Provides links to Podman GPU documentation (https://podman-desktop.io/docs/podman/gpu)

#### 3. Container Architecture
**Three Specialized Containers**:
- **Apple Silicon** (`rag-model-apple-silicon`): Vulkan backend for Apple M1/M2/M3/M4 GPUs
- **NVIDIA** (`rag-model-nvidia`): CUDA backend for NVIDIA GPUs
- **Generic** (`rag-model-generic`): CPU-only fallback for all other platforms

**Container Selection Logic**:
- Uses `PlatformGPUDetector.GetLightspeedImageURI()` to select appropriate container
- Tag construction follows `{rag_model_tag}-{architecture}` pattern from config
- Device mounting handled by `PlatformGPUDetector.GetDeviceMounts()`

#### 4. Configuration Integration
- **Config-Based Tags**: Uses `rag_model_tag` from `pkg/config/config.json` to construct container tags
- **Centralized Settings**: All RAG service parameters (ports, endpoints, timeouts) in configuration
- **Private Registry Support**: Full integration with existing private registry authentication

#### 5. Multi-Stage Container Build Fix
**Problem**: Documentation indexing failed in builder stage of multi-stage builds
- **Root Cause**: Git and Python dependencies not fully available during builder stage
- **Solution**: Moved documentation indexing from builder stage to runtime stage
- **Impact**: Fixed NVIDIA and Generic containers (Apple single-stage already worked)

**Fixed Containers**:
- **NVIDIA** (`Containerfile.nvidia`): Multi-stage build with runtime indexing
- **Generic** (`Containerfile.generic`): Multi-stage build with runtime indexing
- **Apple Silicon** (`Containerfile.apple-silicon`): Single-stage build (already working)

#### 6. Documentation Indexing System
**Sources Indexed**:
- Local krknctl help documentation
- Live krkn-chaos/website repository (chaos engineering guides)
- Live krkn-chaos/krkn-hub repository (scenario definitions)

**Indexing Process**:
- **Build Time**: Creates cached indices for offline/airgapped environments
- **Runtime**: Can rebuild indices with fresh documentation or use cached versions
- **Verification**: Automatic validation of indexed document sources and counts

#### 7. User Experience Improvements
**Progress Feedback**:
- Spinner with dynamic progress messages during container image pulls
- Real-time feedback during RAG model deployment
- Health checking with automatic retry and timeout handling

**Error Handling**:
- Platform-specific error messages with actionable solutions
- Automatic fallback from live indexing to cached documentation
- Container cleanup on deployment failures

### Technical Implementation

#### Core Components
- **`pkg/gpucheck/gpucheck.go`**: Platform-based GPU detection logic
- **`cmd/lightspeed_check.go`**: Lightspeed commands with Docker runtime blocking
- **`cmd/lightspeed.go`**: RAG model deployment with GPU-specific container selection
- **`pkg/config/config.go`**: Enhanced with Lightspeed-specific configuration methods

#### Container Files
- **`containers/lightspeed-rag/Containerfile.apple-silicon`**: Single-stage Vulkan build
- **`containers/lightspeed-rag/Containerfile.nvidia`**: Multi-stage CUDA build
- **`containers/lightspeed-rag/Containerfile.generic`**: Multi-stage CPU-only build

#### Key Functions
- **`DetectGPUAcceleration()`**: Platform-based GPU type detection
- **`deployRAGModelWithGPUType()`**: GPU-aware container deployment
- **`HandleContainerError()`**: Enhanced error reporting with helpful suggestions

### Usage Examples

```bash
# Automatic GPU detection and deployment
krknctl assist check

# AI-powered assistance with auto-detected GPU
krknctl assist run

# Force CPU-only mode (no GPU acceleration)
krknctl assist run --no-gpu

# Offline mode for airgapped environments
krknctl assist run --offline
```

### Development Notes
- **Testing**: Updated test suite to use new `PlatformGPUDetector` API
- **Backwards Compatibility**: Maintains existing CLI interface while simplifying internals
- **Build System**: All containers build successfully with proper documentation indexing
- **Error Recovery**: Graceful degradation when GPU features are unavailable
Loading