This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This repository contains two interconnected projects:
- BudSimulator: Full-stack web application for LLM analysis (main directory)
- llm-memory-calculator: Core Python package for memory calculation and performance modeling
BudSimulator is a comprehensive AI model simulation and benchmarking platform built on top of the GenZ-LLM Analyzer framework. It provides analytical performance estimates for Large Language Model (LLM) inference on various hardware platforms through a full-stack application with React frontend and FastAPI backend.
Key components:
- llm-memory-calculator: Core engine with GenZ framework for LLM performance modeling (installed from
llm-memory-calculator/) - FastAPI Backend: REST API with hardware recommendations and model analysis (
BudSimulator/apis/) - React Frontend: Interactive UI for hardware selection and use case management (
BudSimulator/frontend/) - Streamlit Interface: Alternative web dashboard for model comparison (
BudSimulator/Website/)
# Automated setup (recommended) - installs everything and starts servers
python BudSimulator/setup.py
# Manual installation of llm-memory-calculator package
pip install -e llm-memory-calculator/
# Manual installation of BudSimulator in development mode
pip install -e BudSimulator/
# Or install from PyPI
pip install genz-llm# Backend API (stable mode)
cd BudSimulator
python run_api.py
# Backend API (with hot-reload for development)
cd BudSimulator
RELOAD=true python run_api.py # Unix/Linux/macOS
set RELOAD=true && python run_api.py # Windows
# Frontend (in frontend directory)
cd BudSimulator/frontend
npm install # First time only
npm start
# Alternative Streamlit interface
streamlit run BudSimulator/Website/Home.py# Run all tests
pytest BudSimulator/tests/ -v
# Run specific test file
pytest BudSimulator/comprehensive_api_test.py
# Test CPU functionality
python BudSimulator/cpu_test.py
python BudSimulator/cpu_test_detailed.py
# Test model parameters
python BudSimulator/model_param_test.py
# Run a single test function
pytest BudSimulator/tests/test_file.py::test_function_name -v# Python linting (if ruff is installed)
ruff check BudSimulator/
# Frontend linting
cd BudSimulator/frontend
npm run lint # Note: Currently no explicit lint script, uses eslint via react-scripts
# Python type checking (if mypy is installed)
mypy BudSimulator/The simulator repository contains two main projects that work together:
simulator/
├── BudSimulator/ # Full-stack web application
│ ├── apis/ # FastAPI backend with routers
│ ├── frontend/ # React TypeScript UI
│ ├── src/ # Core business logic and services
│ ├── GenZ/ # Legacy GenZ framework (deprecated, use llm-memory-calculator)
│ └── Website/ # Streamlit dashboard
│
└── llm-memory-calculator/ # Core performance modeling engine
└── src/llm_memory_calculator/
├── genz/ # GenZ framework implementation
│ ├── system.py # Hardware abstraction
│ ├── operators.py # Computational operators
│ ├── parallelism.py # Parallelism strategies
│ └── LLM_inference/ # Prefill/decode modeling
└── performance_api.py # High-level API
-
System Abstraction (
llm-memory-calculator/src/llm_memory_calculator/genz/system.py)- Represents hardware accelerators with compute/memory capabilities
- Handles different precision formats (fp32, bf16, int8) with compute/memory multipliers
- Integrates with compute engines (GenZ, Scale-sim) and collective strategies (GenZ, ASTRA-SIM)
-
Operator Framework (
llm-memory-calculator/src/llm_memory_calculator/genz/operator_base.py,operators.py)- Base class for all computational operations with roofline analysis
- Concrete operators: FC, GEMM, Logit, Attend, CONV2D, Einsum
- Special operators for synchronization and layer repetition
-
Parallelism Management (
llm-memory-calculator/src/llm_memory_calculator/genz/parallelism.py)- ParallelismConfig handles: tensor_parallel, pipeline_parallel, data_parallel, expert_parallel
- Supports hierarchical parallelism (e.g., "TP{4}_EP{2}_PP{1}")
- Collective operations: AllReduce, All2All, AllGather for inter-accelerator communication
-
Performance Modeling (
llm-memory-calculator/src/llm_memory_calculator/genz/LLM_inference/)- Models prefill and decode phases separately (
llm_prefill.py,llm_decode.py) - Calculates memory requirements (weights + KV cache)
- Handles memory offloading when capacity exceeded
- Returns latency, throughput, and runtime breakdown
- Models prefill and decode phases separately (
-
Model Management (
llm-memory-calculator/src/llm_memory_calculator/genz/Models/)- Registry of pre-configured models (Meta, Google, Microsoft, etc.)
- Creates model definitions with specified parallelism configurations
- Supports MHA (Multi-Head Attention) and Mamba architectures
- FastAPI application with automatic OpenAPI documentation (/docs)
- Routers for models, hardware, and usecases
- CORS-enabled for frontend integration
- Static file serving for logos
- SQLAlchemy models for hardware and model data
- Pre-populated SQLite database at
BudSimulator/data/prepopulated.db - Alembic for migrations
- React 18.2 with TypeScript
- Tailwind CSS for styling
- Components for hardware browsing, usecase management, and AI memory calculation
- Proxy configuration to backend API (port 8000)
- Model Definition → CSV representation of operator sequences
- Performance Analysis → Per-operator roofline analysis
- Aggregation → Runtime breakdown by component (MHA, FFN, Embedding, Collective)
- Visualization → Web interface for interactive exploration
From llm-memory-calculator package:
calculate_memory(): Calculate memory from model ID or configestimate_prefill_performance()/estimate_decode_performance(): Performance estimationget_best_parallelization_strategy(): Find optimal parallelism configurationget_hardware_config(): Get predefined hardware configurations
Legacy GenZ interfaces (in BudSimulator/GenZ/):
get_model_df(): Load model and compute performance metricsprefill_moddeling()/decode_moddeling(): End-to-end performance estimationSystem(): Hardware platform definitionParallelismConfig(): Distributed execution configuration
- Create model definition in
BudSimulator/GenZ/Models/Model_sets/ - Register in model registry
- Update database if needed via migration
- Add frontend display logic if required
- Define system specs in
BudSimulator/GenZ/system.pyformat - Add to database via migration or seed script
- Update hardware recommendation logic in
BudSimulator/src/services/ - Test with existing models
- Update router in
BudSimulator/apis/routers/ - Modify corresponding service in
BudSimulator/src/ - Update frontend API service and TypeScript types
- Run tests to ensure compatibility
- Create new Alembic migration:
alembic revision -m "description" - Apply migrations:
alembic upgrade head - Database is at
BudSimulator/data/prepopulated.db
BudSimulator/run_api.py- Main API entry pointBudSimulator/setup.py- Automated setup scriptBudSimulator/apis/- FastAPI application and routersBudSimulator/src/- Core business logic and servicesBudSimulator/GenZ/- Performance modeling frameworkBudSimulator/frontend/- React applicationBudSimulator/Website/- Streamlit alternative UIBudSimulator/tests/- Test suiteBudSimulator/data/prepopulated.db- SQLite databaseBudSimulator/config/env.template- Environment configuration template
- Python: Follow PEP 8, use type hints where appropriate
- TypeScript: Use strict mode, define interfaces for all API responses
- React: Functional components with hooks
- API: RESTful conventions, consistent error handling
- Document complex algorithms with inline comments
- Keep functions focused and single-purpose
The codebase uses unique_id (string) for usecases, not numeric id. This is critical for proper functionality:
# Correct
usecase = db.query(Usecase).filter(Usecase.unique_id == unique_id).first()
# Incorrect (will fail)
usecase = db.query(Usecase).filter(Usecase.id == id).first()Create a .env file based on BudSimulator/config/env.template:
- LLM_PROVIDER: openai, anthropic, ollama, or custom
- LLM_API_KEY: Your provider API key
- LLM_MODEL: Model name (e.g., gpt-4, claude-3-opus-20240229)
- LLM_API_URL: Provider endpoint URL
The project uses:
- Python dependencies in
BudSimulator/requirements.txt - Frontend dependencies in
BudSimulator/frontend/package.json(React 18.2, TypeScript) - Core package configuration in
BudSimulator/pyproject.toml(genz_llm v0.0.16) - llm-memory-calculator package from local directory or PyPI
- Backend: pytest with test files in various locations
- Frontend: React Testing Library and Jest via react-scripts
- Integration: API testing with
comprehensive_api_test.py - CPU Testing: Specialized tests in
cpu_test.pyandcpu_test_detailed.py
- SQLite database at
BudSimulator/data/prepopulated.db - SQLAlchemy for ORM
- Alembic for migrations (though migration files may need setup)
- Database initialization script:
BudSimulator/scripts/setup_database.py
- FastAPI with automatic docs at
/docs - CORS enabled for frontend integration
- Routers organized by domain (models, hardware, usecases)
- Pydantic schemas for request/response validation
- React 18.2 with TypeScript
- Tailwind CSS for styling
- Proxy to backend API configured in package.json
- API service layer in
frontend/src/services/
- Repository is at
/home/budadmin/simulator - Main project directory is
BudSimulator/ - Currently on branch
main - Uses virtual environment at
BudSimulator/env/ - Two main packages: BudSimulator (web app) and llm-memory-calculator (core engine)