Skip to content

docs(building_applications): add Claude Code integration guide#5571

Open
cdoern wants to merge 5 commits intoogx-ai:mainfrom
cdoern:docs/claude-code-usage
Open

docs(building_applications): add Claude Code integration guide#5571
cdoern wants to merge 5 commits intoogx-ai:mainfrom
cdoern:docs/claude-code-usage

Conversation

@cdoern
Copy link
Copy Markdown
Collaborator

@cdoern cdoern commented Apr 16, 2026

Summary

Adds comprehensive documentation for using Claude Code CLI with Llama Stack.

Changes

  • New guide: docs/docs/building_applications/claude_code_integration.mdx
  • Updated sidebar to include both Claude Code and Codex CLI integration docs

Documentation Covers

Quick Start

  • Step-by-step setup with different providers (OpenAI, vLLM, Ollama)
  • Configuration via ANTHROPIC_BASE_URL environment variable
  • Basic testing examples

How It Works

  • Messages API translation flow (clarified that /v1/messages is Llama Stack's endpoint)
  • When translation happens vs. native passthrough
  • What gets translated (messages, tool calls, streaming, thinking blocks)

Model Configuration

  • Updated to reflect model aliasing now available in starter distro
  • Users always specify real Llama Stack models with --model flag (e.g., vllm/Qwen/Qwen3-8B)
  • Explains that provider_id: "all" handles Claude Code's internal requests automatically
  • Removed outdated environment variable approach

Supported Features

  • Core capabilities checklist
  • Provider comparison table showing native Messages API support, thinking, caching
  • vLLM and Ollama show ✅ for native Messages API support
  • Bedrock correctly shown as translated (not native passthrough)

Configuration Examples

  • Using OpenAI models
  • Using vLLM with Qwen models
  • Using Ollama with Llama models
  • Multi-provider setup (different tiers → different providers)

Troubleshooting

  • "Model not found" errors → use --model flag with Llama Stack models
  • "API key not valid" → set fake key for local providers
  • Slow responses → use local providers to reduce latency
  • Tool use issues → explained that tools run in Claude Code, not llama-stack

Performance Considerations

  • Latency breakdown
  • Optimization tips (local providers, prompt caching, streaming, native Messages API)

Differences from Anthropic Claude

  • Feature comparison table (thinking, caching, context, tool format, quality)

Advanced Configuration

  • Custom model mappings via API or config.yaml
  • Using with Claude Agent SDK

Recent Updates

Addressed review feedback from @leseb:

Related

  • Complements existing Anthropic Messages API docs
  • Also fixed sidebar to include the existing Codex CLI integration doc which was written but not in navigation

Testing

  • Pre-commit passes
  • Markdown formatting correct
  • All links valid
  • Sidebar navigation updated

🤖 Generated with Claude Code

Added comprehensive documentation for using Claude Code CLI with Llama Stack,
covering:

- Quick start instructions for different providers (OpenAI, vLLM, Ollama)
- How the Messages API translation works
- Model configuration and aliasing (including upcoming PR ogx-ai#5471 feature)
- Provider-specific features and compatibility matrix
- Configuration examples for multiple use cases
- Troubleshooting common issues
- Performance considerations
- Differences from native Anthropic Claude

Also added both claude_code_integration and codex_cli_integration to the
sidebar navigation (codex was missing).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

Comment thread docs/docs/building_applications/claude_code_integration.mdx Outdated
Claude Code sends requests to the Anthropic Messages API (`/v1/messages`). Llama Stack implements this API with full compatibility, translating between formats as needed:

```
Claude Code → /v1/messages → Llama Stack → Provider
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reads as if /v1/messages was not llama stack but it is, can clarify it?

Comment thread docs/docs/building_applications/claude_code_integration.mdx Outdated
Comment thread docs/docs/building_applications/claude_code_integration.mdx Outdated
Comment thread docs/docs/building_applications/claude_code_integration.mdx Outdated
Comment thread docs/docs/building_applications/claude_code_integration.mdx Outdated
export ANTHROPIC_DEFAULT_OPUS_MODEL="openai/o1"

# Claude Code will route based on which model name it sends
claude "Quick task" # Uses haiku → vLLM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know how does claude define which is the default model to use here?

@ogx-ai ogx-ai deleted a comment from nidhishgajjar Apr 23, 2026
cdoern and others added 3 commits April 23, 2026 11:39
Address review feedback from leseb on PR ogx-ai#5571:
- Clarify that /v1/messages is Llama Stack's endpoint in the flow diagram
- Remove Bedrock from native passthrough list (uses OpenAI translation)
- Update model configuration section to reflect that model aliasing is now available
- Emphasize that users always specify real Llama Stack models with --model flag
- Explain that provider_id: "all" handles Claude Code's internal requests automatically

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Guangya Liu <gyliu513@gmail.com>
Co-authored-by: Guangya Liu <gyliu513@gmail.com>
@cdoern cdoern requested a review from leseb April 23, 2026 18:21
@cdoern cdoern marked this pull request as ready for review April 23, 2026 18:24
@cdoern
Copy link
Copy Markdown
Collaborator Author

cdoern commented Apr 24, 2026

@leseb PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants