Ollama Copilot integrates local LLMs from Ollama directly into VS Code, providing AI-powered code completion and an interactive chat experience with your own locally-running models.
- 🚀 Improved inline suggestions with expanded context (up to 1000 lines)
- 🔄 Fixed Tab key acceptance for multi-line suggestions
- 🎯 Better code completion accuracy with enhanced context awareness
- 💡 Added support for more Ollama models including Qwen and Mixtral
- 🛠️ Improved error handling and connection stability
- 📝 Enhanced documentation with visual guides
Get contextual code suggestions as you type, powered by your local Ollama models:
- Smart context awareness (up to 1000 lines of surrounding code)
- Multi-line code suggestions
- Language-specific completions
- Variable and function name awareness
- Tab completion support
Engage with your code through:
- Dedicated sidebar chat panel
- Real-time streaming responses
- Context-aware code discussions
- File and workspace context integration
- All processing happens locally through Ollama
- No data sent to external servers
- Complete control over your models and data
- Choose from any installed Ollama model
- Configure API host settings
- Adjust workspace context settings
- Install Ollama on your system
- Pull at least one model in Ollama (see model recommendations)
- Make sure Ollama is running (
ollama serve
)
- Install the extension from VS Code marketplace
- Run Ollama in the background (
ollama serve
) - Select a default model when prompted
- Start coding to see inline suggestions
- Use the sidebar chat for more complex queries
Choose your model through:
- Command Palette (
Ctrl+Shift+P
orCmd+Shift+P
) - Type "Ollama Copilot: Select Default Model"
- Pick from your installed models
For the best experience, we recommend:
qwen:14b
- Excellent for general code completioncodellama:13b
- Strong at understanding contextdeepseek-coder:6.7b
- Fast and efficientphind-codellama:34b
- Great for complex completions
mixtral:8x7b
- Strong reasoning and explanationllama2:13b
- Good balance of speed and capabilityneural-chat:7b
- Fast responses for simple queries
# Qwen - Powerful 14B model with strong coding capabilities
ollama pull qwen:14b
# CodeLlama - Meta's specialized coding model
ollama pull codellama:13b
# Mixtral - High-performance 8x7B model
ollama pull mixtral:8x7b
# List all installed models
ollama list
- Type normally and wait for suggestions
- Press Tab to accept full suggestions
- Use → (right arrow) to accept word by word
- Clear completion cache if suggestions seem stale
- Click the Ollama icon in the sidebar
- Use @ to reference files
- Select code before asking questions
- Toggle workspace context for broader awareness
Access via Command Palette (Ctrl+Shift+P
or Cmd+Shift+P
):
Ollama Copilot: Select Default Model
- Change your modelOllama Copilot: Clear Completion Cache
- Reset suggestionsOllama Copilot: Open Chat Panel
- Open chat interfaceOllama Copilot: Search Available Models
- View installed models
Settings available in VS Code:
ollama.defaultModel
: Your preferred modelollama.apiHost
: Ollama API endpoint (default: http://localhost:11434)
- Verify Ollama is running (
ollama serve
) - Check model is selected (Command Palette > Select Default Model)
- Clear completion cache
- Ensure cursor is at a valid completion point
- Try a smaller model
- Clear completion cache
- Check system resources
- Reduce context size if needed
- Confirm Ollama is running
- Check
ollama.apiHost
setting - Verify port 11434 is accessible
- Restart VS Code if needed
We welcome contributions! Please check our GitHub repository for:
- Bug reports
- Feature requests
- Pull requests
- Documentation improvements