Welcome to langchain-oci! This sample will get you up and running with Oracle Cloud Infrastructure's Generative AI service integrated with LangChain.
By the end of this sample, you'll be able to:
- Configure authentication for OCI Generative AI
- Create your first chat conversation
- Understand providers and model selection
- Use streaming responses for real-time output
- An OCI account with access to Generative AI service
- Python 3.9 or higher
- OCI CLI configured (for API key authentication)
| Concept | Description |
|---|---|
ChatOCIGenAI |
Main chat model class |
| Authentication | 4 methods: API Key, Instance Principal, Resource Principal, Session Token |
| Providers | Meta, Cohere, Google (Gemini), xAI (Grok) |
invoke() |
Send a message and get a response |
stream() |
Get streaming responses |
pip install langchain-oci ociIf you haven't already, set up the OCI CLI:
oci setup configThis creates ~/.oci/config with your credentials. The default profile is named DEFAULT.
Let's start with the simplest possible example:
from langchain_oci import ChatOCIGenAI
# Create a chat model
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..your-compartment-id",
)
# Send a message
response = llm.invoke("What is the capital of France?")
print(response.content)Output:
The capital of France is Paris.
| Parameter | Required | Description |
|---|---|---|
model_id |
Yes | The model to use (e.g., meta.llama-3.3-70b-instruct) |
service_endpoint |
Yes | Regional endpoint for GenAI service |
compartment_id |
Yes | Your OCI compartment OCID |
auth_type |
No | Authentication method (default: API_KEY) |
auth_profile |
No | Profile name in ~/.oci/config (default: DEFAULT) |
| Region | Endpoint |
|---|---|
| Chicago | https://inference.generativeai.us-chicago-1.oci.oraclecloud.com |
| Frankfurt | https://inference.generativeai.eu-frankfurt-1.oci.oraclecloud.com |
Uses credentials from ~/.oci/config. This is the default and simplest method:
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
auth_type="API_KEY", # Optional, this is the default
auth_profile="DEFAULT", # Optional, uses DEFAULT profile
)For interactive sessions with temporary credentials:
# First, authenticate
oci session authenticate --profile-name MY_PROFILEllm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
auth_type="SECURITY_TOKEN",
auth_profile="MY_PROFILE",
)For applications running on OCI compute instances:
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
auth_type="INSTANCE_PRINCIPAL",
)For OCI Functions and other resources:
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
auth_type="RESOURCE_PRINCIPAL",
)Note: This is not a comprehensive list. See the OCI Generative AI documentation for all available models.
| Provider | Example Models | Strengths |
|---|---|---|
| Meta | Llama 3.2, 3.3, 4 | Excellent general-purpose, tool calling |
| Cohere | Command R+, Command A | RAG, document processing |
| Gemini 2.0 Flash, 2.5 | Multimodal (PDF, video, audio) | |
| xAI | Grok 4 | Fast reasoning, vision |
OpenAI Models: For OpenAI models (GPT-4.1, o1, etc.), use
ChatOCIOpenAIinstead ofChatOCIGenAI. See Sample 08: OpenAI Responses API.
# Meta Llama models
"meta.llama-3.3-70b-instruct" # Latest text model
"meta.llama-3.2-90b-vision-instruct" # Vision-capable
"meta.llama-4-scout-17b-16e-instruct" # Llama 4 with parallel tools
# Cohere models
"cohere.command-r-plus" # Powerful reasoning
"cohere.command-a-03-2025" # Latest, with vision
# Google Gemini models
"google.gemini-2.5-flash" # Fast, multimodal
# xAI Grok models
"xai.grok-4" # Reasoning and visionThe provider is automatically detected from the model ID:
# Auto-detects "meta" provider
llm = ChatOCIGenAI(model_id="meta.llama-3.3-70b-instruct", ...)
# Or specify explicitly
llm = ChatOCIGenAI(model_id="my-custom-model", provider="meta", ...)For multi-turn conversations, use LangChain message types:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_oci import ChatOCIGenAI
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
)
# Multi-turn conversation
messages = [
SystemMessage(content="You are a helpful cooking assistant."),
HumanMessage(content="I have chicken, rice, and vegetables."),
AIMessage(content="Great! You could make a stir-fry or a chicken rice bowl."),
HumanMessage(content="How do I make a stir-fry?"),
]
response = llm.invoke(messages)
print(response.content)For real-time output, use streaming:
from langchain_oci import ChatOCIGenAI
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
)
# Stream the response
for chunk in llm.stream("Tell me a short story about a robot."):
print(chunk.content, end="", flush=True)Fine-tune model behavior with model_kwargs:
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
model_kwargs={
"temperature": 0.7, # Creativity (0.0 = deterministic, 1.0 = creative)
"max_tokens": 500, # Maximum response length
"top_p": 0.9, # Nucleus sampling
"top_k": 50, # Top-k sampling
}
)| Parameter | Range | Effect |
|---|---|---|
temperature |
0.0 - 1.0 | Higher = more creative, lower = more focused |
max_tokens |
Model-dependent | Maximum tokens in the response (see OCI docs) |
top_p |
0.0 - 1.0 | Nucleus sampling cutoff |
top_k |
Model-dependent | Number of top tokens to consider (typically 1-500, varies by model) |
Note: Maximum token limits vary by model. Check the OCI Generative AI documentation for specific model limits.
In this sample, you learned:
- Installation -
pip install langchain-oci oci - Basic chat - Using
ChatOCIGenAIwithinvoke() - Authentication - 4 methods (API Key, Session, Instance Principal, Resource Principal)
- Providers - Meta, Cohere, Google, xAI and their model families
- Conversations - Multi-turn chat with message types
- Streaming - Real-time responses with
stream() - Parameters - Fine-tuning with
model_kwargs
- Sample 02: Vision & Multimodal - Analyze images, PDFs, and videos
- Sample 03: Building AI Agents - Create autonomous agents with tools
| Class/Function | Description |
|---|---|
ChatOCIGenAI |
Main chat model class |
invoke(input) |
Send messages, get response |
stream(input) |
Stream response chunks |
batch(inputs) |
Process multiple inputs |
- Verify
~/.oci/configexists and contains valid credentials - Check that your profile name matches
auth_profile - Ensure your API key hasn't expired
- Verify
compartment_idis correct - Check you have permissions for GenAI service in that compartment
- Ensure model ID is spelled correctly
- Check model is available in your region
For text completion (non-chat) use cases, the legacy OCIGenAI class is available:
from langchain_oci import OCIGenAI
llm = OCIGenAI(
model_id="cohere.command-r-plus",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.compartment.oc1..xxx",
)
# Text completion (not chat)
response = llm.invoke("Complete this sentence: The quick brown fox")Note: For most use cases, prefer ChatOCIGenAI over OCIGenAI.