Skip to content

Latest commit

 

History

History
267 lines (192 loc) · 7.76 KB

File metadata and controls

267 lines (192 loc) · 7.76 KB
title ChatOCIGenAI integration
description Integrate with ChatOCIGenAI chat model using LangChain Python.

This doc will help you get started with Oracle Cloud Infrastructure (OCI) Generative AI chat models. OCI Generative AI is a fully managed service providing state-of-the-art, customizable large language models covering a wide range of use cases through a single API. Access ready-to-use pretrained models or create and host fine-tuned custom models on dedicated AI clusters.

For detailed documentation, see the OCI Generative AI documentation and API reference.

Overview

Integration details

Class Package Serializable JS support Downloads Version
ChatOCIGenAI langchain-oci beta PyPI - Downloads PyPI - Version

Model features

Tool calling Structured output Image input Audio input Video input Token-level streaming Native async Token usage Logprobs
✅ (Gemini) ✅ (Gemini)

Setup

Installation

```bash pip pip install -qU langchain-oci oci ```
uv add langchain-oci oci

Credentials

Set up authentication with the OCI CLI (creates ~/.oci/config):

oci setup config

For other auth methods (session tokens, instance principals), see OCI SDK authentication.

Instantiation

from langchain_oci import ChatOCIGenAI

llm = ChatOCIGenAI(
    model_id="meta.llama-3.3-70b-instruct",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="ocid1.compartment.oc1..your-compartment-id",
    model_kwargs={"temperature": 0.7, "max_tokens": 500},  # Optional
)

Key parameters:

  • model_id - The model to use (see available models)
  • service_endpoint - Regional endpoint (us-chicago-1, eu-frankfurt-1, etc.)
  • compartment_id - Your OCI compartment OCID
  • model_kwargs - Model settings like temperature, max_tokens

Invocation

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)
J'adore la programmation.

Multi-turn Conversations

from langchain.messages import HumanMessage, AIMessage

messages = [
    HumanMessage(content="Hi, I'm Alice."),
    AIMessage(content="Hello Alice! How can I help you today?"),
    HumanMessage(content="What's my name?"),
]

response = llm.invoke(messages)
print(response.content)
Your name is Alice.

Streaming

Get responses as they're generated:

for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

Async

Use async for concurrent requests or non-blocking applications:

import asyncio

# Async generation
response = await llm.ainvoke("What is 2+2?")

# Async streaming
async for chunk in llm.astream("Tell me a story"):
    print(chunk.content, end="")

# Run multiple requests concurrently
results = await asyncio.gather(
    llm.ainvoke("What is 2+2?"),
    llm.ainvoke("What is 3+3?"),
)

Tool Calling

Give models access to external functions (APIs, databases, etc.):

from langchain.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the weather for a city."""
    # In production, call a weather API
    return f"Weather in {city}: 72°F, sunny"

llm_with_tools = llm.bind_tools([get_weather])
response = llm_with_tools.invoke("What's the weather in Chicago?")

# Model decides to call the tool
print(response.tool_calls)
# [{'name': 'get_weather', 'args': {'city': 'Chicago'}, 'id': 'call_1'}]

Parallel tools (Llama 4+ only) execute multiple tools simultaneously:

llm = ChatOCIGenAI(model_id="meta.llama-4-scout-17b-16e-instruct", ...)
llm_with_tools = llm.bind_tools(
    [get_weather, get_time],
    parallel_tool_calls=True,
)
# "Weather in Chicago and time in NYC?" → calls both tools at once

Structured Output

Extract data into Pydantic models for type-safe parsing:

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int
    email: str

structured_llm = llm.with_structured_output(Person)
result = structured_llm.invoke("John is 30 years old, email john@example.com")

print(result.name)   # "John"
print(result.age)    # 30
print(result.email)  # "john@example.com"

Vision & Multimodal

Analyze images with vision-capable models:

from langchain.messages import HumanMessage
from langchain_oci import ChatOCIGenAI, load_image

llm = ChatOCIGenAI(model_id="meta.llama-3.2-90b-vision-instruct", ...)

message = HumanMessage(content=[
    {"type": "text", "text": "What's in this image?"},
    load_image("./photo.jpg"),  # Or use a URL
])

response = llm.invoke([message])

Vision-capable models: Llama 3.2 Vision, Gemini 2.0/2.5, Grok 4, Command A Vision

Gemini Multimodal (PDF, Video, Audio)

Gemini models process PDFs, videos, and audio:

import base64
from langchain.messages import HumanMessage

llm = ChatOCIGenAI(model_id="google.gemini-2.5-flash", ...)

# Load file as base64
with open("document.pdf", "rb") as f:
    data = base64.b64encode(f.read()).decode()

message = HumanMessage(content=[
    {"type": "text", "text": "Summarize this document"},
    {"type": "media", "data": data, "mime_type": "application/pdf"}
])

response = llm.invoke([message])

Supported formats: PDF, MP4/MOV video, MP3/WAV audio (Gemini 2.0/2.5 only)

Configuration

Control model behavior with model_kwargs:

llm = ChatOCIGenAI(
    model_id="meta.llama-3.3-70b-instruct",
    model_kwargs={
        "temperature": 0.7,    # Creativity (0-1)
        "max_tokens": 500,     # Response length limit
        "top_p": 0.9,         # Nucleus sampling
    },
    # ... other params
)

Available Models

Provider Example Models Key Features
Meta Llama 3.2/3.3/4 (Scout, Maverick) Vision, parallel tools
Google Gemini 2.0/2.5 Flash, Pro PDF, video, audio
xAI Grok 3, Grok 4 Vision, reasoning
Cohere Command R+, Command A RAG, vision

See the OCI model catalog for the complete list and regional availability.

API Reference

For detailed documentation of all ChatOCIGenAI features and configurations, see the API reference.

Related