Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
f4c8197
feat: added gema 27b
blefo Aug 25, 2025
617a590
feat: implement multimodal content support with image URL validation
blefo Aug 25, 2025
941f784
refactor: added multimodal parameter + web search with image in query…
blefo Aug 25, 2025
a74acbc
refactor: update chat completion message structure and change model t…
blefo Aug 26, 2025
7fe8325
feat: add Docker Compose configuration for gemma-4b in ci pipeline fo…
blefo Aug 26, 2025
ae4422a
fix: ruff format
blefo Aug 26, 2025
7c8b61e
refactor: remove unused import in e2e and unit tests
blefo Aug 26, 2025
92e8ece
test: add rate limit checks to multimodal chat completion tests
blefo Aug 26, 2025
27a91b9
fix: ruff format
blefo Aug 26, 2025
2fcce35
refactor: update message model structure
blefo Aug 26, 2025
3e3cc56
fix: ruff format
blefo Aug 26, 2025
2a9669c
test: enhance chat completion tests with multimodal model integration
blefo Aug 26, 2025
3c62952
fix: web search + multimodal with 3 sources
blefo Aug 26, 2025
a52ec27
chore: stop tracking docker/compose/docker-compose.gemma-4b-gpu.ci.ym…
blefo Aug 26, 2025
ad4fa11
refactor: clean up imports in tests
blefo Aug 26, 2025
f0e5848
fix: add type ignore for role in Message class
blefo Aug 26, 2025
338a5ec
refactor: update Message class content type to use new ChatCompletion…
blefo Aug 26, 2025
e85a711
feat: add content extractor utility for processing text and image con…
blefo Aug 26, 2025
758d809
refactor: improve web search handling and enhance message context wit…
blefo Aug 26, 2025
fc9ea5d
refactor: integrate content extraction into user query handling and e…
blefo Aug 26, 2025
758dc0d
feat: add functions to handle multimodal content and extract the last…
blefo Aug 26, 2025
eaf2e96
refactor: remove deprecated image support handler and enhance multimo…
blefo Aug 26, 2025
7973642
refactor: clean up unused imports in web search and content extractor…
blefo Aug 26, 2025
f9b71cd
feat: implement chat completion tests with image support and error ha…
blefo Aug 26, 2025
746edbf
feat: enhance chat completion tests with rate limit configurations an…
blefo Aug 27, 2025
52e3ad9
refactor: streamline rate limiting logic by removing unused wait_for_…
blefo Aug 27, 2025
b688bc0
refactor: remove unused import of asyncio in rate limiting module
blefo Aug 27, 2025
49515d0
refactor: simplify rate limiting tests by consolidating success and r…
blefo Aug 27, 2025
9d40015
refactor: remove redundant blank line in web search test file
blefo Aug 27, 2025
bbb3d4b
fix: unused imports
blefo Aug 27, 2025
a93a72b
refactor: update type annotations in Message class
blefo Aug 27, 2025
83cd373
fix: ruff
blefo Aug 27, 2025
0b3d622
fix: handle None content in message processing and update content ext…
blefo Aug 27, 2025
3288a36
fix: improve multimodal content handling and simplify logic in chat c…
blefo Aug 27, 2025
d2f24c9
fix: ruff format
blefo Aug 27, 2025
ed8f179
fix: enhance multimodal content detection to specifically check for i…
blefo Aug 27, 2025
af31527
fix: ruff format
blefo Aug 27, 2025
bd462a7
refactor: streamline user query extraction and enhance web search
blefo Aug 27, 2025
d5a11a8
chore: remove gemma model entry from E2E config
blefo Aug 27, 2025
2817955
feat: add gemma-4b-gpu model support and update CI workflow
blefo Aug 29, 2025
3fe98f2
refactor: remove unused import
blefo Aug 29, 2025
2eb2549
test#1: gemma-4 test
blefo Aug 29, 2025
d6979a1
fix: ci yml
blefo Aug 29, 2025
0119c3c
Merge branch 'main' into feat-multimodal-endpoint
blefo Aug 29, 2025
9d60103
fix: ruff check
blefo Aug 29, 2025
328fd64
Merge branch 'feat-multimodal-endpoint' of https://github.com/Nillion…
blefo Aug 29, 2025
ed4735d
fix: ci flag
blefo Aug 29, 2025
37e5709
fix: ci model
blefo Aug 29, 2025
023dec0
fix: ci gemma configuration
blefo Aug 29, 2025
da0602f
test#2: remove llama-1b
blefo Aug 29, 2025
fccbd1d
fix: update the script for gemma
blefo Aug 29, 2025
03ca9eb
fix#2
blefo Aug 29, 2025
359f518
fix: add service startup logs
blefo Aug 29, 2025
967eace
fix: update gemma ci config
blefo Aug 29, 2025
8b5a073
fix: added logs for services
blefo Aug 29, 2025
e9cc0da
fix: gemma config
blefo Aug 29, 2025
3dcd4e5
fix: gemma config
blefo Aug 29, 2025
86c8e0a
fix: gemma config
blefo Aug 29, 2025
cfc2e07
fix: gemma config
blefo Aug 29, 2025
96f78af
fix: gemma config
blefo Aug 29, 2025
f1c7b4d
fix: gemma config
blefo Aug 29, 2025
f4451ca
fix: gemma config
blefo Sep 1, 2025
25bea10
fix: update gemma config
blefo Sep 1, 2025
bd8ba99
fix: gemma config
blefo Sep 1, 2025
eb4dbe0
fix: gemma config
blefo Sep 1, 2025
eb3f3de
fix: use qwen-2b instead of gemma-4b for ci pipeline
blefo Sep 1, 2025
b0f36c6
fix: update qwen config
blefo Sep 1, 2025
7c2b140
fix: qwen config
blefo Sep 1, 2025
4375c03
fix: update qwen config
blefo Sep 1, 2025
471c7cb
fix: config as list
blefo Sep 1, 2025
a842da8
fix: qwen config
blefo Sep 1, 2025
9115580
fix: avoid parsing error
blefo Sep 1, 2025
30fb2c9
fix: qwen config format
blefo Sep 1, 2025
aa43f24
fix: update config
blefo Sep 1, 2025
8f6b06e
fix: update config
blefo Sep 1, 2025
244d14d
fix: enfore eager
blefo Sep 1, 2025
fd9ab43
fix: api model fixes
jcabrero Sep 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docker/compose/docker-compose.gemma-27b-gpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
services:
gemma_27b_gpu:
image: nillion/nilai-vllm:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
ipc: host
ulimits:
memlock: -1
stack: 67108864
env_file:
- .env
restart: unless-stopped
depends_on:
etcd:
condition: service_healthy
command: >
--model google/gemma-3-27b-it
--gpu-memory-utilization 0.95
--max-model-len 60000
--max-num-batched-tokens 60000
--tensor-parallel-size 1
--enable-auto-tool-choice
--tool-call-parser llama3_json
--uvicorn-log-level warning
environment:
- SVC_HOST=gemma_27b_gpu
- SVC_PORT=8000
- ETCD_HOST=etcd
- ETCD_PORT=2379
- TOOL_SUPPORT=true
- MULTIMODAL_SUPPORT=true
volumes:
- hugging_face_models:/root/.cache/huggingface
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
retries: 3
start_period: 60s
timeout: 10s
volumes:
hugging_face_models:
92 changes: 92 additions & 0 deletions nilai-api/src/nilai_api/handlers/image_support.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
from dataclasses import dataclass
from typing import List, Optional, Any
from fastapi import HTTPException
from nilai_common import Message


@dataclass(frozen=True)
class MultimodalCheck:
has_multimodal: bool
error: Optional[str] = None


def _extract_url(image_url_field: Any) -> Optional[str]:
"""
Support both object-with-attr and dict-like shapes.
Returns the URL string or None.
"""
if image_url_field is None:
return None

url = getattr(image_url_field, "url", None)
if url is not None:
return url
if isinstance(image_url_field, dict):
return image_url_field.get("url")
return None


def multimodal_check(messages: List[Message]) -> MultimodalCheck:
"""
Single-pass check:
- detect if any part is type=='image_url'
- validate that image_url.url exists and is a base64 data URL
Returns:
MultimodalCheck(has_multimodal: bool, error: Optional[str])
"""
has_mm = False

for m in messages:
content = getattr(m, "content", None) or []
for item in content:
if getattr(item, "type", None) == "image_url":
has_mm = True
iu = getattr(item, "image_url", None)
url = _extract_url(iu)
if not url:
return MultimodalCheck(
True, "image_url.url is required for image_url parts"
)
if not (url.startswith("data:image/") and ";base64," in url):
return MultimodalCheck(
True,
"Only base64 data URLs are allowed for images (data:image/...;base64,...)",
)

return MultimodalCheck(has_mm, None)


def has_multimodal_content(
messages: List[Message], precomputed: Optional[MultimodalCheck] = None
) -> bool:
"""
Check if any message contains multimodal content (image_url parts).

Args:
messages: List of messages to check
precomputed: Optional precomputed result from multimodal_check() to avoid re-iterating

Returns:
True if any message contains image_url parts, False otherwise
"""
res = precomputed or multimodal_check(messages)
return res.has_multimodal


def validate_multimodal_content(
messages: List[Message], precomputed: Optional[MultimodalCheck] = None
) -> None:
"""
Validate that multimodal content (image_url parts) follows the required format.

Args:
messages: List of messages to validate
precomputed: Optional precomputed result from multimodal_check() to avoid re-iterating

Raises:
HTTPException(400): When image_url parts don't have required URL or use invalid format
(only base64 data URLs are allowed: data:image/...;base64,...)
"""
res = precomputed or multimodal_check(messages)
if res.error:
raise HTTPException(status_code=400, detail=res.error)
34 changes: 21 additions & 13 deletions nilai-api/src/nilai_api/handlers/web_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
Source,
WebSearchEnhancedMessages,
WebSearchContext,
MessageContentItem,
)
from nilai_common import Message

Expand Down Expand Up @@ -152,22 +153,29 @@ async def perform_web_search_async(query: str) -> WebSearchContext:
async def enhance_messages_with_web_search(
messages: List[Message], query: str
) -> WebSearchEnhancedMessages:
"""Enhance a list of messages with web search context.

Args:
messages: List of conversation messages to enhance
query: Search query to retrieve web search results for

Returns:
WebSearchEnhancedMessages containing the original messages with web search
context prepended as a system message, along with source information
"""
ctx = await perform_web_search_async(query)
enhanced = [Message(role="system", content=ctx.prompt)] + messages
query_source = Source(source="search_query", content=query)

if not messages or messages[-1].role != "user":
return WebSearchEnhancedMessages(
messages=messages, sources=[query_source] + ctx.sources
)

web_search_context = f"\n\nWeb search results:\n{ctx.prompt}"

last = messages[-1]
items = (
[MessageContentItem(type="text", text=last.content)]
if isinstance(last.content, str)
else list(last.content)
)
items.append(MessageContentItem(type="text", text=web_search_context))

enhanced_messages = list(messages)
enhanced_messages[-1] = Message(role="user", content=items)

return WebSearchEnhancedMessages(
messages=enhanced,
sources=[query_source] + ctx.sources,
messages=enhanced_messages, sources=[query_source] + ctx.sources
)


Expand Down
12 changes: 12 additions & 0 deletions nilai-api/src/nilai_api/routers/private.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from nilai_api.attestation import get_attestation_report
from nilai_api.handlers.nilrag import handle_nilrag
from nilai_api.handlers.web_search import handle_web_search
from nilai_api.handlers.image_support import multimodal_check

from fastapi import APIRouter, Body, Depends, HTTPException, status, Request
from fastapi.responses import StreamingResponse
Expand Down Expand Up @@ -211,6 +212,17 @@ async def chat_completion(
status_code=400,
detail="Model does not support tool usage, remove tools from request",
)

multimodal_result = multimodal_check(req.messages)
if multimodal_result.has_multimodal:
if not endpoint.metadata.multimodal_support:
raise HTTPException(
status_code=400,
detail="Model does not support multimodal content, remove image inputs from request",
)
if multimodal_result.error:
raise HTTPException(status_code=400, detail=multimodal_result.error)

model_url = endpoint.url + "/v1/"

logger.info(
Expand Down
12 changes: 10 additions & 2 deletions nilai-models/src/nilai_models/daemon.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,26 @@ async def get_metadata(num_retries=30):
response.raise_for_status()
response_data = response.json()
model_name = response_data["data"][0]["id"]
return ModelMetadata(

supported_features = ["chat_completion"]
if SETTINGS.multimodal_support:
supported_features.append("multimodal")

metadata = ModelMetadata(
id=model_name, # Unique identifier
name=model_name, # Human-readable name
version="1.0", # Model version
description="",
author="", # Model creators
license="Apache 2.0", # Usage license
source=f"https://huggingface.co/{model_name}", # Model source
supported_features=["chat_completion"], # Capabilities
supported_features=supported_features, # Capabilities
tool_support=SETTINGS.tool_support, # Tool support
multimodal_support=SETTINGS.multimodal_support, # Multimodal support
)

return metadata

except Exception as e:
if not url:
logger.warning(f"Failed to build url: {e}")
Expand Down
23 changes: 18 additions & 5 deletions packages/nilai-common/src/nilai_common/api_model.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,27 @@
import uuid

from typing import Annotated, Iterable, List, Literal, Optional
from typing import Annotated, Iterable, List, Literal, Optional, Union

from openai.types.chat import ChatCompletion, ChatCompletionMessage
from openai.types.chat.chat_completion import Choice as OpenaAIChoice
from openai.types.chat import ChatCompletion
from openai.types.chat import ChatCompletionToolParam
from openai.types.chat.chat_completion import Choice as OpenaAIChoice
from pydantic import BaseModel, Field


class Message(ChatCompletionMessage):
role: Literal["system", "user", "assistant", "tool"] # type: ignore
Comment on lines -11 to -12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Message (ChatCompletionMessageParam):
    pass

class ImageUrl(BaseModel):
url: str
detail: Optional[str] = "auto"


class MessageContentItem(BaseModel):
type: Literal["text", "image_url"]
text: Optional[str] = None
image_url: Optional[ImageUrl] = None


class Message(BaseModel):
role: Literal["system", "user", "assistant", "tool"]
content: Union[str, List[MessageContentItem]]


class Choice(OpenaAIChoice):
Expand Down Expand Up @@ -71,6 +83,7 @@ class ModelMetadata(BaseModel):
source: str
supported_features: List[str]
tool_support: bool
multimodal_support: bool = False


class ModelEndpoint(BaseModel):
Expand Down
2 changes: 2 additions & 0 deletions packages/nilai-common/src/nilai_common/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ class HostSettings(BaseModel):
etcd_host: str = "localhost"
etcd_port: int = 2379
tool_support: bool = False
multimodal_support: bool = False
gunicorn_workers: int = 10
attestation_host: str = "localhost"
attestation_port: int = 8081
Expand All @@ -19,6 +20,7 @@ class HostSettings(BaseModel):
etcd_host=str(os.getenv("ETCD_HOST", "localhost")),
etcd_port=int(os.getenv("ETCD_PORT", 2379)),
tool_support=bool(os.getenv("TOOL_SUPPORT", False)),
multimodal_support=bool(os.getenv("MULTIMODAL_SUPPORT", False)),
gunicorn_workers=int(os.getenv("NILAI_GUNICORN_WORKERS", 10)),
attestation_host=str(os.getenv("ATTESTATION_HOST", "localhost")),
attestation_port=int(os.getenv("ATTESTATION_PORT", 8081)),
Expand Down
Loading
Loading