-
Notifications
You must be signed in to change notification settings - Fork 10
feat: multimodal endpoint for image to text #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
blefo
wants to merge
77
commits into
main
Choose a base branch
from
feat-multimodal-endpoint
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 9 commits
Commits
Show all changes
77 commits
Select commit
Hold shift + click to select a range
f4c8197
feat: added gema 27b
blefo 617a590
feat: implement multimodal content support with image URL validation
blefo 941f784
refactor: added multimodal parameter + web search with image in query…
blefo a74acbc
refactor: update chat completion message structure and change model t…
blefo 7fe8325
feat: add Docker Compose configuration for gemma-4b in ci pipeline fo…
blefo ae4422a
fix: ruff format
blefo 7c8b61e
refactor: remove unused import in e2e and unit tests
blefo 92e8ece
test: add rate limit checks to multimodal chat completion tests
blefo 27a91b9
fix: ruff format
blefo 2fcce35
refactor: update message model structure
blefo 3e3cc56
fix: ruff format
blefo 2a9669c
test: enhance chat completion tests with multimodal model integration
blefo 3c62952
fix: web search + multimodal with 3 sources
blefo a52ec27
chore: stop tracking docker/compose/docker-compose.gemma-4b-gpu.ci.ym…
blefo ad4fa11
refactor: clean up imports in tests
blefo f0e5848
fix: add type ignore for role in Message class
blefo 338a5ec
refactor: update Message class content type to use new ChatCompletion…
blefo e85a711
feat: add content extractor utility for processing text and image con…
blefo 758d809
refactor: improve web search handling and enhance message context wit…
blefo fc9ea5d
refactor: integrate content extraction into user query handling and e…
blefo 758dc0d
feat: add functions to handle multimodal content and extract the last…
blefo eaf2e96
refactor: remove deprecated image support handler and enhance multimo…
blefo 7973642
refactor: clean up unused imports in web search and content extractor…
blefo f9b71cd
feat: implement chat completion tests with image support and error ha…
blefo 746edbf
feat: enhance chat completion tests with rate limit configurations an…
blefo 52e3ad9
refactor: streamline rate limiting logic by removing unused wait_for_…
blefo b688bc0
refactor: remove unused import of asyncio in rate limiting module
blefo 49515d0
refactor: simplify rate limiting tests by consolidating success and r…
blefo 9d40015
refactor: remove redundant blank line in web search test file
blefo bbb3d4b
fix: unused imports
blefo a93a72b
refactor: update type annotations in Message class
blefo 83cd373
fix: ruff
blefo 0b3d622
fix: handle None content in message processing and update content ext…
blefo 3288a36
fix: improve multimodal content handling and simplify logic in chat c…
blefo d2f24c9
fix: ruff format
blefo ed8f179
fix: enhance multimodal content detection to specifically check for i…
blefo af31527
fix: ruff format
blefo bd462a7
refactor: streamline user query extraction and enhance web search
blefo d5a11a8
chore: remove gemma model entry from E2E config
blefo 2817955
feat: add gemma-4b-gpu model support and update CI workflow
blefo 3fe98f2
refactor: remove unused import
blefo 2eb2549
test#1: gemma-4 test
blefo d6979a1
fix: ci yml
blefo 0119c3c
Merge branch 'main' into feat-multimodal-endpoint
blefo 9d60103
fix: ruff check
blefo 328fd64
Merge branch 'feat-multimodal-endpoint' of https://github.com/Nillion…
blefo ed4735d
fix: ci flag
blefo 37e5709
fix: ci model
blefo 023dec0
fix: ci gemma configuration
blefo da0602f
test#2: remove llama-1b
blefo fccbd1d
fix: update the script for gemma
blefo 03ca9eb
fix#2
blefo 359f518
fix: add service startup logs
blefo 967eace
fix: update gemma ci config
blefo 8b5a073
fix: added logs for services
blefo e9cc0da
fix: gemma config
blefo 3dcd4e5
fix: gemma config
blefo 86c8e0a
fix: gemma config
blefo cfc2e07
fix: gemma config
blefo 96f78af
fix: gemma config
blefo f1c7b4d
fix: gemma config
blefo f4451ca
fix: gemma config
blefo 25bea10
fix: update gemma config
blefo bd8ba99
fix: gemma config
blefo eb4dbe0
fix: gemma config
blefo eb3f3de
fix: use qwen-2b instead of gemma-4b for ci pipeline
blefo b0f36c6
fix: update qwen config
blefo 7c2b140
fix: qwen config
blefo 4375c03
fix: update qwen config
blefo 471c7cb
fix: config as list
blefo a842da8
fix: qwen config
blefo 9115580
fix: avoid parsing error
blefo 30fb2c9
fix: qwen config format
blefo aa43f24
fix: update config
blefo 8f6b06e
fix: update config
blefo 244d14d
fix: enfore eager
blefo fd9ab43
fix: api model fixes
jcabrero File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| services: | ||
| gemma_27b_gpu: | ||
| image: nillion/nilai-vllm:latest | ||
| deploy: | ||
| resources: | ||
| reservations: | ||
| devices: | ||
| - driver: nvidia | ||
| count: all | ||
| capabilities: [gpu] | ||
| ipc: host | ||
| ulimits: | ||
| memlock: -1 | ||
| stack: 67108864 | ||
| env_file: | ||
| - .env | ||
| restart: unless-stopped | ||
| depends_on: | ||
| etcd: | ||
| condition: service_healthy | ||
| command: > | ||
| --model google/gemma-3-27b-it | ||
| --gpu-memory-utilization 0.95 | ||
| --max-model-len 60000 | ||
| --max-num-batched-tokens 60000 | ||
| --tensor-parallel-size 1 | ||
| --enable-auto-tool-choice | ||
| --tool-call-parser llama3_json | ||
| --uvicorn-log-level warning | ||
| environment: | ||
| - SVC_HOST=gemma_27b_gpu | ||
| - SVC_PORT=8000 | ||
| - ETCD_HOST=etcd | ||
| - ETCD_PORT=2379 | ||
| - TOOL_SUPPORT=true | ||
| - MULTIMODAL_SUPPORT=true | ||
| volumes: | ||
| - hugging_face_models:/root/.cache/huggingface | ||
| healthcheck: | ||
| test: ["CMD", "curl", "-f", "http://localhost:8000/health"] | ||
| interval: 30s | ||
| retries: 3 | ||
| start_period: 60s | ||
| timeout: 10s | ||
| volumes: | ||
| hugging_face_models: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| services: | ||
| gemma_4b_gpu: | ||
| image: nillion/nilai-vllm:latest | ||
| container_name: nilai-gemma_4b_gpu | ||
| deploy: | ||
| resources: | ||
| reservations: | ||
| devices: | ||
| - driver: nvidia | ||
| count: all | ||
| capabilities: [gpu] | ||
| ipc: host | ||
| ulimits: | ||
| memlock: -1 | ||
| stack: 67108864 | ||
| env_file: | ||
| - .env | ||
| restart: unless-stopped | ||
| depends_on: | ||
| etcd: | ||
| condition: service_healthy | ||
| command: > | ||
| --model google/gemma-3-4b-it | ||
| --gpu-memory-utilization 0.7 | ||
| --max-model-len 8192 | ||
| --max-num-batched-tokens 8192 | ||
| --tensor-parallel-size 1 | ||
| --enable-auto-tool-choice | ||
| --tool-call-parser llama3_json | ||
| --uvicorn-log-level warning | ||
| --dtype half | ||
| environment: | ||
| - SVC_HOST=gemma_4b_gpu | ||
| - SVC_PORT=8000 | ||
| - ETCD_HOST=etcd | ||
| - ETCD_PORT=2379 | ||
| - TOOL_SUPPORT=true | ||
| - MULTIMODAL_SUPPORT=true | ||
| - VLLM_USE_V1=1 | ||
| - VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 | ||
| - CUDA_LAUNCH_BLOCKING=1 | ||
| volumes: | ||
| - hugging_face_models:/root/.cache/huggingface | ||
| healthcheck: | ||
| test: ["CMD", "curl", "-f", "http://localhost:8000/health"] | ||
| interval: 30s | ||
| retries: 3 | ||
| start_period: 60s | ||
| timeout: 10s | ||
| volumes: | ||
| hugging_face_models: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| from dataclasses import dataclass | ||
| from typing import List, Optional, Any | ||
| from fastapi import HTTPException | ||
| from nilai_common import Message | ||
|
|
||
|
|
||
| @dataclass(frozen=True) | ||
| class MultimodalCheck: | ||
| has_multimodal: bool | ||
| error: Optional[str] = None | ||
|
|
||
|
|
||
| def _extract_url(image_url_field: Any) -> Optional[str]: | ||
| """ | ||
| Support both object-with-attr and dict-like shapes. | ||
| Returns the URL string or None. | ||
| """ | ||
| if image_url_field is None: | ||
| return None | ||
|
|
||
| url = getattr(image_url_field, "url", None) | ||
| if url is not None: | ||
| return url | ||
| if isinstance(image_url_field, dict): | ||
| return image_url_field.get("url") | ||
| return None | ||
|
|
||
|
|
||
| def multimodal_check(messages: List[Message]) -> MultimodalCheck: | ||
| """ | ||
| Single-pass check: | ||
| - detect if any part is type=='image_url' | ||
| - validate that image_url.url exists and is a base64 data URL | ||
| Returns: | ||
| MultimodalCheck(has_multimodal: bool, error: Optional[str]) | ||
| """ | ||
| has_mm = False | ||
|
|
||
| for m in messages: | ||
| content = getattr(m, "content", None) or [] | ||
| for item in content: | ||
| if getattr(item, "type", None) == "image_url": | ||
| has_mm = True | ||
| iu = getattr(item, "image_url", None) | ||
| url = _extract_url(iu) | ||
| if not url: | ||
| return MultimodalCheck( | ||
| True, "image_url.url is required for image_url parts" | ||
| ) | ||
| if not (url.startswith("data:image/") and ";base64," in url): | ||
| return MultimodalCheck( | ||
| True, | ||
| "Only base64 data URLs are allowed for images (data:image/...;base64,...)", | ||
| ) | ||
|
|
||
| return MultimodalCheck(has_mm, None) | ||
|
|
||
|
|
||
| def has_multimodal_content( | ||
| messages: List[Message], precomputed: Optional[MultimodalCheck] = None | ||
| ) -> bool: | ||
| """ | ||
| Check if any message contains multimodal content (image_url parts). | ||
|
|
||
| Args: | ||
| messages: List of messages to check | ||
| precomputed: Optional precomputed result from multimodal_check() to avoid re-iterating | ||
|
|
||
| Returns: | ||
| True if any message contains image_url parts, False otherwise | ||
| """ | ||
| res = precomputed or multimodal_check(messages) | ||
| return res.has_multimodal | ||
|
|
||
|
|
||
| def validate_multimodal_content( | ||
| messages: List[Message], precomputed: Optional[MultimodalCheck] = None | ||
| ) -> None: | ||
| """ | ||
| Validate that multimodal content (image_url parts) follows the required format. | ||
|
|
||
| Args: | ||
| messages: List of messages to validate | ||
| precomputed: Optional precomputed result from multimodal_check() to avoid re-iterating | ||
|
|
||
| Raises: | ||
| HTTPException(400): When image_url parts don't have required URL or use invalid format | ||
| (only base64 data URLs are allowed: data:image/...;base64,...) | ||
| """ | ||
| res = precomputed or multimodal_check(messages) | ||
| if res.error: | ||
| raise HTTPException(status_code=400, detail=res.error) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,27 @@ | ||
| import uuid | ||
|
|
||
| from typing import Annotated, Iterable, List, Literal, Optional | ||
| from typing import Annotated, Iterable, List, Literal, Optional, Union | ||
|
|
||
| from openai.types.chat import ChatCompletion, ChatCompletionMessage | ||
| from openai.types.chat.chat_completion import Choice as OpenaAIChoice | ||
| from openai.types.chat import ChatCompletion | ||
| from openai.types.chat import ChatCompletionToolParam | ||
| from openai.types.chat.chat_completion import Choice as OpenaAIChoice | ||
| from pydantic import BaseModel, Field | ||
|
|
||
|
|
||
| class Message(ChatCompletionMessage): | ||
| role: Literal["system", "user", "assistant", "tool"] # type: ignore | ||
|
Comment on lines
-11
to
-12
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. class Message (ChatCompletionMessageParam):
pass |
||
| class ImageUrl(BaseModel): | ||
| url: str | ||
| detail: Optional[str] = "auto" | ||
|
|
||
|
|
||
| class MessageContentItem(BaseModel): | ||
| type: Literal["text", "image_url"] | ||
| text: Optional[str] = None | ||
| image_url: Optional[ImageUrl] = None | ||
|
|
||
|
|
||
| class Message(BaseModel): | ||
| role: Literal["system", "user", "assistant", "tool"] | ||
| content: Union[str, List[MessageContentItem]] | ||
jcabrero marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| class Choice(OpenaAIChoice): | ||
|
|
@@ -71,6 +83,7 @@ class ModelMetadata(BaseModel): | |
| source: str | ||
| supported_features: List[str] | ||
| tool_support: bool | ||
| multimodal_support: bool = False | ||
|
|
||
|
|
||
| class ModelEndpoint(BaseModel): | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.