-
Notifications
You must be signed in to change notification settings - Fork 7
Add dynamic model discovery with file-based caching #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add dynamic model discovery with file-based caching #8
Conversation
Implement dynamic model fetching from Vertex AI's Model Garden API with multi-layer caching to minimize API calls and improve performance. ## Features - **Dynamic Model Discovery**: Automatically fetch available Gemini models from Vertex AI Model Garden API - **Multi-Layer Caching**: - In-memory cache for the duration of the process - File-based cache (~/.cache/llm-vertex/models.json) with configurable TTL (default 24 hours) - **Smart Fallback**: Falls back to hardcoded model list if API unavailable or credentials not configured - **Configuration Options**: - VERTEX_DISABLE_DYNAMIC_MODELS: Disable dynamic fetching - VERTEX_CACHE_TTL: Configure cache expiry time in seconds ## Implementation - Use ModelGardenServiceClient from google-cloud-aiplatform-v1beta1 to list publisher models - Filter for Gemini models only - Cache results to avoid repeated API calls across CLI executions - DRY refactoring with FALLBACK_MODELS constant and _cache_and_return() helper ## Benefits - Always up-to-date with latest Gemini models without plugin updates - Fast CLI startup (no API calls within cache TTL) - Reduced API costs - Works offline if cache exists - Graceful degradation on failures 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements dynamic model discovery from Vertex AI's Model Garden API with multi-layer caching to reduce API calls and improve performance.
Key changes:
- Dynamic fetching of available Gemini models from Vertex AI API with smart fallback to hardcoded list
- Two-tier caching system: in-memory cache for process duration and file-based cache with configurable TTL
- Configuration options to disable dynamic fetching and customize cache expiration
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| llm_vertex.py | Adds dynamic model discovery logic with caching functions and updated model registration |
| tests/test_llm_vertex.py | Adds comprehensive test coverage for caching behavior, API fallbacks, and configuration options |
| README.md | Updates documentation to describe dynamic model discovery and configuration options |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if time.time() - cached_time > cache_ttl: | ||
| return None | ||
|
|
||
| return cache_data.get('models') or None |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression cache_data.get('models') or None is redundant. If 'models' key is missing, get() already returns None. If it exists but contains an empty list, this will incorrectly convert it to None. Either return cache_data.get('models') directly, or use cache_data.get('models', None) for clarity.
| return cache_data.get('models') or None | |
| return cache_data.get('models') |
|
|
||
| except Exception as e: | ||
| # If any error occurs reading cache, just return None | ||
| print(f"Warning: Could not read cache file: {e}") |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.
|
|
||
| except Exception as e: | ||
| # If we can't write cache, just log and continue | ||
| print(f"Warning: Could not write cache file: {e}") |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.
| # Extract model name from the full path (publishers/google/models/gemini-xxx) | ||
| models.append(model.name.split('/')[-1]) | ||
| except Exception as e: | ||
| print(f"Warning: Could not fetch models from Vertex AI: {e}") |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.
| return _cache_and_return(models if models else FALLBACK_MODELS) | ||
|
|
||
| except Exception as e: | ||
| print(f"Warning: Could not fetch models dynamically: {e}") |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.
Testing out claude code's github app for this
Implement dynamic model fetching from Vertex AI's Model Garden API with multi-layer caching to minimize API calls and improve performance.
Features
🤖 Generated with Claude Code