Skip to content

Conversation

@justyns
Copy link
Owner

@justyns justyns commented Oct 21, 2025

Testing out claude code's github app for this


Implement dynamic model fetching from Vertex AI's Model Garden API with multi-layer caching to minimize API calls and improve performance.

Features

  • Dynamic Model Discovery: Automatically fetch available Gemini models from Vertex AI Model Garden API
  • Multi-Layer Caching:
    • In-memory cache for the duration of the process
    • File-based cache (~/.cache/llm-vertex/models.json) with configurable TTL (default 24 hours)
  • Smart Fallback: Falls back to hardcoded model list if API unavailable or credentials not configured
  • Configuration Options:
    • VERTEX_DISABLE_DYNAMIC_MODELS: Disable dynamic fetching
    • VERTEX_CACHE_TTL: Configure cache expiry time in seconds

🤖 Generated with Claude Code

Implement dynamic model fetching from Vertex AI's Model Garden API with multi-layer caching to minimize API calls and improve performance.

## Features

- **Dynamic Model Discovery**: Automatically fetch available Gemini models from Vertex AI Model Garden API
- **Multi-Layer Caching**:
  - In-memory cache for the duration of the process
  - File-based cache (~/.cache/llm-vertex/models.json) with configurable TTL (default 24 hours)
- **Smart Fallback**: Falls back to hardcoded model list if API unavailable or credentials not configured
- **Configuration Options**:
  - VERTEX_DISABLE_DYNAMIC_MODELS: Disable dynamic fetching
  - VERTEX_CACHE_TTL: Configure cache expiry time in seconds

## Implementation

- Use ModelGardenServiceClient from google-cloud-aiplatform-v1beta1 to list publisher models
- Filter for Gemini models only
- Cache results to avoid repeated API calls across CLI executions
- DRY refactoring with FALLBACK_MODELS constant and _cache_and_return() helper

## Benefits

- Always up-to-date with latest Gemini models without plugin updates
- Fast CLI startup (no API calls within cache TTL)
- Reduced API costs
- Works offline if cache exists
- Graceful degradation on failures

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@justyns justyns requested a review from Copilot October 21, 2025 07:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements dynamic model discovery from Vertex AI's Model Garden API with multi-layer caching to reduce API calls and improve performance.

Key changes:

  • Dynamic fetching of available Gemini models from Vertex AI API with smart fallback to hardcoded list
  • Two-tier caching system: in-memory cache for process duration and file-based cache with configurable TTL
  • Configuration options to disable dynamic fetching and customize cache expiration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
llm_vertex.py Adds dynamic model discovery logic with caching functions and updated model registration
tests/test_llm_vertex.py Adds comprehensive test coverage for caching behavior, API fallbacks, and configuration options
README.md Updates documentation to describe dynamic model discovery and configuration options

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

if time.time() - cached_time > cache_ttl:
return None

return cache_data.get('models') or None
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression cache_data.get('models') or None is redundant. If 'models' key is missing, get() already returns None. If it exists but contains an empty list, this will incorrectly convert it to None. Either return cache_data.get('models') directly, or use cache_data.get('models', None) for clarity.

Suggested change
return cache_data.get('models') or None
return cache_data.get('models')

Copilot uses AI. Check for mistakes.

except Exception as e:
# If any error occurs reading cache, just return None
print(f"Warning: Could not read cache file: {e}")
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.

Copilot uses AI. Check for mistakes.

except Exception as e:
# If we can't write cache, just log and continue
print(f"Warning: Could not write cache file: {e}")
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.

Copilot uses AI. Check for mistakes.
# Extract model name from the full path (publishers/google/models/gemini-xxx)
models.append(model.name.split('/')[-1])
except Exception as e:
print(f"Warning: Could not fetch models from Vertex AI: {e}")
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.

Copilot uses AI. Check for mistakes.
return _cache_and_return(models if models else FALLBACK_MODELS)

except Exception as e:
print(f"Warning: Could not fetch models dynamically: {e}")
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings in library code is not ideal. Consider using Python's logging module or the warnings module to allow users to control log output and severity levels.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants