feat(sdk): speculative cloud loader decision layer#250
Conversation
Add set_speculative_cloud / is_speculative_cloud_enabled global toggle and per-load ModelLoader::with_speculative_cloud override. will_speculate gates on enabled + resolvable cloud key + model not already cached locally, via a pure speculative_gate helper. Override beats the process-global default; non-registry sources never speculate.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a speculative cloud fallback feature to the SDK, allowing models to be served from the cloud gateway while downloading locally. The reviewer provided valuable feedback on optimizing performance: short-circuiting evaluation in will_speculate to avoid unnecessary disk I/O and environment lookups, using CacheManager directly instead of instantiating a heavy RegistryClient in is_extracted_locally, and leveraging crate::has_api_key() instead of constructing a full CloudConfig in cloud_api_key_present.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| pub fn will_speculate(&self) -> bool { | ||
| speculative_gate( | ||
| self.speculative_enabled(), | ||
| cloud_api_key_present(), | ||
| self.is_extracted_locally(), | ||
| ) | ||
| } |
There was a problem hiding this comment.
The current implementation of will_speculate passes the results of self.speculative_enabled(), cloud_api_key_present(), and self.is_extracted_locally() to speculative_gate. Because Rust uses eager evaluation for function arguments, this means all three checks are always executed, even when speculation is disabled (which is the default).
This results in unnecessary environment variable lookups and disk I/O (via is_extracted_locally checking the local cache directory) on every call to will_speculate.
Using short-circuiting boolean operators (&&) ensures that we only perform the API key check and disk I/O when speculation is actually enabled.
pub fn will_speculate(&self) -> bool {
self.speculative_enabled()
&& cloud_api_key_present()
&& !self.is_extracted_locally()
}There was a problem hiding this comment.
Done in 2a3e3ef — will_speculate now short-circuits with &&, so the key lookup and the local-cache disk check only run when speculation is enabled. Inlined the gate and dropped the speculative_gate helper.
| fn is_extracted_locally(&self) -> bool { | ||
| match &self.source { | ||
| ModelSource::Registry { id, .. } => { | ||
| RegistryClient::from_env().is_ok_and(|client| client.is_extracted(id)) | ||
| } | ||
| _ => true, | ||
| } | ||
| } |
There was a problem hiding this comment.
Instantiating RegistryClient::from_env() just to check if a model is extracted locally is extremely heavy. RegistryClient::from_env() creates a new HTTP agent (ureq::Agent) with connection/thread pools, allocates circuit breakers, and queries environment variables, none of which are needed for a local cache check.
Since RegistryClient::is_extracted simply forwards the call to CacheManager::is_extracted, we can instantiate CacheManager directly to avoid the heavy overhead of creating an HTTP client.
| fn is_extracted_locally(&self) -> bool { | |
| match &self.source { | |
| ModelSource::Registry { id, .. } => { | |
| RegistryClient::from_env().is_ok_and(|client| client.is_extracted(id)) | |
| } | |
| _ => true, | |
| } | |
| } | |
| fn is_extracted_locally(&self) -> bool { | |
| match &self.source { | |
| ModelSource::Registry { id, .. } => { | |
| crate::cache::CacheManager::new().is_ok_and(|cache| cache.is_extracted(id)) | |
| } | |
| _ => true, | |
| } | |
| } |
There was a problem hiding this comment.
Done in 2a3e3ef — is_extracted_locally now uses CacheManager::new() directly instead of RegistryClient::from_env(), avoiding the HTTP agent and circuit-breaker allocation for a pure local cache check.
| fn cloud_api_key_present() -> bool { | ||
| xybrid_core::cloud::CloudConfig::default() | ||
| .resolve_api_key() | ||
| .is_some() | ||
| } |
There was a problem hiding this comment.
Using xybrid_core::cloud::CloudConfig::default() to resolve the API key is inefficient because CloudConfig::default() queries unrelated environment variables (like XYBRID_GATEWAY_URL and XYBRID_PLATFORM_URL) to build the default gateway URL.
Since crate::has_api_key() is already defined in the SDK and performs the exact same check (checking the in-memory key and falling back to the XYBRID_API_KEY environment variable) without constructing a CloudConfig, we can use it here to avoid unnecessary environment lookups.
fn cloud_api_key_present() -> bool {
crate::has_api_key()
}There was a problem hiding this comment.
Done in 2a3e3ef — switched to crate::has_api_key(), which checks the in-memory key and XYBRID_API_KEY without constructing a CloudConfig or resolving the gateway URL.
Address review: resolve the cloud key via has_api_key() instead of constructing a CloudConfig; check the local cache with CacheManager directly instead of spinning up a RegistryClient HTTP agent; and short-circuit will_speculate so the key lookup and disk check are skipped when speculation is disabled. Inline the gate and drop the now-redundant speculative_gate helper.
Summary
Adds the loader-side decision layer for speculative cloud fallback: a process-global
set_speculative_cloud/is_speculative_cloud_enabledtoggle, a per-loadModelLoader::with_speculative_cloudoverride, andwill_speculate, which gates on speculation being enabled, a resolvable cloud API key, and the model not already being cached locally (via the purespeculative_gatehelper). A per-load override beats the global default, and non-registry sources (bundle/directory/HuggingFace) never speculate. This is the loader half of the feature; the cloud-execution routing and background download are tracked separately.Type of Change
Checklist
cargo test)