feat: on-demand keyed data sources for Breeze Buddy by cmd-err · Pull Request #839 · juspay/clairvoyance

cmd-err · 2026-06-16T22:49:19Z

Adds a pluggable connector layer on top of the inline Google Sheets data source so the agent loads one keyed slice (e.g. a single sheet tab) on demand mid-call, instead of injecting every source up front.

connector registry (type -> connector) with lazy bootstrap so the builtin connector is always registered, even in a cold process
GoogleSheetConnector: keyed fetch / list_keys / cache_key (the only module that knows "tab"/"spreadsheet")
DataSourceRef gains type + mode (eager|on_demand) + connector_config()
load_data_source global builtin: cache-first (prefetched), fail-open keyed load; global fn so it never transitions the node
loader skips on_demand sources in eager injection (keeps context small)
prefetch warms every tab for on_demand sources (first load is a hit)
24 tests (registry, connector, model, handler, prefetch, cold-process wiring)

Summary by CodeRabbit

New Features
- Voice templates now support Google Sheets data sources with eager (preloaded) or on-demand (loaded dynamically) modes.
- Admin users can explore sheets, discover tabs and columns, and preview data via new discovery endpoints.
- Fetched data is automatically cached and formatted as Markdown tables, CSV, or JSON for AI context.

Adds a pluggable connector layer on top of the inline Google Sheets data source so the agent loads one keyed slice (e.g. a single sheet tab) on demand mid-call, instead of injecting every source up front. - connector registry (type -> connector) with lazy bootstrap so the builtin connector is always registered, even in a cold process - GoogleSheetConnector: keyed fetch / list_keys / cache_key (the only module that knows "tab"/"spreadsheet") - DataSourceRef gains `type` + `mode` (eager|on_demand) + connector_config() - load_data_source global builtin: cache-first (prefetched), fail-open keyed load; global fn so it never transitions the node - loader skips on_demand sources in eager injection (keeps context small) - prefetch warms every tab for on_demand sources (first load is a hit) - 24 tests (registry, connector, model, handler, prefetch, cold-process wiring) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-16T22:49:33Z

Walkthrough

Adds Google Sheets as an inline data source for Breeze Buddy templates. Introduces DataSourceRef types, a DataSourceConnector protocol with a registry, a Google Sheets service module, eager loading at template-load time, an LLM-callable load_data_source on-demand handler, a Redis prefetch manager triggered at dispatch, three admin sheet-discovery API endpoints, and full database persistence via a new data_sources JSONB column.

Changes

Google Sheets Data Source Integration

Layer / File(s)	Summary
Data source types and connector protocol `app/ai/voice/agents/breeze_buddy/template/types.py`, `app/services/data_sources.py`	Adds `DataSourceFormat`, `DataSourceMode`, `DataSourceRef` Pydantic models and extends `TemplateModel`, `CreateTemplateRequest`, `ReplaceTemplateRequest` with `data_sources`. Defines the `DataSourceConnector` `Protocol`, `DATA_SOURCE_UNAVAILABLE` sentinel, connector registry, lazy bootstrap, `register_connector`, and `get_connector`.
Google Sheets service and connector `app/services/google/sheets.py`, `app/services/google/sheets_connector.py`, `pyproject.toml`	Adds full Google Sheets API service (auth session, `extract_spreadsheet_id`, `list_tabs`, `get_column_headers`, `fetch_sheet_data`, row formatters, `fetch_formatted`). Implements `GoogleSheetConnector` mapping the `DataSourceConnector` protocol to the Sheets service with deterministic Redis cache keys and self-registration at import time. Adds `google-api-python-client>=2.86.0` dependency.
Eager data source injection in template loader `app/ai/voice/agents/breeze_buddy/template/loader.py`	Adds `eager_data_sources` filter and `_fetch_one_data_source` method to `FlowConfigLoader`. During `load_template`, eager sources are fetched concurrently and injected into `template_vars` under `datasource_<name>` keys with Redis read-through caching and a 5s timeout.
On-demand handler, prefetch manager, and dispatch wiring `app/ai/voice/agents/breeze_buddy/handlers/internal/load_data_source.py`, `app/ai/voice/agents/breeze_buddy/handlers/internal/builtin_dispatcher.py`, `app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py`, `app/ai/voice/agents/breeze_buddy/dispatch/worker.py`	Implements `load_data_source` LLM-callable handler (Redis fast path, connector fetch with 5s timeout, fail-open) registered in `BUILTIN_HANDLERS`. Adds `data_source_prefetch` manager pre-warming Redis for all sources concurrently at dispatch time. `worker.py` schedules `prefetch_data_sources` as an asyncio task when a template has `data_sources`.
Sheet discovery admin API endpoints `app/schemas/breeze_buddy/data_source.py`, `app/api/routers/breeze_buddy/data_sources/__init__.py`, `app/api/routers/breeze_buddy/data_sources/handlers.py`, `app/api/routers/breeze_buddy/__init__.py`	Adds `TabsResponse`, `ColumnsResponse`, `PreviewResponse` schemas. Implements three async handlers with URL validation (HTTP 400 on invalid URL) and admin RBAC. Wires GET `/data-sources/sheets/tabs`, `/columns`, and `/preview` into the Breeze Buddy router.
Database persistence and template API propagation `app/database/migrations/034_add_data_sources_to_template.sql`, `app/database/queries/breeze_buddy/template.py`, `app/database/accessor/breeze_buddy/template.py`, `app/database/decoder/breeze_buddy/template.py`, `app/api/routers/breeze_buddy/templates/handlers.py`, `app/ai/voice/agents/breeze_buddy/template/field_reference.json`	Adds nullable `data_sources` JSONB column to the `template` table. Extends all SELECT/INSERT/UPDATE queries to include `data_sources`. Updates `create_template` and `replace_template` accessor signatures with JSON serialization. Adds `data_sources` decoding and `TemplateModel` construction. Forwards `data_sources` via `model_dump()` in template API handlers. Updates field reference docs.
Tests `tests/test_data_source_connectors.py`, `tests/test_google_sheets.py`	`test_data_source_connectors.py` covers registry behavior, `GoogleSheetConnector` paths, `DataSourceRef` defaults, `load_data_source` cache hit/miss/fail-open, dispatcher registration, eager filter, and prefetch routing. `test_google_sheets.py` covers URL parsing, row formatters with escaping, variable namespacing, and async Sheets API functions via mocked session.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Admin Client
    participant API as FastAPI Router
    participant SheetsHandler as data_sources/handlers.py
    participant SheetsService as sheets.py
    participant GCP as Google Sheets API

    rect rgba(100, 149, 237, 0.5)
        note over Client,GCP: Template Configuration Discovery
        Client->>API: GET /data-sources/sheets/tabs?spreadsheet_url=...
        API->>SheetsHandler: list_tabs_handler(url, user)
        SheetsHandler->>SheetsService: extract_spreadsheet_id(url)
        SheetsService-->>SheetsHandler: spreadsheet_id (or HTTP 400)
        SheetsHandler->>SheetsService: list_tabs(spreadsheet_id)
        SheetsService->>GCP: GET spreadsheets/{id} via thread pool
        GCP-->>SheetsService: sheet metadata
        SheetsService-->>SheetsHandler: [tab names]
        SheetsHandler-->>Client: TabsResponse
    end

    rect rgba(144, 238, 144, 0.5)
        note over Client,GCP: Runtime Voice Call Dispatch
        Client->>API: POST /calls (template with data_sources)
        API->>Worker: dispatch call
        Worker->>Worker: asyncio.create_task(prefetch_data_sources(template))
        Worker->>FlowConfigLoader: load_template()
        FlowConfigLoader->>SheetsService: fetch_formatted() for EAGER sources
        SheetsService->>GCP: fetch via thread pool
        GCP-->>SheetsService: row data
        SheetsService-->>FlowConfigLoader: formatted string
        FlowConfigLoader->>Redis: setex(cache_key, TTL, content)
        FlowConfigLoader-->>Worker: template_vars injected with datasource_<name>
    end

    rect rgba(255, 165, 0, 0.5)
        note over Client,GCP: LLM On-Demand Data Load
        Worker->>LLM: conversation with template context
        LLM->>load_data_source: call(source="prices", key="Sheet1")
        load_data_source->>Redis: get(cache_key)
        alt cache hit
            Redis-->>load_data_source: cached content
        else cache miss
            load_data_source->>GoogleSheetConnector: fetch(config, key)
            GoogleSheetConnector->>SheetsService: fetch_formatted()
            SheetsService->>GCP: GET values/{range}
            GCP-->>SheetsService: values
            SheetsService-->>load_data_source: formatted string
            load_data_source->>Redis: setex(cache_key, 60, content)
        end
        load_data_source-->>LLM: {status, source, key, content}
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 A bunny hops through spreadsheets wide,
Fetching tabs and columns with pride!
Redis caches warm the cache key trail,
Eager loads and on-demand prevail.
The LLM asks, the connector replies—
🌿 Google Sheets data, no surprise!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.03% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: introducing on-demand keyed data sources for Breeze Buddy, which is the core feature described throughout the PR objectives and file changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

coderabbitai

Actionable comments posted: 16

🧹 Nitpick comments (4)

app/api/routers/breeze_buddy/data_sources/__init__.py (1)

31-34: ⚡ Quick win

Add explicit return type hints to route handlers.

get_sheet_tabs, get_sheet_columns, and preview_sheet are missing return annotations. Add -> TabsResponse, -> ColumnsResponse, and -> PreviewResponse for guideline compliance and clearer contracts.

Proposed fix

 async def get_sheet_tabs(
     spreadsheet_url: str = Query(..., description="Full Google Sheets URL"),
     current_user: UserInfo = Depends(get_current_user_with_rbac),
-):
+) -> TabsResponse:
@@
 async def get_sheet_columns(
@@
     current_user: UserInfo = Depends(get_current_user_with_rbac),
-):
+) -> ColumnsResponse:
@@
 async def preview_sheet(
@@
     current_user: UserInfo = Depends(get_current_user_with_rbac),
-):
+) -> PreviewResponse:

As per coding guidelines, “Add type hints on all function signatures.”

Also applies to: 41-47, 54-62

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/routers/breeze_buddy/data_sources/__init__.py` around lines 31 - 34,
Add explicit return type annotations to the three async route handler functions
to comply with coding guidelines. For the function get_sheet_tabs, add the
return type hint TabsResponse. For get_sheet_columns (located around lines
41-47), add the return type hint ColumnsResponse. For preview_sheet (located
around lines 54-62), add the return type hint PreviewResponse. Each return type
annotation should be placed after the closing parenthesis of the function
parameters and before the colon that precedes the function body.

Source: Coding guidelines

app/ai/voice/agents/breeze_buddy/template/loader.py (1)

341-342: ⚡ Quick win

Silent exception swallowing hinders debugging.

The try-except-pass blocks suppress all Redis errors without any logging. While fail-open semantics are correct here, adding a debug-level log would help operators diagnose intermittent cache issues.

Proposed fix

         try:
             redis = await get_redis_service()
             cached = await redis.get(cache_key)
             if cached:
                 return cached
-        except Exception:
-            pass
+        except Exception as exc:
+            logger.debug("Cache read failed for '%s': %s", ref.name, exc)

             try:
                 redis = await get_redis_service()
                 await redis.setex(cache_key, content, ttl_seconds=60)
-            except Exception:
-                pass
+            except Exception as exc:
+                logger.debug("Cache write failed for '%s': %s", ref.name, exc)

Also applies to: 367-368

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 341 - 342,
The except Exception blocks with only pass statements suppress all Redis errors
without any logging, making it difficult to debug intermittent cache issues.
Replace the pass statements in both exception handlers (at lines 341-342 and
367-368) with a debug-level log statement that captures the actual exception
details. This allows operators to diagnose cache-related problems while
maintaining the fail-open semantics.

app/database/queries/breeze_buddy/template.py (1)

183-187: ⚡ Quick win

Metadata list query is over-fetching data_sources.

Line 186 pulls data_sources, but the accessor maps list results into TemplateMetadata without this field. This adds unnecessary JSONB I/O on a hot listing path.

Suggested patch

-               name, is_active, supported_channels, data_sources, created_at, updated_at
+               name, is_active, supported_channels, created_at, updated_at

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/database/queries/breeze_buddy/template.py` around lines 183 - 187, The
SELECT statement in the template metadata list query is unnecessarily fetching
the data_sources field, which is a JSONB column that adds I/O overhead. Remove
data_sources from the SELECT clause on line 186 since the TemplateMetadata
object that receives these query results does not have a corresponding field for
it. This will improve performance on this frequently-used listing endpoint by
eliminating unnecessary JSONB data retrieval.

pyproject.toml (1)

26-26: Remove the unused google-api-python-client dependency.

No imports of googleapiclient exist in the codebase. Since the Sheets implementation uses google-auth + direct REST calls instead, this dependency adds unnecessary supply-chain and startup overhead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` at line 26, Remove the unused google-api-python-client
dependency from the pyproject.toml file. Since the codebase does not import
googleapiclient and the Sheets implementation uses google-auth with direct REST
calls instead, delete the line containing the google-api-python-client>=2.86.0
dependency declaration to reduce supply-chain and startup overhead.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/agents/breeze_buddy/handlers/internal/load_data_source.py`:
- Around line 79-91: The Redis method calls in the load_data_source function are
missing the required namespace parameter according to coding guidelines. Update
the redis.get() call that retrieves the cached value using cache_key to include
the namespace parameter as specified in the guidelines. Additionally, update the
redis.setex() call around line 113 to also include the namespace parameter in
both locations where Redis methods are invoked.

In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py`:
- Around line 79-82: The redis.setex() method calls in the file are missing the
required namespace parameter according to coding guidelines. Locate all
redis.setex() calls (at line 82 in the try block and at lines 111-114 in the
exception handler) and add the namespace parameter to each call to comply with
the coding standards for Redis operations.
- Around line 37-45: The _cache_key() function duplicates cache key generation
logic and causes inconsistency between _prefetch_one() and _prefetch_on_demand()
methods, where _prefetch_one uses _cache_key(ref) and _prefetch_on_demand uses
connector.cache_key(config, key). Remove the _cache_key() function entirely and
refactor _prefetch_one() to use connector.cache_key() with the same parameters
passed to _prefetch_on_demand() to ensure both methods generate identical cache
keys for the same data source.

In `@app/ai/voice/agents/breeze_buddy/template/field_reference.json`:
- Line 21: The description field for the data_sources object in
field_reference.json incorrectly documents how template variables are injected
at runtime. Update the description to accurately reflect the actual runtime
behavior: eager data sources are injected as {datasource_<name>} template
variables (not {name}), and on-demand sources are not injected wholesale into
the LLM context. Correct the documentation to match the actual template variable
naming convention and clarify the distinction between eager and on-demand source
injection behavior.

In `@app/ai/voice/agents/breeze_buddy/template/loader.py`:
- Around line 336-342: The Redis calls within the try block and the subsequent
write operations are missing the required namespace parameter in their method
invocations. Add the namespace parameter to both the redis.get() call (which
retrieves the cached value using cache_key) and the redis.setex() calls that
write to the cache, ensuring all Redis operations follow the coding guidelines
by specifying a namespace to prevent key collisions across services.
- Around line 329-334: The cache key computation in the loader.py file (the code
block computing cache_key with spreadsheet_id, sheet, cols_digest, and
format.value) duplicates the implementation from GoogleSheetConnector.cache_key
in sheets_connector.py. To fix this, refactor the code to call the connector's
cache_key method instead of recomputing the key locally, ensuring both
eager-loaded and on-demand handlers use the same key generation logic and
preventing cache misses or stale data when implementations diverge.

In `@app/api/routers/breeze_buddy/data_sources/handlers.py`:
- Around line 90-95: The issue is that when the preview returns zero rows, the
columns field in the PreviewResponse becomes an empty list even if the caller
explicitly requested specific columns. To fix this, capture the requested
columns from the function parameters or request object, and modify the logic
that determines col_names so that when rows is empty, it uses the requested
columns instead of defaulting to an empty list. This ensures the API returns
deterministic column information regardless of whether data rows are present.
- Around line 60-64: The ColumnsResponse at the get_column_headers call returns
an empty string for sheet_name when it is None, but the columns are actually
fetched from a specific default tab. Instead of returning an empty string on
line 63, determine the actual resolved tab name that was used by
get_column_headers (the default first tab when sheet_name is not provided) and
return that resolved name in the ColumnsResponse construction. This ensures the
response accurately reflects which tab the columns were retrieved from and
eliminates ambiguity for clients consuming this endpoint.

In `@app/api/routers/breeze_buddy/templates/handlers.py`:
- Around line 543-547: In the data dictionary being constructed around the
data_sources assignment, replace the conditional that sets data_sources to None
when omitted with logic that only includes the data_sources key if
template_data.data_sources is actually provided. Instead of using else None,
conditionally add the data_sources field to the dictionary only when the field
has a value, so omitted fields do not overwrite existing database values with
NULL.

In `@app/database/accessor/breeze_buddy/template.py`:
- Line 134: The current code uses a truthiness check when converting
data_sources to JSON, which causes empty lists to be treated as falsy and
converted to None (SQL NULL). Replace the truthiness check with an explicit `is
not None` comparison in both occurrences (line 134 and line 428) to preserve
empty lists as empty JSON arrays instead of coercing them to NULL. This ensures
the distinction between an empty list and a null value is maintained in the
database.

In `@app/services/data_sources.py`:
- Around line 49-60: The _bootstrapped flag is being set to True before the
connector imports execute, so if an import fails, the flag remains True and
subsequent calls to _bootstrap_builtin_connectors() will return early without
retrying the failed import. Move the _bootstrapped = True assignment to after
all the import statements (both the google.sheets_connector import and any other
connectors mentioned in the "Also applies to" comment) have completed
successfully, so that if any import fails, the flag remains False and the
bootstrap can be retried on the next call.

In `@app/services/google/sheets_connector.py`:
- Around line 30-31: The issue is that `config.get("spreadsheet_url", "")` can
return None or non-string values even though a default empty string is provided,
and passing these directly to `extract_spreadsheet_id()` causes a TypeError
instead of gracefully handling the missing/invalid ID. Fix this at all three
locations (around lines 30, 48, and 55) by normalizing the spreadsheet_url value
to ensure it is always a string type before passing it to
`extract_spreadsheet_id()`. Convert the config value to a string or empty string
to preserve the fail-open behavior where the `if not spreadsheet_id:` checks
properly trigger the DATA_SOURCE_UNAVAILABLE flow without exceptions.
- Around line 32-35: The logger.warning call in the GoogleSheetConnector class
is using printf-style formatting with %s, but Loguru only supports curly-brace
style formatting with {}. Replace the %s placeholder in the format string with
{} to ensure the spreadsheet_url value from config.get("spreadsheet_url") is
properly interpolated and logged.

In `@app/services/google/sheets.py`:
- Around line 247-269: Rename the `format` parameter to avoid shadowing Python's
builtin (e.g., change it to `format_type` or `output_format`). This requires
updating the parameter name in the function definition in sheets.py (around line
247), and then updating all call sites that pass the `format=` keyword argument:
in sheets_connector.py, the three calls in tests/test_google_sheets.py, the mock
function signature in tests/test_data_source_connectors.py, the call in
app/ai/voice/agents/breeze_buddy/template/loader.py, and the call in
app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py. Ensure all
five call sites use the new parameter name consistently to avoid TypeError for
unexpected keyword arguments.

In `@tests/test_data_source_connectors.py`:
- Around line 102-108: The _fake_fetch_formatted function uses `format` as a
parameter name, which shadows Python's builtin function and triggers Ruff A002.
Refactor the function signature to accept **kwargs instead of named parameters
(spreadsheet_id, sheet_name, columns, format), then update the captured.update()
call to read values from kwargs using keys like kwargs["format"],
kwargs["spreadsheet_id"], etc. Apply the same fix to the other occurrence
mentioned at lines 131-133.
- Around line 66-72: The subprocess.run call in the test lacks a timeout
parameter which can cause the test suite to hang indefinitely, and it also
triggers the Ruff S603 security lint rule. Add an explicit timeout parameter to
the subprocess.run call to prevent hangs, and add a noqa comment (# noqa: S603)
to the same line to annotate that this subprocess use is intentional and
suppress the lint gate failure.

---

Nitpick comments:
In `@app/ai/voice/agents/breeze_buddy/template/loader.py`:
- Around line 341-342: The except Exception blocks with only pass statements
suppress all Redis errors without any logging, making it difficult to debug
intermittent cache issues. Replace the pass statements in both exception
handlers (at lines 341-342 and 367-368) with a debug-level log statement that
captures the actual exception details. This allows operators to diagnose
cache-related problems while maintaining the fail-open semantics.

In `@app/api/routers/breeze_buddy/data_sources/__init__.py`:
- Around line 31-34: Add explicit return type annotations to the three async
route handler functions to comply with coding guidelines. For the function
get_sheet_tabs, add the return type hint TabsResponse. For get_sheet_columns
(located around lines 41-47), add the return type hint ColumnsResponse. For
preview_sheet (located around lines 54-62), add the return type hint
PreviewResponse. Each return type annotation should be placed after the closing
parenthesis of the function parameters and before the colon that precedes the
function body.

In `@app/database/queries/breeze_buddy/template.py`:
- Around line 183-187: The SELECT statement in the template metadata list query
is unnecessarily fetching the data_sources field, which is a JSONB column that
adds I/O overhead. Remove data_sources from the SELECT clause on line 186 since
the TemplateMetadata object that receives these query results does not have a
corresponding field for it. This will improve performance on this
frequently-used listing endpoint by eliminating unnecessary JSONB data
retrieval.

In `@pyproject.toml`:
- Line 26: Remove the unused google-api-python-client dependency from the
pyproject.toml file. Since the codebase does not import googleapiclient and the
Sheets implementation uses google-auth with direct REST calls instead, delete
the line containing the google-api-python-client>=2.86.0 dependency declaration
to reduce supply-chain and startup overhead.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2b580490-0c23-4446-90a2-c0feb4c10073

📥 Commits

Reviewing files that changed from the base of the PR and between 58d3fbc and 6f90dcc.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (23)

app/ai/voice/agents/breeze_buddy/dispatch/worker.py
app/ai/voice/agents/breeze_buddy/handlers/internal/builtin_dispatcher.py
app/ai/voice/agents/breeze_buddy/handlers/internal/load_data_source.py
app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py
app/ai/voice/agents/breeze_buddy/template/field_reference.json
app/ai/voice/agents/breeze_buddy/template/loader.py
app/ai/voice/agents/breeze_buddy/template/types.py
app/api/routers/breeze_buddy/__init__.py
app/api/routers/breeze_buddy/data_sources/__init__.py
app/api/routers/breeze_buddy/data_sources/handlers.py
app/api/routers/breeze_buddy/templates/handlers.py
app/database/accessor/breeze_buddy/template.py
app/database/decoder/breeze_buddy/template.py
app/database/migrations/034_add_data_sources_to_template.sql
app/database/queries/breeze_buddy/template.py
app/schemas/breeze_buddy/data_source.py
app/services/data_sources.py
app/services/google/__init__.py
app/services/google/sheets.py
app/services/google/sheets_connector.py
pyproject.toml
tests/test_data_source_connectors.py
tests/test_google_sheets.py

coderabbitai · 2026-06-16T23:01:44Z

+    try:
+        redis = await get_redis_service()
+        cached = await redis.get(cache_key)
+        if cached:
+            logger.info("[load_data_source] cache hit '%s' key=%s", source, key)
+            return {
+                "status": "success",
+                "source": source,
+                "key": key,
+                "content": cached,
+            }
+    except Exception as exc:  # never let cache issues break the call
+        logger.warning("[load_data_source] cache read failed: %s", exc)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing namespace parameter in Redis calls.

Same issue as in loader.py: the Redis get (line 81) and setex (line 113) calls lack the namespace parameter required by coding guidelines.

Proposed fix

# Fast path — prefetched slice already in Redis. try: redis = await get_redis_service() - cached = await redis.get(cache_key) + cached = await redis.get(cache_key, namespace="datasource") if cached:

# Warm the cache for subsequent loads of the same slice. try: redis = await get_redis_service() - await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL) + await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL, namespace="datasource") except Exception as exc:

Also applies to: 110-115

🧰 Tools

🪛 Ruff (0.15.17)

[warning] 90-90: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/handlers/internal/load_data_source.py` around lines 79 - 91, The Redis method calls in the load_data_source function are missing the required namespace parameter according to coding guidelines. Update the redis.get() call that retrieves the cached value using cache_key to include the namespace parameter as specified in the guidelines. Additionally, update the redis.setex() call around line 113 to also include the namespace parameter in both locations where Redis methods are invoked.

Source: Coding guidelines

coderabbitai · 2026-06-16T23:01:44Z

+def _cache_key(ref: DataSourceRef) -> str:
+    """Derive a stable cache key from the sheet config."""
+    spreadsheet_id = extract_spreadsheet_id(ref.spreadsheet_url)
+    if not spreadsheet_id:
+        spreadsheet_id = "invalid_url"
+    sheet = ref.sheet_name or "_first_"
+    cols = json.dumps(sorted(ref.columns or []))
+    cols_digest = hashlib.md5(cols.encode()).hexdigest()[:8]
+    return f"datasource:{spreadsheet_id}:{sheet}:{cols_digest}:{ref.format.value}"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cache key logic duplicated; _prefetch_one vs _prefetch_on_demand use different key sources.

_cache_key() is the third copy of this logic (also in loader.py and sheets_connector.py). More critically, _prefetch_one() uses _cache_key(ref) while _prefetch_on_demand() uses connector.cache_key(config, key). If these produce different formats, the eager-prefetched content won't be found by the on-demand loader.

Consider unifying all cache key generation through connector.cache_key() to guarantee consistency.

Proposed refactor for _prefetch_one

async def _prefetch_one(ref: DataSourceRef) -> None: if not ref.is_active: return spreadsheet_id = extract_spreadsheet_id(ref.spreadsheet_url) if not spreadsheet_id: logger.warning("Prefetch: invalid spreadsheet_url for '%s'", ref.name) return + connector = get_connector(ref.type) + if connector is None: + logger.warning("Prefetch: no connector for type '%s'", ref.type) + return + try: content = await asyncio.wait_for( fetch_formatted( spreadsheet_id=spreadsheet_id, sheet_name=ref.sheet_name, columns=ref.columns, format=ref.format.value, ), timeout=_FETCH_TIMEOUT, ) # ... error handling ... if not content or content == DATA_SOURCE_UNAVAILABLE: return - cache_key = _cache_key(ref) + config = ref.connector_config() + cache_key = connector.cache_key(config, ref.sheet_name) try: redis = await get_redis_service()

🧰 Tools

🪛 ast-grep (0.43.0)

[info] 42-42: use jsonify instead of json.dumps for JSON output
Context: json.dumps(sorted(ref.columns or []))
Note: Security best practice.

(use-jsonify)

[warning] 43-43: Do not use insecure functions
Context: hashlib.md5(cols.encode())
Note: [CWE-327] [CWE-328].

(insecure-hash-functions)

🪛 Ruff (0.15.17)

[error] 44-44: Probable use of insecure hash functions in hashlib: md5

(S324)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py` around lines 37 - 45, The _cache_key() function duplicates cache key generation logic and causes inconsistency between _prefetch_one() and _prefetch_on_demand() methods, where _prefetch_one uses _cache_key(ref) and _prefetch_on_demand uses connector.cache_key(config, key). Remove the _cache_key() function entirely and refactor _prefetch_one() to use connector.cache_key() with the same parameters passed to _prefetch_on_demand() to ensure both methods generate identical cache keys for the same data source.

coderabbitai · 2026-06-16T23:01:44Z

+    cache_key = _cache_key(ref)
+    try:
+        redis = await get_redis_service()
+        await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing namespace parameter in Redis setex calls.

Lines 82 and 113-114 use redis.setex() without the namespace parameter required by coding guidelines.

Proposed fix

cache_key = _cache_key(ref) try: redis = await get_redis_service() - await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL) + await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL, namespace="datasource")

try: redis = await get_redis_service() await redis.setex( - connector.cache_key(config, key), content, ttl_seconds=_CACHE_TTL + connector.cache_key(config, key), content, ttl_seconds=_CACHE_TTL, namespace="datasource" )

Also applies to: 111-115

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py` around lines 79 - 82, The redis.setex() method calls in the file are missing the required namespace parameter according to coding guidelines. Locate all redis.setex() calls (at line 82 in the try block and at lines 111-114 in the exception handler) and add the namespace parameter to each call to comply with the coding standards for Redis operations.

Source: Coding guidelines

coderabbitai · 2026-06-16T23:01:44Z

    "is_active":                         { "description": "When false, the template is soft-disabled: dispatch will not enqueue new leads for it. Set to false (not delete) when retiring a template.", "example": true },
-    "supported_channels":                { "description": "Channels this template may be served on. Default ['voice']. Add 'chat' to enable the Buddy Assist web-chat mode. Must have at least one entry.", "example": ["voice"] }
+    "supported_channels":                { "description": "Channels this template may be served on. Default ['voice']. Add 'chat' to enable the Buddy Assist web-chat mode. Must have at least one entry.", "example": ["voice"] },
+    "data_sources":                      { "description": "Google Sheets data sources attached to this template. Each ref has name, spreadsheet_url, sheet_name, columns filter, output format (markdown_table/csv/json), and is_active flag. Fetched at call dispatch and injected as {name} template vars into LLM context.", "example": [{ "name": "customer_list", "spreadsheet_url": "https://docs.google.com/spreadsheets/d/abc123", "sheet_name": "Sheet1", "columns": ["Name", "Status"], "format": "markdown_table", "is_active": true }] }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

data_sources docs are out of sync with runtime contract.

Line 21 says values are injected as {name} at dispatch, but runtime uses {datasource_<name>} for eager sources and does not inject on-demand sources wholesale.

Suggested patch

- "data_sources": { "description": "Google Sheets data sources attached to this template. Each ref has name, spreadsheet_url, sheet_name, columns filter, output format (markdown_table/csv/json), and is_active flag. Fetched at call dispatch and injected as {name} template vars into LLM context.", "example": [{ "name": "customer_list", "spreadsheet_url": "https://docs.google.com/spreadsheets/d/abc123", "sheet_name": "Sheet1", "columns": ["Name", "Status"], "format": "markdown_table", "is_active": true }] } + "data_sources": { "description": "Inline data sources attached to this template. Each ref includes connector `type` and loading `mode` (`eager`/`on_demand`). Eager sources are fetched up front and injected as {datasource_<name>} template vars; on_demand sources are loaded slice-by-slice via load_data_source(source=..., key=...).", "example": [{ "name": "customer_list", "type": "google_sheet", "mode": "eager", "spreadsheet_url": "https://docs.google.com/spreadsheets/d/abc123", "sheet_name": "Sheet1", "columns": ["Name", "Status"], "format": "markdown_table", "is_active": true }] }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"data_sources": { "description": "Google Sheets data sources attached to this template. Each ref has name, spreadsheet_url, sheet_name, columns filter, output format (markdown_table/csv/json), and is_active flag. Fetched at call dispatch and injected as {name} template vars into LLM context.", "example": [{ "name": "customer_list", "spreadsheet_url": "https://docs.google.com/spreadsheets/d/abc123", "sheet_name": "Sheet1", "columns": ["Name", "Status"], "format": "markdown_table", "is_active": true }] }

"data_sources": { "description": "Inline data sources attached to this template. Each ref includes connector `type` and loading `mode` (`eager`/`on_demand`). Eager sources are fetched up front and injected as {datasource_<name>} template vars; on_demand sources are loaded slice-by-slice via load_data_source(source=..., key=...).", "example": [{ "name": "customer_list", "type": "google_sheet", "mode": "eager", "spreadsheet_url": "https://docs.google.com/spreadsheets/d/abc123", "sheet_name": "Sheet1", "columns": ["Name", "Status"], "format": "markdown_table", "is_active": true }] }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/template/field_reference.json` at line 21, The description field for the data_sources object in field_reference.json incorrectly documents how template variables are injected at runtime. Update the description to accurately reflect the actual runtime behavior: eager data sources are injected as {datasource_<name>} template variables (not {name}), and on-demand sources are not injected wholesale into the LLM context. Correct the documentation to match the actual template variable naming convention and clarify the distinction between eager and on-demand source injection behavior.

coderabbitai · 2026-06-16T23:01:44Z

+        sheet = ref.sheet_name or "_first_"
+        cols = json.dumps(sorted(ref.columns or []))
+        cols_digest = hashlib.md5(cols.encode()).hexdigest()[:8]
+        cache_key = (
+            f"datasource:{spreadsheet_id}:{sheet}:{cols_digest}:{ref.format.value}"
+        )


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Cache key computation is duplicated from GoogleSheetConnector.cache_key.

The cache key logic here mirrors the implementation in app/services/google/sheets_connector.py:53-61. If the key format diverges, eager-loaded content won't be found by the on-demand handler (or vice versa), causing redundant fetches or stale data.

Consider delegating to the connector's cache_key method or extracting a shared helper to ensure consistency.

Proposed refactor using connector

+from app.services.data_sources import get_connector async def _fetch_one_data_source(self, ref: "DataSourceRef") -> str: if not ref.is_active: return DATA_SOURCE_UNAVAILABLE spreadsheet_id = extract_spreadsheet_id(ref.spreadsheet_url) if not spreadsheet_id: logger.warning( "Data source '%s' has malformed spreadsheet_url: %s", ref.name, ref.spreadsheet_url, ) return DATA_SOURCE_UNAVAILABLE - sheet = ref.sheet_name or "_first_" - cols = json.dumps(sorted(ref.columns or [])) - cols_digest = hashlib.md5(cols.encode()).hexdigest()[:8] - cache_key = ( - f"datasource:{spreadsheet_id}:{sheet}:{cols_digest}:{ref.format.value}" - ) + connector = get_connector(ref.type) + if connector is None: + logger.warning("Data source '%s' has unknown type '%s'", ref.name, ref.type) + return DATA_SOURCE_UNAVAILABLE + config = ref.connector_config() + cache_key = connector.cache_key(config, ref.sheet_name)

🧰 Tools

🪛 ast-grep (0.43.0)

[warning] 330-330: Do not use insecure functions
Context: hashlib.md5(cols.encode())
Note: [CWE-327] [CWE-328].

(insecure-hash-functions)

[info] 329-329: use jsonify instead of json.dumps for JSON output
Context: json.dumps(sorted(ref.columns or []))
Note: Security best practice.

(use-jsonify)

🪛 Ruff (0.15.17)

[error] 331-331: Probable use of insecure hash functions in hashlib: md5

(S324)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 329 - 334, The cache key computation in the loader.py file (the code block computing cache_key with spreadsheet_id, sheet, cols_digest, and format.value) duplicates the implementation from GoogleSheetConnector.cache_key in sheets_connector.py. To fix this, refactor the code to call the connector's cache_key method instead of recomputing the key locally, ensuring both eager-loaded and on-demand handlers use the same key generation logic and preventing cache misses or stale data when implementations diverge.

coderabbitai · 2026-06-16T23:01:44Z

+        spreadsheet_id = extract_spreadsheet_id(config.get("spreadsheet_url", ""))
+        if not spreadsheet_id:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize spreadsheet_url type before ID extraction to preserve fail-open behavior.

Lines 30, 48, and 55 pass config.get("spreadsheet_url", "") directly. If the key exists with None/non-string, extract_spreadsheet_id() can raise TypeError, bypassing your intended DATA_SOURCE_UNAVAILABLE flow.

Suggested diff

class GoogleSheetConnector: @@ async def fetch(self, config: Dict[str, Any], key: Optional[str]) -> str: - spreadsheet_id = extract_spreadsheet_id(config.get("spreadsheet_url", "")) + spreadsheet_url = config.get("spreadsheet_url") + spreadsheet_id = extract_spreadsheet_id( + spreadsheet_url if isinstance(spreadsheet_url, str) else "" + ) @@ async def list_keys(self, config: Dict[str, Any]) -> List[str]: - spreadsheet_id = extract_spreadsheet_id(config.get("spreadsheet_url", "")) + spreadsheet_url = config.get("spreadsheet_url") + spreadsheet_id = extract_spreadsheet_id( + spreadsheet_url if isinstance(spreadsheet_url, str) else "" + ) @@ def cache_key(self, config: Dict[str, Any], key: Optional[str]) -> str: + spreadsheet_url = config.get("spreadsheet_url") spreadsheet_id = ( - extract_spreadsheet_id(config.get("spreadsheet_url", "")) or "invalid_url" + extract_spreadsheet_id( + spreadsheet_url if isinstance(spreadsheet_url, str) else "" + ) + or "invalid_url" )

Also applies to: 48-49, 55-56

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/services/google/sheets_connector.py` around lines 30 - 31, The issue is that `config.get("spreadsheet_url", "")` can return None or non-string values even though a default empty string is provided, and passing these directly to `extract_spreadsheet_id()` causes a TypeError instead of gracefully handling the missing/invalid ID. Fix this at all three locations (around lines 30, 48, and 55) by normalizing the spreadsheet_url value to ensure it is always a string type before passing it to `extract_spreadsheet_id()`. Convert the config value to a string or empty string to preserve the fail-open behavior where the `if not spreadsheet_id:` checks properly trigger the DATA_SOURCE_UNAVAILABLE flow without exceptions.

coderabbitai · 2026-06-16T23:01:45Z

+            logger.warning(
+                "GoogleSheetConnector: malformed spreadsheet_url: %s",
+                config.get("spreadsheet_url"),
+            )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect logger implementation and placeholder usage patterns. fd -i 'logger.py' app/core --exec sed -n '1,220p' {} rg -n --iglob '*.py' 'logger\.(debug|info|warning|error|exception|critical)\(".*%s' app

Repository: juspay/clairvoyance

Length of output: 2367

🏁 Script executed:

cat -n app/services/google/sheets_connector.py | head -50

Repository: juspay/clairvoyance

Length of output: 2312

🏁 Script executed:

rg -n "import.*logger\|from.*logger" app/services/google/sheets_connector.py

Repository: juspay/clairvoyance

Length of output: 45

🏁 Script executed:

rg -n "logger.*=" app/core --type py | head -20

Repository: juspay/clairvoyance

Length of output: 969

🌐 Web query:

Loguru logging library printf style formatting %s interpolation Python

💡 Result:

The Loguru library does not support printf-style formatting (e.g., %s interpolation) and does not plan to add support for it [1][2][3]. Loguru is designed to prioritize modern Python practices and exclusively uses curly-brace {} style formatting, which is implemented via str.format [4][5][6]. To migrate from printf-style logging to Loguru, you must replace your log calls as follows: Standard printf-style: logger.debug("Some variable: %s", var) Loguru brace-style: logger.debug("Some variable: {}", var) This approach offers several advantages, including better performance through lazy evaluation: if the logging level is configured to ignore the message, the formatting operation is never performed [1][7]. If you have existing codebases or external libraries that rely heavily on printf-style formatting, the recommended approach is to use Loguru's InterceptHandler to redirect logs from the standard logging library to Loguru, which preserves the standard library's formatting capabilities for those specific logs [2][6]. Alternatively, you could implement a small wrapper around Loguru if strict project-wide consistency is required [6].

Citations:

1: https://github.com/Delgan/loguru/blob/master/docs/resources/migration.rst

2: Support printf-style formatting? Delgan/loguru#1151

3: Feature request: Format log messages when using % and a comma Delgan/loguru#1367

4: https://loguru.readthedocs.io/en/stable/api/logger.html

5: https://github.com/Delgan/loguru/

6: Handle %s formatted strings? Delgan/loguru#991

7: Are f-strings supported? Delgan/loguru#289

Use Loguru-style formatting here so the URL is actually logged.

Line 33 uses %s; Loguru does not support printf-style formatting and exclusively uses curly-brace {} style. With %s, the malformed URL value will not be interpolated in logs.

Suggested diff

logger.warning( - "GoogleSheetConnector: malformed spreadsheet_url: %s", + "GoogleSheetConnector: malformed spreadsheet_url: {}", config.get("spreadsheet_url"), )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.warning(

"GoogleSheetConnector: malformed spreadsheet_url: %s",

config.get("spreadsheet_url"),

)

logger.warning(

"GoogleSheetConnector: malformed spreadsheet_url: {}",

config.get("spreadsheet_url"),

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/services/google/sheets_connector.py` around lines 32 - 35, The logger.warning call in the GoogleSheetConnector class is using printf-style formatting with %s, but Loguru only supports curly-brace style formatting with {}. Replace the %s placeholder in the format string with {} to ensure the spreadsheet_url value from config.get("spreadsheet_url") is properly interpolated and logged.

Source: Coding guidelines

coderabbitai · 2026-06-16T23:01:45Z

+    format: str = "markdown_table",
+    max_rows: int = 500,
+) -> str:
+    """
+    Fetch sheet data and return as a formatted string for LLM injection.
+
+    Returns DATA_SOURCE_UNAVAILABLE on any error or empty sheet.
+    """
+    rows = await fetch_sheet_data(spreadsheet_id, sheet_name, columns, max_rows)
+    if not rows:
+        logger.warning(
+            f"No data fetched from spreadsheet={spreadsheet_id}, sheet={sheet_name}"
+        )
+        return DATA_SOURCE_UNAVAILABLE
+
+    headers = list(rows[0].keys()) if rows else []
+
+    if format == "csv":
+        return _rows_to_csv(headers, rows)
+    elif format == "json":
+        return _rows_to_json(rows)
+    else:
+        return _rows_to_markdown_table(headers, rows)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for function parameters named `format` in Python code. rg -n --iglob '*.py' 'def\s+\w+\([^)]*\bformat\s*:'

Repository: juspay/clairvoyance

Length of output: 45

🏁 Script executed:

# Find files related to sheets fd -e py sheets

Repository: juspay/clairvoyance

Length of output: 161

🏁 Script executed:

# Check if the files exist and examine the relevant code if [ -f "app/services/google/sheets.py" ]; then echo "=== File: app/services/google/sheets.py ===" wc -l "app/services/google/sheets.py" fi if [ -f "app/services/google/sheets_connector.py" ]; then echo "=== File: app/services/google/sheets_connector.py ===" wc -l "app/services/google/sheets_connector.py" fi

Repository: juspay/clairvoyance

Length of output: 238

🏁 Script executed:

# Read lines 240-275 from sheets.py to see the function definition if [ -f "app/services/google/sheets.py" ]; then sed -n '240,275p' "app/services/google/sheets.py" fi

Repository: juspay/clairvoyance

Length of output: 972

🏁 Script executed:

# Read lines 40-50 from sheets_connector.py to find the caller if [ -f "app/services/google/sheets_connector.py" ]; then sed -n '40,50p' "app/services/google/sheets_connector.py" fi

Repository: juspay/clairvoyance

Length of output: 496

🏁 Script executed:

# Check for all calls to fetch_formatted function rg -n 'fetch_formatted' --iglob '*.py'

Repository: juspay/clairvoyance

Length of output: 2855

🏁 Script executed:

# Verify there are no other parameters named 'format' in sheets.py grep -n 'format:' app/services/google/sheets.py

Repository: juspay/clairvoyance

Length of output: 103

🏁 Script executed:

# Check the other callers in breeze_buddy modules sed -n '340,355p' app/ai/voice/agents/breeze_buddy/template/loader.py

Repository: juspay/clairvoyance

Length of output: 585

🏁 Script executed:

# Check the caller in data_source_prefetch.py sed -n '55,70p' app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py

Repository: juspay/clairvoyance

Length of output: 539

🏁 Script executed:

# Verify exact line numbers in test files grep -n 'format=' tests/test_google_sheets.py | head -10

Repository: juspay/clairvoyance

Length of output: 299

🏁 Script executed:

# Double check exact lines in breeze_buddy files grep -n 'format=' app/ai/voice/agents/breeze_buddy/template/loader.py

Repository: juspay/clairvoyance

Length of output: 112

🏁 Script executed:

# Check test_data_source_connectors.py for the mock function parameters sed -n '100,115p' tests/test_data_source_connectors.py

Repository: juspay/clairvoyance

Length of output: 509

🏁 Script executed:

# Verify all the call sites comprehensively - count all format= usages rg -n 'format=' --iglob '*.py' -B 1 -A 1

Repository: juspay/clairvoyance

Length of output: 4449

Rename format parameter to avoid builtin shadowing, but update all 5 callers (not just sheets_connector.py).

The parameter shadowing issue is valid (Ruff A002), but the review is incomplete. Beyond the function definition in sheets.py (line 247) and sheets_connector.py (line 44), the format= keyword argument is used in:

tests/test_google_sheets.py (lines 432, 449, 464)

tests/test_data_source_connectors.py (line 102, mock function signature)

app/ai/voice/agents/breeze_buddy/template/loader.py (line 350)

app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py (line 63)

All 5 call sites must be updated when renaming the parameter, or the code will break with a TypeError for unexpected keyword arguments.

🧰 Tools

🪛 Ruff (0.15.17)

[error] 247-247: Function argument format is shadowing a Python builtin

(A002)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/services/google/sheets.py` around lines 247 - 269, Rename the `format` parameter to avoid shadowing Python's builtin (e.g., change it to `format_type` or `output_format`). This requires updating the parameter name in the function definition in sheets.py (around line 247), and then updating all call sites that pass the `format=` keyword argument: in sheets_connector.py, the three calls in tests/test_google_sheets.py, the mock function signature in tests/test_data_source_connectors.py, the call in app/ai/voice/agents/breeze_buddy/template/loader.py, and the call in app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py. Ensure all five call sites use the new parameter name consistently to avoid TypeError for unexpected keyword arguments.

Source: Linters/SAST tools

coderabbitai · 2026-06-16T23:01:45Z

+    proc = subprocess.run(
+        [sys.executable, "-c", code],
+        cwd=_REPO_ROOT,
+        env={**os.environ, "PYTHONPATH": _REPO_ROOT},
+        capture_output=True,
+        text=True,
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden this subprocess smoke test against hangs and lint gate failures.

The command is test-controlled, but this call can block the suite without a timeout and currently trips Ruff S603. Add an explicit timeout and annotate the intentional subprocess use.

Suggested patch

- proc = subprocess.run( + proc = subprocess.run( # noqa: S603 - test uses constant, non-user-controlled command [sys.executable, "-c", code], cwd=_REPO_ROOT, env={**os.environ, "PYTHONPATH": _REPO_ROOT}, capture_output=True, text=True, + timeout=15, )

🧰 Tools

🪛 Ruff (0.15.17)

[error] 66-66: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_data_source_connectors.py` around lines 66 - 72, The subprocess.run call in the test lacks a timeout parameter which can cause the test suite to hang indefinitely, and it also triggers the Ruff S603 security lint rule. Add an explicit timeout parameter to the subprocess.run call to prevent hangs, and add a noqa comment (# noqa: S603) to the same line to annotate that this subprocess use is intentional and suppress the lint gate failure.

Source: Linters/SAST tools

coderabbitai · 2026-06-16T23:01:45Z

+    async def _fake_fetch_formatted(*, spreadsheet_id, sheet_name, columns, format):
+        captured.update(
+            spreadsheet_id=spreadsheet_id,
+            sheet_name=sheet_name,
+            columns=columns,
+            format=format,
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid builtin shadowing in test doubles for fetch_formatted.

Using format as a parameter name triggers Ruff A002. Keep the same call contract via **kwargs and read kwargs["format"].

Suggested patch

- async def _fake_fetch_formatted(*, spreadsheet_id, sheet_name, columns, format): + async def _fake_fetch_formatted(*, spreadsheet_id, sheet_name, columns, **kwargs): + fmt = kwargs["format"] captured.update( spreadsheet_id=spreadsheet_id, sheet_name=sheet_name, columns=columns, - format=format, + format=fmt, ) return "TAB-CONTENT" @@ - async def _fake_fetch_formatted(*, spreadsheet_id, sheet_name, columns, format): + async def _fake_fetch_formatted(*, spreadsheet_id, sheet_name, columns, **kwargs): captured["sheet_name"] = sheet_name return "X"

Also applies to: 131-133

🧰 Tools

🪛 Ruff (0.15.17)

[error] 102-102: Function argument format is shadowing a Python builtin

(A002)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_data_source_connectors.py` around lines 102 - 108, The _fake_fetch_formatted function uses `format` as a parameter name, which shadows Python's builtin function and triggers Ruff A002. Refactor the function signature to accept **kwargs instead of named parameters (spreadsheet_id, sheet_name, columns, format), then update the captured.update() call to read values from kwargs using keys like kwargs["format"], kwargs["spreadsheet_id"], etc. Apply the same fix to the other occurrence mentioned at lines 131-133.

Source: Linters/SAST tools

Copilot AI review requested due to automatic review settings June 16, 2026 22:49

Copilot AI reviewed Jun 16, 2026

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

		spreadsheet_id = extract_spreadsheet_id(config.get("spreadsheet_url", ""))
		if not spreadsheet_id:

Conversation

cmd-err commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmd-err commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading