feat: implemented datasources for google sheet#829
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR introduces a Data Sources feature for Breeze Buddy: a new ChangesData Sources Feature — Breeze Buddy
Sequence Diagram(s)sequenceDiagram
rect rgba(100, 150, 255, 0.5)
Note over dispatch_worker,Redis: Dispatch time (background)
dispatch_worker->>data_source_prefetch: create_task(prefetch_data_sources(lead_id, template))
data_source_prefetch->>DataSourceAccessor: get_data_source_by_id(ref.data_source_id)
DataSourceAccessor-->>data_source_prefetch: DataSourceResponse
data_source_prefetch->>sheets_service: fetch_formatted(spreadsheet_id, ...) [5s timeout]
sheets_service-->>data_source_prefetch: formatted content
data_source_prefetch->>Redis: setex("ds:{lead_id}:{name}", 300, content)
end
rect rgba(100, 200, 150, 0.5)
Note over FlowConfigLoader,agent_flow: Call setup time
agent_flow->>FlowConfigLoader: load_template(lead_id=lead.id)
FlowConfigLoader->>Redis: GET "ds:{lead_id}:{ref.name}"
Redis-->>FlowConfigLoader: cached content (or miss → fetch_formatted [800ms timeout])
FlowConfigLoader->>FlowConfigLoader: inject into template_vars or _data_source_messages
FlowConfigLoader-->>agent_flow: template with _data_source_messages
agent_flow->>agent_flow: build_flow_config → copies _data_source_messages
agent_flow->>agent_flow: prepare_initial_node → prepends to task_messages
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (6)
app/ai/voice/agents/breeze_buddy/template/loader.py (1)
295-322: ⚡ Quick winAdd
strict=Truetozip()for defensive iteration.Both
template_obj.data_sourcesandcontentsshould always have the same length sincecontentsis produced byasyncio.gatherover the same list. Addingstrict=Truemakes this contract explicit and will raise aValueErrorif the lengths ever mismatch due to a bug, rather than silently dropping items.♻️ Proposed fix
- for ref, result in zip(template_obj.data_sources, contents): + for ref, result in zip(template_obj.data_sources, contents, strict=True):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 295 - 322, In the data source injection section, the zip() call that iterates over template_obj.data_sources and contents should include strict=True as a parameter to make the length contract explicit and prevent silent dropping of items if the lists ever mismatch. Add strict=True to the zip() call in the for loop where ref and result are unpacked.Source: Linters/SAST tools
app/services/google/sheets.py (2)
72-90: 💤 Low valueConsider caching the authenticated session to reduce overhead.
Each call to
list_tabs,get_column_headers, andfetch_sheet_datacreates a newAuthorizedSessionvia_get_sheets_session(). Since the service account credentials don't change at runtime, a module-level cached session (with lazy initialization) would reduce credential parsing and object creation overhead on repeated calls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/services/google/sheets.py` around lines 72 - 90, Implement module-level caching for the authenticated session to avoid creating a new session on every call. Create a module-level variable to store a cached AuthorizedSession instance initialized to None, then modify the session retrieval logic (likely in or around _get_sheets_session) to implement lazy initialization: check if the cached session exists, and if not, create it once and store it; otherwise return the cached session. Apply this cached session across all functions that currently call _get_sheets_session() such as list_tabs, get_column_headers, and fetch_sheet_data, ensuring they all reuse the same authenticated session instance rather than creating new ones on each call.
217-244: ⚡ Quick winRename
formatparameter to avoid shadowing the Python builtin.The
formatparameter shadows Python's built-informat()function. While this doesn't cause a bug here, it's poor practice and trips static analysis.♻️ Proposed fix
async def fetch_formatted( spreadsheet_id: str, sheet_name: Optional[str] = None, columns: Optional[List[str]] = None, - format: str = "markdown_table", + output_format: str = "markdown_table", max_rows: int = 500, ) -> str: """ Fetch sheet data and return as a formatted string for LLM injection. Returns DATA_SOURCE_UNAVAILABLE on any error or empty sheet. """ rows = await fetch_sheet_data(spreadsheet_id, sheet_name, columns, max_rows) if not rows: logger.warning( f"No data fetched from spreadsheet={spreadsheet_id}, sheet={sheet_name}" ) return DATA_SOURCE_UNAVAILABLE headers = list(rows[0].keys()) if rows else [] - if format == "csv": + if output_format == "csv": return _rows_to_csv(headers, rows) - elif format == "json": + elif output_format == "json": return _rows_to_json(rows) else: return _rows_to_markdown_table(headers, rows)Note: Update callers (e.g.,
loader.pyline 156) to useoutput_format=accordingly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/services/google/sheets.py` around lines 217 - 244, The `format` parameter in the `fetch_formatted` function shadows Python's built-in `format()` function, which is poor practice and triggers static analysis warnings. Rename the `format` parameter to `output_format` in the function signature, then update all references to this parameter within the function body (the conditionals checking `format == "csv"`, `format == "json"`, etc.) to use `output_format` instead. Additionally, update all callers of the `fetch_formatted` function to pass the parameter as `output_format=` instead of `format=`.Source: Linters/SAST tools
app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py (1)
22-23: ⚡ Quick winMove prefetch TTL/timeout constants to static config.
These are environment-level tuning knobs and should be sourced from
app/core/config/static.pyinstead of hardcoded file constants.As per coding guidelines, “Load ALL configuration from
app/core/config/static.pyusingget_required_env()for mandatory variables; never import directly fromos.environelsewhere.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py` around lines 22 - 23, The constants _CACHE_TTL and _FETCH_TIMEOUT are hardcoded in the data_source_prefetch.py file instead of being sourced from centralized configuration. Move these constants to app/core/config/static.py using the get_required_env() pattern (or appropriate config loading for tuning parameters), then import and use those configuration values in data_source_prefetch.py instead of the hardcoded definitions. This ensures all environment-level configuration follows the standard pattern of loading from app/core/config/static.py.Source: Coding guidelines
app/api/routers/breeze_buddy/templates/handlers.py (1)
45-45: ⚡ Quick winTighten the helper type annotation for data source refs.
Use a parameterized container type instead of
Optional[List](for exampleOptional[List[DataSourceRef]]) so static checks enforce the expected ref shape.As per coding guidelines, use
Optional[T], List[T], Dict[str, Any], Unionfor type hints.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/api/routers/breeze_buddy/templates/handlers.py` at line 45, The `data_sources` parameter on line 45 uses a loose `Optional[List]` type annotation that does not specify the element type, which prevents static type checkers from enforcing the expected shape. Replace `Optional[List]` with `Optional[List[DataSourceRef]]` to explicitly specify that the list contains DataSourceRef objects, enabling proper type checking and enforcement as per the coding guidelines.Source: Coding guidelines
app/ai/voice/agents/breeze_buddy/template/types.py (1)
2012-2013: ⚡ Quick winUse UUID type for
data_source_idat the schema boundary.
data_source_idis documented as UUID but typed asstr, so malformed IDs can slip through request validation and fail later in DB/accessor paths.Proposed change
+from uuid import UUID ... class DataSourceRef(BaseModel): ... - data_source_id: str = Field(description="UUID of the data_source entity") + data_source_id: UUID = Field(description="UUID of the data_source entity")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/template/types.py` around lines 2012 - 2013, The data_source_id field in the schema is typed as str but documented as UUID, which allows invalid ID formats to pass validation. Change the type annotation of the data_source_id field from str to UUID to enforce proper validation at the schema boundary, ensuring only valid UUIDs are accepted during request validation rather than failing later in database or accessor operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/ai/voice/agents/breeze_buddy/agent/flow.py`:
- Around line 149-151: The `_data_source_messages` variable retrieved from the
untyped runtime data `template.flow` is not validated for its shape before being
assigned to `flow_config["_data_source_messages"]`. Since this data is later
used in a concatenation operation at line 207 (ds_messages + task_messages), it
must be a list of message dicts. Add validation after retrieving `ds_messages`
to ensure it is a list before propagating it to `flow_config`, otherwise the
code will fail when attempting to concatenate it with task_messages.
In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py`:
- Around line 63-70: The `redis.setex()` call in the prefetch operation does not
validate its return value before logging success. Since `RedisService.setex()`
returns `False` on Redis write failures, the current code logs a success message
even when the cache write failed, which can mask prefetch issues. Capture the
boolean return value from the `await redis.setex(cache_key, content,
ttl_seconds=_CACHE_TTL)` call and only proceed with the logger.info success
message if the operation returned `True`. If the operation returns `False`, log
an error message instead to properly reflect the failure and enable accurate
operational debugging.
In `@app/ai/voice/agents/breeze_buddy/template/loader.py`:
- Around line 99-127: The `redis.get(cache_key)` call in the
`_fetch_data_source_content` method is missing the `namespace` parameter
required by coding guidelines to prevent key collisions across services. Add the
`namespace` parameter to the `redis.get()` call to include it alongside the
cache_key argument. Additionally, verify that the prefetch manager in the
dependent layer (data_source_prefetch.py) uses the same namespace value when
writing to Redis to ensure consistency across both read and write operations.
In `@app/api/routers/breeze_buddy/data_sources/handlers.py`:
- Around line 226-283: The three handler functions `list_tabs_handler`,
`list_columns_handler`, and `preview_handler` lack tenant-scoped authorization
checks, allowing authenticated users to potentially access Google Sheets outside
their tenant. Add caller context (user/tenant information) as a parameter to
each of these handler functions, then enforce reseller-scoped authorization
checks before calling the underlying sheet access functions (`list_tabs`,
`get_column_headers`, and `fetch_sheet_data` respectively). Verify that the
spreadsheet belongs to the user's tenant or is accessible to them before
proceeding with the sheet data operations.
In `@app/database/queries/breeze_buddy/data_source.py`:
- Line 14: The parameter name `id` shadows Python's built-in `id` function,
triggering Ruff A002 linting errors. Rename all occurrences of the `id`
parameter to `data_source_id` in the function signatures and update all
corresponding call sites throughout the file. This applies to the parameter at
line 14 and all other affected locations (lines 23 and 105 as mentioned) where
`id` is used as a parameter name. Ensure both the function/method signatures and
any internal references to this parameter are updated consistently.
- Around line 116-126: The current `is not None` guards in the update builder
prevent clients from explicitly resetting nullable fields like `sheet_name` and
`columns` to NULL because there is no way to distinguish between "parameter was
not provided" and "parameter was explicitly set to None". Use a sentinel value
(e.g., a module-level `_UNSET` object) as the default for all optional
parameters instead of None, then check against that sentinel value in the
conditional guards (e.g., `if sheet_name is not _UNSET` instead of `if
sheet_name is not None`). This allows clients to pass None explicitly to clear a
field to NULL while still omitting the parameter to leave it unchanged.
---
Nitpick comments:
In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py`:
- Around line 22-23: The constants _CACHE_TTL and _FETCH_TIMEOUT are hardcoded
in the data_source_prefetch.py file instead of being sourced from centralized
configuration. Move these constants to app/core/config/static.py using the
get_required_env() pattern (or appropriate config loading for tuning
parameters), then import and use those configuration values in
data_source_prefetch.py instead of the hardcoded definitions. This ensures all
environment-level configuration follows the standard pattern of loading from
app/core/config/static.py.
In `@app/ai/voice/agents/breeze_buddy/template/loader.py`:
- Around line 295-322: In the data source injection section, the zip() call that
iterates over template_obj.data_sources and contents should include strict=True
as a parameter to make the length contract explicit and prevent silent dropping
of items if the lists ever mismatch. Add strict=True to the zip() call in the
for loop where ref and result are unpacked.
In `@app/ai/voice/agents/breeze_buddy/template/types.py`:
- Around line 2012-2013: The data_source_id field in the schema is typed as str
but documented as UUID, which allows invalid ID formats to pass validation.
Change the type annotation of the data_source_id field from str to UUID to
enforce proper validation at the schema boundary, ensuring only valid UUIDs are
accepted during request validation rather than failing later in database or
accessor operations.
In `@app/api/routers/breeze_buddy/templates/handlers.py`:
- Line 45: The `data_sources` parameter on line 45 uses a loose `Optional[List]`
type annotation that does not specify the element type, which prevents static
type checkers from enforcing the expected shape. Replace `Optional[List]` with
`Optional[List[DataSourceRef]]` to explicitly specify that the list contains
DataSourceRef objects, enabling proper type checking and enforcement as per the
coding guidelines.
In `@app/services/google/sheets.py`:
- Around line 72-90: Implement module-level caching for the authenticated
session to avoid creating a new session on every call. Create a module-level
variable to store a cached AuthorizedSession instance initialized to None, then
modify the session retrieval logic (likely in or around _get_sheets_session) to
implement lazy initialization: check if the cached session exists, and if not,
create it once and store it; otherwise return the cached session. Apply this
cached session across all functions that currently call _get_sheets_session()
such as list_tabs, get_column_headers, and fetch_sheet_data, ensuring they all
reuse the same authenticated session instance rather than creating new ones on
each call.
- Around line 217-244: The `format` parameter in the `fetch_formatted` function
shadows Python's built-in `format()` function, which is poor practice and
triggers static analysis warnings. Rename the `format` parameter to
`output_format` in the function signature, then update all references to this
parameter within the function body (the conditionals checking `format == "csv"`,
`format == "json"`, etc.) to use `output_format` instead. Additionally, update
all callers of the `fetch_formatted` function to pass the parameter as
`output_format=` instead of `format=`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3f3ba705-d8c2-414a-b20f-00c678b7eeed
📒 Files selected for processing (21)
app/ai/voice/agents/breeze_buddy/agent/flow.pyapp/ai/voice/agents/breeze_buddy/dispatch/worker.pyapp/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.pyapp/ai/voice/agents/breeze_buddy/template/loader.pyapp/ai/voice/agents/breeze_buddy/template/types.pyapp/api/routers/breeze_buddy/__init__.pyapp/api/routers/breeze_buddy/data_sources/__init__.pyapp/api/routers/breeze_buddy/data_sources/handlers.pyapp/api/routers/breeze_buddy/templates/handlers.pyapp/database/accessor/breeze_buddy/data_source.pyapp/database/accessor/breeze_buddy/template.pyapp/database/decoder/breeze_buddy/data_source.pyapp/database/decoder/breeze_buddy/template.pyapp/database/migrations/033_create_data_source_table.sqlapp/database/migrations/034_add_data_sources_column_to_template.sqlapp/database/queries/breeze_buddy/data_source.pyapp/database/queries/breeze_buddy/template.pyapp/schemas/breeze_buddy/data_source.pyapp/services/data_sources.pyapp/services/google/__init__.pyapp/services/google/sheets.py
| ds_messages = template.flow.get("_data_source_messages") | ||
| if ds_messages: | ||
| flow_config["_data_source_messages"] = ds_messages |
There was a problem hiding this comment.
Validate _data_source_messages shape before propagating it.
template.flow is untyped runtime data. If _data_source_messages is not a list of message dicts, it will reach Line 207 and raise at ds_messages + task_messages, failing node preparation.
💡 Suggested patch
- ds_messages = template.flow.get("_data_source_messages")
- if ds_messages:
- flow_config["_data_source_messages"] = ds_messages
+ raw_ds_messages = template.flow.get("_data_source_messages")
+ if isinstance(raw_ds_messages, list):
+ ds_messages = [
+ msg
+ for msg in raw_ds_messages
+ if isinstance(msg, dict)
+ and msg.get("role") == "system"
+ and isinstance(msg.get("content"), str)
+ ]
+ if ds_messages:
+ flow_config["_data_source_messages"] = ds_messages🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/ai/voice/agents/breeze_buddy/agent/flow.py` around lines 149 - 151, The
`_data_source_messages` variable retrieved from the untyped runtime data
`template.flow` is not validated for its shape before being assigned to
`flow_config["_data_source_messages"]`. Since this data is later used in a
concatenation operation at line 207 (ds_messages + task_messages), it must be a
list of message dicts. Add validation after retrieving `ds_messages` to ensure
it is a list before propagating it to `flow_config`, otherwise the code will
fail when attempting to concatenate it with task_messages.
| await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL) | ||
| logger.info( | ||
| "Prefetched data source '%s' for lead=%s (%d chars, TTL=%ds)", | ||
| ref.name, | ||
| lead_id, | ||
| len(content), | ||
| _CACHE_TTL, | ||
| ) |
There was a problem hiding this comment.
Check Redis setex outcome before logging prefetch success.
RedisService.setex() returns False on Redis write failures; the current path logs success even when the cache write did not persist. This can mask prefetch misses and mislead operational debugging.
💡 Suggested patch
redis = await get_redis_service()
- await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)
+ written = await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)
+ if not written:
+ logger.warning(
+ "Prefetch cache write failed for data source '%s', lead=%s",
+ ref.name,
+ lead_id,
+ )
+ return
logger.info(
"Prefetched data source '%s' for lead=%s (%d chars, TTL=%ds)",
ref.name,
lead_id,
len(content),
_CACHE_TTL,
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py` around
lines 63 - 70, The `redis.setex()` call in the prefetch operation does not
validate its return value before logging success. Since `RedisService.setex()`
returns `False` on Redis write failures, the current code logs a success message
even when the cache write failed, which can mask prefetch issues. Capture the
boolean return value from the `await redis.setex(cache_key, content,
ttl_seconds=_CACHE_TTL)` call and only proceed with the logger.info success
message if the operation returned `True`. If the operation returns `False`, log
an error message instead to properly reflect the failure and enable accurate
operational debugging.
| async def _fetch_data_source_content( | ||
| self, | ||
| lead_id: Optional[str], | ||
| ref: DataSourceRef, | ||
| template_obj: TemplateModel, | ||
| ) -> str: | ||
| """ | ||
| Fetch formatted sheet content for a DataSourceRef. | ||
|
|
||
| Priority: | ||
| 1. Redis cache (key ``datasource:{lead_id}:{ref.name}``, pre-warmed by prefetch manager) | ||
| 2. Live Google Sheets fetch with 800 ms timeout | ||
| 3. Fallback: DATA_SOURCE_UNAVAILABLE | ||
| """ | ||
| # 1. Redis cache check (requires lead_id) | ||
| if lead_id: | ||
| cache_key = f"datasource:{lead_id}:{ref.name}" | ||
| try: | ||
| redis = await get_redis_service() | ||
| cached = await redis.get(cache_key) | ||
| if cached: | ||
| logger.info( | ||
| "Data source cache hit: lead=%s name=%s", lead_id, ref.name | ||
| ) | ||
| return cached | ||
| except Exception as exc: | ||
| logger.warning( | ||
| "Redis cache check failed for datasource '%s': %s", ref.name, exc | ||
| ) |
There was a problem hiding this comment.
Missing namespace parameter in Redis get() call.
The coding guidelines require using the namespace parameter in redis_get/redis_set calls to prevent key collisions across services. The cache key datasource:{lead_id}:{ref.name} should use a namespace.
🐛 Proposed fix
if lead_id:
cache_key = f"datasource:{lead_id}:{ref.name}"
try:
redis = await get_redis_service()
- cached = await redis.get(cache_key)
+ cached = await redis.get(cache_key, namespace="breeze_buddy")
if cached:
logger.info(
"Data source cache hit: lead=%s name=%s", lead_id, ref.name
)
return cachedAlso ensure the prefetch manager (in the dependent layer data_source_prefetch.py) uses the same namespace when writing to Redis.
Based on coding guidelines: "Always use namespace parameter in redis_get/redis_set calls to prevent key collisions across services"
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 124-124: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 99 - 127,
The `redis.get(cache_key)` call in the `_fetch_data_source_content` method is
missing the `namespace` parameter required by coding guidelines to prevent key
collisions across services. Add the `namespace` parameter to the `redis.get()`
call to include it alongside the cache_key argument. Additionally, verify that
the prefetch manager in the dependent layer (data_source_prefetch.py) uses the
same namespace value when writing to Redis to ensure consistency across both
read and write operations.
Source: Coding guidelines
| async def list_tabs_handler(spreadsheet_url: str) -> TabsResponse: | ||
| """List tabs in a Google Sheet.""" | ||
| spreadsheet_id = extract_spreadsheet_id(spreadsheet_url) | ||
| if not spreadsheet_id: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Invalid Google Sheets URL", | ||
| ) | ||
|
|
||
| tabs = await list_tabs(spreadsheet_id) | ||
| return TabsResponse(spreadsheet_id=spreadsheet_id, tabs=tabs) | ||
|
|
||
|
|
||
| async def list_columns_handler( | ||
| spreadsheet_url: str, sheet_name: Optional[str] = None | ||
| ) -> ColumnsResponse: | ||
| """List column headers for a sheet tab.""" | ||
| spreadsheet_id = extract_spreadsheet_id(spreadsheet_url) | ||
| if not spreadsheet_id: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Invalid Google Sheets URL", | ||
| ) | ||
|
|
||
| columns = await get_column_headers(spreadsheet_id, sheet_name) | ||
| return ColumnsResponse( | ||
| spreadsheet_id=spreadsheet_id, | ||
| sheet_name=sheet_name or "", | ||
| columns=columns, | ||
| ) | ||
|
|
||
|
|
||
| async def preview_handler( | ||
| spreadsheet_url: str, | ||
| sheet_name: Optional[str] = None, | ||
| columns: Optional[List[str]] = None, | ||
| max_rows: int = 10, | ||
| ) -> PreviewResponse: | ||
| """Preview sheet data (first N rows).""" | ||
| spreadsheet_id = extract_spreadsheet_id(spreadsheet_url) | ||
| if not spreadsheet_id: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Invalid Google Sheets URL", | ||
| ) | ||
|
|
||
| rows = await fetch_sheet_data( | ||
| spreadsheet_id, sheet_name, columns, max_rows=max_rows | ||
| ) | ||
| col_names = list(rows[0].keys()) if rows else [] | ||
|
|
||
| return PreviewResponse( | ||
| spreadsheet_id=spreadsheet_id, | ||
| sheet_name=sheet_name, | ||
| columns=col_names, | ||
| rows=rows, | ||
| total_rows=len(rows), | ||
| ) |
There was a problem hiding this comment.
Enforce tenant-scoped authorization on discovery handlers.
list_tabs_handler, list_columns_handler, and preview_handler fetch Google Sheet content from user-supplied URLs without any reseller/tenant scope check. With shared backend Google credentials, this can let an authenticated user read sheets outside their tenant if they know a valid URL.
Please require caller context in these handlers and enforce reseller-scoped authorization (or temporarily admin-only access) before performing sheet reads.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/api/routers/breeze_buddy/data_sources/handlers.py` around lines 226 -
283, The three handler functions `list_tabs_handler`, `list_columns_handler`,
and `preview_handler` lack tenant-scoped authorization checks, allowing
authenticated users to potentially access Google Sheets outside their tenant.
Add caller context (user/tenant information) as a parameter to each of these
handler functions, then enforce reseller-scoped authorization checks before
calling the underlying sheet access functions (`list_tabs`,
`get_column_headers`, and `fetch_sheet_data` respectively). Verify that the
spreadsheet belongs to the user's tenant or is accessible to them before
proceeding with the sheet data operations.
|
|
||
|
|
||
| def insert_data_source_query( | ||
| id: str, |
There was a problem hiding this comment.
Rename parameters that shadow Python builtins (id, format).
These names trigger Ruff A002 and will keep lint noisy/blocking depending on CI policy. Prefer data_source_id / output_format in signatures and call sites.
Based on learnings from static analysis, Ruff is already flagging these exact lines.
Also applies to: 23-23, 105-105
🧰 Tools
🪛 Ruff (0.15.15)
[error] 14-14: Function argument id is shadowing a Python builtin
(A002)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/database/queries/breeze_buddy/data_source.py` at line 14, The parameter
name `id` shadows Python's built-in `id` function, triggering Ruff A002 linting
errors. Rename all occurrences of the `id` parameter to `data_source_id` in the
function signatures and update all corresponding call sites throughout the file.
This applies to the parameter at line 14 and all other affected locations (lines
23 and 105 as mentioned) where `id` is used as a parameter name. Ensure both the
function/method signatures and any internal references to this parameter are
updated consistently.
Source: Linters/SAST tools
| if name is not None: | ||
| _add("name", name) | ||
| if spreadsheet_url is not None: | ||
| _add("spreadsheet_url", spreadsheet_url) | ||
| if spreadsheet_id is not None: | ||
| _add("spreadsheet_id", spreadsheet_id) | ||
| if sheet_name is not None: | ||
| _add("sheet_name", sheet_name) | ||
| if columns_json is not None: | ||
| _add("columns", columns_json, "::jsonb") | ||
| if format is not None: |
There was a problem hiding this comment.
Update builder cannot clear nullable fields to NULL.
The current is not None guards mean clients cannot reset sheet_name/columns back to NULL (even when explicitly requested), which breaks the API contract for reverting to “first tab/all columns” behavior.
Suggested direction
def update_data_source_query(
data_source_id: str,
name: Optional[str],
spreadsheet_url: Optional[str],
spreadsheet_id: Optional[str],
sheet_name: Optional[str],
columns_json: Optional[str],
format: Optional[str],
is_active: Optional[bool],
now: datetime,
+ fields_to_update: set[str],
) -> Tuple[str, List[Any]]:
...
- if sheet_name is not None:
+ if "sheet_name" in fields_to_update:
_add("sheet_name", sheet_name)
- if columns_json is not None:
+ if "columns" in fields_to_update:
_add("columns", columns_json, "::jsonb")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/database/queries/breeze_buddy/data_source.py` around lines 116 - 126, The
current `is not None` guards in the update builder prevent clients from
explicitly resetting nullable fields like `sheet_name` and `columns` to NULL
because there is no way to distinguish between "parameter was not provided" and
"parameter was explicitly set to None". Use a sentinel value (e.g., a
module-level `_UNSET` object) as the default for all optional parameters instead
of None, then check against that sentinel value in the conditional guards (e.g.,
`if sheet_name is not _UNSET` instead of `if sheet_name is not None`). This
allows clients to pass None explicitly to clear a field to NULL while still
omitting the parameter to leave it unchanged.
…s_messages isolation
Summary by CodeRabbit
Release Notes
New Features