Add caching to Google Sheets API calls to prevent rate limiting#1002
Conversation
When updating multiple rows in Google Sheets, the find_sheet_id_by_name() method was being called repeatedly for the same spreadsheet and sheet, causing excessive API calls and hitting the Google Sheets API rate limit of 60 read requests per minute. This commit adds an in-memory cache to store sheet IDs, significantly reducing the number of API calls. The cache is session-scoped and automatically invalidated when new sheets are created. Also fixes a typo in error logging (logger.into -> logger.info). Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughAdds an in-memory cache for Google Sheets sheetIds keyed by ChangesSheet ID Caching and Retry Behavior
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cloud_governance/common/google_drive/google_drive_operations.py (1)
48-52: ⚡ Quick winPopulate cache from create response instead of deleting key.
After
addSheet, you already have authoritative response data; deleting the cache key forces an extraspreadsheets().get(...sheets.properties)read on next operation. Cache the newsheetIddirectly frombatchUpdate(...).execute()replies to reduce read pressure further.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cloud_governance/common/google_drive/google_drive_operations.py` around lines 48 - 52, The code deletes a cached sheet id after calling self.__service.spreadsheets().batchUpdate(...).execute(), causing an extra read; instead parse the batchUpdate response (the result of self.__service.spreadsheets().batchUpdate(...).execute()) to extract the new sheetId for the created sheet and store it in self.__sheet_id_cache under the cache_key (f"{gsheet_id}:{sheet_name}") so subsequent operations use the authoritative id; update the logic around batchUpdate call in google_drive_operations.py to inspect the execute() response for the added sheet's properties and assign that sheetId into __sheet_id_cache rather than deleting the key.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 112-115: find_sheet_id_by_name currently returns cached sheet IDs
from self.__sheet_id_cache unconditionally which can serve stale sheetId values;
modify it to detect and recover from "sheet not found" failures by evicting the
cache entry and retrying the lookup once (or implement a bounded TTL on
entries), and apply the same change to the sibling cache access at the other
location referenced (lines ~123-125) so callers like delete_rows,
update_row_in_gsheet, and paste/update flows will retry with a fresh lookup when
a cached id proves invalid.
---
Nitpick comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 48-52: The code deletes a cached sheet id after calling
self.__service.spreadsheets().batchUpdate(...).execute(), causing an extra read;
instead parse the batchUpdate response (the result of
self.__service.spreadsheets().batchUpdate(...).execute()) to extract the new
sheetId for the created sheet and store it in self.__sheet_id_cache under the
cache_key (f"{gsheet_id}:{sheet_name}") so subsequent operations use the
authoritative id; update the logic around batchUpdate call in
google_drive_operations.py to inspect the execute() response for the added
sheet's properties and assign that sheetId into __sheet_id_cache rather than
deleting the key.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 16fa2d47-e838-44f4-a015-1e436f19af35
📒 Files selected for processing (1)
cloud_governance/common/google_drive/google_drive_operations.py
Addresses potential issue where cached sheet IDs can become stale if sheets are renamed/deleted/recreated by external processes during a job run. This would cause operations to fail with "sheet not found" errors with no recovery mechanism. Changes: - Added _evict_sheet_cache() method to remove stale cache entries - Added use_cache parameter to find_sheet_id_by_name() to bypass cache - Wrapped delete_rows(), update_row_in_gsheet(), and paste_csv_to_gsheet() with retry logic that: 1. Attempts operation with cached sheet ID 2. On "not found" errors, evicts cache and retries once with fresh lookup 3. Fails gracefully if second attempt also fails This defensive approach prevents complete job failures when sheet metadata changes during execution. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
cloud_governance/common/google_drive/google_drive_operations.py (1)
173-174:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStale-sheet retry matching is too narrow and can miss real stale
sheetIdfailures.These guards only retry on
"Unable to parse range"/"not found". Stale cached IDs from Sheets often fail as"No grid with id", which would skip eviction+retry and defeat the cache-recovery path.Suggested fix
+ `@staticmethod` + def _is_stale_sheet_error(error: HttpError) -> bool: + msg = str(error).lower() + return any(token in msg for token in ('not found', 'unable to parse range', 'no grid with id')) + def delete_rows(self, spreadsheet_id: str, sheet_name: str, row_number: int): @@ - except HttpError as error: - if attempt == 0 and ('Unable to parse range' in str(error) or 'not found' in str(error).lower()): + except HttpError as error: + if attempt == 0 and self._is_stale_sheet_error(error): @@ def paste_csv_to_gsheet(self, csv_path, spreadsheet_id: str, sheet_name: str): @@ - except HttpError as error: - if attempt == 0 and ('Unable to parse range' in str(error) or 'not found' in str(error).lower()): + except HttpError as error: + if attempt == 0 and self._is_stale_sheet_error(error): @@ def update_row_in_gsheet(self, data: list, gsheet_id: str, row: int, sheet_name: str): @@ - except HttpError as error: - if attempt == 0 and ('Unable to parse range' in str(error) or 'not found' in str(error).lower()): + except HttpError as error: + if attempt == 0 and self._is_stale_sheet_error(error):#!/bin/bash # Verify stale-id error matching logic and whether "No grid with id" is currently handled. rg -n -C2 "Unable to parse range|not found|No grid with id|_is_stale_sheet_error" cloud_governance/common/google_drive/google_drive_operations.pyAlso applies to: 219-220, 253-254
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cloud_governance/common/google_drive/google_drive_operations.py` around lines 173 - 174, The stale-sheet retry guard is too narrow—update the checks around self._evict_sheet_cache (used where you currently match 'Unable to parse range' or 'not found') to also detect "No grid with id" and other sheet-id stale messages by adding a small helper like _is_stale_sheet_error(self, error) (case-insensitive, matches 'unable to parse range', 'not found', 'no grid with id', etc.), then call that helper in the existing conditionals (the blocks that call self._evict_sheet_cache at the current occurrences) so eviction+retry will run for all stale-sheet error variants; apply the same change to the other two occurrences noted in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 177-178: The delete_rows function currently catches a second
failure, logs it with logger.info and uses break which suppresses the error and
lets callers assume deletion succeeded; change this to log at an appropriate
level (error) and re-raise the exception (or raise a new descriptive exception)
instead of breaking so failures propagate to callers. Locate delete_rows and
replace the logger.info/following break behavior with code that logs the error
(including the caught exception) and then raise the original exception (or a
wrapped one) so downstream code cannot silently continue after a failed delete.
- Around line 205-206: The Sheets batchUpdate payload is using string values for
"rowIndex" and "columnIndex" (e.g., "0" / '0'); change these to integer values
(0) where the payload/gridRange is constructed so the JSON sends numbers not
strings; locate the places in google_drive_operations.py that build the
batchUpdate/gridRange objects (look for "rowIndex" and "columnIndex" keys around
the shown diff) and replace the quoted string values with unquoted numeric
literals (0) so the API receives numeric indexes.
---
Duplicate comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 173-174: The stale-sheet retry guard is too narrow—update the
checks around self._evict_sheet_cache (used where you currently match 'Unable to
parse range' or 'not found') to also detect "No grid with id" and other sheet-id
stale messages by adding a small helper like _is_stale_sheet_error(self, error)
(case-insensitive, matches 'unable to parse range', 'not found', 'no grid with
id', etc.), then call that helper in the existing conditionals (the blocks that
call self._evict_sheet_cache at the current occurrences) so eviction+retry will
run for all stale-sheet error variants; apply the same change to the other two
occurrences noted in the comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: cc8b99a0-610a-45c6-bc42-eff530dbed33
📒 Files selected for processing (1)
cloud_governance/common/google_drive/google_drive_operations.py
Type of change
Note: Fill x in []
Description
When updating multiple rows in Google Sheets, the find_sheet_id_by_name() method was being called repeatedly for the same spreadsheet and sheet, causing excessive API calls and hitting the Google Sheets API rate limit of 60 read requests per minute.
This commit adds an in-memory cache to store sheet IDs, significantly reducing the number of API calls. The cache is session-scoped and automatically invalidated when new sheets are created.
Also fixes a typo in error logging (logger.into -> logger.info).
For security reasons, all pull requests need to be approved first before running any automated CI
Assisted-by: Cursor