Skip to content

Add caching to Google Sheets API calls to prevent rate limiting#1002

Merged
pragya811 merged 3 commits into
mainfrom
jenkins_fix3
Jun 10, 2026
Merged

Add caching to Google Sheets API calls to prevent rate limiting#1002
pragya811 merged 3 commits into
mainfrom
jenkins_fix3

Conversation

@pragya811

@pragya811 pragya811 commented Jun 10, 2026

Copy link
Copy Markdown
Member

Type of change

Note: Fill x in []

  • bug
  • enhancement
  • documentation
  • dependencies

Description

When updating multiple rows in Google Sheets, the find_sheet_id_by_name() method was being called repeatedly for the same spreadsheet and sheet, causing excessive API calls and hitting the Google Sheets API rate limit of 60 read requests per minute.

This commit adds an in-memory cache to store sheet IDs, significantly reducing the number of API calls. The cache is session-scoped and automatically invalidated when new sheets are created.

Also fixes a typo in error logging (logger.into -> logger.info).

For security reasons, all pull requests need to be approved first before running any automated CI

Assisted-by: Cursor

When updating multiple rows in Google Sheets, the find_sheet_id_by_name()
method was being called repeatedly for the same spreadsheet and sheet,
causing excessive API calls and hitting the Google Sheets API rate limit
of 60 read requests per minute.

This commit adds an in-memory cache to store sheet IDs, significantly
reducing the number of API calls. The cache is session-scoped and
automatically invalidated when new sheets are created.

Also fixes a typo in error logging (logger.into -> logger.info).

Co-authored-by: Cursor <cursoragent@cursor.com>
@pragya811 pragya811 self-assigned this Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds an in-memory cache for Google Sheets sheetIds keyed by spreadsheet_id:sheet_name, a cache-eviction helper, invalidates cache after creating worksheets, and applies retry-with-eviction logic (and a logging fix) to several sheet-modifying methods.

Changes

Sheet ID Caching and Retry Behavior

Layer / File(s) Summary
Cache initialization and lookup
cloud_governance/common/google_drive/google_drive_operations.py
Initializes an in-memory sheet ID cache in __init__; find_sheet_id_by_name(..., use_cache: bool = True) returns cached IDs when present, otherwise fetches sheet metadata and stores the sheetId in the cache.
Cache eviction helper
cloud_governance/common/google_drive/google_drive_operations.py
Adds _evict_sheet_cache(sheet_name, spreadsheet_id) to remove a cached sheet ID entry and log the eviction.
Invalidate cache after worksheet creation
cloud_governance/common/google_drive/google_drive_operations.py
After batchUpdate creates a worksheet, computes the spreadsheet_id:sheet_name cache key and deletes any existing cached sheet ID for that key.
delete_rows: retry with cache eviction
cloud_governance/common/google_drive/google_drive_operations.py
Refactors delete_rows to attempt the operation twice: use cached sheetId on first attempt, evict cache and retry on specific HttpError messages; non-matching failures are logged and re-raised.
paste_csv_to_gsheet: retry with cache eviction
cloud_governance/common/google_drive/google_drive_operations.py
Adds two-attempt loop that uses cache on first attempt and bypasses cache on retry; evicts cached sheetId and retries when certain HttpError cases occur.
update_row_in_gsheet: retry with cache eviction
cloud_governance/common/google_drive/google_drive_operations.py
Adds two-attempt loop with cache-on-first-attempt / bypass-on-retry behavior; evicts cached sheetId and retries on specific HttpError cases; request uses numeric columnIndex (0) for update start.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly summarizes the main change: adding caching to Google Sheets API calls to prevent rate limiting, which aligns with the changeset's primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the problem (excessive API calls hitting rate limits), the solution (in-memory caching), and includes additional fixes (typo in logging).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cloud_governance/common/google_drive/google_drive_operations.py (1)

48-52: ⚡ Quick win

Populate cache from create response instead of deleting key.

After addSheet, you already have authoritative response data; deleting the cache key forces an extra spreadsheets().get(...sheets.properties) read on next operation. Cache the new sheetId directly from batchUpdate(...).execute() replies to reduce read pressure further.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloud_governance/common/google_drive/google_drive_operations.py` around lines
48 - 52, The code deletes a cached sheet id after calling
self.__service.spreadsheets().batchUpdate(...).execute(), causing an extra read;
instead parse the batchUpdate response (the result of
self.__service.spreadsheets().batchUpdate(...).execute()) to extract the new
sheetId for the created sheet and store it in self.__sheet_id_cache under the
cache_key (f"{gsheet_id}:{sheet_name}") so subsequent operations use the
authoritative id; update the logic around batchUpdate call in
google_drive_operations.py to inspect the execute() response for the added
sheet's properties and assign that sheetId into __sheet_id_cache rather than
deleting the key.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 112-115: find_sheet_id_by_name currently returns cached sheet IDs
from self.__sheet_id_cache unconditionally which can serve stale sheetId values;
modify it to detect and recover from "sheet not found" failures by evicting the
cache entry and retrying the lookup once (or implement a bounded TTL on
entries), and apply the same change to the sibling cache access at the other
location referenced (lines ~123-125) so callers like delete_rows,
update_row_in_gsheet, and paste/update flows will retry with a fresh lookup when
a cached id proves invalid.

---

Nitpick comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 48-52: The code deletes a cached sheet id after calling
self.__service.spreadsheets().batchUpdate(...).execute(), causing an extra read;
instead parse the batchUpdate response (the result of
self.__service.spreadsheets().batchUpdate(...).execute()) to extract the new
sheetId for the created sheet and store it in self.__sheet_id_cache under the
cache_key (f"{gsheet_id}:{sheet_name}") so subsequent operations use the
authoritative id; update the logic around batchUpdate call in
google_drive_operations.py to inspect the execute() response for the added
sheet's properties and assign that sheetId into __sheet_id_cache rather than
deleting the key.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 16fa2d47-e838-44f4-a015-1e436f19af35

📥 Commits

Reviewing files that changed from the base of the PR and between 9be8c00 and 6bbc2ce.

📒 Files selected for processing (1)
  • cloud_governance/common/google_drive/google_drive_operations.py

Comment thread cloud_governance/common/google_drive/google_drive_operations.py
Addresses potential issue where cached sheet IDs can become stale if
sheets are renamed/deleted/recreated by external processes during a
job run. This would cause operations to fail with "sheet not found"
errors with no recovery mechanism.

Changes:
- Added _evict_sheet_cache() method to remove stale cache entries
- Added use_cache parameter to find_sheet_id_by_name() to bypass cache
- Wrapped delete_rows(), update_row_in_gsheet(), and paste_csv_to_gsheet()
  with retry logic that:
  1. Attempts operation with cached sheet ID
  2. On "not found" errors, evicts cache and retries once with fresh lookup
  3. Fails gracefully if second attempt also fails

This defensive approach prevents complete job failures when sheet
metadata changes during execution.

Co-authored-by: Cursor <cursoragent@cursor.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cloud_governance/common/google_drive/google_drive_operations.py (1)

173-174: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stale-sheet retry matching is too narrow and can miss real stale sheetId failures.

These guards only retry on "Unable to parse range" / "not found". Stale cached IDs from Sheets often fail as "No grid with id", which would skip eviction+retry and defeat the cache-recovery path.

Suggested fix
+    `@staticmethod`
+    def _is_stale_sheet_error(error: HttpError) -> bool:
+        msg = str(error).lower()
+        return any(token in msg for token in ('not found', 'unable to parse range', 'no grid with id'))
+
     def delete_rows(self, spreadsheet_id: str, sheet_name: str, row_number: int):
@@
-                except HttpError as error:
-                    if attempt == 0 and ('Unable to parse range' in str(error) or 'not found' in str(error).lower()):
+                except HttpError as error:
+                    if attempt == 0 and self._is_stale_sheet_error(error):
@@
     def paste_csv_to_gsheet(self, csv_path, spreadsheet_id: str, sheet_name: str):
@@
-                    except HttpError as error:
-                        if attempt == 0 and ('Unable to parse range' in str(error) or 'not found' in str(error).lower()):
+                    except HttpError as error:
+                        if attempt == 0 and self._is_stale_sheet_error(error):
@@
     def update_row_in_gsheet(self, data: list, gsheet_id: str, row: int, sheet_name: str):
@@
-                except HttpError as error:
-                    if attempt == 0 and ('Unable to parse range' in str(error) or 'not found' in str(error).lower()):
+                except HttpError as error:
+                    if attempt == 0 and self._is_stale_sheet_error(error):
#!/bin/bash
# Verify stale-id error matching logic and whether "No grid with id" is currently handled.
rg -n -C2 "Unable to parse range|not found|No grid with id|_is_stale_sheet_error" cloud_governance/common/google_drive/google_drive_operations.py

Also applies to: 219-220, 253-254

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloud_governance/common/google_drive/google_drive_operations.py` around lines
173 - 174, The stale-sheet retry guard is too narrow—update the checks around
self._evict_sheet_cache (used where you currently match 'Unable to parse range'
or 'not found') to also detect "No grid with id" and other sheet-id stale
messages by adding a small helper like _is_stale_sheet_error(self, error)
(case-insensitive, matches 'unable to parse range', 'not found', 'no grid with
id', etc.), then call that helper in the existing conditionals (the blocks that
call self._evict_sheet_cache at the current occurrences) so eviction+retry will
run for all stale-sheet error variants; apply the same change to the other two
occurrences noted in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 177-178: The delete_rows function currently catches a second
failure, logs it with logger.info and uses break which suppresses the error and
lets callers assume deletion succeeded; change this to log at an appropriate
level (error) and re-raise the exception (or raise a new descriptive exception)
instead of breaking so failures propagate to callers. Locate delete_rows and
replace the logger.info/following break behavior with code that logs the error
(including the caught exception) and then raise the original exception (or a
wrapped one) so downstream code cannot silently continue after a failed delete.
- Around line 205-206: The Sheets batchUpdate payload is using string values for
"rowIndex" and "columnIndex" (e.g., "0" / '0'); change these to integer values
(0) where the payload/gridRange is constructed so the JSON sends numbers not
strings; locate the places in google_drive_operations.py that build the
batchUpdate/gridRange objects (look for "rowIndex" and "columnIndex" keys around
the shown diff) and replace the quoted string values with unquoted numeric
literals (0) so the API receives numeric indexes.

---

Duplicate comments:
In `@cloud_governance/common/google_drive/google_drive_operations.py`:
- Around line 173-174: The stale-sheet retry guard is too narrow—update the
checks around self._evict_sheet_cache (used where you currently match 'Unable to
parse range' or 'not found') to also detect "No grid with id" and other sheet-id
stale messages by adding a small helper like _is_stale_sheet_error(self, error)
(case-insensitive, matches 'unable to parse range', 'not found', 'no grid with
id', etc.), then call that helper in the existing conditionals (the blocks that
call self._evict_sheet_cache at the current occurrences) so eviction+retry will
run for all stale-sheet error variants; apply the same change to the other two
occurrences noted in the comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cc8b99a0-610a-45c6-bc42-eff530dbed33

📥 Commits

Reviewing files that changed from the base of the PR and between 6bbc2ce and db9b141.

📒 Files selected for processing (1)
  • cloud_governance/common/google_drive/google_drive_operations.py

Comment thread cloud_governance/common/google_drive/google_drive_operations.py Outdated
Comment thread cloud_governance/common/google_drive/google_drive_operations.py Outdated

@ebattat ebattat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@pragya811 pragya811 merged commit d775cb7 into main Jun 10, 2026
19 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in Cloud-Governance project Jun 10, 2026
@pragya811 pragya811 deleted the jenkins_fix3 branch June 10, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants