-
Notifications
You must be signed in to change notification settings - Fork 167
Feat: add Google Cloud Storage Connector #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Feat: add Google Cloud Storage Connector #109
Conversation
Signed-off-by: SyedZawwarAhmed <[email protected]>
Signed-off-by: SyedZawwarAhmed <[email protected]>
WalkthroughAdds a new Google Cloud Storage (GCS) storage connector, exports and registers it in core, documents GCS usage, updates README references, adds a GCS example script, and includes the @google-cloud/storage dependency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App
participant StorageService
participant GCSStorage
participant GCS as Google Cloud Storage
App->>StorageService: write(resourceId, data, acl?, metadata?, ttl?)
StorageService->>GCSStorage: resolve('GCS').write(...)
GCSStorage->>GCSStorage: ensureInitialized()
GCSStorage->>GCS: putObject(data, metadata+ACL, contentType, customTime?)
GCS-->>GCSStorage: ok
GCSStorage-->>StorageService: void
StorageService-->>App: done
App->>StorageService: read(resourceId)
StorageService->>GCSStorage: read(...)
GCSStorage->>GCS: getObject(resourceId)
alt found
GCS-->>GCSStorage: content+metadata
GCSStorage-->>StorageService: Buffer
StorageService-->>App: Buffer
else missing
GCS-->>GCSStorage: not found
GCSStorage-->>StorageService: undefined
StorageService-->>App: undefined
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (12)
packages/core/package.json (1)
64-64
: Add GCS dep: verify bundling/externalization and consider making it optional
- Ensure rollup marks
@google-cloud/storage
as external to avoid bundling a large server-only lib intodist
.- Consider
optionalDependencies
for connectors users may not need to keep installs lean.If helpful, I can open a follow-up to externalize all cloud SDKs in the rollup config.
packages/sdk/docs/07-services.md (1)
46-46
: Doc/API consistency: confirmagent.storage.GCS()
exists on Agent and in typesDouble-check that the SDK's Agent typing and factory actually expose
GCS()
alongsideS3()
andLocalStorage()
.I can scan/export the SDK typings and open a docs PR if needed.
README.md (2)
317-323
: Update components list to include GCSThe Storage components list still shows
LocalStorage, S3
. AddGCS
for consistency.- - **Storage**: `LocalStorage`, `S3` + - **Storage**: `LocalStorage`, `S3`, `GCS`
292-299
: Optional: Add a GCS example in Production SRE setupConsider adding a second snippet showing
Storage: { Connector: 'GCS', ... }
to mirror S3.packages/core/docs/connectors/storage.md (4)
87-91
: Section looks good; small copy editConsider “on Google Cloud” instead of “on Google Cloud Platform” for brevity/consistency with other docs.
92-98
: Consider documenting ADC as a credential optionMany deployments prefer Application Default Credentials or Workload Identity over embedding keys. Add a note that if ADC is configured,
projectId/clientEmail/privateKey
may be omitted.I can draft a short ADC subsection.
125-131
: Security notes: naming and stronger guidance
- Use “Secret Manager” (official name), not “Google Secret Manager”.
- Recommend Workload Identity Federation / ADC where possible instead of long‑lived keys.
- - Store credentials securely using environment variables or Google Secret Manager + - Store credentials securely using environment variables or Secret Manager + - Prefer Application Default Credentials or Workload Identity (no static keys) when possible
85-115
: Document TTL/lifecycle behavior if supportedIf the connector installs lifecycle rules or maps TTL to object expiry (per helper utilities), add a short subsection so users know how to enable/use it.
Happy to draft this doc block.
examples/06-Storage-no-agent/03-GCS.ts (3)
4-9
: Validate required env vars earlyFail fast with a helpful error if any are missing.
async function main() { - const gcsStorage = Storage.GCS({ + const { GCP_PROJECT_ID, GCP_CLIENT_EMAIL, GCP_PRIVATE_KEY, GCP_BUCKET_NAME } = process.env; + if (!GCP_PROJECT_ID || !GCP_CLIENT_EMAIL || !GCP_PRIVATE_KEY || !GCP_BUCKET_NAME) { + throw new Error('Missing one or more env vars: GCP_PROJECT_ID, GCP_CLIENT_EMAIL, GCP_PRIVATE_KEY, GCP_BUCKET_NAME'); + } + const gcsStorage = Storage.GCS({ - projectId: process.env.GCP_PROJECT_ID, - clientEmail: process.env.GCP_CLIENT_EMAIL, - privateKey: process.env.GCP_PRIVATE_KEY, - bucket: process.env.GCP_BUCKET_NAME, + projectId: GCP_PROJECT_ID, + clientEmail: GCP_CLIENT_EMAIL, + privateKey: GCP_PRIVATE_KEY.replace(/\\n/g, '\n'), + bucket: GCP_BUCKET_NAME, });
11-15
: Use the URI returned by write for provider-agnostic readsStaying URI-based matches docs and avoids key/namespace assumptions.
- await gcsStorage.write('test.txt', 'Hello, world!'); - - const data = await gcsStorage.read('test.txt'); + const uri = await gcsStorage.write('test.txt', 'Hello, world!'); + const data = await gcsStorage.read(uri);
20-20
: Catch top-level rejectionEnsure failures surface clearly in examples.
-main(); +main().catch((err) => { console.error(err); process.exit(1); });packages/core/src/helpers/GCSCache.helper.ts (1)
72-79
: Consider using a more efficient approach for rule comparisonUsing
JSON.stringify
for array comparison is fragile and could fail if array element order differs. Consider a more robust comparison method.Apply this diff for a more reliable comparison:
export function getNonExistingRules(existingRules: LifecycleRule[], newRules: LifecycleRule[]): LifecycleRule[] { return newRules.filter((newRule) => !existingRules.some((existingRule) => existingRule.condition.age === newRule.condition.age && - JSON.stringify(existingRule.condition.matchesSuffix) === JSON.stringify(newRule.condition.matchesSuffix) + existingRule.condition.matchesSuffix?.length === newRule.condition.matchesSuffix?.length && + existingRule.condition.matchesSuffix?.every(suffix => newRule.condition.matchesSuffix?.includes(suffix)) ) ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
packages/sdk/src/types/generated/Storage.types.ts
is excluded by!**/generated/**
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (9)
README.md
(1 hunks)examples/06-Storage-no-agent/03-GCS.ts
(1 hunks)packages/core/docs/connectors/storage.md
(1 hunks)packages/core/package.json
(1 hunks)packages/core/src/helpers/GCSCache.helper.ts
(1 hunks)packages/core/src/index.ts
(1 hunks)packages/core/src/subsystems/IO/Storage.service/connectors/GCSStorage.class.ts
(1 hunks)packages/core/src/subsystems/IO/Storage.service/index.ts
(1 hunks)packages/sdk/docs/07-services.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
packages/core/docs/connectors/storage.md
[grammar] ~89-~89: There might be a mistake here.
Context: ...Role: Google Cloud Storage connector Summary: Provides scalable cloud stora...
(QB_NEW_EN)
[grammar] ~92-~92: There might be a mistake here.
Context: ...on | | ------------- | ------ | -------- | --...
(QB_NEW_EN)
[grammar] ~93-~93: There might be a mistake here.
Context: ...-------------------------------------- | | projectId
| string | Yes | - ...
(QB_NEW_EN)
[grammar] ~94-~94: There might be a mistake here.
Context: ...Project ID where the bucket is located | | clientEmail
| string | Yes | - ...
(QB_NEW_EN)
[grammar] ~95-~95: There might be a mistake here.
Context: ...ccount email address | | privateKey
| string | Yes | - ...
(QB_NEW_EN)
[grammar] ~96-~96: There might be a mistake here.
Context: ...ccount private key | | bucket
| string | Yes | - ...
(QB_NEW_EN)
[grammar] ~119-~119: There might be a mistake here.
Context: ...ing scalability on Google Cloud Platform - Multi-region deployments within GCP infr...
(QB_NEW_EN)
[grammar] ~120-~120: There might be a mistake here.
Context: ...on deployments within GCP infrastructure - Applications with high availability requ...
(QB_NEW_EN)
[grammar] ~121-~121: There might be a mistake here.
Context: ...ions with high availability requirements - Integration with Google Cloud ecosystem ...
(QB_NEW_EN)
[grammar] ~122-~122: There might be a mistake here.
Context: ...ailability requirements - Integration with Google Cloud ecosystem - Large-scale ...
(QB_NEW_EN)
[grammar] ~122-~122: There might be a mistake here.
Context: ... Integration with Google Cloud ecosystem - Large-scale data storage and processing ...
(QB_NEW_EN)
🔇 Additional comments (4)
README.md (1)
28-28
: Nice: GCS mentioned in unified abstractionReads well and aligns with the new connector.
packages/core/src/subsystems/IO/Storage.service/connectors/GCSStorage.class.ts (1)
134-145
: Missing timezone handling in expiry date comparisonThe expiry date comparison uses
new Date()
which could have timezone inconsistencies between when the expiry was set and when it's checked.Consider using UTC dates consistently:
- const expirationDate = new Date(expiryDateValue); - const currentDate = new Date(); + const expirationDate = new Date(expiryDateValue); + const currentDate = new Date(); + // Ensure both dates are compared in UTC + if (currentDate.getTime() > expirationDate.getTime()) { - if (currentDate > expirationDate) {Likely an incorrect or invalid review comment.
packages/core/src/subsystems/IO/Storage.service/index.ts (1)
7-13
: LGTM! Clean integration of GCS storage connectorThe GCS storage connector is properly imported and registered following the same pattern as existing storage connectors.
packages/core/src/index.ts (1)
162-162
: LGTM —GCSStorage.class.ts
uses named exports;export *
correctly re-exportsGCSStorage
andGCSConfig
.
projectId: process.env.GCP_PROJECT_ID, | ||
clientEmail: process.env.GCP_CLIENT_EMAIL, | ||
privateKey: process.env.GCP_PRIVATE_KEY, | ||
bucket: process.env.GCP_BUCKET_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Env private key likely breaks without newline restoration
Most env-injected SA keys require replace(/\\n/g, '\n')
. Without it, auth fails.
Apply:
- privateKey: process.env.GCP_PRIVATE_KEY,
+ privateKey: process.env.GCP_PRIVATE_KEY?.replace(/\\n/g, '\n'),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
projectId: process.env.GCP_PROJECT_ID, | |
clientEmail: process.env.GCP_CLIENT_EMAIL, | |
privateKey: process.env.GCP_PRIVATE_KEY, | |
bucket: process.env.GCP_BUCKET_NAME, | |
projectId: process.env.GCP_PROJECT_ID, | |
clientEmail: process.env.GCP_CLIENT_EMAIL, | |
privateKey: process.env.GCP_PRIVATE_KEY?.replace(/\\n/g, '\n'), | |
bucket: process.env.GCP_BUCKET_NAME, |
🤖 Prompt for AI Agents
In examples/06-Storage-no-agent/03-GCS.ts around lines 5 to 8 the service
account privateKey pulled from process.env.GCP_PRIVATE_KEY will fail auth if
newline characters are escaped; update the assignment to restore real newlines
by replacing escaped sequences (e.g. call replace(/\\n/g, '\n') on
process.env.GCP_PRIVATE_KEY) before passing it to the client constructor,
ensuring you handle undefined safely (e.g. fallback or throw a clear error if
the env var is missing).
```typescript | ||
import { SRE } from '@smythos/sre'; | ||
|
||
SRE.init({ | ||
Storage: { | ||
Connector: 'GCS', | ||
Settings: { | ||
projectId: 'my-project-id', | ||
clientEmail: process.env.GCP_CLIENT_EMAIL, | ||
privateKey: process.env.GCP_PRIVATE_KEY, | ||
bucket: 'my-app-storage', | ||
}, | ||
}, | ||
}); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Private key newline gotcha (common failure in env vars)
Service account keys stored in env vars often need newline restoration. Add a tip:
- privateKey: process.env.GCP_PRIVATE_KEY,
+ // If stored as single-line env var, restore newlines:
+ privateKey: process.env.GCP_PRIVATE_KEY?.replace(/\\n/g, '\n'),
I can update the example + add a warning callout.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```typescript | |
import { SRE } from '@smythos/sre'; | |
SRE.init({ | |
Storage: { | |
Connector: 'GCS', | |
Settings: { | |
projectId: 'my-project-id', | |
clientEmail: process.env.GCP_CLIENT_EMAIL, | |
privateKey: process.env.GCP_PRIVATE_KEY, | |
bucket: 'my-app-storage', | |
}, | |
}, | |
}); | |
``` |
🤖 Prompt for AI Agents
In packages/core/docs/connectors/storage.md around lines 101 to 115, the example
uses process.env.GCP_PRIVATE_KEY directly which commonly fails because
environment variables lose literal newlines; update the docs to include a short
tip and example showing restoration of newlines (e.g., replace escaped "\\n"
with actual "\n" before use) and add a brief warning callout about storing
service account keys in env vars and needing to restore newlines so the private
key is parsed correctly by GCP clients.
const clientConfig: any = { | ||
projectId: _settings.projectId, | ||
}; | ||
|
||
if (_settings.clientEmail && _settings.privateKey) { | ||
clientConfig.credentials = { | ||
client_email: _settings.clientEmail, | ||
private_key: _settings.privateKey.replace(/\\n/g, '\n'), // Handle escaped newlines | ||
type: 'service_account' | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Potential security concern with credential structure
Creating credentials object with type: 'service_account'
but without other required service account fields might cause authentication issues. A complete service account credential typically requires additional fields.
🏁 Script executed:
#!/bin/bash
# Check if there are examples or tests showing the expected credential format
rg -A 10 -B 2 "credentials.*=.*\{" --type ts -g "*.test.ts" -g "*.spec.ts" -g "*example*"
# Check Google Cloud Storage SDK usage patterns in the codebase
ast-grep --pattern 'new Storage({
$$$
credentials: {
$$$
}
$$$
})'
Length of output: 172
🌐 Web query:
What fields are required in the credentials object when instantiating a new Storage client with @google-cloud/storage?
💡 Result:
When passing credentials by value the docs show the service-account JSON shape. Typical fields included are:
- type
- project_id
- private_key_id
- private_key
- client_email
- client_id
- auth_uri
- token_uri
- auth_provider_x509_cert_url
- client_x509_cert_url
(You can also supply keyFilename/projectId or an authClient instead.) [1][2]
Sources: Google Cloud Storage Node.js reference and package docs. [1] (cloud.google.com, jsdocs.io)
Citations:
- 1: https://cloud.google.com/nodejs/docs/reference/storage/latest/storage/storage?utm_source=openai
- 2: https://www.jsdocs.io/package/%40google-cloud/storage?utm_source=openai
Include full service account credentials or switch to keyFilename
The credentials
object currently only includes type
, client_email
, and private_key
, but the Storage client expects the full service-account JSON shape (e.g. project_id
, private_key_id
, client_id
, auth_uri
, token_uri
, auth_provider_x509_cert_url
, client_x509_cert_url
) or a keyFilename
reference. Update lines 60–70 to provide the complete credential fields or use keyFilename
.
🤖 Prompt for AI Agents
In
packages/core/src/subsystems/IO/Storage.service/connectors/GCSStorage.class.ts
around lines 60 to 70, the credentials object only sets type, client_email and
private_key but the Google Storage client expects a full service-account JSON
shape or a keyFilename path; update this block to either (A) accept and assign
the full service account JSON fields (project_id, private_key_id, client_id,
auth_uri, token_uri, auth_provider_x509_cert_url, client_x509_cert_url, etc.)
into clientConfig.credentials (ensuring private_key newline handling) or (B)
accept a filesystem path and set clientConfig.keyFilename to that path instead
of building partial credentials; validate presence of required fields and prefer
keyFilename if provided.
// Reset the initialization promise so it can be retried | ||
this.initializationPromise = null; | ||
throw error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Initialization failure doesn't prevent connector usage
When initialization fails, the initializationPromise
is reset to allow retries, but isInitialized
remains false. However, there's no mechanism to prevent methods from continuing to work with an uninitialized connector, potentially causing repeated failures.
Consider tracking initialization failures and implementing a circuit breaker pattern or maximum retry limit to prevent continuous failed initialization attempts.
🤖 Prompt for AI Agents
In
packages/core/src/subsystems/IO/Storage.service/connectors/GCSStorage.class.ts
around lines 106-108, the initialization failure currently resets
initializationPromise to null and re-throws but leaves isInitialized false with
no protection against subsequent operations; update the class to record
initialization failures (e.g. failureCount and lastFailureAt or a failedUntil
timestamp) and enforce a circuit-breaker style guard: increment failureCount and
set a short lockout/backoff period when initialization fails, prevent public
methods from attempting work while locked by checking the failed state and
throwing a clear "connector not initialized" or "temporarily disabled after
repeated init failures" error, allow retry only when failureCount is below a
configurable maxRetries or when backoff/failedUntil has expired, and reset
failureCount and failed state on a successful initialization.
packages/core/src/subsystems/IO/Storage.service/connectors/GCSStorage.class.ts
Show resolved
Hide resolved
if (gcsMetadata['gcs-acl']) { | ||
if (gcsMetadata['gcs-acl']) { | ||
metadata['gcs-acl'] = | ||
typeof gcsMetadata['gcs-acl'] == 'string' | ||
? gcsMetadata['gcs-acl'] | ||
: ACL.from(gcsMetadata['gcs-acl']).serializedACL; | ||
} | ||
|
||
delete gcsMetadata['gcs-acl']; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate condition check and potential data loss
There's a duplicate check for gcsMetadata['gcs-acl']
on lines 363 and 364, and the ACL is deleted from the original object which could cause data loss if the function fails later.
Apply this diff to fix the duplicate check and prevent data mutation:
private serializeGCSMetadata(gcsMetadata: Record<string, any>): Record<string, string | number | boolean | null> {
let metadata: Record<string, string | number | boolean | null> = {};
+ // Create a shallow copy to avoid mutating the original
+ const metadataCopy = { ...gcsMetadata };
+
- if (gcsMetadata['gcs-acl']) {
- if (gcsMetadata['gcs-acl']) {
- metadata['gcs-acl'] =
- typeof gcsMetadata['gcs-acl'] == 'string'
- ? gcsMetadata['gcs-acl']
- : ACL.from(gcsMetadata['gcs-acl']).serializedACL;
- }
-
- delete gcsMetadata['gcs-acl'];
+ if (metadataCopy['gcs-acl']) {
+ metadata['gcs-acl'] =
+ typeof metadataCopy['gcs-acl'] == 'string'
+ ? metadataCopy['gcs-acl']
+ : ACL.from(metadataCopy['gcs-acl']).serializedACL;
+ delete metadataCopy['gcs-acl'];
}
- for (let key in gcsMetadata) {
+ for (let key in metadataCopy) {
if (key == 'ContentType') continue; //skip ContentType as it's handled separately
- const value = gcsMetadata[key];
+ const value = metadataCopy[key];
🤖 Prompt for AI Agents
In
packages/core/src/subsystems/IO/Storage.service/connectors/GCSStorage.class.ts
around lines 363 to 372, remove the duplicated if check for
gcsMetadata['gcs-acl'] and avoid mutating gcsMetadata by not deleting the
property; instead, read gcsMetadata['gcs-acl'] into a local variable, set
metadata['gcs-acl'] to either the string or ACL.from(...).serializedACL based on
its type, and do not call delete on gcsMetadata (or only delete after any
subsequent operations succeed).
Hey @SyedZawwarAhmed Thank you for this contribution, this will be very useful for SmythOS users ! One quick note, I'm not sure thatGCSCache.helper.ts is relevant for GCS Storage. We added the cache helper with lifecycle rules to S3 because S3 does not support objects expiration natively. GCS supports a more robust expiration logic that can be set per object. so instead of setting up complex expirations rules like in S3, you need to setup one rule per bucket (check if not already there and add it)
This makes GCS automatically delete each object when its customTime is reached. Then when you write any object, calculate the deletion date and add it as a custom metadata if the TTL is not present, then do not set customTime field e.g // Compute expiration time (current time + TTL)
const deleteAt = new Date(Date.now() + ttlSeconds * 1000);
// Upload file with customTime metadata
await storage.bucket(bucketName).upload(filename, {
destination,
metadata: {
metadata: {
// customTime must be in RFC 3339 format
customTime: deleteAt.toISOString(),
},
},
}); |
@alaa-eddine-k Sure, I will look into and make the appropriate changes. |
… native customTime expiration Signed-off-by: SyedZawwarAhmed <[email protected]>
@alaa-eddine-k You're right. I have removed the GCSCache.helper.ts file and handled the expiration using the customTime field. |
📝 Description
This PR adds a Google Cloud Storage (GCS) connector to SmythOS, providing equivalent functionality to the existing S3 connector. The implementation uses Service Account authentication with clientEmail and privateKey credentials.
🔗 Related Issues
🔧 Type of Change
✅ Checklist
Summary by CodeRabbit
New Features
Documentation
Examples
Chores