Skip to content

Commit 45ffb4e

Browse files
Cursor files submodules s3 (#10)
* Added guidelines for s3 submodules * Submodules model improvements * Description update
1 parent f8ac7e0 commit 45ffb4e

File tree

1 file changed

+211
-0
lines changed

1 file changed

+211
-0
lines changed

.cursor/rules/guidelines.mdc

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
description: Guidelines for S3 storage operations and connection handling
3+
globs: ["controller.py", "connections/**/*.py", "enums.py"]
4+
alwaysApply: true
5+
---
6+
7+
# S3 Submodule Guidelines
8+
9+
This directory contains guidelines and conventions for the `submodules/s3` module. These guidelines ensure consistency, maintainability, and correctness across all S3 storage operations, connection handling, and related code.
10+
11+
## Overview
12+
13+
The `submodules/s3` module provides a unified interface for S3-compatible storage operations, supporting both AWS S3 and Minio. The module follows an abstraction pattern where:
14+
15+
- **Controller** (`controller.py`): Provides a unified API that routes operations to the appropriate connection backend
16+
- **Connections** (`connections/`): Contains backend-specific implementations (`aws.py`, `minio.py`)
17+
- **Enums** (`enums.py`): Defines connection target types (`ConnectionTarget`)
18+
19+
## Architecture Principles
20+
21+
### 1. Connection Abstraction
22+
23+
- All public functions should be defined in `controller.py`
24+
- Controller functions MUST route to the appropriate connection based on `get_current_target()`
25+
- Connection-specific implementations MUST be in `connections/aws.py` or `connections/minio.py`
26+
- Never import connection modules directly from outside the submodule - always use `controller.py`
27+
28+
### 2. Connection Target Detection
29+
30+
- Use `get_current_target()` to determine the active connection target
31+
- Connection target is determined by the `S3_TARGET` environment variable:
32+
- `"AWS"` → `ConnectionTarget.AWS`
33+
- Any other value or unset → `ConnectionTarget.MINIO`
34+
- Always handle `ConnectionTarget.UNKNOWN` case explicitly
35+
36+
### 3. Client Initialization Pattern
37+
38+
- Connection modules use lazy initialization with global client variables
39+
- Client initialization functions should be private (prefixed with `__`)
40+
- Use `__get_client()` pattern for accessing the client
41+
- Initialize client only when needed, not at module import time
42+
- Handle connection failures gracefully with appropriate exceptions
43+
44+
### 4. Function Signatures and Return Types
45+
46+
- All controller functions MUST accept the same parameters regardless of backend
47+
- Return types should be consistent:
48+
- Boolean operations return `bool`
49+
- Data retrieval returns `str`, `bytes`, or `Dict[str, Any]` as appropriate
50+
- File operations return file paths as `str` or `None`
51+
- Use type hints consistently for all function parameters and return values
52+
53+
### 5. Error Handling
54+
55+
- Check for bucket/object existence before operations when appropriate
56+
- Return `False` or `None` for failed operations (don't raise exceptions unless critical)
57+
- Raise exceptions only for:
58+
- Missing required environment variables
59+
- Connection failures
60+
- FileNotFoundError for missing objects when creating presigned URLs
61+
- ValueError for invalid operations (e.g., overwriting without `force=True`)
62+
63+
### 6. Environment Variables
64+
65+
#### Minio Configuration
66+
- `S3_ENDPOINT`: External address (e.g., `http://$HOST_IP:7053`)
67+
- `S3_ENDPOINT_LOCAL`: Local address (e.g., `object-storage:9000`)
68+
- `S3_ACCESS_KEY`: S3 username
69+
- `S3_SECRET_KEY`: S3 password
70+
- `S3_USE_SSL`: Set to `"1"` to use SSL
71+
72+
#### AWS Configuration
73+
- `S3_TARGET`: Set to `"AWS"` to use AWS (defaults to Minio otherwise)
74+
- `S3_AWS_ENDPOINT`: AWS endpoint address
75+
- `S3_AWS_REGION`: AWS region (e.g., `eu-west-1`)
76+
- `S3_AWS_ACCESS_KEY`: AWS access key
77+
- `S3_AWS_SECRET_KEY`: AWS secret key
78+
- `STS_ENDPOINT`: Security Token Service endpoint
79+
80+
### 7. Bucket Operations
81+
82+
- Bucket names typically represent `organization_id` (UUID format)
83+
- Always check `bucket_exists()` before operations that require existing buckets
84+
- Create buckets automatically when needed for write operations (unless explicitly documented otherwise)
85+
- Use `ARCHIVE_BUCKET = "archive"` constant for archiving operations
86+
- When removing buckets, handle recursive deletion of objects explicitly
87+
88+
### 8. Object Operations
89+
90+
- Object names follow pattern: `project_id + "/" + object_name` (e.g., `project_id/docbin_full`)
91+
- Use `put_object()` for string data (JSON, text)
92+
- Use `upload_object()` for file uploads from local filesystem
93+
- Use `get_object()` for string data retrieval
94+
- Use `get_object_bytes()` for binary data (PDFs, images, etc.)
95+
- Use `download_object()` to save objects to local filesystem
96+
97+
### 9. Presigned URLs and Credentials
98+
99+
- `create_access_link()`: Creates GET presigned URLs (1 hour expiry)
100+
- `create_file_upload_link()`: Creates PUT presigned URLs (12 hours expiry)
101+
- `create_data_upload_link()`: Creates POST presigned URLs (12 hours expiry)
102+
- `get_upload_credentials_and_id()`: Returns STS credentials for direct client uploads
103+
- `get_download_credentials()`: Returns STS credentials for direct client downloads
104+
- Always verify object/bucket existence before creating presigned URLs
105+
106+
### 10. File Path Handling
107+
108+
- Use relative paths for temporary files (e.g., `tmpfile.{file_type}`)
109+
- Clean up temporary files after operations when possible
110+
- Use `os.path.exists()` to verify file existence before upload operations
111+
- Handle file name conflicts appropriately (use `force` parameter)
112+
113+
### 11. Migration and Transfer Operations
114+
115+
- `transfer_bucket_from_minio_to_aws()`: Downloads from Minio and uploads to AWS
116+
- Always handle cleanup of temporary files during transfer
117+
- Support `remove_from_minio` flag for one-way migrations
118+
- Support `force_overwrite` flag for overwriting existing objects
119+
120+
### 12. Code Organization
121+
122+
```
123+
submodules/s3/
124+
├── controller.py # Main API - routes to connections
125+
├── enums.py # ConnectionTarget enum
126+
├── connections/
127+
│ ├── aws.py # AWS S3 implementation
128+
│ └── minio.py # Minio implementation
129+
└── .cursor/
130+
└── rules/
131+
└── guidelines.mdc # This file
132+
```
133+
134+
### 13. Import Patterns
135+
136+
- Controller imports: `from .connections import minio` and `from .connections import aws`
137+
- Use relative imports within the submodule (e.g., `from .enums import ConnectionTarget`)
138+
- External code should import from `controller.py` only, never from `connections/` directly
139+
140+
### 14. Testing Considerations
141+
142+
- Functions should be testable by mocking connection modules
143+
- Support both Minio and AWS backends in tests
144+
- Test connection target switching behavior
145+
- Test error handling for missing environment variables
146+
147+
### 15. Common Patterns
148+
149+
#### Adding a New Operation
150+
151+
1. Add function to both `connections/aws.py` and `connections/minio.py` with identical signatures
152+
2. Add routing function to `controller.py`:
153+
154+
```python
155+
def new_operation(bucket: str, param: str) -> bool:
156+
target = get_current_target()
157+
if target == ConnectionTarget.MINIO:
158+
return minio.new_operation(bucket, param)
159+
elif target == ConnectionTarget.AWS:
160+
return aws.new_operation(bucket, param)
161+
elif target == ConnectionTarget.UNKNOWN:
162+
return False
163+
return False
164+
```
165+
166+
### 16. Documentation
167+
168+
- All public functions MUST have docstrings
169+
- Docstrings should include:
170+
- Brief description
171+
- Args section with types and descriptions
172+
- Returns section with return type and description
173+
- Raises section if exceptions are raised
174+
- Use clear, descriptive function and variable names
175+
176+
### 17. Security & Performance
177+
178+
**Security:**
179+
- Never log or expose credentials
180+
- Use environment variables for all sensitive configuration
181+
- Presigned URLs should have appropriate expiration times
182+
- Validate bucket and object names to prevent path traversal
183+
184+
**Performance:**
185+
- Use lazy client initialization to avoid unnecessary connections
186+
- Batch operations when possible (e.g., `get_bucket_objects()`)
187+
- Use appropriate part sizes for multipart uploads
188+
189+
## Quick Reference
190+
191+
### Common Operations
192+
193+
- **Check bucket exists**: `bucket_exists(bucket: str) -> bool`
194+
- **Create bucket**: `create_bucket(bucket: str) -> bool`
195+
- **Remove bucket**: `remove_bucket(bucket: str, recursive: bool = False) -> bool`
196+
- **Put string data**: `put_object(bucket: str, object_name: str, data: str, ...) -> bool`
197+
- **Get string data**: `get_object(bucket: str, object_name: str) -> str`
198+
- **Upload file**: `upload_object(bucket: str, object_name: str, file_path: str, force: bool = False) -> bool`
199+
- **Download file**: `download_object(bucket: str, object_name: str, file_type: str, ...) -> str`
200+
- **Delete object**: `delete_object(bucket: str, object_name: str) -> bool`
201+
- **Check object exists**: `object_exists(bucket: str, object_name: str) -> bool`
202+
- **Create presigned URL**: `create_access_link(bucket: str, object_name: str) -> str`
203+
- **List objects**: `get_bucket_objects(bucket: str, prefix: str = None) -> Dict[str, Any]`
204+
- **Copy object**: `copy_object(source_bucket: str, source_object: str, target_bucket: str, target_object: str) -> bool`
205+
206+
### Constants
207+
208+
- `ARCHIVE_BUCKET = "archive"`: Default archive bucket name
209+
- `ESSENTIAL_CREDENTIAL_KEYS = {"bucket", "Credentials", "uploadTaskId"}`: Required credential keys
210+
211+
For detailed implementation examples, refer to the existing code in `controller.py` and `connections/` modules.

0 commit comments

Comments
 (0)