WIP: Add storage infra module + working document upload#37
WIP: Add storage infra module + working document upload#37sean-navapbc wants to merge 4 commits intomainfrom
Conversation
- Create infra/modules/storage/ with Azure Blob Storage account and container, following the security hardening pattern from the terraform-backend-azure module (StorageV2, HTTPS-only, TLS 1.2, no shared keys, OAuth auth, blob versioning, delete retention) - Add Storage Blob Data Contributor RBAC role assignment for the app's managed identity in the service module - Wire storage environment variables (AZURE_STORAGE_ACCOUNT_NAME, AZURE_STORAGE_CONTAINER_NAME) through the service module - Add private endpoint support for blob storage with DNS zone integration in the network layer - Implement storage.py with Azure Blob Storage SDK using DefaultAzureCredential for upload, download, and create_upload_url - Update document-upload route to server-side upload (Flask receives file, uploads to blob storage) - Add document-upload endpoint test in Terratest Closes #4
Testing StatusCompleted
Remaining
Manual Verification Checklist (for reviewer)
|
Manual Verification & Code ReviewTest Results (Local Testing)I cloned the branch, installed dependencies, and ran manual tests against the Flask app and storage module. Here are the results:
CI Status
Code Review FindingsSecurity Issues (Should Fix Before Merge)1. Path Traversal Vulnerability in path = f"uploads/{datetime.now().date()}/{file.filename}"
Suggested fix: Use from werkzeug.utils import secure_filename
path = f"uploads/{datetime.now().date()}/{secure_filename(file.filename)}"2. Reflected XSS in return f"<p>File uploaded successfully to {path}</p>"The filename is embedded directly into the HTML response without escaping. A filename containing Suggested fix: Use Flask's from markupsafe import escape
return f"<p>File uploaded successfully to {escape(path)}</p>"3. No Upload Size Limit Suggested fix: app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024 # 16 MB limitNon-Security Issues4. Pre-existing bug in row = cur.fetchone # Missing parentheses — assigns method, not resultThen line 49 calls 5. 6. resource_id = module.app_config.has_blob_storage ? module.storage[0].storage_account_id : ""Passing an empty string to 7. skip_service_principal_aad_check = var.is_temporaryThis is fine for test environments but make sure this flag is well-documented. Skipping AAD checks in production would be a security concern. Infrastructure Review (Looks Good)
Recommendations
|
Fixes Applied & Updated Test ResultsFollowing up on my earlier review, I've applied fixes for all security issues identified. Here's a summary of changes and the updated test results. Changes Made
Full Test Results (24 tests)
Summary: 23 passed, 0 failed, 1 warning Diff# app.py changes:
+from markupsafe import escape
+from werkzeug.utils import secure_filename
+app.config["MAX_CONTENT_LENGTH"] = 16 * 1024 * 1024 # 16 MB upload limit
- row = cur.fetchone
+ row = cur.fetchone()
- last_migration_date = cur.fetchone()[0]
+ last_migration_date = row[0]
- path = f"uploads/{datetime.now().date()}/{file.filename}"
+ filename = secure_filename(file.filename)
+ if not filename:
+ return "Invalid filename", 400
+ path = f"uploads/{datetime.now().date()}/{filename}"
- f"<p>File uploaded successfully to {path}</p>"
+ f"<p>File uploaded successfully to {escape(path)}</p>"
# storage.py changes:
-def create_upload_url(path):
- """Return the local upload endpoint for server-side upload."""
- return "/document-upload", {}Remaining Items (from Testing Status)
|
- Add secure_filename() to sanitize user-supplied filenames (path traversal fix) - Add escape() to prevent reflected XSS in upload success response - Add MAX_CONTENT_LENGTH (16 MB) to reject oversized uploads - Add guard for empty filenames after sanitization - Fix pre-existing fetchone bug in migrations endpoint (missing parens) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dead code cleanup — this function is no longer called after the server-side upload refactor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remaining Test Status UpdateCI — tfsec compliance scan: ✅ RESOLVEDThe tfsec failure on commit The tfsec action tried to download the latest tfsec binary from the GitHub Releases API, hit a 403 rate limit, got an empty URL, and failed. After our two fix commits ( Terratest: ⏳ Requires Azure environmentThe Go test in
Cannot be run from this environment — requires an Azure subscription with credentials configured. Should be run in a CI/CD pipeline or from a developer workstation with Manual verification: ⏳ Requires deployed environmentSteps to verify manually:
Integration test: ⏳ Requires platform-test-azureSteps:
Summary
|
Summary
infra/modules/storage/— Azure Blob Storage module with security hardening (StorageV2, HTTPS-only, TLS 1.2, no shared keys, OAuth auth, blob versioning, 30-day delete retention, Azure Monitor integration)Storage Blob Data ContributorRBAC role assignment on the app's managed identitystorage.pywith Azure Blob Storage SDK usingDefaultAzureCredential/document-uploadroute to server-side upload (Flask receives file, uploads to blob)Design decisions
storage_varspattern — Follows the existingdb_varsnullable object pattern for consistency. Service module owns the env var names.Test plan
terraform fmtpassesterraform validatepasses on storage moduleCloses #4