-
Notifications
You must be signed in to change notification settings - Fork 567
feat: clica command for a full backup/restore #3042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: clica command for a full backup/restore #3042
Conversation
WalkthroughAdds end-to-end attachment export/import utilities, new serdes API endpoints and views (ZIP and streaming batch formats), CLI streaming backup/restore commands, attachment SHA‑256 support with migration/backfill, and supporting CLI env/gitignore changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI (clica)
participant ExportAPI as ExportAttachmentsView
participant Exporter as AttachmentExporter
participant DB as Database (EvidenceRevision)
participant Storage as ObjectStorage
participant RestoreAPI as FullRestoreView
participant Importer as AttachmentImporter
CLI->>ExportAPI: GET /serdes/export-attachments/
ExportAPI->>Exporter: create_attachments_zip()
Exporter->>DB: collect_all_attachments()
DB-->>Exporter: queryset of revisions
Exporter->>Storage: read attachment files
Storage-->>Exporter: file bytes
Exporter-->>ExportAPI: ZIP BytesIO
ExportAPI-->>CLI: ZIP response
CLI->>RestoreAPI: POST /serdes/full-restore/ (DB dump + attachments stream/ZIP)
RestoreAPI->>DB: perform DB restore (existing load logic)
DB-->>RestoreAPI: DB restore result
RestoreAPI->>Importer: extract_attachments_from_zip(stream or zip, dry_run=false)
Importer->>Storage: write attachment files (evidence-revisions/...)
Storage-->>Importer: stored confirmation
Importer->>DB: assign attachment to EvidenceRevision (save -> attachment_hash computed)
DB-->>Importer: save confirmations
Importer-->>RestoreAPI: stats (processed/restored/errors)
RestoreAPI-->>CLI: consolidated restore response (207 if partial)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (1)
cli/README.md (1)
404-419: Examples: consider adding a note about secure storage/permissions for backup dirs.
Given backups can contain sensitive evidence, it’s worth explicitly recommending restrictive permissions (e.g.,chmod 700on the backup directory).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/serdes/export_utils.py(1 hunks)backend/serdes/urls.py(1 hunks)backend/serdes/views.py(4 hunks)cli/README.md(2 hunks)cli/clica.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
backend/serdes/urls.py (1)
backend/serdes/views.py (2)
ExportAttachmentsView(245-298)FullRestoreView(376-532)
backend/serdes/views.py (3)
backend/serdes/export_utils.py (4)
AttachmentExporter(15-82)AttachmentImporter(85-185)create_attachments_zip(61-82)extract_attachments_from_zip(86-185)backend/iam/models.py (2)
has_backup_permission(729-734)username(742-743)backend/core/utils.py (1)
compare_schema_versions(180-266)
cli/clica.py (1)
backend/serdes/views.py (5)
get(34-68)get(246-298)post(171-242)post(304-373)post(377-532)
backend/serdes/export_utils.py (2)
backend/core/models.py (2)
Evidence(3304-3384)EvidenceRevision(3387-3450)backend/serdes/views.py (2)
get(34-68)get(246-298)
🪛 GitHub Check: CodeQL
backend/serdes/views.py
[warning] 345-350: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.
[warning] 355-359: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.
[warning] 503-507: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
🪛 markdownlint-cli2 (0.18.1)
cli/README.md
288-288: Blank line inside blockquote
(MD028, no-blanks-blockquote)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build_community_frontend
- GitHub Check: test (3.12)
- GitHub Check: enterprise-startup-functional-test (3.12)
- GitHub Check: startup-functional-test (3.12)
- GitHub Check: enterprise-startup-docker-compose-test
- GitHub Check: startup-docker-compose-test
- GitHub Check: build_enterprise_frontend
- GitHub Check: build (3.12)
🔇 Additional comments (3)
backend/serdes/urls.py (1)
12-21: Routes look correct; confirm whetherLoadAttachmentsViewshould also be exposed.
IfLoadAttachmentsViewis intended for standalone attachment restores (outside/full-restore/), it likely needs apath("load-attachments/", ...)entry; otherwise consider removing the view to avoid dead code.cli/clica.py (1)
611-612: Command registration looks good.backend/serdes/views.py (1)
51-63: Excludingiam.personalaccesstokenfrom dump/load is the right direction for restore robustness.
This aligns with the “regenerate PAT after restore” workflow and should help with token-related import/restore failures.Also applies to: 127-136
| def package_attachments_to_zip( | ||
| self, revisions: QuerySet, zipf: zipfile.ZipFile | ||
| ) -> int: | ||
| count = 0 | ||
|
|
||
| for revision in revisions: | ||
| if revision.attachment and default_storage.exists(revision.attachment.name): | ||
| try: | ||
| with default_storage.open(revision.attachment.name, "rb") as file: | ||
| file_content = file.read() | ||
|
|
||
| filename = ( | ||
| f"{revision.evidence_id}_v{revision.version}_" | ||
| f"{os.path.basename(revision.attachment.name)}" | ||
| ) | ||
|
|
||
| zip_path = os.path.join( | ||
| "attachments", "evidence-revisions", filename | ||
| ) | ||
|
|
||
| zipf.writestr(zip_path, file_content) | ||
| count += 1 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exporter: avoid full in-RAM reads + use POSIX zip paths (Windows-safe).
import io
import os
+import posixpath
+import shutil
import zipfile
@@
- zip_path = os.path.join(
- "attachments", "evidence-revisions", filename
- )
-
- zipf.writestr(zip_path, file_content)
+ zip_path = posixpath.join(
+ "attachments", "evidence-revisions", filename
+ )
+ with zipf.open(zip_path, "w") as zf:
+ shutil.copyfileobj(file, zf, length=1024 * 1024)
count += 1Also applies to: 61-82
🤖 Prompt for AI Agents
In backend/serdes/export_utils.py around lines 26-48 (and similarly 61-82), the
exporter currently reads entire attachment files into memory and builds zip
paths with os.path.join (which yields backslashes on Windows); change to stream
the file into the zip to avoid full in-RAM reads and use POSIX-style zip paths.
Specifically, open the storage file as a binary file-like object and create the
entry in the ZipFile via zipfile.ZipFile.open(zip_path, "w") and copy the file
contents into that entry in chunks (e.g., shutil.copyfileobj) instead of
file.read(); build zip_path using posixpath.join("attachments",
"evidence-revisions", filename) so the stored path uses forward slashes and
apply the same changes to the other block at lines 61-82.
| for file_path in attachment_files: | ||
| try: | ||
| filename = os.path.basename(file_path) | ||
| parts = filename.split("_", 2) | ||
|
|
||
| if len(parts) < 3: | ||
| stats["errors"].append( | ||
| f"Invalid filename format: {filename}" | ||
| ) | ||
| continue | ||
|
|
||
| evidence_id = parts[0] | ||
| version_str = parts[1] | ||
| original_filename = parts[2] | ||
|
|
||
| if not version_str.startswith("v"): | ||
| stats["errors"].append( | ||
| f"Invalid version format in: {filename}" | ||
| ) | ||
| continue | ||
|
|
||
| version = int(version_str[1:]) | ||
|
|
||
| if not dry_run: | ||
| # Find the corresponding EvidenceRevision | ||
| try: | ||
| revision = EvidenceRevision.objects.get( | ||
| evidence_id=evidence_id, version=version | ||
| ) | ||
|
|
||
| file_content = zipf.read(file_path) | ||
|
|
||
| storage_path = ( | ||
| f"evidence-revisions/{evidence_id}/" | ||
| f"v{version}/{original_filename}" | ||
| ) | ||
|
|
||
| saved_path = default_storage.save( | ||
| storage_path, io.BytesIO(file_content) | ||
| ) | ||
|
|
||
| revision.attachment = saved_path | ||
| revision.save(update_fields=["attachment"]) | ||
|
|
||
| stats["restored"] += 1 | ||
|
|
||
| except EvidenceRevision.DoesNotExist: | ||
| stats["errors"].append( | ||
| f"EvidenceRevision not found: " | ||
| f"evidence_id={evidence_id}, version={version}" | ||
| ) | ||
| except Exception as e: | ||
| stats["errors"].append( | ||
| f"Failed to restore {filename}: {str(e)}" | ||
| ) | ||
| else: | ||
| stats["restored"] += 1 | ||
|
|
||
| except Exception as e: | ||
| stats["errors"].append( | ||
| f"Error processing {file_path}: {str(e)}" | ||
| ) | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importer: make restores idempotent (avoid default_storage.save auto-renames + orphaned files).
If storage_path already exists, many storage backends will rename on save(), leaving old files behind and changing paths each restore run.
if not dry_run:
@@
- saved_path = default_storage.save(
- storage_path, io.BytesIO(file_content)
- )
+ # Prefer overwrite semantics for idempotent restores
+ try:
+ if revision.attachment:
+ default_storage.delete(revision.attachment.name)
+ except Exception:
+ logger.warning(
+ "Failed to delete previous attachment",
+ revision_id=revision.id,
+ attachment_name=getattr(revision.attachment, "name", None),
+ )
+ try:
+ if default_storage.exists(storage_path):
+ default_storage.delete(storage_path)
+ except Exception:
+ logger.warning(
+ "Failed to delete target path before restore",
+ storage_path=storage_path,
+ )
+
+ saved_path = default_storage.save(
+ storage_path, io.BytesIO(file_content)
+ )Committable suggestion skipped: line range outside the PR's diff.
| class ExportAttachmentsView(APIView): | ||
| def get(self, request, *args, **kwargs): | ||
| if not request.user.has_backup_permission: | ||
| logger.warning( | ||
| "Unauthorized attachment export attempt", | ||
| user=request.user.username, | ||
| ) | ||
| return Response(status=status.HTTP_403_FORBIDDEN) | ||
|
|
||
| logger.info( | ||
| "Starting attachment export", | ||
| user=request.user.username, | ||
| ) | ||
|
|
||
| try: | ||
| exporter = AttachmentExporter() | ||
| zip_buffer, count = exporter.create_attachments_zip() | ||
|
|
||
| if count == 0: | ||
| logger.info("No attachments found to export") | ||
| return Response( | ||
| {"message": "No attachments found"}, | ||
| status=status.HTTP_404_NOT_FOUND, | ||
| ) | ||
|
|
||
| # Create response with ZIP file | ||
| timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") | ||
| filename = f"ciso-assistant-attachments-{timestamp}.zip" | ||
|
|
||
| response = HttpResponse( | ||
| zip_buffer.getvalue(), content_type="application/zip" | ||
| ) | ||
| response["Content-Disposition"] = f'attachment; filename="{filename}"' | ||
|
|
||
| logger.info( | ||
| "Attachment export completed successfully", | ||
| attachments_count=count, | ||
| filename=filename, | ||
| zip_size=len(zip_buffer.getvalue()), | ||
| ) | ||
|
|
||
| return response | ||
|
|
||
| except Exception as e: | ||
| logger.error( | ||
| "Attachment export failed", | ||
| user=request.user.username, | ||
| error=str(e), | ||
| exc_info=True, | ||
| ) | ||
| return Response( | ||
| {"error": "AttachmentExportFailed"}, | ||
| status=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExportAttachmentsView: watch out for large in-memory ZIP responses.
This returns zip_buffer.getvalue() (full ZIP in RAM). Consider streaming/FileResponse (or at least spooling to a temp file) to prevent server OOM on large evidence sets.
🤖 Prompt for AI Agents
In backend/serdes/views.py around lines 245 to 299, the current implementation
builds the entire ZIP in memory with zip_buffer.getvalue(), risking OOM for
large exports; change to stream the ZIP to the client by writing the archive to
a spooled temporary file (e.g., tempfile.SpooledTemporaryFile with a sensible
max_size) or directly to a StreamingHttpResponse/FileResponse that yields
chunks, set the Content-Disposition and Content-Type headers from the temp
file/stream, ensure the temp file is rewound before streaming, and close/delete
the temp file after the response is sent to avoid retaining large buffers in
memory.
| class FullRestoreView(APIView): | ||
| def post(self, request, *args, **kwargs): | ||
| if not request.user.has_backup_permission: | ||
| logger.error("Unauthorized full restore attempt", user=request.user) | ||
| return Response(status=status.HTTP_403_FORBIDDEN) | ||
|
|
||
| logger.info( | ||
| "Starting full restore (database + attachments)", | ||
| user=request.user.username, | ||
| ) | ||
|
|
||
| # Get uploaded files from multipart form data | ||
| backup_file = request.FILES.get("backup") | ||
| attachments_file = request.FILES.get("attachments") | ||
|
|
||
| if not backup_file: | ||
| return Response( | ||
| {"error": "NoBackupFileProvided"}, status=status.HTTP_400_BAD_REQUEST | ||
| ) | ||
|
|
||
| # Step 1: Restore database backup | ||
| logger.info("Step 1/2: Restoring database backup") | ||
|
|
||
| try: | ||
| data = backup_file.read() | ||
| is_gzip = data.startswith(GZIP_MAGIC_NUMBER) | ||
| full_decompressed_data = gzip.decompress(data) if is_gzip else data | ||
|
|
||
| full_decompressed_data = json.loads(full_decompressed_data) | ||
| metadata, decompressed_data = full_decompressed_data | ||
| metadata = metadata["meta"] | ||
|
|
||
| current_version = VERSION.split("-")[0] | ||
| backup_version = None | ||
| schema_version = 0 | ||
|
|
||
| for metadata_part in metadata: | ||
| backup_version = metadata_part.get("media_version") | ||
| schema_version = metadata_part.get("schema_version") | ||
| if backup_version is not None or schema_version is not None: | ||
| break | ||
|
|
||
| try: | ||
| schema_version_int = int(schema_version) | ||
| compare_schema_versions(schema_version_int, backup_version) | ||
| if backup_version != VERSION: | ||
| raise ValueError( | ||
| "The version of the current instance and the one that generated the backup are not the same." | ||
| ) | ||
| except (ValueError, TypeError) as e: | ||
| logger.error( | ||
| "Invalid schema version format", | ||
| schema_version=schema_version, | ||
| exc_info=e, | ||
| ) | ||
| return Response( | ||
| {"error": "InvalidSchemaVersion"}, | ||
| status=status.HTTP_400_BAD_REQUEST, | ||
| ) | ||
|
|
||
| is_enterprise = apps.is_installed("enterprise_core") | ||
| if not is_enterprise: | ||
| for obj in decompressed_data: | ||
| if obj["model"] != "iam.role": | ||
| continue | ||
| permissions = obj["fields"]["permissions"] | ||
| enterprise_perms_indices = [ | ||
| i | ||
| for i, perm in enumerate(permissions) | ||
| if perm[1] == "enterprise_core" | ||
| ] | ||
| for perm_index in reversed(enterprise_perms_indices): | ||
| permissions.pop(perm_index) | ||
|
|
||
| decompressed_data = [ | ||
| obj | ||
| for obj in decompressed_data | ||
| if obj["model"].split(".", 1)[0] != "enterprise_core" | ||
| ] | ||
|
|
||
| decompressed_data = json.dumps(decompressed_data) | ||
|
|
||
| # Reuse existing load_backup logic | ||
| load_backup_view = LoadBackupView() | ||
| db_response = load_backup_view.load_backup( | ||
| request, decompressed_data, backup_version, current_version | ||
| ) | ||
|
|
||
| if db_response.status_code != 200: | ||
| return db_response | ||
|
|
||
| logger.info("Database backup restored successfully") | ||
|
|
||
| except Exception as e: | ||
| logger.error("Database restore failed", exc_info=e) | ||
| return Response( | ||
| {"error": "DatabaseRestoreFailed"}, | ||
| status=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FullRestoreView: catch ValidationError from compare_schema_versions and avoid returning exception details.
As written, a schema/version incompatibility can bubble into the outer except Exception and become a 500 (DatabaseRestoreFailed) instead of a 400. Also, returning details: str(e) is a client info-leak.
+from django.core.exceptions import ValidationError
@@
- except (ValueError, TypeError) as e:
+ except (ValidationError, ValueError, TypeError) as e:
logger.error(
"Invalid schema version format",
schema_version=schema_version,
exc_info=e,
)
+ if isinstance(e, ValidationError):
+ # compare_schema_versions raises ValidationError({"error": "<code>"})
+ code = None
+ try:
+ code = e.message_dict.get("error")
+ if isinstance(code, list):
+ code = code[0]
+ except Exception:
+ code = None
return Response(
- {"error": "InvalidSchemaVersion"},
+ {"error": code or "InvalidSchemaVersion"},
status=status.HTTP_400_BAD_REQUEST,
)
@@
- return Response(
- {
- "error": "AttachmentRestoreFailed",
- "message": "Database restored but attachments failed",
- "details": str(e),
- },
- status=status.HTTP_500_INTERNAL_SERVER_ERROR,
- )
+ return Response(
+ {
+ "error": "AttachmentRestoreFailed",
+ "message": "Database restored but attachments failed",
+ },
+ status=status.HTTP_500_INTERNAL_SERVER_ERROR,
+ )Also applies to: 500-509
🤖 Prompt for AI Agents
backend/serdes/views.py around lines 376-474 (and similarly handle 500-509): the
compare_schema_versions call can raise a ValidationError that is currently
falling through to the broad except and resulting in a 500 and/or leaking
exception details; update the code to catch ValidationError (and similar
schema/version errors) separately after compare_schema_versions, log a concise
message without including sensitive exception details, and return a 400 response
like {"error":"InvalidSchemaVersion"}; ensure the outer broad except still
returns a 500 but does not include exception strings in the response body (only
log them if needed).
| @click.command() | ||
| @click.option( | ||
| "--dest-dir", | ||
| required=True, | ||
| help="Destination directory to save backup files", | ||
| type=click.Path(file_okay=False, dir_okay=True), | ||
| ) | ||
| def backup_full(dest_dir): | ||
| """Create a full backup including database and attachments""" | ||
| if not TOKEN: | ||
| print( | ||
| "No authentication token available. Please set PAT token in .clica.env.", | ||
| file=sys.stderr, | ||
| ) | ||
| sys.exit(1) | ||
|
|
||
| headers = {"Authorization": f"Token {TOKEN}"} | ||
| dest_path = Path(dest_dir) | ||
| dest_path.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| rprint("[bold blue]Exporting database backup...[/bold blue]") | ||
| url = f"{API_URL}/serdes/dump-db/" | ||
| res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) | ||
|
|
||
| if res.status_code != 200: | ||
| rprint( | ||
| f"[bold red]Error exporting database: {res.status_code}[/bold red]", | ||
| file=sys.stderr, | ||
| ) | ||
| rprint(res.text, file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| backup_file = dest_path / "backup.json.gz" | ||
| with open(backup_file, "wb") as f: | ||
| f.write(res.content) | ||
| rprint(f"[green]✓ Database backup saved to {backup_file}[/green]") | ||
|
|
||
| rprint("[bold blue]Exporting attachments...[/bold blue]") | ||
| url = f"{API_URL}/serdes/export-attachments/" | ||
| res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) | ||
|
|
||
| if res.status_code == 404: | ||
| rprint("[yellow]No attachments found, skipping attachments backup[/yellow]") | ||
| elif res.status_code != 200: | ||
| rprint( | ||
| f"[bold red]Error exporting attachments: {res.status_code}[/bold red]", | ||
| file=sys.stderr, | ||
| ) | ||
| rprint(res.text, file=sys.stderr) | ||
| sys.exit(1) | ||
| else: | ||
| attachments_file = dest_path / "attachments.zip" | ||
| with open(attachments_file, "wb") as f: | ||
| f.write(res.content) | ||
| rprint(f"[green]✓ Attachments backup saved to {attachments_file}[/green]") | ||
|
|
||
| rprint("[bold green]Full backup completed successfully![/bold green]") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backup_full: add timeouts + stream downloads to avoid OOM on large backups.
Current res.content buffers the full DB/ZIP in memory.
def backup_full(dest_dir):
@@
- res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE)
+ res = requests.get(
+ url,
+ headers=headers,
+ verify=VERIFY_CERTIFICATE,
+ stream=True,
+ timeout=(5, 300),
+ )
@@
- with open(backup_file, "wb") as f:
- f.write(res.content)
+ with open(backup_file, "wb") as f:
+ for chunk in res.iter_content(chunk_size=1024 * 1024):
+ if chunk:
+ f.write(chunk)
@@
- res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE)
+ res = requests.get(
+ url,
+ headers=headers,
+ verify=VERIFY_CERTIFICATE,
+ stream=True,
+ timeout=(5, 300),
+ )
@@
- with open(attachments_file, "wb") as f:
- f.write(res.content)
+ with open(attachments_file, "wb") as f:
+ for chunk in res.iter_content(chunk_size=1024 * 1024):
+ if chunk:
+ f.write(chunk)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @click.command() | |
| @click.option( | |
| "--dest-dir", | |
| required=True, | |
| help="Destination directory to save backup files", | |
| type=click.Path(file_okay=False, dir_okay=True), | |
| ) | |
| def backup_full(dest_dir): | |
| """Create a full backup including database and attachments""" | |
| if not TOKEN: | |
| print( | |
| "No authentication token available. Please set PAT token in .clica.env.", | |
| file=sys.stderr, | |
| ) | |
| sys.exit(1) | |
| headers = {"Authorization": f"Token {TOKEN}"} | |
| dest_path = Path(dest_dir) | |
| dest_path.mkdir(parents=True, exist_ok=True) | |
| rprint("[bold blue]Exporting database backup...[/bold blue]") | |
| url = f"{API_URL}/serdes/dump-db/" | |
| res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) | |
| if res.status_code != 200: | |
| rprint( | |
| f"[bold red]Error exporting database: {res.status_code}[/bold red]", | |
| file=sys.stderr, | |
| ) | |
| rprint(res.text, file=sys.stderr) | |
| sys.exit(1) | |
| backup_file = dest_path / "backup.json.gz" | |
| with open(backup_file, "wb") as f: | |
| f.write(res.content) | |
| rprint(f"[green]✓ Database backup saved to {backup_file}[/green]") | |
| rprint("[bold blue]Exporting attachments...[/bold blue]") | |
| url = f"{API_URL}/serdes/export-attachments/" | |
| res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) | |
| if res.status_code == 404: | |
| rprint("[yellow]No attachments found, skipping attachments backup[/yellow]") | |
| elif res.status_code != 200: | |
| rprint( | |
| f"[bold red]Error exporting attachments: {res.status_code}[/bold red]", | |
| file=sys.stderr, | |
| ) | |
| rprint(res.text, file=sys.stderr) | |
| sys.exit(1) | |
| else: | |
| attachments_file = dest_path / "attachments.zip" | |
| with open(attachments_file, "wb") as f: | |
| f.write(res.content) | |
| rprint(f"[green]✓ Attachments backup saved to {attachments_file}[/green]") | |
| rprint("[bold green]Full backup completed successfully![/bold green]") | |
| @click.command() | |
| @click.option( | |
| "--dest-dir", | |
| required=True, | |
| help="Destination directory to save backup files", | |
| type=click.Path(file_okay=False, dir_okay=True), | |
| ) | |
| def backup_full(dest_dir): | |
| """Create a full backup including database and attachments""" | |
| if not TOKEN: | |
| print( | |
| "No authentication token available. Please set PAT token in .clica.env.", | |
| file=sys.stderr, | |
| ) | |
| sys.exit(1) | |
| headers = {"Authorization": f"Token {TOKEN}"} | |
| dest_path = Path(dest_dir) | |
| dest_path.mkdir(parents=True, exist_ok=True) | |
| rprint("[bold blue]Exporting database backup...[/bold blue]") | |
| url = f"{API_URL}/serdes/dump-db/" | |
| res = requests.get( | |
| url, | |
| headers=headers, | |
| verify=VERIFY_CERTIFICATE, | |
| stream=True, | |
| timeout=(5, 300), | |
| ) | |
| if res.status_code != 200: | |
| rprint( | |
| f"[bold red]Error exporting database: {res.status_code}[/bold red]", | |
| file=sys.stderr, | |
| ) | |
| rprint(res.text, file=sys.stderr) | |
| sys.exit(1) | |
| backup_file = dest_path / "backup.json.gz" | |
| with open(backup_file, "wb") as f: | |
| for chunk in res.iter_content(chunk_size=1024 * 1024): | |
| if chunk: | |
| f.write(chunk) | |
| rprint(f"[green]✓ Database backup saved to {backup_file}[/green]") | |
| rprint("[bold blue]Exporting attachments...[/bold blue]") | |
| url = f"{API_URL}/serdes/export-attachments/" | |
| res = requests.get( | |
| url, | |
| headers=headers, | |
| verify=VERIFY_CERTIFICATE, | |
| stream=True, | |
| timeout=(5, 300), | |
| ) | |
| if res.status_code == 404: | |
| rprint("[yellow]No attachments found, skipping attachments backup[/yellow]") | |
| elif res.status_code != 200: | |
| rprint( | |
| f"[bold red]Error exporting attachments: {res.status_code}[/bold red]", | |
| file=sys.stderr, | |
| ) | |
| rprint(res.text, file=sys.stderr) | |
| sys.exit(1) | |
| else: | |
| attachments_file = dest_path / "attachments.zip" | |
| with open(attachments_file, "wb") as f: | |
| for chunk in res.iter_content(chunk_size=1024 * 1024): | |
| if chunk: | |
| f.write(chunk) | |
| rprint(f"[green]✓ Attachments backup saved to {attachments_file}[/green]") | |
| rprint("[bold green]Full backup completed successfully![/bold green]") |
🤖 Prompt for AI Agents
In cli/clica.py around lines 450 to 507, replace the current requests.get(...);
writing of res.content with streaming + timeouts: call requests.get with a
timeout (e.g. timeout=(connect_timeout, read_timeout)) and stream=True for both
the DB and attachments endpoints, check the response status code before
streaming, and write the response to file using
res.iter_content(chunk_size=<reasonable size>) inside a with-open, so you never
access res.content; ensure you close the response (or use it as a context
manager) and keep existing headers/verify args and existing 404 and error
handling logic.
| @click.command() | ||
| @click.option( | ||
| "--src-dir", | ||
| required=True, | ||
| help="Source directory containing backup files", | ||
| type=click.Path(exists=True, file_okay=False, dir_okay=True), | ||
| ) | ||
| def restore_full(src_dir): | ||
| """Restore a full backup including database and attachments""" | ||
| if not TOKEN: | ||
| print( | ||
| "No authentication token available. Please set PAT token in .clica.env.", | ||
| file=sys.stderr, | ||
| ) | ||
| sys.exit(1) | ||
|
|
||
| headers = {"Authorization": f"Token {TOKEN}"} | ||
| src_path = Path(src_dir) | ||
|
|
||
| # Check for required backup file | ||
| backup_file = src_path / "backup.json.gz" | ||
| if not backup_file.exists(): | ||
| rprint( | ||
| f"[bold red]Error: backup.json.gz not found in {src_dir}[/bold red]", | ||
| file=sys.stderr, | ||
| ) | ||
| sys.exit(1) | ||
|
|
||
| attachments_file = src_path / "attachments.zip" | ||
| has_attachments = attachments_file.exists() | ||
|
|
||
| rprint("[bold blue]Starting full restore (database + attachments)...[/bold blue]") | ||
|
|
||
| # Prepare multipart form data | ||
| files = { | ||
| "backup": ("backup.json.gz", open(backup_file, "rb"), "application/gzip"), | ||
| } | ||
|
|
||
| if has_attachments: | ||
| files["attachments"] = ( | ||
| "attachments.zip", | ||
| open(attachments_file, "rb"), | ||
| "application/zip", | ||
| ) | ||
| rprint("[dim]Including attachments in restore[/dim]") | ||
| else: | ||
| rprint("[yellow]No attachments.zip found, restoring database only[/yellow]") | ||
|
|
||
| url = f"{API_URL}/serdes/full-restore/" | ||
| try: | ||
| res = requests.post( | ||
| url, headers=headers, files=files, verify=VERIFY_CERTIFICATE | ||
| ) | ||
| finally: | ||
| for file_tuple in files.values(): | ||
| file_tuple[1].close() | ||
|
|
||
| if res.status_code != 200: | ||
| rprint( | ||
| f"[bold red]Error during restore: {res.status_code}[/bold red]", | ||
| file=sys.stderr, | ||
| ) | ||
| try: | ||
| error_data = res.json() | ||
| rprint(f"[red]{error_data}[/red]", file=sys.stderr) | ||
| except: | ||
| rprint(res.text, file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| result = res.json() | ||
|
|
||
| rprint(f"[green]✓ Database restored successfully[/green]") | ||
|
|
||
| if has_attachments: | ||
| if "attachments_restored" in result: | ||
| restored = result["attachments_restored"] | ||
| processed = result.get("attachments_processed", restored) | ||
| rprint( | ||
| f"[green]✓ Attachments restored: {restored} of {processed} processed[/green]" | ||
| ) | ||
|
|
||
| if result.get("attachment_errors"): | ||
| rprint("[yellow]Errors encountered:[/yellow]") | ||
| for error in result["attachment_errors"]: | ||
| rprint(f" - {error}") | ||
|
|
||
| if result.get("status") == "partial_success": | ||
| rprint("[bold yellow]Full restore completed with warnings[/bold yellow]") | ||
| else: | ||
| rprint("[bold green]Full restore completed successfully![/bold green]") | ||
|
|
||
| rprint("[dim]Note: You may need to regenerate your Personal Access Token[/dim]") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restore_full: add timeout; avoid bare except: when parsing error responses.
- res = requests.post(
- url, headers=headers, files=files, verify=VERIFY_CERTIFICATE
- )
+ res = requests.post(
+ url,
+ headers=headers,
+ files=files,
+ verify=VERIFY_CERTIFICATE,
+ timeout=(5, 600),
+ )
@@
- except:
+ except ValueError:
rprint(res.text, file=sys.stderr)🤖 Prompt for AI Agents
In cli/clica.py around lines 509 to 601, the restore_full HTTP call lacks a
timeout and the error parsing uses a bare except; change the requests.post call
to include a timeout (e.g. timeout=REQUEST_TIMEOUT or timeout=30) and keep
verify=headers as-is, and replace the bare except when calling res.json() with a
specific exception handler (except ValueError as e:) to handle JSON decode
errors and fall back to printing res.text; ensure files are still closed in the
finally block.
eric-intuitem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job. To finish it:
- unzip the attachments
- the default directory shall be db (create it if missing)
- the db should be restored in sqlite3 format (ciso-assistant.sqlite3)
Thanks!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/core/models.py (1)
3381-3385: Use storedattachment_hashin Evidence property and avoid redundant logger initialization.Evidence.attachment_hash (lines 3381–3385) currently reads the entire attachment file in a single operation using
.read(), ignoring the newEvidenceRevision.attachment_hashstored field that is now computed efficiently with chunked I/O. Update the property to:
- Return the stored hash from
last_revision.attachment_hashwhen available- Fall back to chunked computation only for legacy rows without a stored hash
@property def attachment_hash(self): - if not self.last_revision or not self.last_revision.attachment: - return None - return hashlib.sha256(self.last_revision.attachment.read()).hexdigest() + last = self.last_revision + if not last: + return None + + # Prefer stored hash on the revision if present + if getattr(last, "attachment_hash", None): + return last.attachment_hash + + # Fallback for legacy rows: compute on the fly (streamed) + if not last.attachment: + return None + + hash_obj = hashlib.sha256() + for chunk in last.attachment.chunks(): + hash_obj.update(chunk) + return hash_obj.hexdigest()Also, in EvidenceRevision.save() around line 3487: remove the logger re-instantiation in the except block. A module-level logger is already imported; reuse it instead of calling
get_logger(__name__)inside the exception handler.
♻️ Duplicate comments (4)
cli/clica.py (2)
488-505: backup_full still buffers DB in memory and lacks timeouts on heavy HTTP calls.
requests.getfor/serdes/dump-db/(Line 491) downloads the entire DB backup into memory viares.content, and has no timeout.requests.getfor/serdes/attachment-metadata/(Line 527) andrequests.postfor/serdes/batch-download-attachments/(Line 580) also have no timeouts.- For large instances this can hang indefinitely and/or OOM the CLI process.
Consider:
- Adding
stream=True+ a reasonable timeout to the DB export and attachments batch calls.- Writing the DB backup to disk via
iter_contentinstead ofres.content.- Adding timeouts to metadata pagination requests.
Example for the DB export section:
- url = f"{API_URL}/serdes/dump-db/" - res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) - - if res.status_code != 200: - rprint( - f"[bold red]Error exporting database: {res.status_code}[/bold red]", - file=sys.stderr, - ) - rprint(res.text, file=sys.stderr) - sys.exit(1) - - backup_file = dest_path / "backup.json.gz" - with open(backup_file, "wb") as f: - f.write(res.content) + url = f"{API_URL}/serdes/dump-db/" + with requests.get( + url, + headers=headers, + verify=VERIFY_CERTIFICATE, + stream=True, + timeout=(5, 300), + ) as res: + if res.status_code != 200: + rprint( + f"[bold red]Error exporting database: {res.status_code}[/bold red]", + file=sys.stderr, + ) + rprint(res.text, file=sys.stderr) + sys.exit(1) + + backup_file = dest_path / "backup.json.gz" + with open(backup_file, "wb") as f: + for chunk in res.iter_content(chunk_size=1024 * 1024): + if chunk: + f.write(chunk)And for metadata + batch download:
- res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) + res = requests.get( + url, + headers=headers, + verify=VERIFY_CERTIFICATE, + timeout=(5, 60), + ) @@ - res = requests.post( + res = requests.post( url, headers=headers, json={"revision_ids": batch_ids}, verify=VERIFY_CERTIFICATE, - stream=True, + stream=True, + timeout=(5, 600), )Also applies to: 521-545, 570-588
710-801: restore_full: high memory usage when building attachments.dat, missing timeout, and bareexcepton JSON errors.
Memory/OOM risk
restore_fullreads all attachment files into memory and concatenates them before writing to the temp file:body_parts = [] ... file_bytes = f.read() ... body_parts.append(struct.pack(">I", total_size)) body_parts.append(header_bytes) body_parts.append(file_bytes) ... temp_attachments.write(b"".join(body_parts))For large backups this is effectively
O(total_attachments_size)RAM and can easily OOM.You can avoid this by streaming per file and using
Path.stat()(oros.path.getsize) to compute the length prefix:
# Build streaming data for ALL attachmentsbody_parts = []for file_info in to_upload:# Read filewith open(file_info["path"], "rb") as f:file_bytes = f.read()# Build JSON headerheader = {"id": file_info["id"],"evidence_id": file_info["evidence_id"],"version": file_info["version"],"filename": file_info["filename"],"hash": file_info["hash"],"size": len(file_bytes),}header_bytes = json.dumps(header).encode("utf-8")# Calculate total size: header + filetotal_size = len(header_bytes) + len(file_bytes)# Add: [4-byte length][header][file]body_parts.append(struct.pack(">I", total_size))body_parts.append(header_bytes)body_parts.append(file_bytes)# Create temporary file for attachments binary datatemp_attachments = tempfile.NamedTemporaryFile(suffix=".dat", delete=False)temp_attachments.write(b"".join(body_parts))temp_attachments.close()attachments_data = temp_attachments.namerprint(f"[dim]Prepared {len(to_upload)} attachments for upload ({sum(len(p) for p in body_parts) / 1024 / 1024:.1f} MB)[/dim]")
temp_attachments = tempfile.NamedTemporaryFile(suffix=".dat", delete=False)total_bytes = 0for file_info in to_upload:file_path = file_info["path"]file_size = file_path.stat().st_sizeheader = {"id": file_info["id"],"evidence_id": file_info["evidence_id"],"version": file_info["version"],"filename": file_info["filename"],"hash": file_info["hash"],"size": file_size,}header_bytes = json.dumps(header).encode("utf-8")total_size = len(header_bytes) + file_size# [4-byte length][header][file]temp_attachments.write(struct.pack(">I", total_size))temp_attachments.write(header_bytes)with open(file_path, "rb") as f:for chunk in iter(lambda: f.read(1024 * 1024), b""):temp_attachments.write(chunk)total_bytes += total_sizetemp_attachments.flush()temp_attachments.close()attachments_data = temp_attachments.namerprint(f"[dim]Prepared {len(to_upload)} attachments for upload ({total_bytes / 1024 / 1024:.1f} MB)[/dim]")
No timeout on restore request
Therequests.postto/serdes/full-restore/(Line 821) has no timeout; a stuck backend could hang the CLI indefinitely. Add a reasonable timeout:
res = requests.post(url,headers=headers,files=files,verify=VERIFY_CERTIFICATE,)
res = requests.post(url,headers=headers,files=files,verify=VERIFY_CERTIFICATE,timeout=(5, 600),)
Bare
exceptwhen parsing error response
The error path still uses a bareexcept:try: error_data = res.json() rprint(f"[red]{error_data}[/red]", file=sys.stderr) except: rprint(res.text, file=sys.stderr)This can mask unrelated errors and makes debugging harder. Narrow it to JSON decode errors:
try:error_data = res.json()rprint(f"[red]{error_data}[/red]", file=sys.stderr)except:rprint(res.text, file=sys.stderr)
try:error_data = res.json()rprint(f"[red]{error_data}[/red]", file=sys.stderr)except ValueError:# Not JSON, fallback to raw textrprint(res.text, file=sys.stderr)These changes will make full restores more robust on large datasets and avoid CLI hangs or silent masking of non‑JSON errors.
Also applies to: 802-872
backend/serdes/views.py (2)
279-281: Consider streaming for large ZIP exports.Returning
zip_buffer.getvalue()loads the entire ZIP into memory. For instances with many/large attachments, consider usingStreamingHttpResponseor a spooled temp file to avoid OOM.This was flagged in a previous review. The current implementation works but may not scale for very large attachment sets.
439-455: CatchValidationErrorfromcompare_schema_versions.
compare_schema_versionsraisesValidationErrorfor schema version mismatches (perbackend/core/utils.pylines 179-265), but onlyValueErrorandTypeErrorare caught here. This causes version mismatches to bubble up to the generic exception handler at line 490, returning a 500 instead of a proper 400 with the error code.+from django.core.exceptions import ValidationError ... try: schema_version_int = int(schema_version) compare_schema_versions(schema_version_int, backup_version) if backup_version != VERSION: raise ValueError( "The version of the current instance and the one that generated the backup are not the same." ) - except (ValueError, TypeError) as e: + except (ValidationError, ValueError, TypeError) as e: logger.error( "Invalid schema version format", schema_version=schema_version, exc_info=e, ) + error_code = "InvalidSchemaVersion" + if isinstance(e, ValidationError): + try: + error_code = e.message_dict.get("error", error_code) + if isinstance(error_code, list): + error_code = error_code[0] + except Exception: + pass return Response( - {"error": "InvalidSchemaVersion"}, + {"error": error_code}, status=status.HTTP_400_BAD_REQUEST, )
🧹 Nitpick comments (6)
cli/.clica.env.template (1)
1-3: VERIFY_CERTIFICATE default here weakens TLS verification compared to code default.The CLI defaults
VERIFY_CERTIFICATEtotruewhen unset, but this template sets it tofalse, which will disable TLS verification for anyone copying it as‑is. Consider either:
- defaulting to
truein the template and documenting how to turn it off for local/self‑signed setups, or- adding a strong warning comment in this template about using
falseonly for trusted local environments.cli/README.md (1)
319-322: Fix blank line between blockquotes (MD028).There's a blank line between the WARNING and TIP blockquotes which triggers markdownlint MD028. Remove the blank line at line 320.
> [!WARNING] > Restoring a backup will **replace all existing data** in your CISO Assistant instance. Make sure you have a current backup before performing a restore operation. - > [!TIP]backend/serdes/export_utils.py (3)
34-46: Use POSIX paths for ZIP entries and consider streaming writes.
os.path.joinproduces backslashes on Windows, which violates the ZIP specification requiring forward slashes. Also,file.read()loads the entire file into memory.+import posixpath +import shutil ... try: with default_storage.open(revision.attachment.name, "rb") as file: - file_content = file.read() - filename = ( f"{revision.evidence_id}_v{revision.version}_" f"{os.path.basename(revision.attachment.name)}" ) - zip_path = os.path.join( + zip_path = posixpath.join( "attachments", "evidence-revisions", filename ) - zipf.writestr(zip_path, file_content) + with zipf.open(zip_path, "w") as zf: + shutil.copyfileobj(file, zf, length=1024 * 1024) count += 1
145-150: Make restore idempotent by deleting existing files before save.
default_storage.save()may auto-rename files if the path exists, causing orphaned files and path drift on repeated restores.storage_path = ( f"evidence-revisions/{evidence_id}/" f"v{version}/{original_filename}" ) + # Delete existing file for idempotent restore + if revision.attachment: + try: + default_storage.delete(revision.attachment.name) + except Exception: + pass + if default_storage.exists(storage_path): + try: + default_storage.delete(storage_path) + except Exception: + pass + saved_path = default_storage.save( storage_path, io.BytesIO(file_content) )
159-162: Avoid exposing raw exception details in API responses.
str(e)may contain sensitive internal information. Log the full exception server-side and return a sanitized message.except Exception as e: + logger.error( + "Failed to restore attachment", + filename=filename, + evidence_id=evidence_id, + version=version, + error=str(e), + exc_info=True, + ) stats["errors"].append( - f"Failed to restore {filename}: {str(e)}" + f"Failed to restore {filename}" )backend/serdes/views.py (1)
870-884: File is fully loaded into memory before streaming.Despite the comment "Read file in chunks", the entire file is buffered in
file_bufferbefore yielding. For large files, this defeats the streaming benefit. However, fixing this requires knowing total_size upfront for the length prefix, which complicates true streaming.One approach: compute size via
default_storage.size()first, then stream chunks directly:file_size = default_storage.size(revision.attachment.name) total_size = len(header_bytes) + file_size yield struct.pack(">I", total_size) yield header_bytes with default_storage.open(revision.attachment.name, "rb") as f: for chunk in iter(lambda: f.read(1024 * 1024), b""): yield chunk
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backend/core/migrations/0119_add_attachment_hash_to_evidencerevision.py(1 hunks)backend/core/models.py(3 hunks)backend/serdes/export_utils.py(1 hunks)backend/serdes/urls.py(1 hunks)backend/serdes/views.py(4 hunks)cli/.clica.env.template(1 hunks)cli/.gitignore(1 hunks)cli/README.md(2 hunks)cli/clica.py(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- cli/.gitignore
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-19T08:46:34.495Z
Learnt from: Mohamed-Hacene
Repo: intuitem/ciso-assistant-community PR: 2550
File: backend/core/serializers.py:1277-1292
Timestamp: 2025-09-19T08:46:34.495Z
Learning: In EvidenceWriteSerializer, the link and attachment fields are read-only, so updates to these fields should be done through EvidenceRevisionWriteSerializer to maintain proper versioning.
Applied to files:
backend/core/models.py
📚 Learning: 2025-09-19T09:08:52.306Z
Learnt from: Mohamed-Hacene
Repo: intuitem/ciso-assistant-community PR: 2550
File: backend/core/migrations/0098_evidence_lifecycle_data_migration.py:21-28
Timestamp: 2025-09-19T09:08:52.306Z
Learning: EvidenceRevision.save() method automatically sets is_published from the parent Evidence object and handles folder assignment from the evidence's folder with proper safety checks.
Applied to files:
backend/core/models.py
🧬 Code graph analysis (4)
backend/core/models.py (1)
backend/core/utils.py (1)
sha256(37-41)
backend/serdes/views.py (4)
backend/serdes/export_utils.py (4)
AttachmentExporter(15-82)AttachmentImporter(85-196)create_attachments_zip(61-82)extract_attachments_from_zip(86-196)backend/core/views.py (5)
get(1005-1012)get(6597-6598)attachment(7156-7186)attachment(7203-7228)filter(190-216)backend/core/models.py (21)
EvidenceRevision(3388-3508)filename(3356-3361)filename(3489-3490)attachment_hash(3382-3385)delete(971-984)delete(5189-5196)save(1654-1662)save(1778-1781)save(2779-2781)save(3340-3342)save(3431-3487)save(3662-3666)save(3876-3900)save(4261-4263)save(4398-4401)save(4469-4471)save(5198-5228)save(5351-5357)save(6295-6315)save(6850-6863)save(7023-7039)backend/core/utils.py (2)
compare_schema_versions(180-266)sha256(37-41)
backend/core/migrations/0119_add_attachment_hash_to_evidencerevision.py (2)
backend/core/models.py (18)
EvidenceRevision(3388-3508)update(874-886)attachment_hash(3382-3385)save(1654-1662)save(1778-1781)save(2779-2781)save(3340-3342)save(3431-3487)save(3662-3666)save(3876-3900)save(4261-4263)save(4398-4401)save(4469-4471)save(5198-5228)save(5351-5357)save(6295-6315)save(6850-6863)save(7023-7039)backend/core/utils.py (1)
sha256(37-41)
backend/serdes/urls.py (1)
backend/serdes/views.py (5)
ExportAttachmentsView(250-303)FullRestoreView(387-703)AttachmentMetadataView(706-793)BatchDownloadAttachmentsView(796-917)BatchUploadAttachmentsView(920-1131)
🪛 markdownlint-cli2 (0.18.1)
cli/README.md
321-321: Blank line inside blockquote
(MD028, no-blanks-blockquote)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build_community_frontend
- GitHub Check: enterprise-startup-docker-compose-test
- GitHub Check: test (3.12)
- GitHub Check: startup-functional-test (3.12)
- GitHub Check: startup-docker-compose-test
- GitHub Check: enterprise-startup-functional-test (3.12)
- GitHub Check: build_enterprise_frontend
- GitHub Check: build (3.12)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
backend/core/migrations/0119_add_attachment_hash_to_evidencerevision.py (1)
8-43: Backfill migration looks well-designed.The chunked reading (1MB), iterator with chunk_size=100, and per-record error handling ensure the migration runs safely on large datasets without OOM. Progress logging every 100 records is helpful for monitoring.
One minor consideration: if a storage backend is slow or unavailable during migration, this could take a long time or fail silently for many records. Consider adding a summary warning if
errors > 0at the end.backend/serdes/urls.py (1)
12-37: URL patterns are well-structured.The new endpoints follow REST conventions and are logically grouped. The comment on line 22 helps clarify the streaming batch endpoints section.
cli/README.md (1)
225-268: Documentation for backup_full is comprehensive and clear.The explanation of the streaming approach, resume capability, and hash verification is well-documented. The technical details section provides useful context for understanding the protocol format.
backend/serdes/views.py (2)
60-64: Good fix: Excluding PAT from backup/restore.Adding
"iam.personalaccesstoken"to the exclude list addresses issue #2700 where restoring a backup with personal authentication tokens causedIntegrityError: NOT NULL constraint failed.
306-372: LoadAttachmentsView implementation looks correct.Proper permission check, file type validation, and partial success handling with 207 status for mixed results. Error details are logged server-side while the response provides a sanitized count-only summary.
| # Fetch all attachment metadata from API (paginated) | ||
| rprint("[dim]Fetching attachment metadata from server...[/dim]") | ||
| all_metadata = [] | ||
| url = f"{API_URL}/serdes/attachment-metadata/" | ||
|
|
||
| while url: | ||
| res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) | ||
|
|
||
| if res.status_code != 200: | ||
| rprint( | ||
| f"[bold red]Error fetching metadata: {res.status_code}[/bold red]", | ||
| file=sys.stderr, | ||
| ) | ||
| rprint(res.text, file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| data = res.json() | ||
| all_metadata.extend(data["results"]) | ||
| url = data.get("next") | ||
| if url and not url.startswith("http"): | ||
| # Convert relative URL to absolute | ||
| url = f"{API_URL}/serdes/attachment-metadata/{url}" | ||
|
|
||
| rprint(f"[cyan]Found {len(all_metadata)} total attachments[/cyan]") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the relevant code section
fd -e py -p "cli/clica.py" | head -1 | xargs -I {} sh -c 'wc -l {} && echo "---" && sed -n "515,550p" {}'Repository: intuitem/ciso-assistant-community
Length of output: 1434
🏁 Script executed:
# Check for API_URL definition
rg "API_URL\s*=" cli/clica.py -B 2 -A 2Repository: intuitem/ciso-assistant-community
Length of output: 289
🏁 Script executed:
# Look for other pagination patterns in the codebase
rg "\.get\(.*next" --type py -A 3 -B 1Repository: intuitem/ciso-assistant-community
Length of output: 1388
🏁 Script executed:
# Check if there are tests for this pagination logic
fd -e py -p "test" | xargs rg "attachment-metadata|pagination" 2>/dev/nullRepository: intuitem/ciso-assistant-community
Length of output: 61
🏁 Script executed:
# Find the attachment-metadata endpoint in the backend
rg "attachment-metadata" --type py | head -20Repository: intuitem/ciso-assistant-community
Length of output: 331
🏁 Script executed:
# Look for DRF pagination configuration
rg "DEFAULT_PAGINATION_CLASS|pagination" backend/ --type py -B 2 -A 2 | head -40Repository: intuitem/ciso-assistant-community
Length of output: 1067
🏁 Script executed:
# Search for the serializer or view handling attachment-metadata
fd -e py -p "backend" | xargs rg "attachment.metadata|AttachmentMetadata" | head -20Repository: intuitem/ciso-assistant-community
Length of output: 476
🏁 Script executed:
# Check Django REST Framework settings
fd -e py -p "backend.*settings" | xargs cat 2>/dev/null | grep -A 10 -B 2 "REST_FRAMEWORK"Repository: intuitem/ciso-assistant-community
Length of output: 1957
🏁 Script executed:
# Read the custom pagination class
cat -n backend/core/pagination.pyRepository: intuitem/ciso-assistant-community
Length of output: 919
🏁 Script executed:
# Check the AttachmentMetadataView implementation
rg "class AttachmentMetadataView" backend/serdes/views.py -A 50Repository: intuitem/ciso-assistant-community
Length of output: 1890
🏁 Script executed:
# Get the complete AttachmentMetadataView to see how it returns paginated data
rg "class AttachmentMetadataView" backend/serdes/views.py -A 100 | head -80Repository: intuitem/ciso-assistant-community
Length of output: 3012
🏁 Script executed:
# Check what the paginator.get_paginated_response returns
rg "get_paginated_response" backend/serdes/views.py -B 3 -A 3Repository: intuitem/ciso-assistant-community
Length of output: 59
URL reconstruction logic doesn't safely handle absolute-path pagination links.
The code assumes next is either an absolute URL or query-only (?page=2), but if the backend ever returns an absolute path like /serdes/attachment-metadata/?page=2, the concatenation creates a malformed URL:
url = f"{API_URL}/serdes/attachment-metadata/{url}" # → {API_URL}/serdes/attachment-metadata//serdes/attachment-metadata/?page=2Use urllib.parse.urljoin or check if the URL starts with / to handle both cases correctly.
🤖 Prompt for AI Agents
In cli/clica.py around lines 521 to 545, the pagination URL reconstruction
blindly prepends API_URL plus the path, which breaks when the server returns an
absolute-path next value (e.g. "/serdes/attachment-metadata/?page=2"); update
the logic to build the next URL safely by using urllib.parse.urljoin(API_URL,
next) or by checking for a leading slash and concatenating appropriately (i.e.,
if next startswith "http" use it as-is, elif next.startswith("/") use
API_URL.rstrip("/") + next, else join the base path and next), so both
relative-query and absolute-path pagination links produce valid absolute URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
backend/serdes/export_utils.py (2)
26-59: [Duplicate] Exporter: still reading full files into memory and using OS-specific path separators.The past review comment on lines 26-48 identified that this code reads entire attachment files into memory (line 35) and uses
os.path.join(lines 42-44) which produces backslashes on Windows. These issues remain unaddressed and can cause out-of-memory errors for large attachments and cross-platform compatibility issues.Apply the previously suggested fix:
import io import os +import posixpath +import shutil import zipfiletry: with default_storage.open(revision.attachment.name, "rb") as file: - file_content = file.read() - filename = ( f"{revision.evidence_id}_v{revision.version}_" f"{os.path.basename(revision.attachment.name)}" ) - zip_path = os.path.join( + zip_path = posixpath.join( "attachments", "evidence-revisions", filename ) - zipf.writestr(zip_path, file_content) + with zipf.open(zip_path, "w") as zf: + shutil.copyfileobj(file, zf, length=1024 * 1024) count += 1
108-174: [Duplicate] Importer: restore is not idempotent and reads full files into memory.The past review comment on lines 108-174 identified two issues that remain unaddressed:
- Non-idempotent restores: At line 145,
default_storage.save()will auto-rename if the path exists, leaving orphaned files and changing paths on each restore run.- Full in-memory reads: Line 138 reads the entire file content into memory with
zipf.read().Apply the previously suggested fix for idempotency:
if not dry_run: # Find the corresponding EvidenceRevision try: revision = EvidenceRevision.objects.get( evidence_id=evidence_id, version=version ) file_content = zipf.read(file_path) storage_path = ( f"evidence-revisions/{evidence_id}/" f"v{version}/{original_filename}" ) + # Prefer overwrite semantics for idempotent restores + try: + if revision.attachment: + default_storage.delete(revision.attachment.name) + except Exception: + logger.warning( + "Failed to delete previous attachment", + revision_id=revision.id, + attachment_name=getattr(revision.attachment, "name", None), + ) + try: + if default_storage.exists(storage_path): + default_storage.delete(storage_path) + except Exception: + logger.warning( + "Failed to delete target path before restore", + storage_path=storage_path, + ) + saved_path = default_storage.save( storage_path, io.BytesIO(file_content) )For the memory issue, consider streaming from the ZIP directly to storage to avoid loading entire files into RAM.
backend/serdes/views.py (2)
250-303: [Duplicate] ExportAttachmentsView: large in-memory ZIP can cause OOM.The past review comment identified that returning
zip_buffer.getvalue()(line 280) loads the entire ZIP into memory, risking out-of-memory errors for large evidence sets. This issue remains unaddressed.Consider using a
SpooledTemporaryFileor streaming response:+import tempfile +from django.http import FileResponsetry: exporter = AttachmentExporter() - zip_buffer, count = exporter.create_attachments_zip() + revisions = exporter.collect_all_attachments() - if count == 0: + if revisions.count() == 0: logger.info("No attachments found to export") return Response( {"message": "No attachments found"}, status=status.HTTP_404_NOT_FOUND, ) # Create response with ZIP file timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") filename = f"ciso-assistant-attachments-{timestamp}.zip" - response = HttpResponse( - zip_buffer.getvalue(), content_type="application/zip" - ) + # Use temp file to avoid OOM + temp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024) + with zipfile.ZipFile(temp_file, "w", zipfile.ZIP_DEFLATED) as zipf: + count = exporter.package_attachments_to_zip(revisions, zipf) + temp_file.seek(0) + + response = FileResponse( + temp_file, content_type="application/zip", as_attachment=True + ) response["Content-Disposition"] = f'attachment; filename="{filename}"'
439-455: [Duplicate] FullRestoreView: catch ValidationError from compare_schema_versions.The past review comment noted that
compare_schema_versions(line 441) can raiseValidationErroron schema incompatibility, but the current code only catchesValueErrorandTypeError(line 446). This means validation errors will bubble to the outer exception handler and return an HTTP 500 instead of the more appropriate HTTP 400.Apply the suggested fix:
+from django.core.exceptions import ValidationErrortry: schema_version_int = int(schema_version) compare_schema_versions(schema_version_int, backup_version) if backup_version != VERSION: raise ValueError( "The version of the current instance and the one that generated the backup are not the same." ) - except (ValueError, TypeError) as e: + except (ValidationError, ValueError, TypeError) as e: logger.error( "Invalid schema version format", schema_version=schema_version, exc_info=e, ) return Response( {"error": "InvalidSchemaVersion"}, status=status.HTTP_400_BAD_REQUEST, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/serdes/export_utils.py(1 hunks)backend/serdes/views.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/serdes/export_utils.py (2)
backend/core/models.py (19)
Evidence(3305-3385)EvidenceRevision(3388-3508)filename(3356-3361)filename(3489-3490)save(1654-1662)save(1778-1781)save(2779-2781)save(3340-3342)save(3431-3487)save(3662-3666)save(3876-3900)save(4261-4263)save(4398-4401)save(4469-4471)save(5198-5228)save(5351-5357)save(6295-6315)save(6850-6863)save(7023-7039)backend/serdes/views.py (3)
get(39-73)get(251-303)get(712-793)
backend/serdes/views.py (5)
backend/serdes/export_utils.py (4)
AttachmentExporter(15-82)AttachmentImporter(85-194)create_attachments_zip(61-82)extract_attachments_from_zip(86-194)backend/core/models.py (19)
EvidenceRevision(3388-3508)filename(3356-3361)filename(3489-3490)attachment_hash(3382-3385)save(1654-1662)save(1778-1781)save(2779-2781)save(3340-3342)save(3431-3487)save(3662-3666)save(3876-3900)save(4261-4263)save(4398-4401)save(4469-4471)save(5198-5228)save(5351-5357)save(6295-6315)save(6850-6863)save(7023-7039)backend/core/utils.py (2)
compare_schema_versions(180-266)sha256(37-41)backend/serdes/serializers.py (1)
LoadBackupSerializer(20-24)backend/iam/models.py (1)
has_backup_permission(729-734)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build_community_frontend
- GitHub Check: build_enterprise_frontend
- GitHub Check: enterprise-startup-docker-compose-test
- GitHub Check: startup-docker-compose-test
- GitHub Check: build (3.12)
- GitHub Check: enterprise-startup-functional-test (3.12)
- GitHub Check: startup-functional-test (3.12)
- GitHub Check: test (3.12)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
backend/serdes/views.py (3)
60-60: Good fix for personal authentication token issue.Excluding
iam.personalaccesstokenfrom the backup export addresses issue #2700 where importing a database with personal authentication tokens caused integrity constraint failures.Based on PR objectives: This change resolves the reported issue where exports containing users with personal authentication tokens would fail to import with an IntegrityError.
306-384: LoadAttachmentsView: good error handling and sanitization.The error handling properly avoids exposing raw exception details to clients (lines 360, 382) while logging full details server-side with
exc_info=True. The use of HTTP 207 for partial success is appropriate.
1098-1118: BatchUploadAttachmentsView: proper error sanitization.The error handling properly sanitizes exception details before returning to clients (lines 1100, 1115) while logging full details with
exc_info=True. This addresses the previous security concerns about information exposure.
| # Read file in chunks | ||
| file_buffer = io.BytesIO() | ||
| with default_storage.open(revision.attachment.name, "rb") as f: | ||
| for chunk in iter(lambda: f.read(10240 * 1024), b""): | ||
| file_buffer.write(chunk) | ||
|
|
||
| file_bytes = file_buffer.getvalue() | ||
|
|
||
| # Calculate total size: header_length + header + file | ||
| total_size = len(header_bytes) + len(file_bytes) | ||
|
|
||
| # Yield: [4-byte total size][header bytes][file bytes] | ||
| yield struct.pack(">I", total_size) | ||
| yield header_bytes | ||
| yield file_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchDownloadAttachmentsView: avoid buffering entire file before streaming.
Lines 870-876 read the entire attachment into an io.BytesIO buffer before yielding, which defeats the purpose of streaming and can cause memory issues for large files.
Stream the file content directly:
- # Read file in chunks
- file_buffer = io.BytesIO()
+ # Calculate file size first
+ file_size = default_storage.size(revision.attachment.name)
+
+ # Calculate total size: header + file
+ total_size = len(header_bytes) + file_size
+
+ # Yield: [4-byte total size][header bytes]
+ yield struct.pack(">I", total_size)
+ yield header_bytes
+
+ # Stream file in chunks
with default_storage.open(revision.attachment.name, "rb") as f:
- for chunk in iter(lambda: f.read(10240 * 1024), b""):
- file_buffer.write(chunk)
-
- file_bytes = file_buffer.getvalue()
-
- # Calculate total size: header_length + header + file
- total_size = len(header_bytes) + len(file_bytes)
-
- # Yield: [4-byte total size][header bytes][file bytes]
- yield struct.pack(">I", total_size)
- yield header_bytes
- yield file_bytes
+ for chunk in iter(lambda: f.read(1024 * 1024), b""):
+ yield chunk🤖 Prompt for AI Agents
In backend/serdes/views.py around lines 870 to 884, the code currently reads the
entire attachment into an io.BytesIO before yielding which uses excessive
memory; instead obtain the remote file size via
default_storage.size(revision.attachment.name), compute total_size =
len(header_bytes) + file_size, yield the 4-byte total size and header_bytes,
then open the file with default_storage.open(revision.attachment.name, "rb") and
stream its contents by iterating over f.read(CHUNK_SIZE) and yielding each chunk
(no in-memory buffering); remove the temporary BytesIO and ensure CHUNK_SIZE
matches previous value (e.g., 10*1024*1024).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (6)
cli/clica.py (3)
488-504: Database backup still buffers entire response in memory and lacks timeout.The
res.contentapproach loads the entire database backup into memory before writing. For large instances this can cause OOM. Additionally, no timeout is specified.Apply streaming and timeout as suggested in the previous review:
- res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) + res = requests.get( + url, + headers=headers, + verify=VERIFY_CERTIFICATE, + stream=True, + timeout=(5, 300), + ) if res.status_code != 200: ... backup_file = dest_path / "backup.json.gz" with open(backup_file, "wb") as f: - f.write(res.content) + for chunk in res.iter_content(chunk_size=1024 * 1024): + if chunk: + f.write(chunk)
537-553: Missing timeout on metadata fetch; URL reconstruction still flawed.The pagination requests lack timeouts. Additionally, the URL handling logic doesn't correctly handle absolute-path responses (e.g.,
/serdes/attachment-metadata/?page=2).+from urllib.parse import urljoin + while url: - res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) + res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE, timeout=(5, 60)) ... url = data.get("next") - if url and not url.startswith("http"): - # Convert relative URL to absolute - url = f"{API_URL}/serdes/attachment-metadata/{url}" + if url and not url.startswith("http"): + url = urljoin(API_URL + "/", url)
845-871: Restore request lacks timeout; bareexcept:catches too broadly.These issues were flagged in the previous review and remain unaddressed.
res = requests.post( url, headers=headers, files=files, verify=VERIFY_CERTIFICATE, + timeout=(5, 600), ) ... try: error_data = res.json() rprint(f"[red]{error_data}[/red]", file=sys.stderr) - except: + except ValueError: rprint(res.text, file=sys.stderr)backend/serdes/views.py (3)
279-288: Memory concern with full ZIP in RAM remains unaddressed.The
zip_buffer.getvalue()approach loads the entire ZIP into memory, which risks OOM on large attachment sets. This was flagged in a previous review.
870-884: File buffering defeats streaming purpose - previously flagged.The entire attachment is read into
file_bufferBytesIO before yielding, which defeats the purpose of streaming and can cause memory issues for large files. This was flagged in a previous review with a suggested fix to calculate file size first and stream directly.
439-455:ValidationErrorfromcompare_schema_versionsnot caught.The
compare_schema_versionsfunction raisesValidationErrorfor version mismatches (e.g.,{"error": "backupGreaterVersionError"}), but onlyValueErrorandTypeErrorare caught here. A version mismatch will fall through to the outerexcept Exceptionat line 490 and return a genericDatabaseRestoreFailed(500) instead of the appropriateInvalidSchemaVersion(400).Apply this diff to handle
ValidationError:from core.utils import compare_schema_versions +from django.core.exceptions import ValidationErrortry: schema_version_int = int(schema_version) compare_schema_versions(schema_version_int, backup_version) if backup_version != VERSION: raise ValueError( "The version of the current instance and the one that generated the backup are not the same." ) - except (ValueError, TypeError) as e: + except ValidationError as e: + # Handle version validation errors from compare_schema_versions + error_code = "InvalidSchemaVersion" + if hasattr(e, 'message_dict') and 'error' in e.message_dict: + error_code = e.message_dict['error'] + if isinstance(error_code, list): + error_code = error_code[0] + logger.error( + "Schema version validation failed", + schema_version=schema_version, + error_code=error_code, + ) + return Response( + {"error": error_code}, + status=status.HTTP_400_BAD_REQUEST, + ) + except (ValueError, TypeError) as e: logger.error( "Invalid schema version format", schema_version=schema_version, exc_info=e, ) return Response( {"error": "InvalidSchemaVersion"}, status=status.HTTP_400_BAD_REQUEST, )
🧹 Nitpick comments (4)
cli/clica.py (1)
632-647: Header parsing is inefficient but functional.The byte-by-byte JSON parsing attempts up to 1024 decode operations per block. This works but is O(n²) in the worst case.
If the protocol can be modified, consider having the backend send the header length as a prefix (similar to the block length) to enable direct parsing. This would be a coordinated change with the backend.
backend/serdes/views.py (3)
502-667: Consider extracting shared attachment processing logic.The attachment restoration code here (binary parsing, header validation, hash verification, storage save) is largely duplicated in
BatchUploadAttachmentsView. Extracting this into a utility function would reduce maintenance burden and ensure consistent behavior.
749-778: Per-item storage I/O may impact performance on large result sets.Each iteration calls
default_storage.size()anddefault_storage.exists(), which may be slow depending on the storage backend (especially for S3/remote storage). For pagination with many items, consider caching file sizes in the model or limiting default page size.
980-988: Header parsing is functional but inefficient.The loop attempts to parse progressively longer byte sequences as JSON, which is O(n) in header length with n JSON parse attempts. Consider including a header length prefix in the binary format for more efficient parsing in future versions.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/serdes/views.py(4 hunks)cli/.clica.env(0 hunks)cli/clica.py(3 hunks)
💤 Files with no reviewable changes (1)
- cli/.clica.env
🧰 Additional context used
🧬 Code graph analysis (1)
backend/serdes/views.py (4)
backend/serdes/export_utils.py (4)
AttachmentExporter(15-82)AttachmentImporter(85-194)create_attachments_zip(61-82)extract_attachments_from_zip(86-194)backend/core/views.py (5)
get(1005-1012)get(6597-6598)attachment(7156-7186)attachment(7203-7228)filter(190-216)backend/core/models.py (21)
EvidenceRevision(3388-3508)filename(3356-3361)filename(3489-3490)attachment_hash(3382-3385)delete(971-984)delete(5189-5196)save(1654-1662)save(1778-1781)save(2779-2781)save(3340-3342)save(3431-3487)save(3662-3666)save(3876-3900)save(4261-4263)save(4398-4401)save(4469-4471)save(5198-5228)save(5351-5357)save(6295-6315)save(6850-6863)save(7023-7039)backend/core/utils.py (2)
compare_schema_versions(180-266)sha256(37-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: test (3.12)
- GitHub Check: startup-docker-compose-test
- GitHub Check: enterprise-startup-functional-test (3.12)
- GitHub Check: startup-functional-test (3.12)
- GitHub Check: enterprise-startup-docker-compose-test
- GitHub Check: build (3.12)
- GitHub Check: build_enterprise_frontend
- GitHub Check: build_community_frontend
- GitHub Check: Analyze (python)
🔇 Additional comments (8)
cli/clica.py (2)
2-7: LGTM!The new standard library imports are appropriate for the streaming protocol and integrity verification features.
906-907: LGTM!Commands are properly registered following the existing pattern.
backend/serdes/views.py (6)
1-33: LGTM!The imports are appropriately organized and all added modules are utilized by the new views.
60-60: Good fix for Issue #2700.Excluding
iam.personalaccesstokenfrom the backup prevents theIntegrityError: NOT NULL constraint failed: global_settings_globalsettings.folder_idthat occurred when importing databases containing personal access tokens.
136-136: Consistent exclusion during import.Correctly mirrors the export exclusion to ensure personal access tokens are excluded from the restore process.
354-363: Error handling looks appropriate.Error details are logged server-side while clients receive sanitized messages. The "Check server logs for details" guidance is appropriate for operators.
1097-1117: Error handling improvements look good.Exception details are now sanitized - using generic messages like "Error processing block" and "An internal error has occurred" instead of exposing
str(e). Full details are appropriately logged server-side.
947-950: Reading entire request body into memory may cause issues with very large uploads, but Django's DATA_UPLOAD_MAX_MEMORY_SIZE setting (2.5 MB default) already protects against this. When accessing request.body, Django enforces a size limit calculated against the total request size excluding file upload data. For file uploads, Django's MemoryFileUploadHandler and TemporaryFileUploadHandler provide default behavior of reading small files into memory and large ones onto disk. However, if this code processes raw (non-multipart) request bodies or batches that bypass standard file upload handlers, verify that adequate size limits are configured for your use case. Consider whether FILE_UPLOAD_MAX_MEMORY_SIZE or DATA_UPLOAD_MAX_MEMORY_SIZE need adjustment based on typical batch sizes.
| # Request batch download | ||
| url = f"{API_URL}/serdes/batch-download-attachments/" | ||
| res = requests.post( | ||
| url, | ||
| headers=headers, | ||
| json={"revision_ids": batch_ids}, | ||
| verify=VERIFY_CERTIFICATE, | ||
| stream=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add timeout to batch download request.
The batch download uses streaming correctly but lacks a timeout, which can cause the CLI to hang indefinitely on network issues.
res = requests.post(
url,
headers=headers,
json={"revision_ids": batch_ids},
verify=VERIFY_CERTIFICATE,
stream=True,
+ timeout=(5, 300),
)🤖 Prompt for AI Agents
In cli/clica.py around lines 597 to 605, the requests.post call used for the
batch download is missing a timeout which can cause the CLI to hang; add a
timeout parameter (e.g., timeout=30 or a configurable constant) to the
requests.post call and update error handling to catch requests.Timeout and
requests.RequestException so the CLI fails fast and reports a clear error
instead of hanging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
cli/clica.py (5)
597-605: Add timeout to batch download request.The batch download request lacks a timeout, which can cause the CLI to hang indefinitely on network issues.
Apply this diff:
# Request batch download url = f"{API_URL}/serdes/batch-download-attachments/" res = requests.post( url, headers=headers, json={"revision_ids": batch_ids}, verify=VERIFY_CERTIFICATE, stream=True, + timeout=(5, 300), )
488-504: Add timeout and stream the database backup to avoid OOM.The database export uses
res.content, which buffers the entire backup in memory before writing to disk. For large databases, this can cause OOM errors.Apply this diff to stream the response and add a timeout:
rprint("[bold blue]Step 1/2: Exporting database backup...[/bold blue]") url = f"{API_URL}/serdes/dump-db/" -res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) +res = requests.get( + url, + headers=headers, + verify=VERIFY_CERTIFICATE, + stream=True, + timeout=(5, 300), +) if res.status_code != 200: rprint( f"[bold red]Error exporting database: {res.status_code}[/bold red]", file=sys.stderr, ) rprint(res.text, file=sys.stderr) sys.exit(1) backup_file = dest_path / "backup.json.gz" with open(backup_file, "wb") as f: - f.write(res.content) + for chunk in res.iter_content(chunk_size=1024 * 1024): + if chunk: + f.write(chunk)
841-867: Add timeout to restore request and use specific exception handler.The restore POST request lacks a timeout (which can cause indefinite hangs), and the bare
except:clause at line 865 catches all exceptions including system exits.Apply this diff:
try: res = requests.post( url, headers=headers, files=files, verify=VERIFY_CERTIFICATE, + timeout=(5, 600), ) finally: # Close file handles for file_tuple in files.values(): file_tuple[1].close() # Clean up temp file if attachments_data: Path(attachments_data).unlink() if res.status_code != 200: rprint( f"[bold red]Error during restore: {res.status_code}[/bold red]", file=sys.stderr, ) try: error_data = res.json() rprint(f"[red]{error_data}[/red]", file=sys.stderr) - except: + except (ValueError, json.JSONDecodeError): rprint(res.text, file=sys.stderr)
537-554: Add timeout to metadata requests and fix URL reconstruction for absolute paths.The metadata fetch lacks a timeout and the URL reconstruction at lines 551-553 doesn't handle absolute-path pagination links (e.g.,
/serdes/attachment-metadata/?page=2), which would produce a malformed URL.Apply this diff:
while url: - res = requests.get(url, headers=headers, verify=VERIFY_CERTIFICATE) + res = requests.get( + url, + headers=headers, + verify=VERIFY_CERTIFICATE, + timeout=(5, 60), + ) if res.status_code != 200: rprint( f"[bold red]Error fetching metadata: {res.status_code}[/bold red]", file=sys.stderr, ) rprint(res.text, file=sys.stderr) sys.exit(1) data = res.json() all_metadata.extend(data["results"]) url = data.get("next") - if url and not url.startswith("http"): - # Convert relative URL to absolute - url = f"{API_URL}/serdes/attachment-metadata/{url}" + if url: + if not url.startswith("http"): + if url.startswith("/"): + # Absolute path: join with base URL + url = API_URL.rstrip("/") + url + else: + # Relative query string: append to base path + url = f"{API_URL}/serdes/attachment-metadata/{url}"
789-820: Stream attachment files to temp file to avoid loading large files into memory.The current code reads each attachment file fully into memory (line 796:
file_bytes = f.read()) before writing to the temp file. For large attachments, this can cause OOM errors.Apply this diff to stream files in chunks:
# Stream to temp file with reasonable buffering temp_attachments = tempfile.NamedTemporaryFile( suffix=".dat", delete=False, buffering=8 * 1024 * 1024 ) # 8MB buffer for file_info in to_upload: - with open(file_info["path"], "rb") as f: - file_bytes = f.read() - - # Build header + # Get file size without reading entire file + file_size = file_info["path"].stat().st_size + header = { "id": file_info["id"], "evidence_id": file_info["evidence_id"], "version": file_info["version"], "filename": file_info["filename"], "hash": file_info["hash"], - "size": len(file_bytes), + "size": file_size, } header_bytes = json.dumps(header).encode("utf-8") - total_size = len(header_bytes) + len(file_bytes) + total_size = len(header_bytes) + file_size # Write directly (OS will buffer intelligently) temp_attachments.write(struct.pack(">I", total_size)) temp_attachments.write(header_bytes) - temp_attachments.write(file_bytes) + + # Stream file content in chunks + with open(file_info["path"], "rb") as f: + for chunk in iter(lambda: f.read(10 * 1024 * 1024), b""): + temp_attachments.write(chunk)
🧹 Nitpick comments (2)
cli/clica.py (2)
461-465: Consider adding batch size validation.The CLI accepts a batch size but doesn't validate it against the backend's limit (default 200). Users won't discover an oversized batch until the request fails at runtime.
Consider adding a note in the help text or validating the batch size:
@click.option( "--batch-size", default=200, - help="Number of files to download per batch (default: 200)", + help="Number of files to download per batch (default: 200, max: 200)", type=int, )Or add runtime validation:
if batch_size > 200: rprint("[yellow]Warning: Batch size exceeds server limit (200), using 200[/yellow]") batch_size = 200
471-472: Consider adding a--quietoption for automation.For scripted/automated backups, the rich formatting and progress messages may not be desired.
Add a quiet option:
@click.option( "--resume/--no-resume", default=True, help="Resume from existing manifest (default: True)", ) -def backup_full(dest_dir, batch_size, resume): +@click.option( + "--quiet", + is_flag=True, + default=False, + help="Suppress progress output (for automation)", +) +def backup_full(dest_dir, batch_size, resume, quiet): """Create a full backup including database and attachments using streaming"""Then conditionally print progress messages based on the
quietflag.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cli/clica.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cli/clica.py (2)
backend/serdes/views.py (8)
get(39-73)get(251-303)get(712-793)post(176-247)post(309-384)post(398-703)post(803-917)post(927-1133)backend/core/utils.py (1)
sha256(37-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build (3.12)
- GitHub Check: test (3.12)
- GitHub Check: build_community_frontend
- GitHub Check: build_enterprise_frontend
- GitHub Check: startup-functional-test (3.12)
- GitHub Check: startup-docker-compose-test
- GitHub Check: enterprise-startup-functional-test (3.12)
- GitHub Check: enterprise-startup-docker-compose-test
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
cli/clica.py (3)
762-773: LGTM: Hash verification streams files efficiently.The hash verification correctly streams files in chunks and compares against the manifest, avoiding memory issues with large attachments.
509-531: LGTM: Resume logic correctly validates existing files.The manifest loading opens the file in read mode and verifies that each downloaded file actually exists on disk before including it in the resume state, preventing issues with incomplete or missing downloads.
615-680: LGTM: Streaming protocol correctly handles chunked transfers.The binary protocol implementation properly:
- Buffers chunks until complete blocks are available
- Bounds header parsing to prevent excessive iteration
- Handles decode errors gracefully
- Updates the manifest incrementally
Resolves #2869 , resolves #2700

Creates a complete backup of your CISO Assistant instance, including both the database and all evidence attachments.
Restores a complete backup of your CISO Assistant instance, including both the database and all evidence attachments.
Summary by CodeRabbit
New Features
Data integrity
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.