Skip to content

Feat/handle data policy#101

Merged
FloChehab merged 7 commits into
mainfrom
feat/handle-data-policy
Jun 10, 2026
Merged

Feat/handle data policy#101
FloChehab merged 7 commits into
mainfrom
feat/handle-data-policy

Conversation

@FloChehab

@FloChehab FloChehab commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Checks commits

Summary by CodeRabbit

  • New Features

    • File lifecycle state system to track file status through retention and deletion phases
    • Automatic hard deletion of files after configurable retention period plus grace period
    • Automatic deletion of original file data while preserving file metadata
    • API now exposes file lifecycle state and data retention policy configuration
  • Chores

    • Updated Helm chart with scheduled daily jobs for automated file retention management

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@FloChehab, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 57 minutes and 11 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd9cb746-e8af-4851-8225-632baa9e7346

📥 Commits

Reviewing files that changed from the base of the PR and between 54af19f and 0c0dbe4.

📒 Files selected for processing (15)
  • src/backend/core/api/serializers.py
  • src/backend/core/tests/files/test_api_files_get.py
  • src/backend/core/tests/files/test_api_files_list.py
  • src/backend/dictaphone/settings.py
  • src/frontend/src/api/useConfig.ts
  • src/frontend/src/features/files/api/types.ts
  • src/frontend/src/features/recordings/components/FileActionMenu.tsx
  • src/frontend/src/features/recordings/components/Transcript.tsx
  • src/frontend/src/i18n/init.ts
  • src/frontend/src/locales/en-US/recordings.json
  • src/frontend/src/locales/en-US/shared.json
  • src/frontend/src/locales/fr-FR/recordings.json
  • src/frontend/src/locales/fr-FR/shared.json
  • src/frontend/src/pages/RecordingPage.scss
  • src/frontend/src/pages/RecordingPage.tsx
📝 Walkthrough

Walkthrough

This PR implements a comprehensive file lifecycle management system with automatic data retention and deletion workflows. Files now track lifecycle state (ACTIVE, PENDING_ORIGINAL_DATA_DELETION, ORIGINAL_DATA_DELETED, PENDING_AUTO_HARD_DELETE), and two scheduled management commands (running daily via Helm cron jobs) automatically mark and delete files based on retention policies with configurable grace periods.

Changes

File Lifecycle Management & Data Retention

Layer / File(s) Summary
File Lifecycle Model & Retention Helpers
src/backend/core/models.py, src/backend/core/migrations/0012_file_lifecycle_state_file_file_created_915352_idx.py, src/backend/core/factories.py
FileLifecycleStateChoices enum defines four states; File model adds lifecycle_state CharField with ACTIVE default and initializes on first save. Helper functions get_original_file_data_cutoff_datetime() and get_file_hard_delete_cutoff_datetime() compute retention cutoffs from settings with optional grace-period extension. Migration adds the field, indexes -created_at for deletion queries, and maintains dependency chain.
File Deletion Workflows & Transcription Safeguards
src/backend/core/management/commands/auto_hard_delete_files.py, src/backend/core/management/commands/delete_original_files_data.py, src/backend/core/management/commands/clean_pending_files.py, src/backend/core/tasks/file.py, src/backend/core/tests/commands/test_auto_hard_delete_files.py, src/backend/core/tests/commands/test_delete_original_files_data.py, src/backend/core/tests/test_tasks.py
New auto_hard_delete_files command marks aged files PENDING_AUTO_HARD_DELETE, then soft/hard deletes within grace period and enqueues process_file_deletion. New delete_original_files_data command similarly marks files for original-data deletion and enqueues process_original_file_data_deletion task. Task deletes stored file and updates lifecycle state to ORIGINAL_DATA_DELETED. call_transcribe_service task now rejects non-ACTIVE files. Unused import removed from clean_pending_files. Tests verify state transitions, grace-period windows, already-deleted handling, and stdout summaries.
API Filtering, Serialization & Validation
src/backend/core/api/serializers.py, src/backend/core/api/viewsets.py, src/backend/core/tests/files/test_api_files_get.py, src/backend/core/tests/files/test_api_files_list.py, src/backend/core/tests/ai_jobs/test_api_ai_jobs_retry.py
ListFileSerializer serializes and reads-only marks lifecycle_state. FileViewSet.get_queryset() filters to exclude PENDING_AUTO_HARD_DELETE files and returns only those created after hard-delete cutoff. AiJobViewSet.get_queryset() applies same filters to related files. get_url() method additionally verifies ACTIVE state and original-data cutoff. AiJobViewSet.retry() validates file is ACTIVE, returning 400 Bad Request if not. Tests verify filtering behavior, URL suppression for non-active/expired files, and retry rejection.
Configuration, Admin UI & Deployment
src/backend/dictaphone/settings.py, src/backend/core/admin.py, src/backend/core/api/__init__.py, src/helm/dictaphone/Chart.yaml, src/helm/dictaphone/values.yaml, src/backend/core/tests/test_api_frontend_configuration.py
Four new retention settings (ORIGINAL_FILE_DATA_DELETE_AFTER_DAYS, ORIGINAL_FILE_DATA_DELETE_AFTER_GRACE_PERIOD_DAYS, FILE_AUTO_HARD_DELETE_AFTER_DAYS, FILE_AUTO_HARD_DELETE_AFTER_GRACE_PERIOD_DAYS) with environment overrides. FileAdmin exposes lifecycle_state in list display, filters, and readonly form fields. Configuration endpoint includes data_policy object with retention settings. Helm chart version bumped to 0.2.0; two new daily cronjobs schedule management commands. Test validates full configuration response including policy fields.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • suitenumerique/dictaphone#51: Both PRs modify the AI transcript retry flow; the retrieved PR adds/reworks the retry endpoint while the main PR further gates the same operation by requiring the file's lifecycle_state to be ACTIVE.
  • suitenumerique/dictaphone#47: Both PRs touch file-cleanup management commands; the main PR only removes an unused import from clean_pending_files, while the retrieved PR introduces/tests the command itself.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Feat/handle data policy" broadly describes the main feature area (data policy handling) and aligns with the PR's primary objectives of implementing automatic file lifecycle management and deletion policies.
Docstring Coverage ✅ Passed Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FloChehab FloChehab force-pushed the feat/handle-data-policy branch from 90347aa to 089aae8 Compare June 4, 2026 15:51
Adds 2 commands:
1. Delete original data
2. Auto hard delete the file

They are paramettrized through settings.
The file model is updated to enable and
lifecycle_state is added.

Note that this boolean might be a bit redundant
with other attributes but it makes it clear what
has been done automatically by the commands.

Tests cover all the new cases.

Viewsets & serializers have been updated accordingly.
@FloChehab FloChehab force-pushed the feat/handle-data-policy branch from 7ede068 to 54af19f Compare June 10, 2026 14:32
@FloChehab

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/core/api/viewsets.py`:
- Around line 261-268: media_auth currently only checks readiness and thus
allows stale URLs; update media_auth to apply the same exclusion/filter logic
used elsewhere by excluding FileLifecycleStateChoices.PENDING_AUTO_HARD_DELETE
and filtering created_at >
get_file_hard_delete_cutoff_datetime(include_grace_period=False) before
authorizing access. Locate the media_auth function/method and add the
lifecycle_state exclusion and created_at cutoff filter to its queryset or
authorization check (mirror the logic around the existing
.exclude(...).filter(...)), and make the identical change in the other
media_auth occurrence mentioned (the similar block used elsewhere).

In `@src/backend/dictaphone/settings.py`:
- Around line 271-290: The four retention settings
(ORIGINAL_FILE_DATA_DELETE_AFTER_DAYS,
ORIGINAL_FILE_DATA_DELETE_AFTER_GRACE_PERIOD_DAYS,
FILE_AUTO_HARD_DELETE_AFTER_DAYS, FILE_AUTO_HARD_DELETE_AFTER_GRACE_PERIOD_DAYS)
currently allow negative integers which makes code that uses timezone.now() -
timedelta(days=days) (see core.models usage) behave incorrectly; add validation
when the settings are loaded to ensure each value is non-negative (or enforce a
sensible minimum), and if a value is negative either raise a clear configuration
error or coerce it to the minimum allowed value so lifecycle/ deletion logic
cannot receive negative days. Locate these symbols in settings.py and implement
the check in the settings class initialization or immediately after values are
read so invalid negatives are caught early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5418267c-154b-4e3c-a476-2b5f12adba8b

📥 Commits

Reviewing files that changed from the base of the PR and between 841945e and 54af19f.

📒 Files selected for processing (21)
  • src/backend/core/admin.py
  • src/backend/core/api/__init__.py
  • src/backend/core/api/serializers.py
  • src/backend/core/api/viewsets.py
  • src/backend/core/factories.py
  • src/backend/core/management/commands/auto_hard_delete_files.py
  • src/backend/core/management/commands/clean_pending_files.py
  • src/backend/core/management/commands/delete_original_files_data.py
  • src/backend/core/migrations/0012_file_lifecycle_state_file_file_created_915352_idx.py
  • src/backend/core/models.py
  • src/backend/core/tasks/file.py
  • src/backend/core/tests/ai_jobs/test_api_ai_jobs_retry.py
  • src/backend/core/tests/commands/test_auto_hard_delete_files.py
  • src/backend/core/tests/commands/test_delete_original_files_data.py
  • src/backend/core/tests/files/test_api_files_get.py
  • src/backend/core/tests/files/test_api_files_list.py
  • src/backend/core/tests/test_api_frontend_configuration.py
  • src/backend/core/tests/test_tasks.py
  • src/backend/dictaphone/settings.py
  • src/helm/dictaphone/Chart.yaml
  • src/helm/dictaphone/values.yaml
💤 Files with no reviewable changes (1)
  • src/backend/core/management/commands/clean_pending_files.py

Comment thread src/backend/core/api/viewsets.py
Comment thread src/backend/dictaphone/settings.py Outdated
@FloChehab FloChehab force-pushed the feat/handle-data-policy branch from 9c6f8eb to 0c0dbe4 Compare June 10, 2026 17:00
@FloChehab FloChehab merged commit 0c0dbe4 into main Jun 10, 2026
12 checks passed
@FloChehab FloChehab deleted the feat/handle-data-policy branch June 10, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant