Skip to content

feat: delete old environments (and refactor delete old data) #3928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 25, 2025

Conversation

kaposke
Copy link
Contributor

@kaposke kaposke commented Apr 17, 2025

Description

This adds environments to our deleteOldData cron job. It goes very in depth on deleting anything under an environment. To achieve so, some existing parts of deleteOldEnvironments was refactored / modified. It should go down the whole graph of dependencies under environments. Here's how the dependency graph looks, meaning that deleting a node means also deleting all it's child nodes:

_nango_environments
├── _nango_external_webhooks
├── _nango_slack_notifications
├── _nango_environment_variables
├── end_users
│   └── _nango_connections
│       ├── connect_sessions
│       │   └── _nango_oauth_sessions
│       └── _nango_syncs
│           ├── _nango_sync_jobs
│           └── _nango_active_logs
└── _nango_configs
    └── on_event_scripts
        ├── _nango_connections
        │   ├── connect_sessions
        │   │   └── _nango_oauth_sessions
        │   └── _nango_syncs
        │       ├── _nango_sync_jobs
        │       └── _nango_active_logs
        └── _nango_sync_configs
            ├── _nango_syncs
            │   ├── _nango_sync_jobs
            │   └── _nango_active_logs
            └── _nango_sync_endpoints

Hope this is all correct, looking at this is kind of mind-bending with all the crossed dependencies.

I'll add some comments throughout with some more details

Testing

I have not yet tested this anywhere. It's very unlikely that I got it all right out of the gate. I'm taking suggestions on how to test it.

Copy link

linear bot commented Apr 17, 2025

@kaposke kaposke requested a review from a team April 17, 2025 23:52
Copy link
Collaborator

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested yet, looks good overall, just a few comments here and there 🙏🏻

Comment on lines 33 to 39
deleteFn: async () => {
const endUsers = await db.knex.from<DBEndUser>('end_users').where({ environment_id: environment.id }).limit(opts.limit);

for (const endUser of endUsers) {
await deleteEndUserData(endUser, opts);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also directly batch delete since connections will be deleted in an other loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else are connections deleted? You mean the soft-deleted ones in the base cron?

There's a conflict between having our delete function do what you'd think they would (delete all it's dependencies), and things being deleted elsewhere, meaning we trusts other parts of the cron job to clean things properly. I was going for redundancy in most cases, but I'll remove redundancy where performance could be significantly affected (removed for oauth sessions and connect sessions, for instance).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was not very clear. end_users table is not an entry point to connections.
They should be considered independent, because legacy customers are not using this system and it's optional for new customers.

So you can batch delete them without selecting (e.g: DELETE WHERE IN)
And delete connections independently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to isolated delete under deleteEnvironment.

@@ -0,0 +1,31 @@
import { setTimeout } from 'node:timers/promises';
Copy link
Collaborator

@TBonnin TBonnin Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename the crons/utils subfolder to crons/delete.

@kaposke kaposke requested a review from bodinsamuel April 22, 2025 14:26
Copy link

gitguardian bot commented Apr 23, 2025

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
16627964 Triggered Username Password fa7f53a packages/shared/lib/utils/utils.unit.test.ts View secret
16627965 Triggered Generic Password fa7f53a packages/shared/lib/utils/utils.unit.test.ts View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Collaborator

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this back and forth is a bit annoying, maybe we should have made it testable.
It's currently not working, for providers and env, it's missing the deletion of connections

Comment on lines 33 to 39
deleteFn: async () => {
const endUsers = await db.knex.from<DBEndUser>('end_users').where({ environment_id: environment.id }).limit(opts.limit);

for (const endUser of endUsers) {
await deleteEndUserData(endUser, opts);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was not very clear. end_users table is not an entry point to connections.
They should be considered independent, because legacy customers are not using this system and it's optional for new customers.

So you can batch delete them without selecting (e.g: DELETE WHERE IN)
And delete connections independently

Copy link
Collaborator

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working great 🚀

@kaposke kaposke merged commit 6fe25ed into master Apr 25, 2025
16 checks passed
@kaposke kaposke deleted the gui/NAN-2685/delete-old-environments branch April 25, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants