Skip to content

Conversation

@bstrdsmkr
Copy link
Collaborator

No description provided.

@minxu74
Copy link
Collaborator

minxu74 commented Apr 29, 2025

@bstrdsmkr Thanks a lot for your contribution and help the synchronizer deployment. I have several general questions:

  • If we find some bugs, can we fix them and push the fixes to the repo and re-deploy the latest one immediately. As the synchronizer will run every 5 minutes, we may have quick fixes and let it run its new version in the next sync?
  • There are some arguments and options for the cli comand, but I have not found them. Is it possible to change them easily? because we need to change the target index from backup to public when it is in the production sync.
  • We need to monitor 5 staged indexes, can we make them not start at the same time? as each job will query the ntp sever, sometime it will have a connection error if all five jobs query the ntp at the same time.

@bstrdsmkr bstrdsmkr force-pushed the ci-setup branch 10 times, most recently from 5c0c041 to 584a6e6 Compare April 29, 2025 20:55
@bstrdsmkr
Copy link
Collaborator Author

  • If we find some bugs, can we fix them and push the fixes to the repo and re-deploy the latest one immediately. As the synchronizer will run every 5 minutes, we may have quick fixes and let it run its new version in the next sync?

We'll have a local deployment repo in Gitlab. If you update the version number in that repo, it'll deploy that version into production.

  • There are some arguments and options for the cli comand, but I have not found them. Is it possible to change them easily? because we need to change the target index from backup to public when it is in the production sync.

Those are defined in the values file. By default I've only defined the one instance named stage but we can define as many as we need

  • We need to monitor 5 staged indexes, can we make them not start at the same time? as each job will query the ntp sever, sometime it will have a connection error if all five jobs query the ntp at the same time.

An NTP server should be able to handle way more than 5 simultaneous connections, but NTP is built to be fault tolerant so it's expected for the server to be unavailable from time to time

@minxu74
Copy link
Collaborator

minxu74 commented Apr 29, 2025

We'll have a local deployment repo in Gitlab. If you update the version number in that repo, it'll deploy that version into production.

Cool. Could I be added to the repo? so if there are some urgent fixes, i can push changes quickly

Those are defined in the values file. By default I've only defined the one instance named stage but we can define as many as we need

Yes. we need monitor all the staged indexes. In the code, we only use stage in the cli, but it will find the corresponding indexes using the project name (e.g., CMIP6Plus, e3sm, input4MIPs, obs4MIPs etc.). Also the --start-time, we could change it to Apr. 30, 2025 that is used for the start time for globus query when the sync is a fresh run (i.e., no database files in previous three days).

An NTP server should be able to handle way more than 5 simultaneous connections, but NTP is built to be fault tolerant so it's expected for the server to be unavailable from time to time

Cool. I wonder if the container syncs its time to the ntp server frequently. If there is an error to get the time from ntp server, it will use other two online time servers, then the local time. If all fail, the sync will quit. But it will pick up the time of last successful sync in the next sync. So it won't miss any data.

@bstrdsmkr bstrdsmkr force-pushed the ci-setup branch 7 times, most recently from 88d5988 to a411075 Compare April 30, 2025 22:23
Details can be seen in the design.md
"""
lock_file_path = f"/tmp/metadata_migrate_sync_{project.value}.lock" # noqa S108
Copy link
Collaborator

@minxu74 minxu74 May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lock file is important. we cannot confirm that the sync must be finished in 5 minutes. So if the previous sync has not finished, the present sync is on. It will cause problems if the sync runs under crontab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants