-
Notifications
You must be signed in to change notification settings - Fork 35
Add support for OCI image #847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
1883d78 to
ddf5bd4
Compare
05222ea to
7ea86ef
Compare
|
Notes:
|
Sounds strange, probably config related...
If I understood correctly, dense shapes have ephemeral NVMe disks. Also, what is the main aim - VM or BM shapes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for building Scylla machine images for Oracle Cloud Infrastructure (OCI). It introduces OCI-specific Packer configurations, cloud instance detection and handling logic, and I/O setup routines to enable Scylla deployment on OCI alongside existing AWS, GCE, and Azure support.
Key Changes:
- Added OCI as a target cloud option across build scripts and installation tools
- Implemented OCI instance class with metadata service integration and disk configuration
- Added OCI-specific I/O setup with configurable parameters for NVMe disks
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| packer/scylla_install_image | Added custom run wrapper, OCI target cloud handling, and OCI-specific user management (opc user) with cloud-init workaround |
| packer/scylla.json | Added OCI builder configuration with shape settings, metadata, and authentication parameters |
| packer/oci_variables.json | New file defining default OCI-specific variables for Packer builds |
| packer/build_image.sh | Added OCI target support, JSON file override option, and OCI-specific build artifact detection |
| packer/apply_cis_rules | Extended target cloud choices to include OCI with appropriate grub variable configuration |
| lib/scylla_cloud.py | Implemented oci_instance class with DMI/metadata detection, disk management, and metadata service integration |
| common/scylla_create_devices | Added OCI detection to disk device creation logic |
| common/scylla_cloud_io_setup | Fixed UnsupportedInstanceClassError class definition and added oci_io_setup class with disk-count-based I/O parameters |
| README.md | Added OCI build instructions and reference to OCI build guide |
Comments suppressed due to low confidence (5)
lib/scylla_cloud.py:713
- Except block directly handles BaseException.
except:
lib/scylla_cloud.py:804
- Except block directly handles BaseException.
except:
lib/scylla_cloud.py:862
- Except block directly handles BaseException.
except:
lib/scylla_cloud.py:877
- Except block directly handles BaseException.
except:
lib/scylla_cloud.py:713
- 'except' clause does nothing but pass and there is no explanatory comment.
except:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except: | ||
| return '' |
Copilot
AI
Dec 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bare except clause catches all exceptions including SystemExit and KeyboardInterrupt, which is an anti-pattern. Consider catching specific exceptions or at minimum use 'except Exception:' to avoid catching system-exiting exceptions.
| # FIX: in OCI, cloud-init init command always return non-zero exit code, cause failing to create user, | ||
| # but still when image boots the user is created correctly | ||
| # leaving it like this for the time being | ||
| check_init_command = False |
Copilot
AI
Dec 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions fixing an issue with cloud-init init always returning a non-zero exit code in OCI, but the solution (setting check_init_command to False) silently ignores all errors from cloud-init init, including legitimate failures. This could mask real problems during image creation. Consider a more targeted approach, such as checking for specific expected error codes or validating that the user was created successfully after the cloud-init init call.
| else: | ||
| logging.warning("This is a supported instance but with no pre-configured io, scylla_io_setup will be run") | ||
| try: | ||
| result = subprocess.run('scylla_io_setup', shell=True, check=True, capture_output=True, timeout=300) |
Copilot
AI
Dec 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable result is not used.
| result = subprocess.run('scylla_io_setup', shell=True, check=True, capture_output=True, timeout=300) | |
| subprocess.run('scylla_io_setup', shell=True, check=True, capture_output=True, timeout=300) |
| from io import StringIO | ||
| from subprocess import run, PIPE, STDOUT | ||
| import subprocess | ||
| from subprocess import PIPE, STDOUT, CalledProcessError |
Copilot
AI
Dec 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'PIPE' is not used.
Import of 'STDOUT' is not used.
| from subprocess import PIPE, STDOUT, CalledProcessError | |
| from subprocess import CalledProcessError |
As far as I found, It's how their API works, you spin a machine the attachment the extra volume to it You are welcome to prove me wrong here :)
Same in other clouds, we want the fastest disks we can get hold of.
At the end probably both, but the dense ones seems to most fitting to our use case. (Not that I've been able to spin those) |
1 similar comment
As far as I found, It's how their API works, you spin a machine the attachment the extra volume to it You are welcome to prove me wrong here :)
Same in other clouds, we want the fastest disks we can get hold of.
At the end probably both, but the dense ones seems to most fitting to our use case. (Not that I've been able to spin those) |
9694223 to
8973d64
Compare
|
update: been able to test ontop of VM.DenseIO.E5.Flex with 8 OCPUs, and current image works smoothly. running scylla_io_setup, got me this: disks:
- mountpoint: /var/lib/scylla
read_iops: 1111069
read_bandwidth: 6928430080
write_iops: 612214
write_bandwidth: 4059799552official docs promise 290K ops for write, and says nothing about read or bandwidth: anyhow would change the code to run |
common/scylla_cloud_io_setup
Outdated
|
|
||
| if nr_disks >= 1 and nr_disks < 4: | ||
| # Conservative estimates for smaller configurations | ||
| self.disk_properties["read_iops"] = 150000 * nr_disks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those figures are copies from GCE, we should remove them completely
and fall straight to scylla_io_setup
packer/oci/setup_oci_tags.sh
Outdated
| set -euo pipefail | ||
|
|
||
| # Tag namespace OCID | ||
| TAG_NAMESPACE_OCID="ocid1.tagnamespace.oc1..aaaaaaaaw6wipwfzt5n6wuxk3jglbo2hlajvkojkdqru45u7i46vxya7azpq" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
9e4c553 to
b7f4a76
Compare
86da7fd to
306027d
Compare
* intro the needed packer configuration * add the code to identify and handle OCI image * use scyllaadm user (as in AWS)
This commit enhances the network bandwidth estimation for OCI instances by adding support for flexible shapes.
Previously, the estimation only worked for fixed shapes and failed for flex shapes where the network bandwidth is not a fixed value but depends on the number of OCPUs.
The changes include:
- A new script to generate a JSON file ('oci_net_params.json') with network parameters for all shapes, including per-OCPU bandwidth for flex shapes.
- An update to the 'oci_instance' class to retrieve the number of OCPUs from instance metadata.
- An update to the bandwidth estimation logic in 'param_estimation.py' to calculate the bandwidth for flex shapes at runtime.
This commit introduces a new tool to measure IO properties for OCI instances. The tool can launch instances of a given shape or a family of shapes, retrieve the IO properties from the instance, and update a YAML file with the results. The `scylla_cloud_io_setup` script is also updated to support Flex shapes, by using the OCPU count to look up the correct IO parameters.
Ref: SMI-244
Testing
TODO:
device_wait_secondsfeature