Skip to content

Conversation

@fruch
Copy link
Collaborator

@fruch fruch commented Dec 14, 2025

  • intro the needed packer configuration
  • add the code to identify and handle OCI image

Ref: SMI-244

Testing

TODO:

  • - report we need OCI Snitch
  • - remove wrong io prop defaults
  • - test & document device_wait_seconds feature
  • - document the OCI prerequisites for building images
  • - create service user for building images
  • - build pipelines parts for building OCI image

@fruch fruch requested a review from vponomaryov December 14, 2025 20:20
@fruch fruch force-pushed the oci_image branch 3 times, most recently from 1883d78 to ddf5bd4 Compare December 16, 2025 07:58
@fruch fruch force-pushed the oci_image branch 7 times, most recently from 05222ea to 7ea86ef Compare December 18, 2025 14:58
@fruch
Copy link
Collaborator Author

fruch commented Dec 18, 2025

Notes:

  • we have a problem with configuring extra volumes out of the box, since when we boot they are not yet attached. not fixing this for now
  • we don't have access to the dense shapes yet, so can't test those, would try with BM.Optimized3.36 which is the smallest with local disk I have available

@vponomaryov
Copy link

vponomaryov commented Dec 19, 2025

Notes:

  • we have a problem with configuring extra volumes out of the box, since when we boot they are not yet attached. not fixing this for now

Sounds strange, probably config related...

  • we don't have access to the dense shapes yet, so can't test those, would try with BM.Optimized3.36 which is the smallest with local disk I have available

If I understood correctly, dense shapes have ephemeral NVMe disks.
It means any stop/start operation will make all the data be gone.
If so, why use dense shapes?

Also, what is the main aim - VM or BM shapes?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for building Scylla machine images for Oracle Cloud Infrastructure (OCI). It introduces OCI-specific Packer configurations, cloud instance detection and handling logic, and I/O setup routines to enable Scylla deployment on OCI alongside existing AWS, GCE, and Azure support.

Key Changes:

  • Added OCI as a target cloud option across build scripts and installation tools
  • Implemented OCI instance class with metadata service integration and disk configuration
  • Added OCI-specific I/O setup with configurable parameters for NVMe disks

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
packer/scylla_install_image Added custom run wrapper, OCI target cloud handling, and OCI-specific user management (opc user) with cloud-init workaround
packer/scylla.json Added OCI builder configuration with shape settings, metadata, and authentication parameters
packer/oci_variables.json New file defining default OCI-specific variables for Packer builds
packer/build_image.sh Added OCI target support, JSON file override option, and OCI-specific build artifact detection
packer/apply_cis_rules Extended target cloud choices to include OCI with appropriate grub variable configuration
lib/scylla_cloud.py Implemented oci_instance class with DMI/metadata detection, disk management, and metadata service integration
common/scylla_create_devices Added OCI detection to disk device creation logic
common/scylla_cloud_io_setup Fixed UnsupportedInstanceClassError class definition and added oci_io_setup class with disk-count-based I/O parameters
README.md Added OCI build instructions and reference to OCI build guide
Comments suppressed due to low confidence (5)

lib/scylla_cloud.py:713

  • Except block directly handles BaseException.
        except:

lib/scylla_cloud.py:804

  • Except block directly handles BaseException.
        except:

lib/scylla_cloud.py:862

  • Except block directly handles BaseException.
        except:

lib/scylla_cloud.py:877

  • Except block directly handles BaseException.
        except:

lib/scylla_cloud.py:713

  • 'except' clause does nothing but pass and there is no explanatory comment.
        except:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +862 to +875
except:
return ''
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except clause catches all exceptions including SystemExit and KeyboardInterrupt, which is an anti-pattern. Consider catching specific exceptions or at minimum use 'except Exception:' to avoid catching system-exiting exceptions.

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +240
# FIX: in OCI, cloud-init init command always return non-zero exit code, cause failing to create user,
# but still when image boots the user is created correctly
# leaving it like this for the time being
check_init_command = False
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions fixing an issue with cloud-init init always returning a non-zero exit code in OCI, but the solution (setting check_init_command to False) silently ignores all errors from cloud-init init, including legitimate failures. This could mask real problems during image creation. Consider a more targeted approach, such as checking for specific expected error codes or validating that the user was created successfully after the cloud-init init call.

Copilot uses AI. Check for mistakes.
else:
logging.warning("This is a supported instance but with no pre-configured io, scylla_io_setup will be run")
try:
result = subprocess.run('scylla_io_setup', shell=True, check=True, capture_output=True, timeout=300)
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable result is not used.

Suggested change
result = subprocess.run('scylla_io_setup', shell=True, check=True, capture_output=True, timeout=300)
subprocess.run('scylla_io_setup', shell=True, check=True, capture_output=True, timeout=300)

Copilot uses AI. Check for mistakes.
from io import StringIO
from subprocess import run, PIPE, STDOUT
import subprocess
from subprocess import PIPE, STDOUT, CalledProcessError
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'PIPE' is not used.
Import of 'STDOUT' is not used.

Suggested change
from subprocess import PIPE, STDOUT, CalledProcessError
from subprocess import CalledProcessError

Copilot uses AI. Check for mistakes.
@fruch
Copy link
Collaborator Author

fruch commented Dec 20, 2025

Notes:

  • we have a problem with configuring extra volumes out of the box, since when we boot they are not yet attached. not fixing this for now

Sounds strange, probably config related..

As far as I found,

It's how their API works, you spin a machine the attachment the extra volume to it

You are welcome to prove me wrong here :)

  • we don't have access to the dense shapes yet, so can't test those, would try with BM.Optimized3.36 which is the smallest with local disk I have available

If I understood correctly, dense shapes have ephemeral NVMe disks.
It means any stop/start operation will make all the data be gone.
If so, why use dense shapes?

Same in other clouds, we want the fastest disks we can get hold of.

Also, what is the main aim - VM or BM shapes?

At the end probably both, but the dense ones seems to most fitting to our use case.

(Not that I've been able to spin those)

1 similar comment
@fruch
Copy link
Collaborator Author

fruch commented Dec 20, 2025

Notes:

  • we have a problem with configuring extra volumes out of the box, since when we boot they are not yet attached. not fixing this for now

Sounds strange, probably config related..

As far as I found,

It's how their API works, you spin a machine the attachment the extra volume to it

You are welcome to prove me wrong here :)

  • we don't have access to the dense shapes yet, so can't test those, would try with BM.Optimized3.36 which is the smallest with local disk I have available

If I understood correctly, dense shapes have ephemeral NVMe disks.
It means any stop/start operation will make all the data be gone.
If so, why use dense shapes?

Same in other clouds, we want the fastest disks we can get hold of.

Also, what is the main aim - VM or BM shapes?

At the end probably both, but the dense ones seems to most fitting to our use case.

(Not that I've been able to spin those)

@fruch fruch force-pushed the oci_image branch 2 times, most recently from 9694223 to 8973d64 Compare December 20, 2025 20:46
@fruch
Copy link
Collaborator Author

fruch commented Dec 20, 2025

update:

been able to test ontop of VM.DenseIO.E5.Flex with 8 OCPUs, and current image works smoothly.
the io configuration still needs to be tuned:

running scylla_io_setup, got me this:

disks:
  - mountpoint: /var/lib/scylla
    read_iops: 1111069
    read_bandwidth: 6928430080
    write_iops: 612214
    write_bandwidth: 4059799552

official docs promise 290K ops for write, and says nothing about read or bandwidth:
https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Compute/Concepts/computeperformance.htm

anyhow would change the code to run scylla_io_setup until we decide on numbers
kind of hard to give the exact numbers, since we didn't run anything substantial on those


if nr_disks >= 1 and nr_disks < 4:
# Conservative estimates for smaller configurations
self.disk_properties["read_iops"] = 150000 * nr_disks
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those figures are copies from GCE, we should remove them completely

and fall straight to scylla_io_setup

set -euo pipefail

# Tag namespace OCID
TAG_NAMESPACE_OCID="ocid1.tagnamespace.oc1..aaaaaaaaw6wipwfzt5n6wuxk3jglbo2hlajvkojkdqru45u7i46vxya7azpq"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to parameter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@fruch fruch force-pushed the oci_image branch 5 times, most recently from 9e4c553 to b7f4a76 Compare December 28, 2025 15:35
@fruch fruch force-pushed the oci_image branch 3 times, most recently from 86da7fd to 306027d Compare December 28, 2025 20:05
* intro the needed packer configuration
* add the code to identify and handle OCI image
* use scyllaadm user (as in AWS)
This commit enhances the network bandwidth estimation for OCI instances by adding support for flexible shapes.

Previously, the estimation only worked for fixed shapes and failed for flex shapes where the network bandwidth is not a fixed value but depends on the number of OCPUs.

The changes include:
- A new script to generate a JSON file ('oci_net_params.json') with network parameters for all shapes, including per-OCPU bandwidth for flex shapes.
- An update to the 'oci_instance' class to retrieve the number of OCPUs from instance metadata.
- An update to the bandwidth estimation logic in 'param_estimation.py' to calculate the bandwidth for flex shapes at runtime.
This commit introduces a new tool to measure IO properties for OCI instances.
The tool can launch instances of a given shape or a family of shapes,
retrieve the IO properties from the instance, and update a YAML file with the results.

The `scylla_cloud_io_setup` script is also updated to support Flex shapes,
by using the OCPU count to look up the correct IO parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants