| title | Triton Hybrid Images |
|---|---|
| markdown2extras | code-friendly |
| apisections |
This document describes the best practices for HVM image creation such that they are compatible with the QEMU/KVM and bhyve hypervisors on SmartOS.
The key characteristics of a hybrid image are:
- Must be capable of booting from BIOS. The version of QEMU supported by SmartOS does not have UEFI boot support.
- Must be tolerant of disks and other virtual hardware appearing at different locations in the device tree, perhaps with different model identifiers, serial numbers. etc.
- Must configure networking using the mdata
protocol on the second serial
port (
ttyS1,ttyb,COM2, etc.). - Should take other actions as described in the data dictionary.
- Should have an administrative port on the first serial port (
ttyS0,ttya,COM1, etc.). - Should use virtio block and network devices.
The typical hybrid image creation process involves the following steps.
- Obtain guest OS installation media.
- Generate an ISO with configuration data. Depending on the guest OS, this may involve remastering the installation media or generating a configuration-only image.
- Create an empty ZFS volume (virtual disk) that is the same size as the
desired root disk. If the image requires a disk with its block size other
than 8 KiB,
volblocksizemust be set when this is being created. No other ZFS properties will be preserved in the generated image. - Start a QEMU process that uses the ISO image(s) and virtual disk mentioned above. Ideally, the output of the installer will be directed to a location that makes post-mortem debugging easy. One way to do this is to start QEMU in a way that attaches the first serial port to stdio and configure the guest installer to verbosely log to that serial port.
- Wait for the instance to shut itself down, then sanity check the installation. This sanity check likely involves looking for well-known markers in the log output described above.
- Generate an image, which is comprised of a zfs stream and a manifest.
- Optionally upload to Manta and/or https://updates.tritondatacenter.com.
The steps described above are generally handled with automation. Steps 1, 2, and 5 are specific to the image being created. Image-specific tools then invoke create-hybrid-image for the remaining steps.
The image build automation discussed in the sections that flow is designed to run in a joyent branded zone on a SmartOS compute node. The CN can be a standalone instance or part of a Triton cloud. The CN does require some configuration that is not supported by Triton or even vmadm.
The key requirements for the build zone are:
- 2.5 GiB or more RAM. This allows the guest running under QEMU to have 2 GiB.
- A delegated dataset with 30 GiB of available space.
/smartdcneeds to be a read-only lofs mount from the global zone/dev/kvmdevice needs to be delegated for QEMU hardware accelerationgit,gpg, andmkisofsneed to be installed and in$PATH. Ifpigzis available, the image creation will be quicker.
Some images may have additional requirements.
The following script may be used to customize an instance that was created with
triton instance create.
#! /bin/bash
uuid=$1
if [[ -z $uuid ]]; then
echo "Usage: $0 <uuid>" 1>&2
exit 1
}
set -euo pipefail
topds=zones/$uuid/data
zfs create -o zoned=on -o mountpoint=/data $topds
zonecfg -z $uuid <<EOF
add dataset
set name=$topds
end
add fs
set dir=/smartdc
set special=/smartds
set type=lofs
set options=ro
end
add device
set match=kvm
end
EOF
zlogin $uuid /opt/local/bin/pkgin update
zlogin $uuid /opt/local/bin/pkgin -y install git gpg cdrtools pigz
vmadm stop $uuid
vmadm start $uuidThe following repositories follow the patterns described above.
- mi-centos-hvm for CentOS 6 - 8
- mi-debian-hvm for Debian 8 - 10
Each repo is intended to handle a family of distributions. The mi-centos-hvm repo probably implements the hard parts required for RHEL, Fedora, and other RHEL-derived distributions. Likewise, it should be straight-forward to add support for Ubuntu to mi-debian-hvm.
Each of those repositories has this (eng) and sdc-vmtools as subrepos. The eng repo is home to common tools used for building the images and the sdc-vmtools repo is used for things that are added to the image.
Some images may require software that is specific to the OS version and is stored elsewhere. For example, some older versions ship with a version of cloud-init that is too old to properly support the metadata protocol. On these systems, a custom build of cloud-init is stored in Manta and installed via post-install scripting that is invoked by the guest operating system installer during image creation.
Each repository is designed to produce images using Jenkins, with the build
steps for each image specified in the Jenkinsfile in each repository.
The create-hybrid-image script is typically
invoked by an image-specific create-image script that resides in a
mi-<mumble>-hvm repository. See the usage message for details on options that
are supported.
This script is designed to be able to be run as any user that has the Primary Administrator profile. Usually it will be invoked as root, who has sufficient
permissions in all but the oddest of circumstances. To make the script usable
by alice:
# usermod -P 'Primary Administrator' alice
See usermod(1M) and profiles(1M) for details.
QEMU is started such that when the guest writes to the first serial port, it
appears on stdout of the QEMU process. Interaction with the guest over the
first serial port is possible. This is generally fine, as the installer running
in the guest is generally logging its progress to the first serial port. When
create-hybrid-image runs as part of Jenkins job, verbose logging to the serial
port means that the Jenkins console log is likely to be helpful in diagnosing
failures.
If the guest installer performs its own checks and logs the status of those
checks to the first serial port, stdout from QEMU then be used by other tools
to confirm that the installation completed properly. For example, in CentOS
images, the ks.cfg %pre and %post sections are designed to provide the
status of those phases. The corresponding create-image scripts verify that
the expected results are found before calling the image creation successful.
OS installers generally do not log their actions to the console very verbosely.
To get verbose logs, some form a pre-install script (e.g. %pre in ks.cfg for
CentOS) is used to tail log files in the background. When GNU tail is used in a
Linux guest, this may be done with:
tail -F foo.log bar.log ... >/dev/ttyS0 2>&1 </dev/null &
The -F option is used to ask tail to watch for logs to come into existence,
and reopen them when they are rotated. This allows you to start tailing a log
before it exists. The redirection of stderr and stdin are primarily to make
it so that tail closes all the file descriptors passed to it by its parent
process. Without this, some installers will think that the script is intended
to complete before the pre-installation script finishes and will cause the
installation to hang forever.
QEMU is started so that the graphical console is available over VNC. The ip and
port are shown on the terminal. Generally, there is no need to connect to the
graphical console. However, in the event of a failure it is often helpful to
connect to the console to poke around inside a VM using a shell. Most Linux
distributions have virtual consoles available with ctrl-alt-F{1,2,3}.
WARNING: The VNC port is accessible via all interfaces. If the instance has an interface on an untrusted network, use of Cloud Firewall to restrict access to TCP port 5901 is is highly recommended.
QEMU networking is configured such that a network that a private network exists within the QEMU process. QEMU intercepts DHCP requests and provides basic networking configuration to the guest. This network has access to other networks that are accessible by the zone via a NAT implementation that exists within QEMU.
When an image-specific create-image script needs to add additional devices,
have specialized control over the boot loader, etc., additional arguments can be
specified on the command line.
A key example of how this is used is in the CentOS
images.Rather
than remastering the image to add the appropriate boot options, the installation
media is mounted in the host and QEMU is started with options that specify the
kernel, initrd, kernel command line, and a secondary CD which contains the
kickstart configuration. This allows the CentOS create-image to avoid
gigabytes of I/O that would otherwise be required to remaster the installer ISO.