Skip to content

Drive Groups

Jan Fajerski edited this page Dec 12, 2018 · 37 revisions

Drive Group specification

drive_group:
  target: *
  data_devices: {device_spec}
  db_devices: {device_spec}
  wal_devices: {device_spec} # bluestore can be deployed on 3 devices
  osds_per_device: 1 # number of osd daemons per device. To fully utilize nvme devices multiple osds are required.
  objectstore: bluestore
  encrypted: True
  db_slots: 1wal_slots: 5
  db_slots: 1 # if deploying on 3 devices how many wal volumes per db device
  # and other c-v flags like wal_underprovision_ratio, though this is still being discussed
  # also some keys are only available when others are present ({wal,db}_devices for bluestore, journal_devices for filestore)

with {device_spec} being

ID_MODEL: foo # substring match on the ID_MODEL property                                                           
size: 10G:40G # Size specification of format LOW:HIGH. Can also take the the form :HIGH, LOW: or an exact value (as ceph-volume inventory reports)
rotates: 0 # is the drive rotating or not
count: 10 # if this is present limit the number of drives to this number. 

This new structure is proposed to serve as a declarative way to specify OSD deployments. On a per host basis OSD deployments are defined by the list of devices and their intended use (data, wal, db or journal) and a list of flags for the deployment tools (ceph-volume in this case). The Drive Group specification (dg) is intended to be created manually by a user and specifies a group of OSDs that are interrelated (hybrid OSDs that are deployed on solid state and spinners) or share the same deployment options (identical, i.e. same objectstore, same encryption option, ... standalone OSDs) To avoid explicitly listing devices, we'll rely on a list of filter items. These correspond to a few selected fields of ceph-volume inventory reports. In the simplest case this could be the rotational flag (all solid-state drives are to be db_devices, all rotating one data devices) or something more involved like model strings, sizes or others. DeepSea will provide code that translates these drive groups into actual device lists for inspection by the user.

Example Drive Group Files

2 Nodes with the same setup:

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

This is a common setup and can be described quite easily:

The simple case

drive_group:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: MC-55-44-XZ

This is a simple and valid, but maybe not future-safe configuration. The user may add disks of different vendors in the future, which wouldn't be included with this configuration

We can improve it by reducing the filters on core properties of the drives:

drive_group:
  target: '*'
  data_devices:
    rotates: 1
  db_devices:
    rotates: 0

Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db)

If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size:

drive_group:
  target: '*'
  data_devices:
    size: 2TB:
  db_devices:
    size: :2TB

Forcing encryption on your OSDs is as simple as appending 'encrypted: True' to the layout(? need to agree on a terminology, probably layout is bad - specification or spec?).

drive_group:
  target: '*'
  data_devices:
    size: 2TB:
  db_devices:
    size: :2TB
  encrypted: True

This was a rather simple setup. Following this approach you can also describe more sophisticated setups.

The advanced case

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 12 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 2 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB

Here we have two distinct setups;

20 HDDs should share 2 SSDs;

10 SSDs should share 2 NVMes;

This can be described with two layouts.

drive_group:
  target: '*'
  data_devices:
    rotates: 0
  db_devices:
    model: MC-55-44-XZ
  db_slots: 5 # How many OSDs per DB device

Settings db_slots: 5 will ensure that only two SSDs will be used ( 10 left )

followed by

drive_group:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    vendor: samsung
    size: 256GB
  db_slots: 5 # How many OSDs per DB device

The advanced case (with non-uniform nodes)

The examples above assumed that all nodes have the same drives. That's however not always the case. Example:

Node1-5:

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

Node6-10:

  • 5 NVMEs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 20 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

You can use the 'target' key in the layout to target certain nodes. Salt target notation helps to keep things easy.

drive_group:
  target: 'node[1-5]'
  data_devices:
    rotates: 1
  db_devices:
    rotates: 0

followed by:

drive_group:
  target: 'node[6-10]'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo

The expert case

All previous cases co-colacated the WALs with the DBs. It's however possible to deploy the WAL on a dedicated device as well(if it makes sense).

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 2 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB
drive_group:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo
  wal_devices:
    model: NVME-QQQQ-987
  db_slots: 10
  wal_slots: 10

The very unlikely(but possible) case

Neither Ceph, Deepsea or ceph-volume prevents you from making questionable decisions.

  • 23 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 10 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 1 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB

Here we are trying to define:

20 HDDs backed by 1 NVME

2 HDDs backed by 1 SSD(db) and 1 NVME(wal)

8 SSDs backed by 1 NVME

2 SSDs standalone (encrypted)

1 HDD is spare and should not be deployed

drive_group_hdd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: NVME-QQQQ-987
  db_slots: 20
drive_group_hdd_ssd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: MC-55-44-XZ
  wal_devices:
    model: NVME-QQQQ-987
  db_slots: 2
  wal_slots: 2
drive_group_ssd_nvme:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: NVME-QQQQ-987
  db_slots: 8
drive_group_ssd_standalone_encrypted:
  target: '*'
  data_devices:
    model: SSD-123-foo
  encryption: True

One HDD will remain as the file is being parsed from top to bottom (it should be.. not implemented!) and the db_slots(former ratios) are strictly defined.

** Note

it won't be possible to have only 1 standalone SSD. Maybe there should be a 'limit' flag.. on the other hand, there's no reason to leave a disk out..

Clone this wiki locally