Skip to content

Intel Optane 900P performance

Brian S. O'Neill edited this page Nov 21, 2017 · 14 revisions

I wanted to see what kind of performance I could achieve with Tupl on the new Optane 900P SSD. Just like the earlier tests, this one inserted random keys into a new Tupl database, as quickly as possible, with no transactions.

  • Key: 8 bytes
  • Value: 0 bytes
  • RAM: 16GB
  • Storage: Intel Optane SSD 900P 480GB
  • CPU: Ryzen 7 1700
  • Kernel: 4.10.0-38

Optane performance

The graph shows four test runs:

  • odirect_32: Block device opened with O_DIRECT, 32 threads (12 hours)
  • odirect_16: Block device opened with O_DIRECT, 16 threads (12 hours)
  • bdev_16: Block device opened normally, 16 threads (12 hours)
  • ext4_16: EXT4 file system, 16 threads (8 hours)

As can be seen, the best performance is achieved when the database is written to a raw block device, using direct I/O. When using the file system or block device with the regular open option, the primary bottleneck is the kernel. In particular, a lot of CPU time is spent in the kworker and kswapd processes. It should be noted that no actual page swapping occurred, so I don't know what kswapd is doing. When using a NAND-flash SSD, the overhead of these kernel processes doesn't affect the test results, although a real application would benefit from lower CPU load.

The maximum raw performance (not using Tupl) I was able to get with the Optane SSD was 550K read/write operations per second, where each was 4KiB in size. When RAM is exhausted, each insert into a Tupl database requires at least one read and one write operation. So the best sustained throughput well into the test should be about 275K inserts per second. The actual observed rate at around 6 hours was about 180K inserts per second. Some of this overhead is due to contention introduced by Tupl, perhaps when accessing the free list. Contention could be reduced by striping the free list, or by accessing it in batches. A higher thread count of 32 only improves throughput slightly, which further suggests that contention is the issue. An application which is inserting larger records might experience less contention, and so the throughout would be higher.

When I first ran the tests, I observed strange performance dips. At times, the insert rate would plummet all the way down to zero. I ran a simple test which confirmed that this was due to thermal throttling. The Optane SSD has a large heatsink, and it also requires good airflow. After I cranked up the fan and directed more airflow over the SSD, the throttling problem disappeared.

Earlier tests with the Samsung 960 Pro SSD didn't really show any benefit when using a raw block device instead of a file system. This is because the bottleneck is still the SSD, and any operating system overhead isn't much of an issue. Because the Optane SSD is so fast, the bottleneck shifts away from the hardware, and eliminating overhead is essential for achieving the highest performance. Future Optane drives are expected to have performance closer to DRAM speeds, and so bypassing the operating system will become essential.

Clone this wiki locally