As part of setting up a Dremio cluster on Kubernetes, there are a number of important considerations that we recommend you review before deploying your cluster. Some of these values have an impact on the performance of your cluster and should be adjusted to your needs.
imageTag: As part of setup, this value should be updated to reference the exact version of Dremio you wish to deploy, i.e.4.7.0. If you are directly consuming Dremio's images from Docker Hub, when specifying which version to use, it is recommended to use the full version tag in the formX.Y.Z(i.e.21.1.0), as image tags in the formX.Y(i.e.21.1) are continually updated with the latest patch version released.distStorage.type: By default, thedistStorage.typeis set tolocal. This must be changed prior to production use. We do not recommend users use local distributed storage as part of a production setup.volumeSizeandstorageClass: The size and type of volume used for Dremio has a direct impact on performance. In most Kubernetes providers, volume size has a direct impact on the performance in IOPS and read/write speeds. It is important to check your Kubernetes provider to determine how volume size impacts the performance of your disk.executor.cloudCache.storageClass: Dremio C3 was designed to be used with performant NVMe storage. By default, the chart utilizes the default storage class that is configured on the Kubernetes cluster. For the major Kubernetes providers, NVMe storage is often available on appropriately sized nodes. We recommend utilizing a local storage provisioner to unlock the benefits of NVMe storage available on the physical Kubernetes nodes. For more information, see the Kubernetes Special Interest Group for Local Static Provisioner.service.sessionAffinity: By default, theservice.sessionAffinityis set tofalse. We currently recommend leaving this value asfalseunless you are using Flight, in which case you should consider the following factors:- When the Flight client is being used and this value is set to
false, there are cases where theDoGetcall happens on a different TCP connection than the originalGetFlightInfocall.- For the Java Flight client, this happens when a different
ManagedChannelis used for differentFlightClientinstances for different Dremio Users. - For the Python Flight client, this happens when a different
FlightClientis initialized for different Dremio Users.
- For the Java Flight client, this happens when a different
- In the cases described above, the
DoGetcall goes to a different coordinator than the one that originally created the query plan. - This causes the query plan to be regenerated, which is less efficient than the case where both the
DoGetand theGetFlightInfocalls go to the same coordinator. - When
service.sessionAffinityis set totrue, all the TCP connections from a particular client IP will be routed to a specific Dremio coordinator.
- When the Flight client is being used and this value is set to
For users who wish to setup a Hive 2/3 source, please see the Setup Hive 2 and 3 documentation.