Skip to content

Conversation

@fruch
Copy link

@fruch fruch commented Jan 8, 2026

Introduces AzurePerfTuner to apply network optimizations for Microsoft Azure environments.

This tuner:

  • Detects if running on an Azure VM.
  • Verifies if Accelerated Networking is enabled on NICs.
  • Applies Azure-specific settings for improved network performance, including increasing ring buffers, TX queue length, and tuning kernel sysctl parameters for high throughput.

Ref: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-optimize-network-bandwidth#achieving-consistent-transfer-speeds-in-linux-vms-in-azure
Ref: #2534

Introduces `AzurePerfTuner` to apply network optimizations for Microsoft Azure environments.

This tuner:
- Detects if running on an Azure VM.
- Verifies if Accelerated Networking is enabled on NICs.
- Applies Azure-specific settings for improved network performance, including increasing ring buffers, TX queue length, and tuning kernel sysctl parameters for high throughput.

Ref: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-optimize-network-bandwidth#achieving-consistent-transfer-speeds-in-linux-vms-in-azure
Ref: scylladb#2534
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Azure VM-specific network performance tuning capabilities to the perftune.py script. It introduces an AzurePerfTuner class that detects Azure VMs and applies optimized network settings when Accelerated Networking is enabled.

Key changes:

  • New AzurePerfTuner class with Azure VM detection and Accelerated Networking verification
  • Interface-specific tuning (ring buffers and TX queue length)
  • Global kernel sysctl optimizations for high-throughput networking on Azure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


perftune_print("Azure VM detected. Checking for Accelerated Networking...")

an_enabled_any = False
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'an_enabled_any' is ambiguous. The abbreviation 'an' could stand for multiple things. Consider renaming to 'accelerated_networking_enabled' or 'azure_an_enabled' for better clarity.

Copilot uses AI. Check for mistakes.
Comment on lines +1331 to +1337
try:
kernel_ver = platform.release().split('-')[0]
major, minor = map(int, kernel_ver.split('.')[:2])
if major > 4 or (major == 4 and minor >= 19):
sysctl_params['net.ipv4.tcp_congestion_control'] = 'bbr'
except Exception:
pass
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel version parsing assumes a specific format and may fail for certain kernel version strings. For example, versions like '5.10.0-rc1' would fail when trying to convert 'rc1' to an integer. While the exception is caught, consider adding more robust parsing or validating the format before conversion to avoid unnecessary exception handling.

Suggested change
try:
kernel_ver = platform.release().split('-')[0]
major, minor = map(int, kernel_ver.split('.')[:2])
if major > 4 or (major == 4 and minor >= 19):
sysctl_params['net.ipv4.tcp_congestion_control'] = 'bbr'
except Exception:
pass
kernel_ver = platform.release().split('-')[0]
match = re.match(r'^(\d+)\.(\d+)', kernel_ver)
if match:
major = int(match.group(1))
minor = int(match.group(2))
if major > 4 or (major == 4 and minor >= 19):
sysctl_params['net.ipv4.tcp_congestion_control'] = 'bbr'

Copilot uses AI. Check for mistakes.
"""
driver = self.__get_driver_name(nic)

# Check 1: Is the NIC itself a VF? (e.g. running on bare metal or direct assignment)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions 'running on bare metal or direct assignment' but this scenario is unlikely on Azure VMs with Accelerated Networking, which typically use the hv_netvsc driver with a bonded VF. Consider clarifying that this check handles edge cases or non-standard configurations where the VF driver is directly exposed.

Suggested change
# Check 1: Is the NIC itself a VF? (e.g. running on bare metal or direct assignment)
# Check 1: Is the NIC itself a VF? This covers edge/non-standard setups where the
# VF driver is bound directly to the interface (for example, bare metal
# or direct assignment). On Azure VMs with Accelerated Networking, the
# typical configuration is the synthetic hv_netvsc NIC bonding a VF,
# which is detected in Check 2 below.

Copilot uses AI. Check for mistakes.
Comment on lines +1217 to +1230
def __get_driver_name(self, nic):
"""
Returns the driver name for a given interface using ethtool.
"""
try:
# run_ethtool returns a list of strings. We join them to search.
# Output format is usually: "driver: mlx5_core\nversion: ..."
output = run_ethtool(['-i', nic])
for line in output:
if line.startswith('driver:'):
return line.split(':')[1].strip()
except Exception:
pass
return None
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method duplicates the __get_driver_name implementation already present in NetPerfTuner class (line 955). Consider refactoring to reuse the existing implementation by either making NetPerfTuner's version accessible or extracting it to a shared utility function.

Copilot uses AI. Check for mistakes.
Comment on lines +1293 to +1297
# We use check=False because some drivers/versions are noisy even on success
run_one_command(['ethtool', '-G', nic, 'rx', '1024', 'tx', '1024'], check=False)

# Optimization: Increase TX Queue Length to 10000
run_one_command(['ip', 'link', 'set', nic, 'txqueuelen', '10000'], check=False)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states 'check=False because some drivers/versions are noisy even on success', but using check=False silently ignores all errors including genuine failures. Consider logging when these commands fail to help with debugging, even if the failures are non-fatal. The dry_run_mode already prints the commands, but in normal mode failures are completely silent.

Suggested change
# We use check=False because some drivers/versions are noisy even on success
run_one_command(['ethtool', '-G', nic, 'rx', '1024', 'tx', '1024'], check=False)
# Optimization: Increase TX Queue Length to 10000
run_one_command(['ip', 'link', 'set', nic, 'txqueuelen', '10000'], check=False)
# We keep failures non-fatal but log them for debugging.
try:
run_one_command(['ethtool', '-G', nic, 'rx', '1024', 'tx', '1024'], check=True)
except subprocess.CalledProcessError as e:
perftune_print(f"Warning: failed to set ring buffers on interface {nic}: {e}")
# Optimization: Increase TX Queue Length to 10000
try:
run_one_command(['ip', 'link', 'set', nic, 'txqueuelen', '10000'], check=True)
except subprocess.CalledProcessError as e:
perftune_print(f"Warning: failed to set txqueuelen on interface {nic}: {e}")

Copilot uses AI. Check for mistakes.
"""
return iter(self.__slaves[nic])

AzurePerfTuner(self.nics).tune()
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is unreachable code. It appears after a return statement in the slaves method and will never be executed. This line should be moved to the tune method of the NetPerfTuner class (after line 666) to actually invoke the Azure tuning functionality.

Copilot uses AI. Check for mistakes.
'net.core.netdev_budget': '1000',
'net.core.optmem_max': '65535',

# Connection Backlogs
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential conflict with existing sysctl settings. The NetPerfTuner.tune method sets net.core.somaxconn to 4096 (line 660), but AzurePerfTuner.tune sets it to 32768 (line 1323). If AzurePerfTuner runs after NetPerfTuner, it will overwrite this value. However, if the misplaced line 700 is fixed and placed at the end of NetPerfTuner.tune, this would result in the Azure value taking precedence, which may be intentional for Azure VMs. Consider documenting this behavior or extracting the common sysctl tuning to avoid conflicts.

Suggested change
# Connection Backlogs
# Connection Backlogs
# NOTE: This Azure-specific value intentionally overrides the generic
# net.core.somaxconn setting applied by NetPerfTuner.tune to support
# higher connection backlogs on Azure high-throughput VMs.

Copilot uses AI. Check for mistakes.
for line in output:
if line.startswith('driver:'):
return line.split(':')[1].strip()
except Exception:
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
if slave_driver in self.vf_drivers:
return True
except Exception:
pass
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
logging.debug(
"Failed to inspect lower devices for NIC %s while checking for Azure Accelerated Networking",
nic,
exc_info=True,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant