-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(perftune): add Azure-specific network tuning #3194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -697,6 +697,8 @@ def slaves(self, nic): | |||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| return iter(self.__slaves[nic]) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| AzurePerfTuner(self.nics).tune() | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| #### Protected methods ########################## | ||||||||||||||||||||||||||||||||||
| def _get_irqs(self): | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
|
|
@@ -1206,6 +1208,139 @@ def __get_rx_queue_count(self, iface): | |||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| return min(self.__max_rx_queue_count(iface), rx_queues_count) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| class AzurePerfTuner(object): | ||||||||||||||||||||||||||||||||||
| def __init__(self, nics): | ||||||||||||||||||||||||||||||||||
| self.nics = nics | ||||||||||||||||||||||||||||||||||
| # Known drivers for Azure Accelerated Networking VFs | ||||||||||||||||||||||||||||||||||
| self.vf_drivers = {'mlx4_core', 'mlx4_en', 'mlx5_core', 'mana'} | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| def __get_driver_name(self, nic): | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| Returns the driver name for a given interface using ethtool. | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||
| # run_ethtool returns a list of strings. We join them to search. | ||||||||||||||||||||||||||||||||||
| # Output format is usually: "driver: mlx5_core\nversion: ..." | ||||||||||||||||||||||||||||||||||
| output = run_ethtool(['-i', nic]) | ||||||||||||||||||||||||||||||||||
| for line in output: | ||||||||||||||||||||||||||||||||||
| if line.startswith('driver:'): | ||||||||||||||||||||||||||||||||||
| return line.split(':')[1].strip() | ||||||||||||||||||||||||||||||||||
| except Exception: | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
| pass | ||||||||||||||||||||||||||||||||||
| return None | ||||||||||||||||||||||||||||||||||
|
Comment on lines
+1217
to
+1230
|
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| def __is_azure_vm(self): | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| Detects if the script is running on an Azure VM. | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||
| with open('/sys/class/dmi/id/sys_vendor', 'r') as f: | ||||||||||||||||||||||||||||||||||
| vendor = f.read().strip() | ||||||||||||||||||||||||||||||||||
| return "Microsoft Corporation" in vendor or "Microsoft" in vendor | ||||||||||||||||||||||||||||||||||
| except OSError: | ||||||||||||||||||||||||||||||||||
| return False | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| def __is_accelerated_nic(self, nic): | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| Checks if a specific network interface has Azure Accelerated Networking enabled. | ||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||
| driver = self.__get_driver_name(nic) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Check 1: Is the NIC itself a VF? (e.g. running on bare metal or direct assignment) | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
| # Check 1: Is the NIC itself a VF? (e.g. running on bare metal or direct assignment) | |
| # Check 1: Is the NIC itself a VF? This covers edge/non-standard setups where the | |
| # VF driver is bound directly to the interface (for example, bare metal | |
| # or direct assignment). On Azure VMs with Accelerated Networking, the | |
| # typical configuration is the synthetic hv_netvsc NIC bonding a VF, | |
| # which is detected in Check 2 below. |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| pass | |
| logging.debug( | |
| "Failed to inspect lower devices for NIC %s while checking for Azure Accelerated Networking", | |
| nic, | |
| exc_info=True, | |
| ) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name 'an_enabled_any' is ambiguous. The abbreviation 'an' could stand for multiple things. Consider renaming to 'accelerated_networking_enabled' or 'azure_an_enabled' for better clarity.
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states 'check=False because some drivers/versions are noisy even on success', but using check=False silently ignores all errors including genuine failures. Consider logging when these commands fail to help with debugging, even if the failures are non-fatal. The dry_run_mode already prints the commands, but in normal mode failures are completely silent.
| # We use check=False because some drivers/versions are noisy even on success | |
| run_one_command(['ethtool', '-G', nic, 'rx', '1024', 'tx', '1024'], check=False) | |
| # Optimization: Increase TX Queue Length to 10000 | |
| run_one_command(['ip', 'link', 'set', nic, 'txqueuelen', '10000'], check=False) | |
| # We keep failures non-fatal but log them for debugging. | |
| try: | |
| run_one_command(['ethtool', '-G', nic, 'rx', '1024', 'tx', '1024'], check=True) | |
| except subprocess.CalledProcessError as e: | |
| perftune_print(f"Warning: failed to set ring buffers on interface {nic}: {e}") | |
| # Optimization: Increase TX Queue Length to 10000 | |
| try: | |
| run_one_command(['ip', 'link', 'set', nic, 'txqueuelen', '10000'], check=True) | |
| except subprocess.CalledProcessError as e: | |
| perftune_print(f"Warning: failed to set txqueuelen on interface {nic}: {e}") |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential conflict with existing sysctl settings. The NetPerfTuner.tune method sets net.core.somaxconn to 4096 (line 660), but AzurePerfTuner.tune sets it to 32768 (line 1323). If AzurePerfTuner runs after NetPerfTuner, it will overwrite this value. However, if the misplaced line 700 is fixed and placed at the end of NetPerfTuner.tune, this would result in the Azure value taking precedence, which may be intentional for Azure VMs. Consider documenting this behavior or extracting the common sysctl tuning to avoid conflicts.
| # Connection Backlogs | |
| # Connection Backlogs | |
| # NOTE: This Azure-specific value intentionally overrides the generic | |
| # net.core.somaxconn setting applied by NetPerfTuner.tune to support | |
| # higher connection backlogs on Azure high-throughput VMs. |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel version parsing assumes a specific format and may fail for certain kernel version strings. For example, versions like '5.10.0-rc1' would fail when trying to convert 'rc1' to an integer. While the exception is caught, consider adding more robust parsing or validating the format before conversion to avoid unnecessary exception handling.
| try: | |
| kernel_ver = platform.release().split('-')[0] | |
| major, minor = map(int, kernel_ver.split('.')[:2]) | |
| if major > 4 or (major == 4 and minor >= 19): | |
| sysctl_params['net.ipv4.tcp_congestion_control'] = 'bbr' | |
| except Exception: | |
| pass | |
| kernel_ver = platform.release().split('-')[0] | |
| match = re.match(r'^(\d+)\.(\d+)', kernel_ver) | |
| if match: | |
| major = int(match.group(1)) | |
| minor = int(match.group(2)) | |
| if major > 4 or (major == 4 and minor >= 19): | |
| sysctl_params['net.ipv4.tcp_congestion_control'] = 'bbr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is unreachable code. It appears after a return statement in the slaves method and will never be executed. This line should be moved to the tune method of the NetPerfTuner class (after line 666) to actually invoke the Azure tuning functionality.