| layout | default |
|---|---|
| title | NVMe-TCP on RHEL/Rocky/AlmaLinux - Best Practices Guide |
Comprehensive best practices for deploying NVMe-TCP storage on RHEL-based systems in production environments.
{% include bestpractices/disclaimer-rhel.md %}
- Architecture Overview
- RHEL-Specific Considerations
- Network Configuration
- SELinux Configuration
- Firewall Configuration
- Performance Tuning
- High Availability
- Monitoring & Maintenance
- Security
- Troubleshooting
flowchart TB
subgraph "RHEL/Rocky/AlmaLinux Hosts"
HOST1[Linux Host 1<br/>2x Storage NICs]
HOST2[Linux Host 2<br/>2x Storage NICs]
HOST3[Linux Host 3<br/>2x Storage NICs]
end
subgraph "Storage Network"
SW1[Storage Switch 1<br/>10/25/100 GbE]
SW2[Storage Switch 2<br/>10/25/100 GbE]
end
subgraph "NVMe-TCP Storage Array"
CTRL1[Controller 1<br/>Portal 1 & 2]
CTRL2[Controller 2<br/>Portal 3 & 4]
NVME[(NVMe Namespace)]
end
SW1 --- SW2
HOST1 ---|NIC 1| SW1
HOST1 ---|NIC 2| SW2
HOST2 ---|NIC 1| SW1
HOST2 ---|NIC 2| SW2
HOST3 ---|NIC 1| SW1
HOST3 ---|NIC 2| SW2
SW1 --- CTRL1
SW1 --- CTRL2
SW2 --- CTRL1
SW2 --- CTRL2
CTRL1 --- NVME
CTRL2 --- NVME
style NVME fill:#5d6d7e,stroke:#333,stroke-width:2px,color:#fff
style SW1 fill:#1a5490,stroke:#333,stroke-width:2px,color:#fff
style SW2 fill:#1a5490,stroke:#333,stroke-width:2px,color:#fff
flowchart LR
subgraph "RHEL Host"
NIC1[Storage NIC 1<br/>10.100.1.101]
NIC2[Storage NIC 2<br/>10.100.1.102]
end
subgraph "Storage Network - VLAN 100"
SW1[Switch 1]
SW2[Switch 2]
end
subgraph "Storage Array"
P1[Portal 1<br/>10.100.1.10]
P2[Portal 2<br/>10.100.1.11]
P3[Portal 3<br/>10.100.1.12]
P4[Portal 4<br/>10.100.1.13]
end
NIC1 ---|Path 1-4| SW1
NIC2 ---|Path 5-8| SW2
SW1 --- SW2
SW1 --- P1
SW1 --- P2
SW1 --- P3
SW1 --- P4
SW2 --- P1
SW2 --- P2
SW2 --- P3
SW2 --- P4
style NIC1 fill:#1e8449,stroke:#333,stroke-width:2px,color:#fff
style NIC2 fill:#1e8449,stroke:#333,stroke-width:2px,color:#fff
style SW1 fill:#1a5490,stroke:#333,stroke-width:2px,color:#fff
style SW2 fill:#1a5490,stroke:#333,stroke-width:2px,color:#fff
Key Design Principles:
- Dual switches for network redundancy
- Minimum 2 NICs per host for multipath
- Dual controller array for storage HA
- 8 paths (2 NICs × 4 portals) for maximum redundancy
{% include diagrams-storage-topology-nvme.md %}
{% include diagrams-network-architecture.md %}
Red Hat Enterprise Linux:
# Register system
sudo subscription-manager register --username <username>
# Attach subscription
sudo subscription-manager attach --auto
# Enable required repositories
sudo subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms
sudo subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms
# Update system
sudo dnf update -yRocky Linux / AlmaLinux:
# No subscription required
# Update system
sudo dnf update -y
# Enable EPEL if needed for additional tools
sudo dnf install -y epel-releaseMinimum kernel versions:
- RHEL 8: Kernel 4.18.0-193 or later (NVMe-TCP support)
- RHEL 9: Kernel 5.14.0 or later (recommended)
Check kernel version:
uname -r
# Verify NVMe-TCP module is available
modinfo nvme-tcpUpdate kernel if needed:
sudo dnf update kernel
sudo rebootEssential packages:
# Core NVMe and multipath tools
sudo dnf install -y \
nvme-cli \
device-mapper-multipath \
lvm2 \
sg3_utils
# Performance monitoring tools
sudo dnf install -y \
sysstat \
iotop \
iftop \
htop \
perf
# Network tools
sudo dnf install -y \
NetworkManager \
NetworkManager-tui \
ethtool \
iproute \
iputils \
bind-utils
# Optional: Tuned for performance profiles
sudo dnf install -y tuned tuned-utilsWhy NetworkManager:
- Default in RHEL 8/9
- Better integration with systemd
- Dynamic configuration support
- Consistent across RHEL ecosystem
Disable network-scripts (RHEL 8):
# Network scripts are deprecated
sudo systemctl disable network
sudo systemctl enable NetworkManager
sudo systemctl start NetworkManager# Create connection for storage interface
sudo nmcli connection add type ethernet \
con-name storage-nvme-1 \
ifname ens1f0 \
ipv4.method manual \
ipv4.addresses 10.100.1.101/24 \
ipv4.never-default yes \
ipv4.may-fail no \
802-3-ethernet.mtu 9000 \
connection.autoconnect yes \
connection.autoconnect-priority 10
# Optimize for storage
sudo nmcli connection modify storage-nvme-1 \
ethtool.ring-rx 4096 \
ethtool.ring-tx 4096 \
ethtool.coalesce-rx-usecs 50 \
ethtool.coalesce-tx-usecs 50
# Activate
sudo nmcli connection up storage-nvme-1Key NetworkManager parameters:
ipv4.never-default yes- No default route on storage interfaceipv4.may-fail no- Boot waits for this interfaceconnection.autoconnect-priority 10- Higher priority for storageethtool.*- NIC tuning parameters
# Set MTU 9000 for jumbo frames
sudo nmcli connection modify storage-nvme-1 802-3-ethernet.mtu 9000
# Verify
nmcli connection show storage-nvme-1 | grep mtu
# Test MTU
ping -M do -s 8972 <storage_portal_ip>Important: MTU must be 9000 on:
- Host interfaces
- All switches in path
- Storage array ports
SELinux modes:
enforcing- SELinux policy is enforced (recommended for production)permissive- SELinux logs violations but doesn't block (testing)disabled- SELinux is disabled (not recommended)
Check SELinux status:
getenforce
sestatusAllow NVMe-TCP connections:
# Check for denials
sudo ausearch -m avc -ts recent | grep nvme
# If denials found, generate policy
sudo ausearch -m avc -ts recent | audit2allow -M nvme_tcp_policy
# Review the policy
cat nvme_tcp_policy.te
# Install policy
sudo semodule -i nvme_tcp_policy.ppIssue: NVMe connections blocked
# Check for denials
sudo ausearch -m avc -ts recent
# Temporary: Set to permissive for testing
sudo setenforce 0
# Test NVMe connection
sudo nvme connect -t tcp -a <portal_ip> -s 4420 -n <nqn>
# Check for new denials
sudo ausearch -m avc -ts recent
# Generate and install policy
sudo ausearch -m avc -ts recent | audit2allow -M nvme_fix
sudo semodule -i nvme_fix.pp
# Re-enable enforcing
sudo setenforce 1Issue: Multipath device access denied
# Allow multipath to access devices
sudo setsebool -P virt_use_rawio 1
# Or create custom policy
sudo ausearch -m avc -ts recent | grep multipath | audit2allow -M multipath_nvme
sudo semodule -i multipath_nvme.pp-
Never disable SELinux in production
- Use permissive mode for troubleshooting only
- Always create proper policies
-
Use audit2allow carefully
- Review generated policies before installing
- Understand what you're allowing
- Document custom policies
-
Monitor for denials
# Set up monitoring sudo ausearch -m avc -ts today | grep denied # Or use setroubleshoot sudo dnf install -y setroubleshoot-server sudo sealert -a /var/log/audit/audit.log
-
Keep policies updated
# Update SELinux policies sudo dnf update selinux-policy\*
Why FirewallD:
- Default in RHEL 8/9
- Dynamic firewall management
- Zone-based configuration
- Integration with NetworkManager
Check firewall status:
sudo firewall-cmd --state
sudo firewall-cmd --list-allFor dedicated storage networks, disable firewall filtering on storage interfaces to eliminate CPU overhead from packet inspection. This is critical for high-throughput NVMe-TCP storage.
# Add storage interfaces to trusted zone (no packet filtering)
sudo firewall-cmd --permanent --zone=trusted --add-interface=ens1f0
sudo firewall-cmd --permanent --zone=trusted --add-interface=ens1f1
# Reload
sudo firewall-cmd --reload
# Verify
sudo firewall-cmd --zone=trusted --list-allWhy disable filtering on storage interfaces:
- CPU overhead: Firewall packet inspection adds latency and consumes CPU cycles
- Performance impact: At high IOPS (millions with NVMe-TCP), filtering overhead becomes significant
- Network isolation: Dedicated storage VLANs provide security at the network layer
- Simplicity: No port rules to maintain for storage traffic
Use port filtering only when storage interfaces share a network with other traffic or when additional host-level security is required by policy.
⚠️ Performance Note: Port filtering adds CPU overhead for every packet. For production storage with high IOPS requirements, use Option 1 with network-level isolation instead.
# Create custom storage zone
sudo firewall-cmd --permanent --new-zone=storage
# Port 4420 = Data port (connections)
# Port 8009 = Discovery port (optional, for nvme discover)
sudo firewall-cmd --permanent --zone=storage --add-port=4420/tcp
sudo firewall-cmd --permanent --zone=storage --add-port=8009/tcp
sudo firewall-cmd --permanent --zone=storage --add-interface=ens1f0
sudo firewall-cmd --permanent --zone=storage --add-interface=ens1f1
# Set target to DROP (deny by default)
sudo firewall-cmd --permanent --zone=storage --set-target=DROP
# Reload
sudo firewall-cmd --reload# Allow NVMe-TCP only from specific subnet
sudo firewall-cmd --permanent --zone=storage --add-rich-rule='
rule family="ipv4"
source address="10.100.1.0/24"
port protocol="tcp" port="4420" accept'
sudo firewall-cmd --permanent --zone=storage --add-rich-rule='
rule family="ipv4"
source address="10.100.1.0/24"
port protocol="tcp" port="8009" accept'
# Log dropped packets
sudo firewall-cmd --permanent --zone=storage --add-rich-rule='
rule family="ipv4"
log prefix="STORAGE-DROP: " level="warning"
drop'
# Reload
sudo firewall-cmd --reloadWhy use tuned:
- Red Hat's system tuning daemon
- Pre-configured profiles for different workloads
- Dynamic tuning based on system state
- Easy to customize
Install and enable tuned:
sudo dnf install -y tuned tuned-utils
sudo systemctl enable --now tunedAvailable profiles:
# List available profiles
sudo tuned-adm list
# Recommended profiles for storage:
# - throughput-performance: Maximum throughput
# - latency-performance: Minimum latency
# - network-latency: Network-optimizedApply profile:
# For maximum throughput
sudo tuned-adm profile throughput-performance
# For minimum latency
sudo tuned-adm profile latency-performance
# Verify active profile
sudo tuned-adm activeCreate custom profile optimized for NVMe-TCP storage:
# Create custom profile directory
sudo mkdir -p /etc/tuned/nvme-tcp-storage
# Create profile configuration
sudo tee /etc/tuned/nvme-tcp-storage/tuned.conf > /dev/null <<'EOF'
[main]
summary=Optimized for NVMe-TCP storage workloads
include=throughput-performance
[cpu]
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[sysctl]
# Network tuning
net.core.rmem_max=134217728
net.core.wmem_max=134217728
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.ipv4.tcp_rmem=4096 87380 67108864
net.ipv4.tcp_wmem=4096 65536 67108864
net.core.netdev_max_backlog=30000
net.core.somaxconn=4096
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=1
# VM tuning
vm.dirty_ratio=10
vm.dirty_background_ratio=5
vm.swappiness=10
# ARP cache
net.ipv4.neigh.default.gc_thresh1=4096
net.ipv4.neigh.default.gc_thresh2=8192
net.ipv4.neigh.default.gc_thresh3=16384
# ARP settings for same-subnet multipath (CRITICAL)
# Prevents ARP responses on wrong interface when multiple NICs share same subnet
# See: Network Concepts documentation for detailed explanation
net.ipv4.conf.all.arp_ignore=2
net.ipv4.conf.default.arp_ignore=2
net.ipv4.conf.all.arp_announce=2
net.ipv4.conf.default.arp_announce=2
# Interface-specific (adjust interface names as needed)
net.ipv4.conf.ens1f0.arp_ignore=2
net.ipv4.conf.ens1f1.arp_ignore=2
net.ipv4.conf.ens1f0.arp_announce=2
net.ipv4.conf.ens1f1.arp_announce=2
[disk]
# I/O scheduler for NVMe
elevator=none
[script]
script=${i:PROFILE_DIR}/script.sh
EOF
# Create script for NIC tuning
sudo tee /etc/tuned/nvme-tcp-storage/script.sh > /dev/null <<'EOF'
#!/bin/bash
. /usr/lib/tuned/functions
start() {
# Tune storage NICs (adjust interface names)
for iface in ens1f0 ens1f1; do
if [ -d "/sys/class/net/$iface" ]; then
# Ring buffers
ethtool -G $iface rx 4096 tx 4096 2>/dev/null || true
# Interrupt coalescing
ethtool -C $iface rx-usecs 50 tx-usecs 50 2>/dev/null || true
# Offloads
ethtool -K $iface tso on gso on gro on 2>/dev/null || true
# Flow control
ethtool -A $iface rx on tx on 2>/dev/null || true
fi
done
return 0
}
stop() {
return 0
}
process $@
EOF
# Make script executable
sudo chmod +x /etc/tuned/nvme-tcp-storage/script.sh
# Apply custom profile
sudo tuned-adm profile nvme-tcp-storage
# Verify
sudo tuned-adm active
⚠️ Note: The values in this custom tuned profile are starting points for testing. Actual optimal values depend on:
- Driver/firmware limitations: Check NIC and storage driver documentation for supported buffer sizes and queue depths
- Hardware capabilities: Use
ethtool -g <interface>to verify ring buffer limits- Workload characteristics: Sequential vs. random I/O, block sizes, concurrency
Always validate with performance monitoring (
iostat -x 1,sar -n DEV 1,perf, vendor telemetry) before deploying to production. Measure baseline performance first, then test changes incrementally.
Automatic IRQ distribution:
# Install irqbalance
sudo dnf install -y irqbalance
# Configure for storage workload
sudo tee /etc/sysconfig/irqbalance > /dev/null <<EOF
IRQBALANCE_BANNED_CPUS=00000001
IRQBALANCE_ARGS="--policyscript=/usr/local/bin/irq-policy.sh"
EOF
# Enable and start
sudo systemctl enable --now irqbalanceManual IRQ affinity (for specific control):
# Find storage NIC IRQs
grep ens1f0 /proc/interrupts | awk '{print $1}' | sed 's/://'
# Pin IRQs to specific CPUs (example: CPUs 2-5)
#!/bin/bash
INTERFACE="ens1f0"
CPU_START=2
for IRQ in $(grep $INTERFACE /proc/interrupts | awk '{print $1}' | sed 's/://'); do
MASK=$(printf "%x" $((1 << $CPU_START)))
echo $MASK > /proc/irq/$IRQ/smp_affinity
echo "IRQ $IRQ -> CPU $CPU_START (mask: $MASK)"
CPU_START=$((CPU_START + 1))
doneCheck NUMA topology:
# Install numactl
sudo dnf install -y numactl
# Show NUMA topology
numactl --hardware
# Show NIC NUMA node
cat /sys/class/net/ens1f0/device/numa_nodeOptimize for NUMA:
# Pin NVMe-TCP connections to NUMA node with storage NICs
# Example: Identify NUMA node for network interfaces
# Find NUMA node for your NVMe interface
cat /sys/class/net/eth1/device/numa_node
# Set IRQ affinity for NVMe interfaces to matching NUMA node
# See IRQ affinity section belowNote: NVMe-TCP uses native NVMe multipathing, not dm-multipath. There is no
multipathdservice to tune for NVMe-TCP.
Edit GRUB configuration:
# Edit /etc/default/grub
sudo vi /etc/default/grub
# Add to GRUB_CMDLINE_LINUX:
# isolcpus=2,3,10,11 nohz_full=2,3,10,11 rcu_nocbs=2,3,10,11 intel_iommu=on iommu=pt
# Update GRUB
sudo grub2-mkconfig -o /boot/grub2/grub.cfg # BIOS
# OR
sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg # UEFI
# Reboot
sudo rebootParameter explanations:
isolcpus- Isolate CPUs from scheduler (dedicate to storage I/O)nohz_full- Disable timer ticks on isolated CPUsrcu_nocbs- Offload RCU callbacks from isolated CPUsintel_iommu=on iommu=pt- Enable IOMMU passthrough
⚠️ Note: These are general CPU and NUMA optimizations that improve overall system performance for I/O-intensive workloads. They do not directly affect NVMe-TCP protocol behavior. Measure baseline performance before and after changes to validate impact in your environment. Thenvme_core.multipath=Yparameter (if not already set) directly affects NVMe multipath behavior.
flowchart TB
subgraph "Linux Host"
HOST[Host NQN]
NIC1[NIC 1<br/>10.100.1.101]
NIC2[NIC 2<br/>10.100.1.102]
end
subgraph "8 Redundant Paths"
P1[Path 1: NIC1→Portal1]
P2[Path 2: NIC1→Portal2]
P3[Path 3: NIC1→Portal3]
P4[Path 4: NIC1→Portal4]
P5[Path 5: NIC2→Portal1]
P6[Path 6: NIC2→Portal2]
P7[Path 7: NIC2→Portal3]
P8[Path 8: NIC2→Portal4]
end
subgraph "Storage Array - 10.100.1.0/24"
SUBSYS[NVMe Subsystem<br/>Single Namespace]
PORTAL1[Portal 1]
PORTAL2[Portal 2]
PORTAL3[Portal 3]
PORTAL4[Portal 4]
end
HOST --> NIC1
HOST --> NIC2
NIC1 --> P1 --> PORTAL1
NIC1 --> P2 --> PORTAL2
NIC1 --> P3 --> PORTAL3
NIC1 --> P4 --> PORTAL4
NIC2 --> P5 --> PORTAL1
NIC2 --> P6 --> PORTAL2
NIC2 --> P7 --> PORTAL3
NIC2 --> P8 --> PORTAL4
PORTAL1 --> SUBSYS
PORTAL2 --> SUBSYS
PORTAL3 --> SUBSYS
PORTAL4 --> SUBSYS
style SUBSYS fill:#5d6d7e,stroke:#333,stroke-width:2px,color:#fff
style HOST fill:#1e8449,stroke:#333,stroke-width:2px,color:#fff
sequenceDiagram
participant App as Application
participant NVMe as NVMe Multipath
participant Path1 as Active Path 1
participant Path2 as Standby Path 2
participant Storage as Storage Array
App->>NVMe: Write Request
NVMe->>Path1: Route via Path 1
Path1->>Storage: I/O Operation
Storage->>Path1: Success
Note over Path1: Path 1 Fails
App->>NVMe: Write Request
NVMe->>Path1: Attempt Path 1
Path1--xNVMe: Timeout/Error
NVMe->>Path2: Automatic Failover
Path2->>Storage: I/O Operation
Storage->>Path2: Success
NVMe->>App: Success
Note over Path1: Path 1 Recovers
Path1->>NVMe: Path Available
NVMe->>NVMe: Rebalance I/O
{% include diagrams-nvme-multipath.md %}
{% include diagrams-failover-nvme.md %}
NVMe-TCP uses native NVMe multipathing built into the Linux kernel. This is NOT dm-multipath (multipath.conf, multipathd) - those are for iSCSI/Fibre Channel only.
Enable Native NVMe Multipath:
# Enable native NVMe multipathing
echo 'options nvme_core multipath=Y' | sudo tee /etc/modprobe.d/nvme-tcp.conf
# Reboot to apply (required if nvme_core already loaded)
sudo rebootConfigure IO Policy for HA:
# Create udev rule for NVMe IO policy
sudo tee /etc/udev/rules.d/99-nvme-iopolicy.rules > /dev/null <<'EOF'
# Set IO policy to queue-depth for all NVMe subsystems (recommended for HA)
ACTION=="add|change", SUBSYSTEM=="nvme-subsystem", ATTR{iopolicy}="queue-depth"
EOF
# Reload udev rules
sudo udevadm control --reload-rules
sudo udevadm triggerConfigure NVMe Connection Timeouts for HA:
# When connecting, use appropriate timeout values
# ctrl-loss-tmo: Time to wait before declaring controller lost (seconds)
# reconnect-delay: Delay between reconnection attempts (seconds)
# Example: Conservative HA settings
nvme connect -t tcp -a <IP> -s 4420 -n <NQN> \
--ctrl-loss-tmo=1800 \
--reconnect-delay=10
# For faster failover (may cause more transient errors):
nvme connect -t tcp -a <IP> -s 4420 -n <NQN> \
--ctrl-loss-tmo=600 \
--reconnect-delay=5Verify Native Multipath Status:
# Check multipath is enabled
cat /sys/module/nvme_core/parameters/multipath
# Should show: Y
# View all paths per subsystem
sudo nvme list-subsys
# Check IO policy
cat /sys/class/nvme-subsystem/nvme-subsys*/iopolicyEnsure proper boot order:
# Create drop-in for services that depend on NVMe storage
sudo mkdir -p /etc/systemd/system/libvirtd.service.d
sudo tee /etc/systemd/system/libvirtd.service.d/storage.conf > /dev/null <<EOF
[Unit]
After=nvmf-autoconnect.service
Wants=nvmf-autoconnect.service
EOF
# Reload systemd
sudo systemctl daemon-reloadSet up monitoring with systemd:
# Create monitoring script for native NVMe multipath
sudo tee /usr/local/bin/check-nvme-paths.sh > /dev/null <<'EOF'
#!/bin/bash
# Check native NVMe multipath status (NOT dm-multipath)
# Count connections that are NOT in 'live' state
FAILED=$(nvme list-subsys 2>/dev/null | grep -c -E "connecting|deleting")
if [ $FAILED -gt 0 ]; then
echo "WARNING: $FAILED NVMe paths not in live state"
nvme list-subsys
exit 1
fi
# Check connection count
EXPECTED_CONNECTIONS=8
ACTUAL=$(nvme list-subsys 2>/dev/null | grep -c "live")
if [ $ACTUAL -lt $EXPECTED_CONNECTIONS ]; then
echo "WARNING: Only $ACTUAL of $EXPECTED_CONNECTIONS NVMe connections active"
nvme list-subsys
exit 1
fi
echo "OK: All NVMe storage paths healthy"
exit 0
EOF
sudo chmod +x /usr/local/bin/check-nvme-paths.sh
# Create systemd timer
sudo tee /etc/systemd/system/check-nvme-paths.service > /dev/null <<EOF
[Unit]
Description=Check NVMe-TCP path health
[Service]
Type=oneshot
ExecStart=/usr/local/bin/check-nvme-paths.sh
StandardOutput=journal
EOF
sudo tee /etc/systemd/system/check-nvme-paths.timer > /dev/null <<EOF
[Unit]
Description=Check NVMe-TCP paths every 5 minutes
[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
[Install]
WantedBy=timers.target
EOF
# Enable timer
sudo systemctl enable --now check-nvme-paths.timer{% include monitoring-maintenance-nvme.md %}
Using Cockpit:
# Install Cockpit
sudo dnf install -y cockpit cockpit-storaged
# Enable and start
sudo systemctl enable --now cockpit.socket
# Access via browser: https://<host>:9090Using Performance Co-Pilot (PCP):
# Install PCP
sudo dnf install -y pcp pcp-system-tools
# Enable and start
sudo systemctl enable --now pmcd pmlogger
# Monitor storage performance
pmrep disk.dev.read disk.dev.write disk.dev.avactive
# Monitor network
pmrep network.interface.in.bytes network.interface.out.bytesInsights Integration (RHEL only):
# Install Red Hat Insights client
sudo dnf install -y insights-client
# Register
sudo insights-client --register
# Run analysis
sudo insights-client{% include security-best-practices-nvme.md %}
FIPS Mode (for compliance):
# Enable FIPS mode
sudo fips-mode-setup --enable
# Verify
fips-mode-setup --check
# Reboot required
sudo rebootAudit Rules for Storage:
# Install auditd
sudo dnf install -y audit
# Add rules for storage access
sudo tee -a /etc/audit/rules.d/storage.rules > /dev/null <<EOF
# Monitor NVMe device access
-w /dev/nvme0n1 -p rwa -k nvme_access
# Monitor NVMe configuration changes
-w /etc/nvme/ -p wa -k nvme_config
-w /etc/modprobe.d/nvme-tcp.conf -p wa -k nvme_multipath_config
-w /etc/udev/rules.d/99-nvme-iopolicy.rules -p wa -k nvme_iopolicy_config
EOF
# Reload rules
sudo augenrules --load
# Enable and start auditd
sudo systemctl enable --now auditdgraph TD
START[NVMe Issue Detected]
START --> CHECK_PATHS{Are paths live?<br/>nvme list-subsys}
CHECK_PATHS -->|No paths| CHECK_NET[Check Network<br/>Connectivity]
CHECK_NET --> PING{Can ping<br/>storage portals?}
PING -->|No| FIX_NET[Fix Network:<br/>- Check cables<br/>- Check switch config<br/>- Check interface IP]
PING -->|Yes| CHECK_MOD{NVMe modules<br/>loaded?}
CHECK_MOD -->|No| LOAD_MOD[Load modules:<br/>modprobe nvme-tcp]
CHECK_MOD -->|Yes| CHECK_NQN[Verify Host NQN<br/>registered on array]
CHECK_PATHS -->|Some paths| CHECK_PARTIAL[Check Failed Paths:<br/>- Interface down?<br/>- Portal unreachable?]
CHECK_PATHS -->|All live| CHECK_PERF{Performance<br/>Issue?}
CHECK_PERF -->|Yes| CHECK_MTU[Verify MTU 9000<br/>end-to-end]
CHECK_MTU --> CHECK_POLICY[Check IO Policy:<br/>queue-depth recommended]
CHECK_POLICY --> CHECK_LOAD[Check Network<br/>Utilization]
CHECK_PERF -->|No| CHECK_PERSIST{Persistence<br/>Issue?}
CHECK_PERSIST -->|Yes| CHECK_SERVICE[Check nvmf-autoconnect<br/>service enabled]
CHECK_SERVICE --> CHECK_CONFIG[Verify discovery.conf]
FIX_NET --> RECONNECT[Reconnect:<br/>nvme connect-all]
LOAD_MOD --> RECONNECT
CHECK_NQN --> RECONNECT
CHECK_PARTIAL --> RECONNECT
CHECK_LOAD --> TUNE[Tune Performance]
CHECK_CONFIG --> ENABLE[Enable service]
RECONNECT --> VERIFY[Verify:<br/>nvme list-subsys]
TUNE --> VERIFY
ENABLE --> VERIFY
style START fill:#5d6d7e,stroke:#333,stroke-width:2px,color:#fff
style VERIFY fill:#1e8449,stroke:#333,stroke-width:2px,color:#fff
{% include diagrams-troubleshooting-nvme.md %}
{% include troubleshooting-common-nvme.md %}
Issue: NetworkManager conflicts with manual configuration
# Disable NetworkManager for specific interface
sudo nmcli device set ens1f0 managed no
# Or configure via NetworkManager instead (recommended)Issue: Firewalld blocking connections
# Temporarily disable for testing
sudo systemctl stop firewalld
# Test connection
sudo nvme connect -t tcp -a <portal_ip> -s 4420 -n <nqn>
# If works, add proper firewall rules
sudo firewall-cmd --permanent --add-port=4420/tcp
sudo firewall-cmd --reload
# Re-enable firewall
sudo systemctl start firewalldIssue: Subscription issues (RHEL)
# Check subscription status
sudo subscription-manager status
# Refresh subscriptions
sudo subscription-manager refresh
# Re-attach if needed
sudo subscription-manager attach --auto- RHEL 9 Storage Administration Guide
- RHEL 9 Security Hardening
- Quick Start Guide
- [Network Concepts]({{ site.baseurl }}/common/network-concepts.html)
- [Multipath Concepts]({{ site.baseurl }}/common/multipath-concepts.html)
- [Performance Tuning]({{ site.baseurl }}/common/performance-tuning.html)
Daily:
- Check NVMe path status:
sudo nvme list-subsys - Check IO policy:
cat /sys/class/nvme-subsystem/nvme-subsys*/iopolicy - Review system logs:
sudo journalctl -p err --since today - Check firewall logs:
sudo journalctl -u firewalld --since today
Weekly:
- Review SELinux denials:
sudo ausearch -m avc -ts week - Check for updates:
sudo dnf check-update - Review performance metrics:
sudo pmrep disk.dev.avactive - Verify backup completion
Monthly:
- Apply security updates:
sudo dnf update --security - Review tuned profile:
sudo tuned-adm verify - Check subscription status:
sudo subscription-manager status - Backup configurations
- Review Insights recommendations:
sudo insights-client
Quarterly:
- Test failover procedures
- Review and update firewall rules
- Audit SELinux policies
- Capacity planning review
- Update documentation