Skip to content

Commit 240db7d

Browse files
committed
format
1 parent 02de7cc commit 240db7d

File tree

1 file changed

+63
-63
lines changed

1 file changed

+63
-63
lines changed

linux/network-performance-ultimate-guide.md

Lines changed: 63 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -73,69 +73,69 @@ Source:
7373

7474
- You check the detailed version at [PackageCloud's article](https://blog.packagecloud.io/illustrated-guide-monitoring-tuning-linux-networking-stack-receiving-data).
7575

76-
<details>
77-
<summary>Click to expand</summary>
78-
- In network devices, it is common for the NIC to raise an **IRQ** to signal that a packet has arrived and is ready to be processed.
79-
- An IRQ (Interrupt Request) is a hardware signal sent to the processor instructing it to suspend its current activity and handle some external event, such as a keyboard input or a mouse movement.
80-
- In Linux, IRQ mappings are stored in **/proc/interrupts**.
81-
- When an IRQ handler is executed by the Linux kernel, it runs at a very, very high priority and often blocks additional IRQs from being generated. As such, IRQ handlers in device drivers must execute as quickly as possible and defer all long running work to execute outside of this context. This is why the **softIRQ** system exists.
82-
- **softIRQ** system is a system that kernel uses to process work outside of the device driver IRQ context. In the case of network devices, the softIRQ system is responsible for processing incoming packets
83-
- Initial setup (from step 1-4):
84-
85-
![](https://cdn.buttercms.com/hwT5dgTatRdfG7UshrAF)
86-
- softIRQ kernel threads are created (1 per CPU).
87-
- The ksoftirqd threads begin executing their processing loops.
88-
- `softnet_data` structures are created (1 per CPU), hold references to important data for processing network data. `poll_list` is created (1 per CPU).
89-
- `net_dev_init` then registers the `NET_RX_SOFTIRQ` softirq with the softirq system by calling `open_softirq` - this registration is called `net_rx_action`,
90-
91-
- Alright, Linux just init and setup networking stack to wait for data arrival:
92-
93-
![](https://cdn.buttercms.com/yharphBYTEm2Kt4G2fT9)
94-
- Data is received by the NIC (Network Interface Card) from the network.
95-
- The NIC uses DMA (Direct Memory Access) to write the network data to RAM (in ring buffer).
96-
- Some NICs are "multiqueue" NICs, meaning that they can DMA incoming packets to one of many ring buffers in RAM.
97-
- The NIC raises an IRQ.
98-
- The device driver's registered IRQ handler is executed.
99-
- The IRQ is cleared on the NIC, so that it can generate IRQs for net packet arrivals.
100-
- NAPI softIRQ poll loop is started with a call to `napi_schedule`.
101-
102-
- Check initial setup diagram (setup 5-8):
103-
- The call to `napi_schedule` in the driver adds the driver's NAPI poll structure to the `poll_list` for the current CPU.
104-
- The softirq pending a bit is set so that the `ksoftirqd` process on this CPU knows that there are packets to process.
105-
- `run_ksoftirqd` function (which is being run in a loop by the `ksoftirq` kernel thread) executes.
106-
- `__do_softirq` is called which checks the pending bitfield, sees that a softIRQ is pending, and calls the handler registered for the pending softIRQ: `net_rx_action` (softIRQ kernel thread executes this, not the driver IRQ handler).
107-
- Now, data processing begins:
108-
- `net_rx_action` loop starts by checking the NAPI poll list for NAPI structures.
109-
- The `budget` and elapsed time are checked to ensure that the softIRQ will not monopolize CPU time.
110-
- The registered `poll` function is called.
111-
- The driver's `poll` function harvests packets from the ring buffer in RAM.
112-
- Packets are handed over to `napi_gro_receive` (GRO - Generic Receive Offloading).
113-
- GRO is a widely used SW-based offloading technique to reduce per-packet processing overheads.
114-
- By reassembling small packets into larger ones, GRO enables applications to process fewer large packets directly, thus reducing the number of packets to be processed.
115-
- Packets are either held for GRO and the call chain ends here or packets are passed on to `netif_receive_skb` to proceed up toward the protocol stacks.
116-
- Network data processing continues from `netif_receive_skb`, but the path of the data depends on whether or not Receive Packet Steering (RPS) is enabled or not.
117-
118-
![](https://cdn.buttercms.com/uoaSO7cgTwKaH1esQgWX)
119-
- If RPS is disabled:
120-
- 1. `netif_receive_skb` passed the data onto `__netif_receive_core`.
121-
- 6. `__netif_receive_core` delivers the data to any taps.
122-
- 7. `__netif_receive_core` delivers data to registered protocol layer handlers.
123-
- If RPS is enabled:
124-
- 1. `netif_receive_skb` passes the data on to `enqueue_to_backlog`.
125-
- 2. Packets are placed on a per-CPU input queue for processing.
126-
- 3. The remote CPU’s NAPI structure is added to that CPU’s poll_list and an IPI is queued which will trigger the softIRQ kernel thread on the remote CPU to wake-up if it is not running already.
127-
- 4. When the `ksoftirqd` kernel thread on the remote CPU runs, it follows the same pattern described in the previous section, but this time, the registered poll function is `process_backlog` which harvests packets from the current CPU’s input queue.
128-
- 5. Packets are passed on toward `__net_receive_skb_core`.
129-
- 6. `__netif_receive_core` delivers data to any taps (like PCAP).
130-
- 7. `__netif_receive_core` delivers data to registered protocol layer handlers.
131-
132-
- Protocol stacks, netfilter, BPF, and finally the userland socket.
133-
- Packets are received by the IPv4 protocol layer with `ip_rcv`.
134-
- Netfilter and a routing optimization are performed.
135-
- Data destined for the current system is delivered to higher-level protocol layers, like UDP.
136-
- Packets are received by the UDP protocol layer with `udp_rcv` and are queued to the receive buffer of a userland socket by `udp_queue_rcv_skb` and `sock_queue_rcv`. Prior to queuing to the receive buffer, BPF are processed.
137-
138-
</details>
76+
<details>
77+
<summary>Click to expand</summary>
78+
- In network devices, it is common for the NIC to raise an **IRQ** to signal that a packet has arrived and is ready to be processed.
79+
- An IRQ (Interrupt Request) is a hardware signal sent to the processor instructing it to suspend its current activity and handle some external event, such as a keyboard input or a mouse movement.
80+
- In Linux, IRQ mappings are stored in **/proc/interrupts**.
81+
- When an IRQ handler is executed by the Linux kernel, it runs at a very, very high priority and often blocks additional IRQs from being generated. As such, IRQ handlers in device drivers must execute as quickly as possible and defer all long running work to execute outside of this context. This is why the **softIRQ** system exists.
82+
- **softIRQ** system is a system that kernel uses to process work outside of the device driver IRQ context. In the case of network devices, the softIRQ system is responsible for processing incoming packets
83+
- Initial setup (from step 1-4):
84+
85+
![](https://cdn.buttercms.com/hwT5dgTatRdfG7UshrAF)
86+
- softIRQ kernel threads are created (1 per CPU).
87+
- The ksoftirqd threads begin executing their processing loops.
88+
- `softnet_data` structures are created (1 per CPU), hold references to important data for processing network data. `poll_list` is created (1 per CPU).
89+
- `net_dev_init` then registers the `NET_RX_SOFTIRQ` softirq with the softirq system by calling `open_softirq` - this registration is called `net_rx_action`,
90+
91+
- Alright, Linux just init and setup networking stack to wait for data arrival:
92+
93+
![](https://cdn.buttercms.com/yharphBYTEm2Kt4G2fT9)
94+
- Data is received by the NIC (Network Interface Card) from the network.
95+
- The NIC uses DMA (Direct Memory Access) to write the network data to RAM (in ring buffer).
96+
- Some NICs are "multiqueue" NICs, meaning that they can DMA incoming packets to one of many ring buffers in RAM.
97+
- The NIC raises an IRQ.
98+
- The device driver's registered IRQ handler is executed.
99+
- The IRQ is cleared on the NIC, so that it can generate IRQs for net packet arrivals.
100+
- NAPI softIRQ poll loop is started with a call to `napi_schedule`.
101+
102+
- Check initial setup diagram (setup 5-8):
103+
- The call to `napi_schedule` in the driver adds the driver's NAPI poll structure to the `poll_list` for the current CPU.
104+
- The softirq pending a bit is set so that the `ksoftirqd` process on this CPU knows that there are packets to process.
105+
- `run_ksoftirqd` function (which is being run in a loop by the `ksoftirq` kernel thread) executes.
106+
- `__do_softirq` is called which checks the pending bitfield, sees that a softIRQ is pending, and calls the handler registered for the pending softIRQ: `net_rx_action` (softIRQ kernel thread executes this, not the driver IRQ handler).
107+
- Now, data processing begins:
108+
- `net_rx_action` loop starts by checking the NAPI poll list for NAPI structures.
109+
- The `budget` and elapsed time are checked to ensure that the softIRQ will not monopolize CPU time.
110+
- The registered `poll` function is called.
111+
- The driver's `poll` function harvests packets from the ring buffer in RAM.
112+
- Packets are handed over to `napi_gro_receive` (GRO - Generic Receive Offloading).
113+
- GRO is a widely used SW-based offloading technique to reduce per-packet processing overheads.
114+
- By reassembling small packets into larger ones, GRO enables applications to process fewer large packets directly, thus reducing the number of packets to be processed.
115+
- Packets are either held for GRO and the call chain ends here or packets are passed on to `netif_receive_skb` to proceed up toward the protocol stacks.
116+
- Network data processing continues from `netif_receive_skb`, but the path of the data depends on whether or not Receive Packet Steering (RPS) is enabled or not.
117+
118+
![](https://cdn.buttercms.com/uoaSO7cgTwKaH1esQgWX)
119+
- If RPS is disabled:
120+
- 1. `netif_receive_skb` passed the data onto `__netif_receive_core`.
121+
- 6. `__netif_receive_core` delivers the data to any taps.
122+
- 7. `__netif_receive_core` delivers data to registered protocol layer handlers.
123+
- If RPS is enabled:
124+
- 1. `netif_receive_skb` passes the data on to `enqueue_to_backlog`.
125+
- 2. Packets are placed on a per-CPU input queue for processing.
126+
- 3. The remote CPU’s NAPI structure is added to that CPU’s poll_list and an IPI is queued which will trigger the softIRQ kernel thread on the remote CPU to wake-up if it is not running already.
127+
- 4. When the `ksoftirqd` kernel thread on the remote CPU runs, it follows the same pattern described in the previous section, but this time, the registered poll function is `process_backlog` which harvests packets from the current CPU’s input queue.
128+
- 5. Packets are passed on toward `__net_receive_skb_core`.
129+
- 6. `__netif_receive_core` delivers data to any taps (like PCAP).
130+
- 7. `__netif_receive_core` delivers data to registered protocol layer handlers.
131+
132+
- Protocol stacks, netfilter, BPF, and finally the userland socket.
133+
- Packets are received by the IPv4 protocol layer with `ip_rcv`.
134+
- Netfilter and a routing optimization are performed.
135+
- Data destined for the current system is delivered to higher-level protocol layers, like UDP.
136+
- Packets are received by the UDP protocol layer with `udp_rcv` and are queued to the receive buffer of a userland socket by `udp_queue_rcv_skb` and `sock_queue_rcv`. Prior to queuing to the receive buffer, BPF are processed.
137+
138+
</details>
139139

140140
![](https://raw.githubusercontent.com/ntk148v/til/master/linux/images/linux-networking-recv.png)
141141

0 commit comments

Comments
 (0)