Skip to content

Various fixes for working rtwn #2385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 2, 2025
Merged

Various fixes for working rtwn #2385

merged 12 commits into from
Apr 2, 2025

Conversation

jrtc27
Copy link
Member

@jrtc27 jrtc27 commented Apr 1, 2025

  • rtwn: ensure TX work isn't scheduled during reset / abort
  • rtwn: change the USB TX transfers to only do one pending transfer per endpoint
  • rtwn: enable FCS in the recive config to work around truncated frames
  • rtwn: don't do 64 bit TSF extension by default
  • rtwn: make sure RCR_APPFCS stays set in monitor mode / mode changes.
  • net80211: crypto: ccmp: fix more hardware offload bits
  • net80211: validate control frame TA/RA before further processing
  • rtwn: enable reception of BAR frames
  • rtwn: Disable IP/TCP/UDP checksum offloading
  • rtwn: Add missing subobject bounds annotations
  • net80211: Don't bound structs that extend radiotap headers
  • net80211: Fix subobject bounds issue in IEEE80211_IOC_CHANINFO

Adrian Chadd and others added 12 commits April 1, 2025 22:15
Don't schedule work during reset / abort.  For USB NICs, work
must not be scheduled during a call to rtwn_usb_abort_xfers(),
as then it'll cause the call to usbd_transfer_drain() to hang.

This fixes a hang I've been seeing where the NIC hits a TX timeout
and then the reset/re-init path is called.  If data is scheduled
to be transmitted in that window, the call to usbd_transfer_drain()
would hang and require a hard reboot to recover.

Differential Revision: https://reviews.freebsd.org/D47479

(cherry picked from commit 8838f3c32ac0ebcb8b20863f8c455375039a505e)
… endpoint

I found I was getting constant device timeouts when doing anything
more complicated than a single SSH on laptop with RTL8811AU.

After digging into it, i found a variety of fun situations, including
traffic stalls that would recover w/ a shorter (1 second) USB transfer
timeout.  However, the big one is a straight up hang of any TX endpoint
until the NIC was reset.  The RX side kept going just fine; only the
TX endpoints would hang.

Reproducing it was easy - just start up a couple of traffic streams
on different WME AC's - eg a best effort + bulk transfer, like
browsing the web and doing an ssh clone - throw in a ping -i 0.1
to your gateway, and it would very quickly hit device timeouts every
couple of seconds.

I put everything into a single TX EP and the hangs went away.
Well, mostly.

So after some MORE digging, I found that this driver isn't checking
if the transfers are going into the correct EPs for the packet
WME access category / 802.11 TID; and would frequently be able
to schedule multiple transfers into the same endpoint.

Then there's a second problem - there's an array of endpoints
used for setting up the USB device, with .endpoint = UE_ADDR_ANY,
however they're also being setup with the same endpoint configured
in multiple transfer configs.  Eg, a NIC with 3 or 4 bulk TX endpoints
will configure the BK and BE endpoints with the same physical endpoint
ID.  This also leads to timed out transfers.

My /guess/ was that the firmware isn't happy with one or both of the
above, and so I solved both.

* drop the USB transfer timeout to 1 second, not 5 seconds -
  that way we'll either get a 1 second traffic pause and USB transfer
  failure, or a 5 second device timeout.  Having both the TX timeout
  and the USB transfer timeout made recovery from a USB transfer
  timeout (without a NIC reset) almost impossible.

* enforce one transfer per endpoint;
* separate pending/active buffer tracking per endpoint;
* each endpoint now has its own TX callback to make sure the queue /
  end point ID is known;
* and only frames from a given endpoint pending queue is going
  into the active queue and into that endpoint.
* Finally, create a local wme2qid array and populate it with the
  endpoint mapping that ensures unique physical endpoint use.

Locally tested:

* rtl8812AU, 11n STA mode
* rtl8192EU, 11n STA mode (with diffs to fix the channel config / power
  timeouts.)

Differential Revision: https://reviews.freebsd.org/D47522

(cherry picked from commit d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b)
I noticed that on RTL8812AU/RTL8821AU receiving VHT frames that
I'd occasionally see frames missing the last 4 bytes.  I can
easily reproduce it with a ping sweep and fast (10ms) between frames.

There's also a report of an earlier NIC (RTL8188EU) doing the same
thing with HT frames but not with OFDM (11g) frames.

After a bunch of poking, it turns out a driver where things DID work
properly for the other report kept FCS enabled, and trimmed it from
the frame before pushing it up to the network layer.

I did the same and it also worked fine.

The other solution was to disable PHYSTATUS notifications, but then
we'd get no per packet RX notifications (RX rate, RSSI, etc.)

Locally tested:

* RTL8192EU, STA mode (HT)
* RTL8812AU, STA mode (HT, VHT)
* RTL8821AU, STA mode (HT, VHT)

Differential Revision:	https://reviews.freebsd.org/D47775

(cherry picked from commit d76247e801dec95600a068fa1bb09f4f57e00031)
The TSF64 extension involves at least 3 reads from TSF registers
(R92C_TSFTR(0), R92C_TSFTR(1), R92C_TSFTR(2)) which are 4 byte
control transfers.  They take up valuable USB link time.

It's likely much less expensive for PCIe adapters.  At some point
it may be worthwhile enabling it by default just for those.

With this disabled, the only USB traffic that I see during
normal data operation are bulk TX/RX data transfers for 802.11
packets, and on NICs w/ net80211 rate control, the control register
space read/writes for TX completion.  (And that will also need
addressing.)

This is the difference between 15mbit TCP RX and 30mbit TCP RX
on the 11n NICs, and around 40 to 50mbit TCP RX on the 11ac NICs
in HT40 and VHT80.

Locally tested:

* RTL8188EU, STA mode
* RTL8192CU, STA mode
* RTL8192EU, STA mode
* RTL8811AU, STA mode
* RTL8821AU, STA mode

Differential Revision:	https://reviews.freebsd.org/D47861

(cherry picked from commit fcb5e8d0c19ac21515ab3047d39a76b32d835cec)
My previous commit meant that APPFCS wasn't enabled during monitor
mode and likely other corner cases.

Ensure it stays on at all times.

This, amusingly, fixes monitor mode in RTL8812AU/RTL8821AU - without it,
I don't see HT/VHT frames in monitor mode but I can still receive them
in normal STA mode.

Differential Revision:	https://reviews.freebsd.org/D48112

(cherry picked from commit 791170aaf7efb4e053ccbf537d80a43e8a81d1e4)
Add the missing IEEE80211_RX_F_DECRYPTED and IEEE80211_RX_F_MMIC_STRIP
(really just MIC_STRIP) checks to make hwaccel offload work.
This makes rtw8x drivers pass RX packets again at least with LinuxKPI
if HW_CRYPTO support is enabled.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D49030

(cherry picked from commit 3afc0bfecb1a927c37672dc245688c575e4d9ec4)
Summary:
An earlier commit relaxed the TA/RA rules around control frames
to fix other issues, however it now results in control frames
not specifically from a known node / to us to be handled in the control
path.

Specifically, rtwn(4) RTL8812/RTL8821 NICs are currently passing BARs
from the AP TA to any destination to us; which is tripping up BAW
tracking and causing traffic hangs.

Reviewers: #wireless, jrtc27, bz

Subscribers: bz, imp, melifaro, glebius

Tags: #wireless

Differential Revision: https://reviews.freebsd.org/D49575
Summary:
The RX filter wasn't enabling BAR frames, so we weren't receiving them
during normal operation.

Jessica noticed we WERE getting BAR frames, but only when promisc mode
is active.  Which is a different set of bugs, but it did highlight
the differences here.

PR: kern/285822

Reviewers: #wireless, jrtc27

Subscribers: imp

Tags: #wireless

Differential Revision: https://reviews.freebsd.org/D49596
I have consistently seen false-positive checksum failures for UDP/IP
packets with a missing, i.e. zero, checksum (specifically, for packets
from the UniOfCam-IoT network's DHCP server), but valid checksums
reported for UDP/IP packets with a non-zero checksum. Therefore I
believe that the hardware does not correctly handle the optional nature
of UDP/IP's checksums, so ignore the result and let the network stack
validate the checksum itself.

Neither Linux nor OpenBSD seems to use this feature that I can see.
Whilst there is the old Linux vendor driver that defines macros for
these specific fields, it too does not check them.
65fdef2 ("rtwn: Add subobject bounds annotations for softc
subclasses.") caught some of these but not all. This allows rtwn to
attach to a TP-Link Archer T2U Nano (RTL8821AU).
These are intended to be embedded, not used standalone, with a pointer
to the inner ieee80211_radiotap_header member passed to net80211
functions that is then used to walk off the end of the header into the
larger containing struct. Whilst ieee80211_radiotap_vendor_header
doesn't appear to be problematic today, it's still intended to be
embedded as a header, and it seems likely that a similar access pattern
(rather than always being derived from a ieee80211_radiotap_header *)
could appear, so keep it consistent with the non-vendor one.
We fault on the kernel access into the array after ic_nchans here which
manifests as EFAULT returned back to userspace.
@brooksdavis brooksdavis merged commit 2339ee8 into dev Apr 2, 2025
29 checks passed
@brooksdavis brooksdavis deleted the rtwn branch April 2, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants