Skip to content

Commit d3c5628

Browse files
charsyamnamjaejeon
authored andcommitted
ksmbd: centralize ksmbd_conn final release to plug transport leak
ksmbd_conn_free() is one of four sites that can observe the last refcount drop of a struct ksmbd_conn. The other three fs/smb/server/connection.c ksmbd_conn_r_count_dec() fs/smb/server/oplock.c __free_opinfo() fs/smb/server/vfs_cache.c session_fd_check() end the conn with a bare kfree(), skipping ida_destroy(&conn->async_ida) and conn->transport->ops->free_transport(conn->transport). Whenever one of them is the last putter, the embedded async_ida and the entire transport struct leak -- for TCP, that is also the struct socket and the kvec iov. __free_opinfo() being a final putter is not theoretical. opinfo_put() queues the callback via call_rcu(&opinfo->rcu, free_opinfo_rcu), so ksmbd_server_terminate_conn() can deposit N opinfo releases in RCU and have ksmbd_conn_free() run in the handler thread before any of them fire. ksmbd_conn_free() then observes refcnt > 0 and short-circuits; the last RCU-delivered __free_opinfo() falls onto its bare kfree(conn) branch and the transport is lost. Reproducer (QEMU/virtme guest, ksmbd server and CIFS client in the same guest, mounting //127.0.0.1/testshare): each iteration holds 8 files open via sleep processes, force-closes TCP with `ss -K sport = :445`, kills the holders, lazy-umounts; repeated 10 times, then ksmbd shutdown and kmemleak scan. Kprobes: ksmbd_conn_alloc, ksmbd_conn_free, ksmbd_tcp_free_transport, free_opinfo_rcu. (ksmbd_tcp_new_connection did not register stably so ksmbd_conn_alloc is the lifecycle anchor.) A/B validation, same image, varying only ksmbd.ko. Pre-patch is HEAD with only patch 1 of this series applied: state conn_alloc conn_free tcp_free opi_rcu kmemleak ---------- ---------- --------- -------- ------- -------- pre-patch 20 20 10 160 7 with patch 20 20 20 160 0 Pre-patch conn_free=20 with tcp_free=10 directly demonstrates the bare-kfree paths skipping transport cleanup, and kmemleak reports 7 unreferenced allocations whose backtraces point into the ksmbd text (struct tcp_transport, t->iov kvec). With this patch tcp_free matches conn_free at 20/20 and kmemleak is clean across two independent post-patch runs. opi_rcu=160 confirms the RCU opinfo release path that motivates the fix is exercised. The transport leak fix is established by this A/B comparison. Move the per-struct final release into __ksmbd_conn_release_work() and route the three bare-kfree final-put sites through a new ksmbd_conn_put(). Those sites now pair ida_destroy() and free_transport() with kfree(conn) regardless of which holder happens to release the last reference. The stop_sessions() path keeps patch 1's local pin-drop cleanup open-coded instead of routing that temporary reference through ksmbd_conn_put(), so this patch does not layer the conn-put conversion on top of the stop_sessions() iteration rewrite and the two fixes remain independently reviewable and revertible. Applying this patch alone still leaves stop_sessions() on its own local cleanup; patch 1 fixes that site separately. The centralized release reaches sock_release() -> tcp_close() -> lock_sock_nested() from every final putter. lock_sock_nested() has might_sleep(), and __free_opinfo() can be released from an RCU callback (free_opinfo_rcu(), softirq on default kernels). Calling the final release directly from that path trips CONFIG_DEBUG_ATOMIC_SLEEP: BUG: sleeping function called from invalid context at net/core/sock.c:3785 in_atomic(): 1, irqs_disabled(): 0, non_block: 0 Call Trace: <IRQ> lock_sock_nested+0x43/0xa0 tcp_close+0x19/0xa0 inet_release+0x44/0x90 sock_release+0x25/0x90 ksmbd_tcp_free_transport+0x16/0x40 [ksmbd] __ksmbd_conn_release_work+0x... [ksmbd] ksmbd_conn_put+0x... [ksmbd] free_opinfo_rcu+0x... [ksmbd] rcu_do_batch+0x1e5/0x5c0 rcu_core+0x395/0x4d0 Handle this once inside ksmbd_conn_put() by making the final release unconditional through a dedicated ksmbd_conn_wq workqueue. When the refcount reaches zero, ksmbd_conn_put() queues the pre-initialized release_work onto the workqueue and returns; the work handler runs the sleep-allowed teardown (ida_destroy, free_transport -> sock_release, kfree) in process context. That makes ksmbd_conn_put() safe to call from RCU callbacks and other non-sleeping putter contexts without each call site needing its own bounce. Moving the final release onto a workqueue is not by itself enough on the session_fd_check() path: __close_file_table_ids() holds write_lock(&ft->lock) across the skip callback, and session_fd_check() already sleeps in ksmbd_vfs_copy_durable_owner() -> kstrdup(GFP_KERNEL) and down_write(&fp->f_ci->m_lock) (a rw_semaphore) before it ever reaches ksmbd_conn_put(). These sleeps pre-date this patch but would equally trip CONFIG_DEBUG_ATOMIC_SLEEP on a durable-fd workload. Refactor __close_file_table_ids() to take a transient reference on fp and unpublish fp from the session idr *under ft->lock* before calling skip() outside the lock. A transient ref protects lifetime but not concurrent field mutation, so the idr_remove() is what keeps __ksmbd_lookup_fd() through this session's idr from granting a new ksmbd_fp_get() reference to an fp whose fp->conn / fp->tcon / fp->volatile_id / op->conn / lock_list links are about to be rewritten by session_fd_check(). The same unpublished transition also clears fp->volatile_id under ft->lock, preventing any later final close of the same fp from removing a reused idr slot. Durable reconnect is unaffected because it reaches fp through the global durable table (ksmbd_lookup_durable_fd -> global_ft). After skip() returns, the preserve path drops the transient with atomic_dec(): fp keeps the original +1 refcount that used to represent the session idr entry so the durable scavenger can later expire it once the timeout elapses. The close path transitions f_state to FP_CLOSED under ft->lock (matching ksmbd_close_fd()) so ksmbd_fp_get() lookups via any remaining path fail, then removes fp from m_fp_list before dropping both the transient and the original session-idr ref via atomic_sub_and_test(2). The list removal cannot be left for a deferred final putter because fp->volatile_id has already been cleared and __ksmbd_remove_fd() will intentionally skip both idr_remove() and list_del_init(). The subtraction cannot underflow because no concurrent close path can consume the session-idr ref after the idr_remove() above. If the subtraction hits zero we finalize fp ourselves, otherwise the remaining user's ksmbd_fd_put() finalizes via __put_fd_final() -> __ksmbd_close_fd(). The __close_file_table_ids() refactor is exercised separately on a debug kernel additionally built with CONFIG_DEBUG_LIST and CONFIG_DEBUG_OBJECTS_WORK using a same-session two-tcon workload: one tcon drives an open/write storm while the other tcon repeats 50 tree disconnects on the same session. Trace counts: 52 __close_file_table_ids invocations, 4793 __ksmbd_close_fd calls, 30337 __put_fd_final, 9578 ksmbd_conn_put decrements, 1 __ksmbd_conn_release_work execution. The workload exercises the idr_remove() / fp->volatile_id clear / m_fp_list unlink coupling under concurrent fp allocation in the same session table. This run validates the file-table/id/list rewrite under DEBUG_LIST/DEBUG_OBJECTS_WORK; it does not re-prove the transport leak fix, which the abrupt-disconnect A/B above already covered. No list-corruption, work_struct ODEBUG, sleep-in-atomic, lockdep or kmemleak reports were observed. The deferred-final-putter branch in __close_file_table_ids() (atomic_sub_and_test(2) returning false) is covered by analysis, not by a dedicated counter in this run: the trace points used above cannot distinguish a deferred-putter __ksmbd_close_fd from a normal SMB2_CLOSE __ksmbd_close_fd. __close_file_table_ids() unconditionally clears fp->volatile_id and unlinks fp from m_fp_list before atomic_sub_and_test(2), so __ksmbd_remove_fd() invoked from a later __put_fd_final() correctly skips both idr_remove() and list_del_init(). At module exit, the workqueue is flushed and destroyed after rcu_barrier(), so any release queued by a trailing RCU callback is drained before the inode hash and the module text go away. Verified by kprobe tracing that all 20 __ksmbd_conn_release_work() executions complete before ksmbd_conn_wq_destroy() enters, with ksmbd_tcp_free_transport() matching ksmbd_conn_alloc() at 20/20. The ida_destroy() previously added to ksmbd_conn_free()'s refcount branch is folded into __ksmbd_conn_release_work() so it runs from whichever site turns out to be the last putter. The conversion also closes a pre-existing ksmbd_conn lifetime gap: fp used to store a borrowed pointer to its connection (fp->conn = work->conn) without taking a reference, so nothing stopped the conn from being freed while an fp still held a stale fp->conn. Teach fp to own a strong reference on fp->conn for as long as fp->conn is non-NULL: * ksmbd_open_fd() and ksmbd_reopen_durable_fd() bump conn->refcnt when assigning fp->conn (matching put on the ksmbd_open_fd() error path and on the ksmbd_reopen_durable_fd() __open_id() failure path). Both now set fp->conn and fp->tcon before __open_id() publishes fp into the session's file table, so a concurrent teardown that iterates the table via idr cannot observe a valid volatile_id with fp->conn still NULL and preserve a partially-initialized fp. * session_fd_check() (durable preserve) and __ksmbd_close_fd() (fp destroy) release the owned reference via ksmbd_conn_put() and clear fp->conn. With that invariant in place, session_fd_check() needs no local pin across the op->conn puts -- fp's own reference keeps conn alive for the entire body of the function, including the subsequent conn->llist_lock access. The NULL-guard at the top of session_fd_check() stays: a durable reconnect that has already been through cleanup once leaves fp->conn cleared, and the lock_list loop would otherwise dereference NULL. The kernel under test was built with CONFIG_DEBUG_KMEMLEAK, CONFIG_PROVE_LOCKING, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_DEBUG_OBJECTS and CONFIG_FAILSLAB, plus CONFIG_DEBUG_LIST and CONFIG_DEBUG_OBJECTS_WORK for the two-tcon stress run. The __close_file_table_ids() refactor was also exercised under the same debug kernel with a local test harness that forces is_reconnectable() to return true so session_fd_check() reaches the ksmbd_vfs_copy_durable_owner()/down_write(&ci->m_lock) path for every fp (Linux cifs.ko does not request durable handles in the generic mount path so the slow path is otherwise not covered). With the refactor applied the harness traverses the full sleep path and CONFIG_DEBUG_ATOMIC_SLEEP / CONFIG_PROVE_LOCKING stay silent. Reverting only the __close_file_table_ids() hunk while keeping the harness produces: BUG: sleeping function called from invalid context at vfs_cache.c:1095 __might_sleep+0x49/0x60 [ BUG: Invalid wait context ] which confirms that the refactor is what keeps ft->lock out of the sleepable skip() body. Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
1 parent de46ee6 commit d3c5628

5 files changed

Lines changed: 248 additions & 38 deletions

File tree

connection.c

Lines changed: 59 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,62 @@ static int create_proc_clients(void) { return 0; }
8080
static void delete_proc_clients(void) {}
8181
#endif
8282

83+
static struct workqueue_struct *ksmbd_conn_wq;
84+
85+
int ksmbd_conn_wq_init(void)
86+
{
87+
ksmbd_conn_wq = alloc_workqueue("ksmbd-conn-release",
88+
WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
89+
if (!ksmbd_conn_wq)
90+
return -ENOMEM;
91+
return 0;
92+
}
93+
94+
void ksmbd_conn_wq_destroy(void)
95+
{
96+
if (ksmbd_conn_wq) {
97+
destroy_workqueue(ksmbd_conn_wq);
98+
ksmbd_conn_wq = NULL;
99+
}
100+
}
101+
102+
/*
103+
* __ksmbd_conn_release_work() - perform the final, once-per-struct cleanup
104+
* of a ksmbd_conn whose refcount has just dropped to zero.
105+
*
106+
* This is the common release path used by ksmbd_conn_put() for the embedded
107+
* state that outlives the connection thread: async_ida and the attached
108+
* transport (which owns the socket and iov for TCP). Called from a workqueue
109+
* so that sleep-allowed teardown (sock_release -> tcp_close ->
110+
* lock_sock_nested) never runs from an RCU softirq callback (free_opinfo_rcu)
111+
* or any other non-sleeping putter context.
112+
*/
113+
static void __ksmbd_conn_release_work(struct work_struct *work)
114+
{
115+
struct ksmbd_conn *conn =
116+
container_of(work, struct ksmbd_conn, release_work);
117+
118+
ida_destroy(&conn->async_ida);
119+
conn->transport->ops->free_transport(conn->transport);
120+
kfree(conn);
121+
}
122+
123+
/**
124+
* ksmbd_conn_put() - drop a reference and, if it was the last, queue the
125+
* release onto ksmbd_conn_wq so it runs from process context.
126+
*
127+
* Callable from any context including RCU softirq callbacks and non-sleeping
128+
* locks; the actual release is deferred to the workqueue.
129+
*/
130+
void ksmbd_conn_put(struct ksmbd_conn *conn)
131+
{
132+
if (!conn)
133+
return;
134+
135+
if (atomic_dec_and_test(&conn->refcnt))
136+
queue_work(ksmbd_conn_wq, &conn->release_work);
137+
}
138+
83139
/**
84140
* ksmbd_conn_free() - free resources of the connection instance
85141
*
@@ -98,19 +154,7 @@ void ksmbd_conn_free(struct ksmbd_conn *conn)
98154
kvfree(conn->request_buf);
99155
kfree(conn->preauth_info);
100156
kfree(conn->mechToken);
101-
if (atomic_dec_and_test(&conn->refcnt)) {
102-
/*
103-
* async_ida is embedded in struct ksmbd_conn, so pair
104-
* ida_destroy() with the final kfree() rather than with
105-
* the unconditional field teardown above. This keeps
106-
* the IDA valid for the entire lifetime of the struct,
107-
* even while other refcount holders (oplock / vfs
108-
* durable handles) still reference the connection.
109-
*/
110-
ida_destroy(&conn->async_ida);
111-
conn->transport->ops->free_transport(conn->transport);
112-
kfree(conn);
113-
}
157+
ksmbd_conn_put(conn);
114158
}
115159

116160
/**
@@ -141,6 +185,7 @@ struct ksmbd_conn *ksmbd_conn_alloc(void)
141185
conn->um = ERR_PTR(-EOPNOTSUPP);
142186
if (IS_ERR(conn->um))
143187
conn->um = NULL;
188+
INIT_WORK(&conn->release_work, __ksmbd_conn_release_work);
144189
atomic_set(&conn->req_running, 0);
145190
atomic_set(&conn->r_count, 0);
146191
atomic_set(&conn->refcnt, 1);
@@ -566,8 +611,7 @@ void ksmbd_conn_r_count_dec(struct ksmbd_conn *conn)
566611
if (!atomic_dec_return(&conn->r_count) && waitqueue_active(&conn->r_count_q))
567612
wake_up(&conn->r_count_q);
568613

569-
if (atomic_dec_and_test(&conn->refcnt))
570-
kfree(conn);
614+
ksmbd_conn_put(conn);
571615
}
572616

573617
int ksmbd_conn_transport_init(void)

connection.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include <linux/kthread.h>
1717
#include <linux/nls.h>
1818
#include <linux/unicode.h>
19+
#include <linux/workqueue.h>
1920

2021
#include "smb_common.h"
2122
#include "ksmbd_work.h"
@@ -118,6 +119,7 @@ struct ksmbd_conn {
118119
bool binding;
119120
atomic_t refcnt;
120121
bool is_aapl;
122+
struct work_struct release_work;
121123
};
122124

123125
struct ksmbd_conn_ops {
@@ -163,6 +165,9 @@ void ksmbd_conn_wait_idle(struct ksmbd_conn *conn);
163165
int ksmbd_conn_wait_idle_sess_id(struct ksmbd_conn *curr_conn, u64 sess_id);
164166
struct ksmbd_conn *ksmbd_conn_alloc(void);
165167
void ksmbd_conn_free(struct ksmbd_conn *conn);
168+
void ksmbd_conn_put(struct ksmbd_conn *conn);
169+
int ksmbd_conn_wq_init(void);
170+
void ksmbd_conn_wq_destroy(void);
166171
bool ksmbd_conn_lookup_dialect(struct ksmbd_conn *c);
167172
int ksmbd_conn_write(struct ksmbd_work *work);
168173
int ksmbd_conn_rdma_read(struct ksmbd_conn *conn,

oplock.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,8 +138,8 @@ static void __free_opinfo(struct oplock_info *opinfo)
138138
{
139139
if (opinfo->is_lease)
140140
free_lease(opinfo);
141-
if (opinfo->conn && atomic_dec_and_test(&opinfo->conn->refcnt))
142-
kfree(opinfo->conn);
141+
if (opinfo->conn)
142+
ksmbd_conn_put(opinfo->conn);
143143
kfree(opinfo);
144144
}
145145

server.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -614,8 +614,15 @@ static int __init ksmbd_server_init(void)
614614
ret = ksmbd_workqueue_init();
615615
if (ret)
616616
goto err_crypto_destroy;
617+
618+
ret = ksmbd_conn_wq_init();
619+
if (ret)
620+
goto err_workqueue_destroy;
621+
617622
return 0;
618623

624+
err_workqueue_destroy:
625+
ksmbd_workqueue_destroy();
619626
err_crypto_destroy:
620627
ksmbd_crypto_destroy();
621628
err_release_inode_hash:
@@ -641,6 +648,12 @@ static void __exit ksmbd_server_exit(void)
641648
{
642649
ksmbd_server_shutdown();
643650
rcu_barrier();
651+
/*
652+
* ksmbd_conn_put() defers the final release onto ksmbd_conn_wq,
653+
* so drain it after rcu_barrier() has fired any pending RCU
654+
* callbacks that may have queued a release.
655+
*/
656+
ksmbd_conn_wq_destroy();
644657
ksmbd_release_inode_hash();
645658
}
646659

0 commit comments

Comments
 (0)