You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Component: glusterd — snapshot ZFS provider File:xlators/mgmt/glusterd/src/snapshot/glusterd-zfs-snapshot.c Affected: present since the ZFS provider was added (PR #2855); confirmed on devel @ ae1d69672f (v12dev) and shipped in release-11. Class: CWE-562 (Return of Stack Variable Address) / use-after-scope.
Summary
glusterd_zfs_dataset() hands its caller, through a char ** out-parameter, a pointer
that points into a function-local automatic buffer. The buffer's lifetime ends when
the function returns, so every caller dereferences a dangling pointer. It is the same
defect class that was already fixed twice in these very files (see Precedent), missed
here because the escape launders through strtok().
Affected code
int32_tglusterd_zfs_dataset(char*brick_path, char**pool_name)
{
chardataset[PATH_MAX] =""; /* line 37: automatic storage */
...
ptr=fgets(dataset, sizeof(dataset), runner_chio(&runner, STDOUT_FILENO));
...
*pool_name=strtok(dataset, "\n"); /* line 77: out-param -> into `dataset` */out:
returnret; /* line 80: `dataset` goes out of scope */
}
All four callers then use the returned pointer after the function has returned:
Caller
deref site
glusterd_zfs_snapshot_create_or_clone
snprintf(snap_device, …, "%s@%s_%d", dataset, …) — line 161
glusterd_zfs_brick_details
dict_set_str(rsp_dict, key, dataset) — line 256
glusterd_zfs_snapshot_remove
snprintf(snap_device, …, dataset, …) — line 318
glusterd_zfs_snapshot_restore
snprintf(snap_device, …, dataset, …) — line 410
Root cause
The out-parameter is set to strtok(dataset, …), i.e. a pointer into the local dataset[PATH_MAX]. Once the function returns, that storage is no longer valid. The code
"works" today only because each caller consumes the pointer before any intervening call
clobbers the reused stack frame — undefined behaviour that can silently corrupt the
dataset name and cause zfs snapshot/clone/destroy/restore to operate on the wrong
dataset, or store a garbage value into the details response dict.
It is not flagged by Coverity or GCC -Wreturn-local-addr because the address escapes
indirectly (through strtok() and a char ** out-param) rather than as a direct return &local;.
Precedent (same defect class, already fixed in these files)
d9b57691a9 — "glusterd: Fix coverity issue (CWE-562)" changed the
sibling glusterd_zfs_snap_clone_brick_path() from returning a local's address through char **snap_brick_path to writing directly into the caller-owned brickinfo->path.
glusterd_zfs_dataset() is the third instance of the pattern and was not caught by either pass.
Proposed fix
There is a subtlety the naive fix misses: dict_set_str() does not copy — str_to_data
→ strn_to_data stores the raw pointer (data->data = value; data->is_static = _gf_true) and
the response dict outlives the caller's frame (it is read at dict_serialize() time). So a
blanket "gf_strdup() + GF_FREE(dataset) in each of the four callers" turns the
use-after-scope into a use-after-free in glusterd_zfs_brick_details (the dict reads the
freed string later); dropping the GF_FREE instead leaks (the dict never frees a static
value). The details caller needs an owning dict set.
Preferred — give the helper a caller-owned output buffer (matches the precedent fix d9b57691a9, which wrote into brickinfo->path):
create / remove / restore (zfs:157/314/406): pass a local char dataset[PATH_MAX] and
use it directly — it is valid for the caller's lifetime.
glusterd_zfs_brick_details (zfs:247): pass a local buffer, then dict_set_dynstr_with_alloc(rsp_dict, key, dataset) (copies into the dict; the dict owns
its copy).
Alternative — heap return:*pool_name = gf_strdup(strtok(dataset, "\n")) (+ NULL-check),
then create/remove/restoreGF_FREE(dataset) after use, and brick_details uses dict_set_dynstr(rsp_dict, key, dataset) to transfer ownership to the dict (no GF_FREE
in the caller; the dict frees it on destroy). Do not pair dict_set_str with GF_FREE.
This is exactly the idiom the sibling glusterd_lvm_brick_details already uses for vgname
(value = gf_strdup(token); … dict_set_dynstr(rsp_dict, key, value);) — the ZFS provider just
needs to match it.
Reproduction
Reproduced on a shipping GlusterFS 11.2 + ZFS 2.4.2 host (the ZFS provider is selected
when the brick filesystem probes as zfs).
Deterministic, operator-visible, no debugger — a multi-brick volume whose bricks are distinct ZFS datasets, snapshotted and activated, then gluster snapshot status:
# bricks on datasets pool/aaa, pool/bbb, pool/ccc
gluster snapshot create snap1 vol no-timestamp
gluster snapshot activate snap1
gluster snapshot status snap1
Brick … pool/aaa … Volume Group : pool/ccc <-- should be pool/aaa
Brick … pool/bbb … Volume Group : pool/ccc <-- should be pool/bbb
Brick … pool/ccc … Volume Group : pool/ccc
Every brick reports the last brick's dataset. glusterd_zfs_dataset() is called at the
same stack depth for each brick, so dataset[PATH_MAX] lands at the same address and strtok
returns the same pointer each call; dict_set_str() stores that pointer without copying
(str_to_data/strn_to_data sets data->data = value; is_static = _gf_true), so every *.vgname entry aliases one dead stack address and dict_serialize() reads the last value
written. strace confirms zfs list -Ho name resolves each brick correctly — only the
stored pointer is wrong. (The Brick Path field, built independently, is correct per brick,
isolating the defect to the vgname/dataset field.)
Single-brick / create / restore paths consume the pointer before the popped frame is
reused, so they appear to work today — the classic latent-UB "works until it doesn't". A gdb
witness on a live glusterd confirms the pointer is dangling in all cases: the address
returned by strtok is above $sp inside glusterd_zfs_dataset (in scope) and ~4 KB below $sp by the time the caller stores it (a popped frame).
Compiler view: the pattern is invisible to -Wreturn-local-addr and Coverity (the address
escapes through strtok()'s return value and a char ** out-param), but AddressSanitizer
reports stack-use-after-return at the deref — usable directly as a regression test.
Component: glusterd — snapshot ZFS provider
File:
xlators/mgmt/glusterd/src/snapshot/glusterd-zfs-snapshot.cAffected: present since the ZFS provider was added (PR #2855); confirmed on
devel@ae1d69672f(v12dev) and shipped in release-11.Class: CWE-562 (Return of Stack Variable Address) / use-after-scope.
Summary
glusterd_zfs_dataset()hands its caller, through achar **out-parameter, a pointerthat points into a function-local automatic buffer. The buffer's lifetime ends when
the function returns, so every caller dereferences a dangling pointer. It is the same
defect class that was already fixed twice in these very files (see Precedent), missed
here because the escape launders through
strtok().Affected code
All four callers then use the returned pointer after the function has returned:
glusterd_zfs_snapshot_create_or_clonesnprintf(snap_device, …, "%s@%s_%d", dataset, …)— line 161glusterd_zfs_brick_detailsdict_set_str(rsp_dict, key, dataset)— line 256glusterd_zfs_snapshot_removesnprintf(snap_device, …, dataset, …)— line 318glusterd_zfs_snapshot_restoresnprintf(snap_device, …, dataset, …)— line 410Root cause
The out-parameter is set to
strtok(dataset, …), i.e. a pointer into the localdataset[PATH_MAX]. Once the function returns, that storage is no longer valid. The code"works" today only because each caller consumes the pointer before any intervening call
clobbers the reused stack frame — undefined behaviour that can silently corrupt the
dataset name and cause
zfs snapshot/clone/destroy/restoreto operate on the wrongdataset, or store a garbage value into the
detailsresponse dict.It is not flagged by Coverity or GCC
-Wreturn-local-addrbecause the address escapesindirectly (through
strtok()and achar **out-param) rather than as a directreturn &local;.Precedent (same defect class, already fixed in these files)
d9b57691a9— "glusterd: Fix coverity issue (CWE-562)" changed thesibling
glusterd_zfs_snap_clone_brick_path()from returning a local's address throughchar **snap_brick_pathto writing directly into the caller-ownedbrickinfo->path.05b0264903— "glusterd: do not return address of local variable" (glusterd: do not return address of local variable #3632) fixed thesame class in
snapshot/glusterd-lvm-snapshot.c.glusterd_zfs_dataset()is the third instance of the pattern and was not caught by either pass.Proposed fix
There is a subtlety the naive fix misses:
dict_set_str()does not copy —str_to_data→
strn_to_datastores the raw pointer (data->data = value; data->is_static = _gf_true) andthe response dict outlives the caller's frame (it is read at
dict_serialize()time). So ablanket "
gf_strdup()+GF_FREE(dataset)in each of the four callers" turns theuse-after-scope into a use-after-free in
glusterd_zfs_brick_details(the dict reads thefreed string later); dropping the
GF_FREEinstead leaks (the dict never frees astaticvalue). The
detailscaller needs an owning dict set.Preferred — give the helper a caller-owned output buffer (matches the precedent fix
d9b57691a9, which wrote intobrickinfo->path):create/remove/restore(zfs:157/314/406): pass a localchar dataset[PATH_MAX]anduse it directly — it is valid for the caller's lifetime.
glusterd_zfs_brick_details(zfs:247): pass a local buffer, thendict_set_dynstr_with_alloc(rsp_dict, key, dataset)(copies into the dict; the dict ownsits copy).
Alternative — heap return:
*pool_name = gf_strdup(strtok(dataset, "\n"))(+ NULL-check),then
create/remove/restoreGF_FREE(dataset)after use, andbrick_detailsusesdict_set_dynstr(rsp_dict, key, dataset)to transfer ownership to the dict (noGF_FREEin the caller; the dict frees it on destroy). Do not pair
dict_set_strwithGF_FREE.This is exactly the idiom the sibling
glusterd_lvm_brick_detailsalready uses forvgname(
value = gf_strdup(token); … dict_set_dynstr(rsp_dict, key, value);) — the ZFS provider justneeds to match it.
Reproduction
Reproduced on a shipping GlusterFS 11.2 + ZFS 2.4.2 host (the ZFS provider is selected
when the brick filesystem probes as
zfs).Deterministic, operator-visible, no debugger — a multi-brick volume whose bricks are
distinct ZFS datasets, snapshotted and activated, then
gluster snapshot status:Every brick reports the last brick's dataset.
glusterd_zfs_dataset()is called at thesame stack depth for each brick, so
dataset[PATH_MAX]lands at the same address andstrtokreturns the same pointer each call;
dict_set_str()stores that pointer without copying(
str_to_data/strn_to_datasetsdata->data = value; is_static = _gf_true), so every*.vgnameentry aliases one dead stack address anddict_serialize()reads the last valuewritten.
straceconfirmszfs list -Ho nameresolves each brick correctly — only thestored pointer is wrong. (The
Brick Pathfield, built independently, is correct per brick,isolating the defect to the
vgname/datasetfield.)Single-brick / create / restore paths consume the pointer before the popped frame is
reused, so they appear to work today — the classic latent-UB "works until it doesn't". A
gdbwitness on a live
glusterdconfirms the pointer is dangling in all cases: the addressreturned by
strtokis above$spinsideglusterd_zfs_dataset(in scope) and ~4 KB below$spby the time the caller stores it (a popped frame).Compiler view: the pattern is invisible to
-Wreturn-local-addrand Coverity (the addressescapes through
strtok()'s return value and achar **out-param), but AddressSanitizerreports
stack-use-after-returnat the deref — usable directly as a regression test.