Skip to content

glusterd/snapshot (ZFS): glusterd_zfs_dataset() returns a pointer into a stack-local buffer (CWE-562) #4687

Description

@ThalesBarretto

Component: glusterd — snapshot ZFS provider
File: xlators/mgmt/glusterd/src/snapshot/glusterd-zfs-snapshot.c
Affected: present since the ZFS provider was added (PR #2855); confirmed on devel @ ae1d69672f (v12dev) and shipped in release-11.
Class: CWE-562 (Return of Stack Variable Address) / use-after-scope.

Summary

glusterd_zfs_dataset() hands its caller, through a char ** out-parameter, a pointer
that points into a function-local automatic buffer. The buffer's lifetime ends when
the function returns, so every caller dereferences a dangling pointer. It is the same
defect class that was already fixed twice in these very files (see Precedent), missed
here because the escape launders through strtok().

Affected code

int32_t
glusterd_zfs_dataset(char *brick_path, char **pool_name)
{
    char dataset[PATH_MAX] = "";          /* line 37: automatic storage  */
    ...
    ptr = fgets(dataset, sizeof(dataset), runner_chio(&runner, STDOUT_FILENO));
    ...
    *pool_name = strtok(dataset, "\n");   /* line 77: out-param -> into `dataset` */
out:
    return ret;                           /* line 80: `dataset` goes out of scope */
}

All four callers then use the returned pointer after the function has returned:

Caller deref site
glusterd_zfs_snapshot_create_or_clone snprintf(snap_device, …, "%s@%s_%d", dataset, …) — line 161
glusterd_zfs_brick_details dict_set_str(rsp_dict, key, dataset) — line 256
glusterd_zfs_snapshot_remove snprintf(snap_device, …, dataset, …) — line 318
glusterd_zfs_snapshot_restore snprintf(snap_device, …, dataset, …) — line 410

Root cause

The out-parameter is set to strtok(dataset, …), i.e. a pointer into the local
dataset[PATH_MAX]. Once the function returns, that storage is no longer valid. The code
"works" today only because each caller consumes the pointer before any intervening call
clobbers the reused stack frame — undefined behaviour that can silently corrupt the
dataset name and cause zfs snapshot/clone/destroy/restore to operate on the wrong
dataset, or store a garbage value into the details response dict.

It is not flagged by Coverity or GCC -Wreturn-local-addr because the address escapes
indirectly (through strtok() and a char ** out-param) rather than as a direct
return &local;.

Precedent (same defect class, already fixed in these files)

  • d9b57691a9"glusterd: Fix coverity issue (CWE-562)" changed the
    sibling glusterd_zfs_snap_clone_brick_path() from returning a local's address through
    char **snap_brick_path to writing directly into the caller-owned brickinfo->path.
  • 05b0264903"glusterd: do not return address of local variable" (glusterd: do not return address of local variable #3632) fixed the
    same class in snapshot/glusterd-lvm-snapshot.c.

glusterd_zfs_dataset() is the third instance of the pattern and was not caught by either pass.

Proposed fix

There is a subtlety the naive fix misses: dict_set_str() does not copystr_to_data
strn_to_data stores the raw pointer (data->data = value; data->is_static = _gf_true) and
the response dict outlives the caller's frame (it is read at dict_serialize() time). So a
blanket "gf_strdup() + GF_FREE(dataset) in each of the four callers" turns the
use-after-scope into a use-after-free in glusterd_zfs_brick_details (the dict reads the
freed string later); dropping the GF_FREE instead leaks (the dict never frees a static
value). The details caller needs an owning dict set.

Preferred — give the helper a caller-owned output buffer (matches the precedent fix
d9b57691a9, which wrote into brickinfo->path):

int32_t glusterd_zfs_dataset(char *brick_path, char *pool_name, size_t len);
...
char *p = strtok(dataset, "\n");
if (!p || (size_t)snprintf(pool_name, len, "%s", p) >= len) { ret = -1; goto out; }
  • create / remove / restore (zfs:157/314/406): pass a local char dataset[PATH_MAX] and
    use it directly — it is valid for the caller's lifetime.
  • glusterd_zfs_brick_details (zfs:247): pass a local buffer, then
    dict_set_dynstr_with_alloc(rsp_dict, key, dataset) (copies into the dict; the dict owns
    its copy).

Alternative — heap return: *pool_name = gf_strdup(strtok(dataset, "\n")) (+ NULL-check),
then create/remove/restore GF_FREE(dataset) after use, and brick_details uses
dict_set_dynstr(rsp_dict, key, dataset) to transfer ownership to the dict (no GF_FREE
in the caller; the dict frees it on destroy). Do not pair dict_set_str with GF_FREE.
This is exactly the idiom the sibling glusterd_lvm_brick_details already uses for vgname
(value = gf_strdup(token); … dict_set_dynstr(rsp_dict, key, value);) — the ZFS provider just
needs to match it.

Reproduction

Reproduced on a shipping GlusterFS 11.2 + ZFS 2.4.2 host (the ZFS provider is selected
when the brick filesystem probes as zfs).

Deterministic, operator-visible, no debugger — a multi-brick volume whose bricks are
distinct ZFS datasets, snapshotted and activated, then gluster snapshot status:

# bricks on datasets pool/aaa, pool/bbb, pool/ccc
gluster snapshot create snap1 vol no-timestamp
gluster snapshot activate snap1
gluster snapshot status snap1
  Brick … pool/aaa …   Volume Group : pool/ccc     <-- should be pool/aaa
  Brick … pool/bbb …   Volume Group : pool/ccc     <-- should be pool/bbb
  Brick … pool/ccc …   Volume Group : pool/ccc

Every brick reports the last brick's dataset. glusterd_zfs_dataset() is called at the
same stack depth for each brick, so dataset[PATH_MAX] lands at the same address and strtok
returns the same pointer each call; dict_set_str() stores that pointer without copying
(str_to_data/strn_to_data sets data->data = value; is_static = _gf_true), so every
*.vgname entry aliases one dead stack address and dict_serialize() reads the last value
written. strace confirms zfs list -Ho name resolves each brick correctly — only the
stored pointer is wrong. (The Brick Path field, built independently, is correct per brick,
isolating the defect to the vgname/dataset field.)

Single-brick / create / restore paths consume the pointer before the popped frame is
reused, so they appear to work today — the classic latent-UB "works until it doesn't". A gdb
witness on a live glusterd confirms the pointer is dangling in all cases: the address
returned by strtok is above $sp inside glusterd_zfs_dataset (in scope) and ~4 KB below
$sp by the time the caller stores it (a popped frame).

Compiler view: the pattern is invisible to -Wreturn-local-addr and Coverity (the address
escapes through strtok()'s return value and a char ** out-param), but AddressSanitizer
reports stack-use-after-return at the deref — usable directly as a regression test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions