Add UID-based dataset zoning for rootless container support#18167
Add UID-based dataset zoning for rootless container support#18167li-nkSN wants to merge 1 commit intoopenzfs:masterfrom
Conversation
|
As an update testing Write operations (create, snapshot, destroy) etc locally... |
6766f6f to
bd21810
Compare
|
9cfc9b9 to
37ff4ca
Compare
|
Hi everyone. I have the perception (perhaps incorrect) maintainers would like me to fix for freebsd. Because I like to test this locally, I have reached out to the BSD community in the post above. Then if there are freebsd testers available. I would feel more confident regarding BSD support. |
37ff4ca to
50ec0de
Compare
|
Looks like FreeBSD will need a similar jails implementation for rootless container support. I think they might find this approach helpful as a reference. FreeBSD compatibility: |
|
@robn @behlendorf would someone review? |
behlendorf
left a comment
There was a problem hiding this comment.
Thanks for tackling this functionality, this is looking good and it would be very nice to have! I made a first pass over the PR and posted a few questions inline. Thanks for aligning the implementation so closely with the existing namespace delegation support and including additional test cases.
@wca @0mp can you comment on this since you authored the namespace delegation support for Linux.
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
50ec0de to
be72e8a
Compare
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
be72e8a to
7ba9094
Compare
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
7ba9094 to
ce96e4f
Compare
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
ce96e4f to
304f95f
Compare
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
Hi @tonyhutter and @behlendorf thanks for your feedback. Regarding the Fedora 42 error. I believe this is a regression introduced after removing the unused scripts and rebasing the build off of master. I don't believe it was due to the script removal. I will investigate and try to address this issue today. I will review the original namespace commit shared. I look forward to hearing from @wca @0mp if they find themselves available . Thank you. Kind Regards, Colin Williams |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
c9d837f to
b75b974
Compare
|
@li-nkSN I've been trying to run these tests locally from my Fedora 42 VM, but I keep getting the same failures: Here's the first failure from To reproduce from ZFS source dir: Any ideas? |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
b75b974 to
0125cd7
Compare
Hi @behlendorf and @tonyhutter I made the investigation and look at the latest commit message for details. The tinybuild kernel config change set CONFIG_USER_NS=n. Then this suggested better error messages. If I understand correctly, this feature is now passing in all of the CI environments.
Try |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
0125cd7 to
f9d1699
Compare
An install is not required to run ZTS locally. This fixed the issue for me (run within zfs source directory): Can you update your PR with the |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
f9d1699 to
336635e
Compare
@tonyhutter I was able to reproduce your issue but found the sed change didn't resolve for multiple runs. Then I created the helper: https://github.com/openzfs/zfs/pull/18167/changes#diff-abda02af92e31d80534ac3bf82322ab6165637a4a1c8ad18efb8bd5f27c46bc4R76 which resolved the issue. |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Tests: zoned_uid_001 through zoned_uid_011
Signed-off-by: Colin K. Williams <colin@li-nk.org>
336635e to
e3c0da7
Compare
|
@li-nkSN thanks, that fixed the local ZTS tests for me. I will take another look at this PR when I get back into the office. |
@tonyhutter @behlendorf . I must be forward in that. I developed this feature initially on the Cachy OS ZFS 2.4.0 based source tree on my server. I furthermore wrote tests locally against the same server. The reason that I didn't start from the upstream ZFS master. It wasn't clear (to me) when I started regarding what ZFS patching existed on the CachyOS build for ZFS. My other project sources being on the same platform I couldn't take a risk disabling the filesystem. So after development, I then upstreamed the support back into this ZFS master project. If I was more informed regarding differences between CachyOS ZFS and it's supposed ZFS patches, etc. I might have tried another approach at the start. Because it is an effort patching between the versions. I believe this contributed to some confusion regarding the testing processes. My original tests were "ported" to the upstream. And therefore, the CI system was perhaps the first attempt at integrating and porting the tests between the versions. However, I do believe that this did just lead to further testing of this code between different environments. Anyhow I apologize if it has caused any confusion regarding local testing. I recently recognized I could run the source against my ubuntu laptop and developed the latest test fix based on that. I did also awhile back reach out to CachyOS regarding the CachyOS ZFS patches. I have yet heard back regarding that. But I am looking forward to this feature making it's way "down the tubes" so to speak. And perhaps I will hear back regarding my informal inquiries. Finally, I have been running my modified patched versions against my server for quite some time now. Anyhow let me know if I can be of any further assistance. |
@li-nkSN try asking on Discord. They have been very friendly with me and they reviewed and merged my patches in a matter of days so if you have any questions I'm sure they will answer them. |
@darkbasic I messaged you there. IMO the feature is well developed by above and ready for review / merge. But I will ask again for guidance on running ZFS / master via CachyOS. |
|
Some first pass comments:
Can you talk about what you mean by that? I ask, because I'm able to create datasets as a user when I used the combined
However, if you try to This is somewhat of a misleading message, as the pool named 'tank' does exist. Is it possible to return a "permission denied" here as well? |
| @@ -0,0 +1,75 @@ | |||
| # SPDX-License-Identifier: CDDL-1.0 | |||
There was a problem hiding this comment.
Is this file (libtest_supplement.shlib) meant to be checked in? I didn't see where it gets included.
| function get_zoned_uid | ||
| { | ||
| typeset dataset=$1 | ||
| zfs get -H -p -o value zoned_uid $dataset |
There was a problem hiding this comment.
- zfs get -H -p -o value zoned_uid $dataset
+ get_prop zoned_uid $dataset| typeset uid=$1 | ||
| shift | ||
| typeset zfs_cmd | ||
| zfs_cmd=$(which zfs) |
There was a problem hiding this comment.
In the off-chance that zfs is in a directory with spaces in its name:
- zfs_cmd=$(which zfs)
+ zfs_cmd="$(which zfs)"| if [[ "$actual_uq" != "50M" ]]; then | ||
| log_fail "Userquota not set correctly: expected 50M, got $actual_uq" | ||
| fi | ||
| log_note "Userquota set successfully to 50M" |
There was a problem hiding this comment.
For tests where you are looking to see if a numerical value is mostly within an expected range, I would write it like this (untested):
typeset actual_uq=$(get_prop userquota@0 $TESTPOOL/$TESTFS/deleg_root/child)
if ! within_percent "$actual_uq" $((50 * 1048576)) 99 ; then
log_fail "Userquota not set correctly: expected ~50M, got $actual_uq"
fi(see functions in tests/zfs-tests/include/math.shlib)
The comment was from Jan 30 / 31 , at the time of the comment a WIP. All functionality was completed months ago. This was from when I opened as a draft.
Resolved by setting PROP_INHERIT vs PROP_DEFAULT . I did not test sub-datasets as not a feature I used. Test added 014
This is existing behavior not specific to zoned_uid and here is a script for you to see that. The point is to not expose information about existence of resources when permission denied, etc.... |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
- zoned_uid registered as PROP_INHERIT so child datasets
inherit the delegation, enabling sub-dataset creation
- zfs_get_zoned_uid() uses dsl_prop_get setpoint to identify
the true delegation root, correctly distinguishing inherited
values from locally-set ones for destroy/rename policy checks
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Test improvements:
- run_in_userns helper resolves absolute zfs path to handle
environments where PATH does not include zfs (source builds)
- Test 004 updated: zoned_uid now inherits (PROP_INHERIT), test
verifies inheritance and override behavior
- Test 013 uses within_percent with parseable byte output (-Hp)
for robust quota value comparison across environments
- Test 014 added: verifies grandchild dataset creation from user
namespace, confirming inherited zoned_uid delegation works
- Shellcheck SC2155 fixes across all test scripts
Tests: zoned_uid_001 through zoned_uid_014
Signed-off-by: Colin K. Williams <colin@li-nk.org>
e3c0da7 to
cbcd64e
Compare
|
@tonyhutter above I responded to all of your comments. Then regarding your suggested changes. I added them in be7a78d . If you are satisfied would you click resolve or do you want me to do that? |
This implements zoned_uid - a ZFS property that delegates dataset
visibility and administration to user namespaces owned by a specific
UID, enabling rootless Podman/Docker with native ZFS storage.
Usage: zfs set zoned_uid=1000 pool/dataset
Problem solved:
- zfs zone requires an existing namespace PID
- Podman creates a new namespace on each container start
- Solution: delegate to UID, any namespace owned by that UID is
authorized
Delegated operations:
- Visibility: zfs list, get, mount (read-only access)
- Create: child datasets and clones
- Snapshot: create snapshots
- Destroy: children only (delegation root protected)
- Rename: within delegation subtree only
- Properties: set on delegated datasets
Security model:
- Namespace owner UID must match zoned_uid value
- CAP_SYS_ADMIN required within the user namespace
- Delegation root cannot be destroyed or escaped via rename
Kernel changes:
- zone_dataset_attach_uid()/detach_uid() in SPL
- zone_dataset_admin_check() for write authorization
- Callback registration for zoned_uid property lookup
- Security policy hooks in zfs_secpolicy_*() functions
- Fixed inglobalzone() to use current_user_ns()
- zfs_prop_set_special() handles attach/detach as property
side-effects, eliminating the need for dedicated ioctls
- spa_import_os() restores zoned_uid delegations kernel-side
on pool import via dmu_objset_find() walk
- zoned_uid registered as PROP_INHERIT so child datasets
inherit the delegation, enabling sub-dataset creation
- zfs_get_zoned_uid() uses dsl_prop_get setpoint to identify
the true delegation root, correctly distinguishing inherited
values from locally-set ones for destroy/rename policy checks
Userspace changes:
- check_parents() defers to kernel when zoned_uid set
FreeBSD compatibility:
- include/os/freebsd/spl/sys/zone.h — Added FreeBSD stubs:
- zone_uid_op_t enum (ZONE_OP_CREATE, SNAPSHOT, CLONE, DESTROY,
RENAME, SETPROP)
- zone_admin_result_t enum (NOT_APPLICABLE, ALLOWED, DENIED)
- zone_dataset_admin_check() — static inline, always returns
ZONE_ADMIN_NOT_APPLICABLE
- zone_dataset_attach_uid() — static inline, returns ENXIO
- zone_dataset_detach_uid() — static inline, returns ENXIO
- zone_get_zoned_uid_fn_t callback typedef
- zone_register_zoned_uid_callback() — static inline no-op
- zone_unregister_zoned_uid_callback() — static inline no-op
- On FreeBSD, every zone_dataset_admin_check() call returns
ZONE_ADMIN_NOT_APPLICABLE, causing all security policy functions
to fall through to existing jail-based permission checks
- Setting zoned_uid on FreeBSD returns ENXIO since user namespace
delegation requires Linux user namespaces
Addressed review feedback from PR openzfs#18167:
- Removed dedicated ZFS_IOC_USERNS_ATTACH_UID/DETACH_UID ioctls;
attach/detach is now handled kernel-side as a property side-effect
in zfs_prop_set_special()
- Moved pool import delegation restoration from userspace
(zpool_restore_zoned) to kernel-side in spa_import_os()
- Removed unnecessary suppression file additions
- Reverted ABI files to upstream (will regenerate from CI)
- Added test scripts to tests/zfs-tests/tests/Makefile.am
Fix CONFIG_USER_NS=n build failure and improve error reporting:
Upstream CI commit 640a217 ("CI: Test & fix Linux ZFS built-in
build", Tony Hutter) added a tinyconfig built-in kernel build test
to Fedora runners, which compiles with CONFIG_USER_NS disabled,
exposing unguarded static functions and variables that cause fatal
-Werror=unused-function/-Werror=unused-variable errors.
- Fixed #ifdef CONFIG_USER_NS guards for zone_uid_datasets_lookup(),
zone_dataset_is_zoned_uid_root(), and the zuds variable in
zone_dataset_visible()
- Added ZFS_ERR_NO_USER_NS_SUPPORT error code so users get a clear
message ("kernel was built without user namespace support") instead
of a generic "I/O error" when CONFIG_USER_NS is disabled
- Translate ENXIO from zone_dataset_attach_uid()/detach_uid() in
zfs_prop_set_special() to ZFS_ERR_NO_USER_NS_SUPPORT
- Also fixes a pre-existing bug in the upstream
zfs_ioc_userns_attach()/zfs_ioc_userns_detach() where ENXIO from
zone_dataset_attach()/detach() was not translated, producing the
same confusing "I/O error" on kernels without CONFIG_USER_NS
- Synced pyzfs constants with zfs.h (added missing
ZFS_ERR_ASHIFT_MISMATCH, ZFS_ERR_STREAM_LARGE_MICROZAP,
ZFS_ERR_TOO_MANY_SITOUTS, and the new
ZFS_ERR_NO_USER_NS_SUPPORT)
Test improvements:
- run_in_userns helper resolves absolute zfs path to handle
environments where PATH does not include zfs (source builds)
- Test 004 updated: zoned_uid now inherits (PROP_INHERIT), test
verifies inheritance and override behavior
- Test 013 uses within_percent with parseable byte output (-Hp)
for robust quota value comparison across environments
- Test 014 added: verifies grandchild dataset creation from user
namespace, confirming inherited zoned_uid delegation works
- Shellcheck SC2155 fixes across all test scripts
Tests: zoned_uid_001 through zoned_uid_014
Signed-off-by: Colin K. Williams <colin@li-nk.org>
cbcd64e to
be7a78d
Compare
This implements zoned_uid - a ZFS property that grants visibility of a dataset to any user namespace owned by a specific UID.
Usage: zfs set zoned_uid=1000 pool/dataset
This solves the chicken-and-egg problem with rootless Podman + ZFS:
Kernel changes:
Userspace changes:
Current limitations (WIP):
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by.