Skip to content

Debian#194

Draft
kalofoli wants to merge 59 commits into
NetworkBlockDevice:debianfrom
kalofoli:debian
Draft

Debian#194
kalofoli wants to merge 59 commits into
NetworkBlockDevice:debianfrom
kalofoli:debian

Conversation

@kalofoli
Copy link
Copy Markdown
Contributor

Added some new functionality I needed for my own server. Feel free to nitpick anything.
master branch:
* created nbd-server.service
* TLS write fixes
debian branch:
* debian packaging

I added some tags, which you may have to review/remove/edit.

eworm-de and others added 30 commits March 6, 2024 19:53
The shell template is no longer required to generate man pages, so more
it to systemd/ and ship it in tarball.

Signed-off-by: Wouter Verhelst <w@uter.be>
Commit 915444b introduced a regression
whereby an entry in nbdtab with no port specification was read as
wanting port "0" rather than the default "10809".
Currently, running "configure --disable-manpages" while docbook2man *is*
installed results in the error "don't know what to do here" when it
should disable manpages.

There also appears to be a missing conditional at the start of the line;
there's closing un-matched ]) at the end of the line. Still, at this
point the check can be done in pure shell; no need for AC macros. I've
also removed the confusing m4_divert_text call on the check case. Not
sure why that was there, but it appears unnecessary.
Now that we auto-free g_key_file stuff, we shouldn't manually free them
anymore.

Fixes: ab41c4f
Reported-By: Hilko Bengen <bengen@debian.org>
Older versions of GnuTLS did not support TLS1.3, and so we couldn't
update the version priority string to enable that by default, yet.

This now seems to no longer be a problem, so enable support for TLS1.3
by default while still disallowing TLS1.1 and below.
Disabling all versions of TLS and then enabling those versions that are
supported only means we get to do this again when (if ever) a new
version of TLS is defined.

Enabling all versions of TLS and then disabling those versions that are
*not* supported means we support it the moment GnuTLS supports it.
…ixes, remove unecessary heap allocation in open_treefile()
GCC-14 has promoted incompatible-pointer-types warning into error which is
now flagged especially with when building on musl

Fixes following error

| ../nbd-3.26.1/nbd-client.c: In function 'openunix':
| ../nbd-3.26.1/nbd-client.c:345:27: error: passing argument 2 of 'connect' from incompatible pointer type [-Wincompatible-pointer-types]
|   345 |         if (connect(sock, &un_addr, sizeof(un_addr)) == -1) {
|       |                           ^~~~~~~~
|       |                           |
|       |                           struct sockaddr_un *
| In file included from ../nbd-3.26.1/nbd-client.c:25:
| /mnt/b/yoe/master/build/tmp/work/core2-64-yoe-linux-musl/nbd/3.26.1/recipe-sysroot/usr/include/sys/socket.h:386:19: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_un *'
|   386 | int connect (int, const struct sockaddr *, socklen_t);
|       |                   ^~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Khem Raj <raj.khem@gmail.com>
nbd-client requires root to set up devices. The error message when we
are not root was less than helpful.

Fix.

Closes: NetworkBlockDevicegh-167

Signed-off-by: Wouter Verhelst <w@uter.be>
The daemon() helper function is widely implemented, but is not really
standardized and is actually somewhat buggy on some platforms.

Implementing the desired functionality is also not all that complicated.

So, do it ourselves.

Closes: NetworkBlockDevicegh-161
We have not looked at this in forever, and it probably does not work
with modern nbd clients anymore.
Somehow gcc figures out that nbd_err is err, because of the macro? not
sure.

This is confusing and black magic, stop doing it.
Upstream QEMU has moved the location of its NBD docs, as of its commit
8dac93a8e[1].  Instead of pointing to a raw git .txt source file that
no longer exists, we now point to the rendered html version built from
rST [2].

[1] https://gitlab.com/qemu-project/qemu/-/commit/8dac93a8e, see
also https://lists.gnu.org/archive/html/qemu-devel/2024-08/msg00223.html
[2] https://www.qemu.org/docs/master/interop/nbd.html

CC: qemu-devel@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Closes: NetworkBlockDevicegh-168
Signed-off-by: Wouter Verhelst <w@uter.be>
Add support for the netlink get status command to get the connection
status of one or all of the nbd devices in a system.

Signed-off-by: Josef Bacik <jbacik@fb.com>
[Forward-ported after seven (!) years by Wouter Verhelst]
Signed-off-by: Wouter Verhelst <w@uter.be>
Having a gazillion arguments to negotiate is unwieldy and
unmaintainable. Use the CLIENT type to handle parsed stuff instead.

Originally we were going to pass a pointer to a CLIENT, but we need to
keep the cur_client global variable for config file parsing, so might
as well reuse that.

Not so clean, so we might revisit this in the future, but at least this
reduces complexity somewhat.

Signed-off-by: Wouter Verhelst <w@uter.be>
When the nbd-client disconnects from a TLS connection, the gnutls_record_recv
function will return a zero value. Due to a faulty/missing check, this
causes the readit_tls call to enter an infinite loop, with all terrible
consequences that this has. This is a very problematic bug that causes a
full CPU usage, and is only treatable by killing the nbd-server.

This fix adds the missing check and an appropriate message that
terminates the forked server child graceously.

Signed-off-by: Janis Kalofolias <code@kalofolias.de>
When nbd-client tries to connect to nbd device, it talks to kernel
via netlink. If the nbd device is taken and locked by other process
like systemd-udevd, kernel will return EBUSY to nbd client. However,
nbd client just hide this error code with error message to check
dmesg logs.

Checking the dmesg logs is error-prone and not friendly for other
caller program. Instead, nbd-client should return the error code
to the caller to handle it properly.

Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
The refactoring in commit 17043b0 causes nbd-client without -N to
segfault, instead of using the desired default export.

Signed-off-by: Eric Blake <eblake@redhat.com>
In order for userspace apps to idempotently ensure that a
netlink-managed NBD device corresponds to the device that the app is
expecting, kernel 5.14 added a netlink backend string identifier.
Expose the ability for setting this identifier when binding to a
device; the user can then check with /sys/block/nbdN/backend to see if
the contents match expectations.

Signed-off-by: Eric Blake <eblake@redhat.com>
At least nbdkit 1.42 has several scenarios where it can advertise a
minimum block size, but where block status results are not aligned to
that size.  While most of those instances are bugs fixed in the
upcoming 1.44, we have to consider the case when a server advertises
an image size which is not a multiple of the minimum block size.  The
spec is already clear that a server SHOULD advertise aligned sizes,
but when it doesn't, the requirement that block status results be
aligned is impossible to meet.  Relaxing the standard from MUST to
SHOULD warns clients to be prepared for weaknesses in the server, as
well as making it less troublesome to try and collect block status
even for an unaligned tail of an image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Copy-on-write can cause I/O errors and corruption, as
described in 'BRANCH'

First, copy-on-write can send corrupt data over the
network -- even though on-disk it's fine -- with
sequential reads.

Second, sparse-copy-on-write can fail to write
correctly to disk, when extending the file.
Commit 17043b0 ("Refactor the negotiate() and connected functions")
removed all parameters from openunix() and opennet(), but main()
still called them with host/port arguments. This causes build errors:

nbd-client.c:1224:32: error: too many arguments to function ‘openunix’; expected 0, have 1
nbd-client.c:1226:32: error: too many arguments to function ‘opennet’; expected 0, have 2

Update the calls in main() to match the new prototypes.

Fixes: 17043b0 ("Refactor the negotiate() and connected functions")
Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Wouter Verhelst <w@uter.be>
yoe and others added 28 commits December 28, 2025 12:03
The original pull request did not see this code and therefore did not
include it. Fix.
Fixes: d294cda ("Depend on the nbd module being loaded")
We can't test the ioctl-based paths; but with some LD_PRELOAD tricks and
mocked libnl reimplementations, we *can* test the netlink-based paths.

(strictly speaking that's not true, we can also test the ioctl-based
paths in that way by mocking more, but fugly)

Signed-Off-By: Wouter Verhelst <w@uter.be>
netlink-get-status was a command written by Josef, but we already had a
check_conn function that did the same using the ioctl interface.

It's confusing to have two commands for that, so integrate the
connection status stuff into nbd-client and drop netlink-get-status
Uncomment the socket_wrapper and nss_wrapper checks in configure.ac and add custom GnuTLS push/pull functions to work better with socket_wrapper. Update test environment to use cwrap_test when both wrappers are available.
We set custom push/pull functions for GnuTLS so that it works under
socket-wrapper.

However, this has downsides:
- There is a slight performance impact
- We have also observed issues with nonblocking IO causing floods of
  EAGAIN errors in the test suite.

This is fine for a test environment, but not when running for real.

To avoid issues when the code is actually being used, only set the
custom push/pull functions when SOCKET_WRAPPER_DIR environment variable
is set, indicating that we're running under socket-wrapper.

Signed-Off-By: Wouter Verhelst <w@uter.be>
There is a "-p" mode for ioctl connections, which attempts to reconnect
to the server if the connection dies. It doesn't really work anymore
though.

The netlink interface has a "monitor" message that can get sent to a
multicast group, and a "reconfigure" message that can replace dropped
connections. Additionally, there is a "dead link timeout" option which
can be used to tell the kernel to block I/O to a device for X amount of
time.

The three combined allow for a much more flexible way of persist mode
using netlink.

Implement that, and a test for it.

TODO: test. It compiles, and seems to behave correctly against our
mocked libnl3, that's all I can say for now, but no guarantees (yet)
that it will actually work correctly against a kernel and/or an
unreliable network.

Signed-Off-By: Wouter Verhelst <w@uter.be>
…without /dev/ prefix

Signed-off-by: conanoc <conanoc@gmail.com>
Signed-Off-By: Wouter Verhelst <w@uter.be>
Adding missing check for graceously disconnected TLS transport, which
returns a zero write size. This solves a rare race condition, which
would arise if disconnect occurrs while in the writeit_tls loop; in that
case, the server would end in an infinite loop.
@kalofoli kalofoli closed this May 12, 2026
@kalofoli kalofoli reopened this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.