Add qcow2 as supported format by xcp-rrdd-iostat #6

gthvn1 · 2025-02-17T12:21:00Z

As XCP-ng is now supporting qcow2 file, xcp-rrdd-iostat generates an error like: "/usr/sbin/tap-ctl list" returned a line that could not be parsed. Ignoring.

This is because tap-ctl list can return strings like:

"1564848 0 0 qcow2 /var/run/sr-mount/..."

This patch allows qcow2 type.

gthvn1 · 2025-02-17T12:21:49Z

I didn't find a way to reopen #5 , so it looks like a new PR

coveralls · 2025-02-17T12:26:16Z

Pull Request Test Coverage Report for Build 14661243996

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 77.719%

Totals
Change from base Build 14656120106:	0.0%
Covered Lines:	3359
Relevant Lines:	4322

💛 - Coveralls

Backport of 3b52b72 This enables PAM to be used in multithreaded mode (currently XAPI has a global lock around auth). Using an off-cpu flamegraph I identified that concurrent PAM calls are slow due to a call to `sleep(1)`. `pam_authenticate` calls `crypt_r` which calls `NSSLOW_Init` which on first use will try to initialize the just `dlopen`-ed library. If it encounters a race condition it does a `sleep(1)`. This race condition can be quite reliably reproduced when performing a lot of PAM authentications from multiple threads in parallel. GDB can also be used to confirm this by putting a breakpoint on `sleep`: ``` #0 __sleep (seconds=seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:42 #1 0x00007ffff1548e22 in freebl_RunLoaderOnce () at lowhash_vector.c:122 #2 0x00007ffff1548f31 in freebl_InitVector () at lowhash_vector.c:131 #3 NSSLOW_Init () at lowhash_vector.c:148 #4 0x00007ffff1b8f09a in __sha512_crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=0x7ffff31e17b8 "dIJbsXKc0", #5 0x00007ffff1b8d070 in __crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=<optimized out>, #6 0x00007ffff1dc9abc in verify_pwd_hash (p=p@entry=0x7fffd8005a60 "pamtest-edvint", hash=<optimized out>, nullok=nullok@entry=0) at passverify.c:111 #7 0x00007ffff1dc9139 in _unix_verify_password (pamh=pamh@entry=0x7fffd8002910, name=0x7fffd8002ab0 "pamtest-edvint", p=0x7fffd8005a60 "pamtest-edvint", ctrl=ctrl@entry=8389156) at support.c:777 #8 0x00007ffff1dc6556 in pam_sm_authenticate (pamh=0x7fffd8002910, flags=<optimized out>, argc=<optimized out>, argv=<optimized out>) at pam_unix_auth.c:178 #9 0x00007ffff7bcef1a in _pam_dispatch_aux (use_cached_chain=<optimized out>, resumed=<optimized out>, h=<optimized out>, flags=1, pamh=0x7fffd8002910) at pam_dispatch.c:110 #10 _pam_dispatch (pamh=pamh@entry=0x7fffd8002910, flags=1, choice=choice@entry=1) at pam_dispatch.c:426 #11 0x00007ffff7bce7e0 in pam_authenticate (pamh=0x7fffd8002910, flags=flags@entry=1) at pam_auth.c:34 #12 0x00000000005ae567 in XA_mh_authorize (username=username@entry=0x7fffd80028d0 "pamtest-edvint", password=password@entry=0x7fffd80028f0 "pamtest-edvint", error=error@entry=0x7ffff31e1be8) at xa_auth.c:83 xapi-project#13 0x00000000005adf20 in stub_XA_mh_authorize (username=<optimized out>, password=<optimized out>) at xa_auth_stubs.c:42 ``` `pam_start` and `pam_end` doesn't help here, because on `pam_end` the library is `dlclose`-ed, so on next `pam_authenticate` it will have to go through the initialization code again. (This initialization code would've belonged into `pam_start`, not `pam_authenticate`, but there are several layers here including a call to `crypt_r`). Upstream has fixed this problem >5 years ago by switching to libxcrypt instead. Signed-off-by: Edwin Török <[email protected]> Signed-off-by: Christian Lindig <[email protected]>

The actual location is currently missing from the backtrace, which is the bug CA-409628. Signed-off-by: Edwin Török <[email protected]>

There are 2 log_backtrace functions in the Debug module, one of them prints the full backtrace, the other one prints the backtrace just from the last time Backtrace.is_important got called, i.e. it drops the *important* part of the backtrace. Delete the buggy function and replace it with a function that always takes an exception, so that we can look up any stashed backtrace. There are good reasons why Backtrace.is_important wipes the current backtrace after it stashes it away (destroying the important part): * if it didn't, then every time Backtrace.is_important got called we'd get the same backtrace prefix appeneded multiple times * for cross-process/language backtraces to work you need to use Backtrace.get to retrieve the stashed backtrace when printing, and not use the OCaml one directly from Printexc. But that means you need to be disciplined in how you deal with backtraces: * catching and reraising the same exception with the same backtrace using Printexc.rawbacktrace is fine * the first statement in an exception handler needs to be either Backtrace.is_important, or Debug.log_backtrace (which calls is_important internally) Using Printexc.get_backtrace/Printexc.get_rawbacktrace for printing purposes has to be avoided if we ever called Backtrace.is_important. The updated unit test shows that we properly print line 8 as the source of the exception now. This fixes backtraces from xcp-rrdd. Signed-off-by: Edwin Török <[email protected]>

These shouldn't be necessary anymore, since no exception is raised. Signed-off-by: Edwin Török <[email protected]>

We should only log_backtrace if we are the final handler. The exception is raised here, so the caller will have a chance to log it. This was also inconsistent: some *_interface logged the backtrace, and others didn't. In theory there is a chance that the caller is buggy and doesn't log the correct backtrace. But if we simplify the places that call the Backtrace module, we'll have fewer chances of that going wrong. Signed-off-by: Edwin Török <[email protected]>

response_internal_error already calls Backtrace.is_important in the correct place, and logs the exception. There is no need to do that a 2nd time in the caller. Signed-off-by: Edwin Török <[email protected]>

This shows how brittle the current Backtrace API is, this was missing from a lot of places. We have some better alternatives (`with_backtraces`, or a `try_with` function) that'd guarantee that `important` is always called in the right place, but that would be a more invasive change, which will be done in a followup commit. Signed-off-by: Edwin Török <[email protected]>

This was not set properly when it was first introduced in xapi-24.15.0 Signed-off-by: Vincent Liu <[email protected]>

Signed-off-by: Vincent Liu <[email protected]>

…mportant part of the backtrace (xapi-project#6430) Before/after can be seen clearly in the newly added unit test: * previously line 8 was not printed anywhere, even though that is the actual place the exception is raised * the whole backtrace was printed on a single line, which can be quite annoying After: * line 8 is present * the correct backtrace printer is used that prints each stack frame on a separate line with a counter ```diff --git a/ocaml/libs/log/test/log_test.t b/ocaml/libs/log/test/log_test.t index f25ee70..fbdeebf 100644 --- a/ocaml/libs/log/test/log_test.t +++ b/ocaml/libs/log/test/log_test.t @@ -1,4 +1,9 @@ $ ./log_test.exe | sed -re 's/[0-9]+T[0-9:.]+Z//' - [|debug||0 |main|log_test.ml] Raised at Xapi_stdext_pervasives__Pervasiveext.finally in file \"ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml\", line 39, characters 6-15\nCalled from Dune__exe__Log_test.(fun) in file \"ocaml/libs/log/test/log_test.ml\", line 15, characters 4-55\n + [|error||0 |main|backtrace] Raised Invalid_argument("index out of bounds") + [|error||0 |main|backtrace] 1/4 log_test.exe Raised at file ocaml/libs/log/test/log_test.ml, line 8 + [|error||0 |main|backtrace] 2/4 log_test.exe Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 + [|error||0 |main|backtrace] 3/4 log_test.exe Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39 + [|error||0 |main|backtrace] 4/4 log_test.exe Called from file ocaml/libs/log/test/log_test.ml, line 15 + [|error||0 |main|backtrace] ```

The `expected_votes` field in corosync represents the number of hosts that is expected by the cluster stack. In the context of corosync, this is the same as the number of hosts as in the corosync.conf file*. This is a useful field to expose to the user so that they can see how many nodes actually are expected. We also have `Cluster_host` object, which represents xapi's view of what nodes should be in the cluster, but that might not be identical to corosync's view, especially when a host is disabled, but is still left in the list of Cluster_host objects. Although one could argue that we could infer this `expected_votes` field from the number of enabled Cluster_hosts, it might still be useful to get this information directly from corosync. *: there are ways in corosync to make one host cast multiple votes, but that feature is not used. Signed-off-by: Vincent Liu <[email protected]>

The `expected_votes` field in corosync represents the number of hosts that is expected by the cluster stack. In the context of corosync, this is the same as the number of hosts as in the corosync.conf file*. This is a useful field to expose to the user so that they can see how many nodes actually are expected. We also have `Cluster_host` object, which represents xapi's view of what nodes should be in the cluster, but that might not be identical to corosync's view, especially when a host is disabled, but is still left in the list of Cluster_host objects. Although one could argue that we could infer this `expected_votes` field from the number of enabled Cluster_hosts, it might still be useful to get this information directly from corosync. *: there are ways in corosync to make one host cast multiple votes, but that feature is not used.

This function updates the snapshot related db fields after the storage migration. There is no need to leave this in the storage layer as xapi-storage-script will not be able to access xapi db. Signed-off-by: Vincent Liu <[email protected]>

Move this to storage_utils.ml since it is used by storage_smapiv1.ml and storage_mux.ml Signed-off-by: Vincent Liu <[email protected]>

Signed-off-by: Vincent Liu <[email protected]>

Extract common logic on finding vdi_info given vdi, and also add a parameter to specify where to find the VDI (locally or remotely). Signed-off-by: Vincent Liu <[email protected]>

Move the update_snapshot_info_dest to storage mux as this function just does db operations. Also rescan the SR after updaing the content_id during SXM, so that the latest content_id can be reflected in the returned vdi_info, which gets used later on in `update_snapshot_info`

Since we are now supporting qcow file `tap-ctl list` can return strings like: - "1564848 0 0 qcow2 /var/run/sr-mount/..." Without this patch the type "qcow2" is unknown and xcp-rrdd-iostat generates an error like: returned a line that could not be parsed. Ignoring This patch fixes the issue. Signed-off-by: Guillaume <[email protected]>

gthvn1 self-assigned this Feb 17, 2025

gthvn1 mentioned this pull request Apr 11, 2025

Add qcow2 as supported format by xcp-rrdd-iostat #5

Closed

edwintorok and others added 17 commits April 17, 2025 17:15

CA-409628: Add backtrace logging test

bfea6f3

The actual location is currently missing from the backtrace, which is the bug CA-409628. Signed-off-by: Edwin Török <[email protected]>

CA-409628: remove leftover log_backtrace from find->find_opt conversion

d5c00db

These shouldn't be necessary anymore, since no exception is raised. Signed-off-by: Edwin Török <[email protected]>

CA-409628: remove duplicate exception backtrace

eb5604b

response_internal_error already calls Backtrace.is_important in the correct place, and logs the exception. There is no need to do that a 2nd time in the caller. Signed-off-by: Edwin Török <[email protected]>

Update cluster-stack-version lifecycle

571f36a

This was not set properly when it was first introduced in xapi-24.15.0 Signed-off-by: Vincent Liu <[email protected]>

Update datamodel lifecycle

b18a1c9

Signed-off-by: Vincent Liu <[email protected]>

Update cluster-stack-version datamodel lifecycle (xapi-project#6436)

0ab649c

Move update_snapshot_info_dest to storage_mux

f0d21ae

This function updates the snapshot related db fields after the storage migration. There is no need to leave this in the storage layer as xapi-storage-script will not be able to access xapi db. Signed-off-by: Vincent Liu <[email protected]>

Refactor Storage_smapiv1.find_vdi

0a21358

Move this to storage_utils.ml since it is used by storage_smapiv1.ml and storage_mux.ml Signed-off-by: Vincent Liu <[email protected]>

Use the new scan2

7471d40

Signed-off-by: Vincent Liu <[email protected]>

Refactor Storage_migrate.find_vdi

ad25956

Extract common logic on finding vdi_info given vdi, and also add a parameter to specify where to find the VDI (locally or remotely). Signed-off-by: Vincent Liu <[email protected]>

gthvn1 force-pushed the gtn-add-qcow-to-xcp-rrdd-iostat branch from 85dd1d8 to 42ea6bb Compare April 24, 2025 13:39

gthvn1 force-pushed the gtn-add-qcow-to-xcp-rrdd-iostat branch from 42ea6bb to 6a1d8af Compare April 25, 2025 09:18

gthvn1 closed this Apr 29, 2025

gthvn1 deleted the gtn-add-qcow-to-xcp-rrdd-iostat branch April 29, 2025 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add qcow2 as supported format by xcp-rrdd-iostat #6

Add qcow2 as supported format by xcp-rrdd-iostat #6

Uh oh!

gthvn1 commented Feb 17, 2025 •

edited

Loading

Uh oh!

gthvn1 commented Feb 17, 2025

Uh oh!

coveralls commented Feb 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add qcow2 as supported format by xcp-rrdd-iostat #6

Add qcow2 as supported format by xcp-rrdd-iostat #6

Uh oh!

Conversation

gthvn1 commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gthvn1 commented Feb 17, 2025

Uh oh!

coveralls commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 14661243996

Details

💛 - Coveralls

Uh oh!

Uh oh!

gthvn1 commented Feb 17, 2025 •

edited

Loading

coveralls commented Feb 17, 2025 •

edited

Loading