Skip to content

Will newer DRBD userland break older Kernels ? #99

Open
@bernardgut

Description

Hello

Sorry in advance if this is obvious but I have been testing Linstor/DRBD on Talos for 2 weeks now. I have had a lot of issues with piraeus-operator (v2.6.0) on my Talos (1.7.6) cluster. Namely:

  • Random quorum lost on volumes (replicas stuck in connecting(<nodeID>/Unconnected(<nodeID>)). With no errors in drbdadm
  • Volumes that randomly get stuck in "Terminating" when I delete them with the logs on the pv stating Warning VolumeFailedDelete 4m42s linstor.csi.linbit.com_linstor-csi-controller-84674bd55b-4kd2n_20cbf029-05ef-4869-bf72-b9782a25f513 (combined from similar events): rpc error: code = Internal desc = failed to delete volume: Message: 'Resource 'pvc-a423738b-8249-48ae-8a57-a708f87c98e5' is still in use.'; Cause: 'Resource is mounted/in use.'; Details: 'Node: n2, Resource: pvc-a423738b-8249-48ae-8a57-a708f87c98e5'; Correction: 'Un-mount resource 'pvc-a423738b-8249-48ae-8a57-a708f87c98e5' on the node 'n2'.'; Reports: '[66EEEB28-00000-000019]' (I can share the error reports if you want)

Because I was the only one seeing these I assumed I must have made a mistake somewhere in my config.

After investigating for a week I found out that when You install mainline Talos (currently 1.7.6) and setup the drbd kernel module, you get the 9.8.2 version. Currently the DRBD version packaged with piraeus-operator is DRBD 9.2.11. It is a very obscure thing and you have to go look for it in the image tags and Talos will not push extensions updates to older versions of Talos so you need to wait until a new version of Talos gets released to get the latest version.

I new assume this is the reason why I am seeing all these issues. But before I open a PR on the piraeus-operator repo in the Talos section to add warnings to the documentation so that other people who are new to this don't hit the same issue as me, I need to confirm this :

  • Can running DRBD 9.2.11 userland against a DRBD 9.2.8 kernel cause these kind of issues ?

I am pretty new to DRBD so apologies if the answer is obvious. Either way, If the answer is yes, then I would suggest adding a small printk("drbd kernel version mismatch <9.X.X> vs <9.X.Y)") somewhere in the dmesg... I can even open the PR myself if you show me in which file to do it.

Thanks
Bernard.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions