Skip to content

Help to determine if Mayastor is rebooting our servers #1912

@cmontemuino

Description

@cmontemuino

We have OpenEBS Mayastor installed in several K8S clusters, and recently we've stated to experience several server reboots (per day), which initially seems to be related to Mayastor.

Several details of our environment follows. Any ideas/suggestions are welcome.

This is what we always see in /var/crash/ files:

[10538.801189] general protection fault, probably for non-canonical address 0xb75d20f87555d73a: 0000 [#1] PREEMPT SMP NOPTI
[10538.801198] Workqueue: kblockd blk_mq_timeout_work
[10538.801205] RIP: 0010:blk_mq_queue_tag_busy_iter+0x6f/0x590
[10538.801209] Code: 00 65 48 ff 00 e8 c1 d8 bb ff 48 8b 44 24 08 48 8b 80 e0 03 00 00 f6 80 84 00 00 00 08 0f 84 ed 00 00 00 4c 8b a0 98 00 00 00 <41> 8b 54 24 04 85 d2 0f 85 bc 03 00 00 48 8b 44 24 08 41 8b 5c 24
[10538.801210] RSP: 0018:ff30d3e34fd1fda8 EFLAGS: 00010202
[10538.801212] RAX: ff2ba51a0701a130 RBX: ff2ba5196eca32a0 RCX: 0000000000000003
[10538.801213] RDX: ff2ba4d85297bac0 RSI: ffffffffbb1fc830 RDI: ff2ba51764a7a340
[10538.801214] RBP: ff2ba5196eca32a0 R08: ff2ba517070f8940 R09: ff2ba517070f8980
[10538.801215] R10: 0000000000000008 R11: 0000000000000008 R12: b75d20f87555d736
[10538.801216] R13: ff2ba555ff432e10 R14: ff2ba5170820b605 R15: 0000000000000000
[10538.801217] FS:  0000000000000000(0000) GS:ff2ba555ff400000(0000) knlGS:0000000000000000
[10538.801218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10538.801219] CR2: 00007f71a8002088 CR3: 0000004531ea4006 CR4: 0000000000773ef0
[10538.801221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10538.801221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10538.801222] PKRU: 55555554
[10538.801223] Call Trace:
[10538.801225]  <TASK>
[10538.801227]  ? show_trace_log_lvl+0x1c4/0x2df
[10538.801231]  ? show_trace_log_lvl+0x1c4/0x2df
[10538.801234]  ? blk_mq_timeout_work+0x74/0x1b0
[10538.801236]  ? __die_body.cold+0x8/0xd
[10538.801238]  ? die_addr+0x39/0x60
[10538.801242]  ? exc_general_protection+0x1ec/0x420
[10538.801247]  ? asm_exc_general_protection+0x22/0x30
[10538.801252]  ? __pfx_blk_mq_check_expired+0x10/0x10
[10538.801256]  ? blk_mq_queue_tag_busy_iter+0x6f/0x590
[10538.801257]  ? blk_mq_queue_tag_busy_iter+0x55a/0x590
[10538.801258]  ? __pfx_blk_mq_check_expired+0x10/0x10
[10538.801260]  ? pick_next_task_idle+0x26/0x40
[10538.801263]  ? pick_next_task+0x9f9/0xaf0
[10538.801266]  ? dequeue_task_fair+0xaa/0x370
[10538.801269]  ? __switch_to_asm+0x3a/0x80
[10538.801272]  blk_mq_timeout_work+0x74/0x1b0
[10538.801275]  process_one_work+0x194/0x380
[10538.801278]  worker_thread+0x2fe/0x410
[10538.801280]  ? __pfx_worker_thread+0x10/0x10
[10538.801282]  kthread+0xdd/0x100
[10538.801286]  ? __pfx_kthread+0x10/0x10
[10538.801289]  ret_from_fork+0x2c/0x50
[10538.801292]  </TASK>

Hardware: Dell PowerEdge R650xs/0441XG, BIOS 1.18.1
Disks: 6 x SAS HDD 2235 GB
OS: Oracle 9 (5.14.0-570.44.1.0.1.el9_6.x86_64)

We had 6 DiskPools initially. The reduced to a single one, but servers still keep rebooting.

io-engine configured with following envs:

  env:
    - name: RUST_LOG
      value: info
    - name: NVMF_TCP_MAX_QPAIRS_PER_CTRL
      value: "6"
    - name: NVMF_TCP_MAX_QUEUE_DEPTH
      value: "32"
    - name: NVME_TIMEOUT
      value: 200s
    - name: NVME_TIMEOUT_ADMIN
      value: 90s
    - name: NVME_KATO
      value: 45s

We've increased NVME_TIMEOUT to be way above kernel's timeout. Also increased the keepalive to 45s (default is 10s, which looked a bit aggressive)

io-engine args:

  - args:
    - -g$(MY_POD_IP)
    - -N$(MY_NODE_NAME)
    - -Rhttps://mayastor-agent-core:50051
    - -y/var/local/mayastor/io-engine/config.yaml
    - -l1,2,3,4,5,6
    - -p=mayastor-etcd:2379
    - --ptpl-dir=/var/local/mayastor/io-engine/ptpl/
    - --api-versions=v1
    - --tgt-crdt=30
    - --events-url=nats://mayastor-nats:4222
# ....
resources:
  limits:
    hugepages-2Mi: 2Gi
    memory: 4Gi
  requests:
    cpu: "8"
    hugepages-2Mi: 2Gi
    memory: 4Gi

We're using way much more compute/mem than what's normally needed.

Pool list in the server:

NAME                                             UUID                                 STATE       CAPACITY         USED DISKS
mayastor-pool-on-live-k8sworker-010-disk-0 60a2937d-f0e9-4520-9f36-b1bdaafe47e8 online 2397468360704 214748364800 aio:///dev/disk/by-id/scsi-36f4ee080549f55003063f8a54b1a0146?uuid=668e3006-ac61-4a6f-b66a-9d59377fa2eb

Pool stats:

NAME                                             NUM_RD_OPS TOTAL_RD NUM_WR_OPS TOTAL_WR  NUM_UNMAP_OPS TOTAL_UNMAPPED RD_LAT   WR_LAT    UNMAP_LATENCY MAX_RD_LAT MIN_RD_LAT MAX_WR_LAT MIN_WR_LAT
mayastor-pool-on-live-k8sworker-010-disk-0 594609     2.65 GiB 1388440    81.42 GiB 0             0 B            47065162 208191293 0             592752     15         13683      15

Metadata

Metadata

Assignees

Labels

kind/documentationImprovements or additions to documentationos/linux/buga bug on the linux kernel

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions