Skip to content

Stuck in the near data compaction #8

@Zivvv

Description

@Zivvv

Hello!

I run dLSM as README but the db_bench seems to stuck in the near data compaction.
Here the logs from Server and compute node.
I have tested the RDMA with perftest and the connection is good.
Did I miss something?
Thank you!

root@node005:~/dLSM/build$ ./Server
searching for IB devices in host
found 1 device(s)
device not specified, using first one found: mlx5_0
New MR was registered with addr=0x7f060fb95010, lkey=0xa362, rkey=0xa362, flags=0x7, size=10240000, total registered size is 0
New MR was registered with addr=0x7f060f1d0010, lkey=0x6e2d, rkey=0x6e2d, flags=0x7, size=10240000, total registered size is 10240000
SST buffer, send&receive buffer were registered with a
maximum outstanding wr number is32768
maximum query pair number is131072
maximum completion queue number is16777216
maximum memory region number is16777216
maximum memory region size is18446744073709551615
checkpoint0connection built up from192.168.6.744687
connection family is 2
A new shared memory thread start
checkpoint1checkpoint1QP was created, QP number=0x37
checkpoint2
Local LID = 0x0
total bytes: 23read byte: 23Remote QP number = 0x38
Remote LID = 0x0
Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22
QP 0x7f0608002278 state was change to RTS
The connected compute node's id is 1
Polling sync option handlerory 87 GB
total bytes: 1read byte: 1Option sync finished
Polling sync option handler
Option sync finished
compute node sync number is 1Register memory for computing node
create query pair command receive for
Remote QP number=0x39
Remote LID = 0x0
QP was created, QP number=0x38
Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22
QP 0x7f060801d678 state was change to RTS
create query pair command receive for
Remote QP number=0x3a
Remote LID = 0x0
QP was created, QP number=0x39
Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22
QP 0x7f0608025da8 state was change to RTS
create query pair command receive for
Remote QP number=0x3b
Remote LID = 0x0
QP was created, QP number=0x3a
Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22
QP 0x7f060802e638 state was change to RTS
create query pair command receive for
Remote QP number=0x3c
Remote LID = 0x0
QP was created, QP number=0x3b
Remote GID =fe:80:00:00:00:00:00:00:96:6d:ae:ff:fe:15:94:22
QP 0x7f0608036d88 state was change to RTS
near data compaction
Register memory for computing node
number 0 got bad completion with status: 0xc, vendor syndrome: 0x81
RDMA Write Failed
q id is
QP number=0x37
Register memory for computing node
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
RDMA Write Failed
q id is
QP number=0x37
corrupt message from client. 0ent
Polling Remote Compaction content

root@node007:~/dLSM/build# ./db_bench --benchmarks=fillrandom --threads=1 --value_size=400 --num=50000000 --bloom_bits=10 --readwritepercent=5 --compute_node_id=0 --fixed_compute_shards_num=0
Mark: valgrind socket info1
searching for IB devices in host
found 1 device(s)
device not specified, using first one found: mlx5_0
New MR was registered with addr=0x7fc2322dc010, lkey=0xb978, rkey=0xb978, flags=0x7, size=10240000, total registered size is 0
New MR was registered with addr=0x7fc231917010, lkey=0x7535, rkey=0x7535, flags=0x7, size=10240000, total registered size is 10240000
SST buffer, send&receive buffer were registered with a
maximum outstanding wr number is32768
maximum query pair number is131072
maximum completion queue number is16777216
maximum memory region number is16777216
maximum memory region size is18446744073709551615
Success to connect to 192.168.6.5
TCP connection was established
connect to node id 0QP was created, QP number=0x38

Local LID = 0x0
total bytes: 23read byte: 23Remote QP number = 0x37
Remote LID = 0x0
Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72
QP 0x7fc22c0022b8 state was change to RTS
total bytes: 1read byte: 1Finish the connection with node 0
New MR was registered with addr=0x7fc1ebfff010, lkey=0x9d11, rkey=0x9d11, flags=0x7, size=1073741824, total registered size is 20480000
dLSM: version 1.22
Date: Wed Jul 10 06:23:09 2024
Start to sync options
client handling thread
CPU: 64 * Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
CPUCache:
Keys: 16 bytes each
Values: 400 bytes each (200 bytes after compression)
Entries: 50000000
RawSize: 19836.4 MB (estimated)
FileSize: 10299.7 MB (estimated)
WARNING: Snappy compression is not enabled

DBImpl start
New MR was registered with addr=0x7fc1e9ffe010, lkey=0xcf8e, rkey=0xcf8e, flags=0x7, size=33554432, total registered size is 1094221824
Memory used up, Initially, allocate new one, memory pool is Version_edit, total memory this pool is 1
communication thread created
DBImpl finished
DBImpl deallocated
Total number of entries within the cahce is 0DBImpl start
communication thread created
DBImpl finished
validation write finished
start front-end threads
Wait for thread start
total bytes: 1read byte: 1sync wait time is 384180Threads start to run
New MR was registered with addr=0x7fc197fff010, lkey=0x9e12, rkey=0x9e12, flags=0x7, size=1073741824, total registered size is 1127776256
Memory used up, Initially, allocate new one, memory pool is FlushBuffer, total memory this pool is 1
New MR was registered with addr=0x7fc13ffff010, lkey=0xac13, rkey=0xac13, flags=0x7, size=1073741824, total registered size is 2201518080
Memory used up, Initially, allocate new one, memory pool is IndexChunk, total memory this pool is 1
New MR was registered with addr=0x7fc0effff010, lkey=0x14a14, rkey=0x14a14, flags=0x7, size=1073741824, total registered size is 3275259904
Memory used up, Initially, allocate new one, memory pool is FilterChunk, total memory this pool is 1
Remote memory registeration, size: 1073741824
polled reply bufferr
QP was created, QP number=0x39

QP num to be sent = 0x39
Local LID = 0x0
QP was created, QP number=0x3a

QP num to be sent = 0x3a
Local LID = 0x0
QP was created, QP number=0x3b
Polling reply buffer
QP num to be sent = 0x3b
Local LID = 0x0uffer
QP was created, QP number=0x3c
Polling reply buffer
QP num to be sent = 0x3c
Local LID = 0x0uffer
Remote QP number=0x38
Remote LID = 0x0ffer
Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72
QP 0x7fc1d80099e8 state was change to RTS
Remote QP number=0x39
Remote LID = 0x0ffer
Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72
QP 0x7fc180041538 state was change to RTS
Remote QP number=0x3a
Remote LID = 0x0ffer
Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72
QP 0x7fc138041538 state was change to RTS
Remote QP number=0x3b
Remote LID = 0x0
Remote GID =fe:80:00:00:00:00:00:00:ba:3f:d2:ff:fe:56:ee:72
QP 0x7fc188041538 state was change to RTS
number 0 got bad completion with status: 0xc, vendor syndrome: 0x81
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0xc, vendor syndrome: 0x81
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0xc, vendor syndrome: 0x81
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0xc, vendor syndrome: 0x81
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 3 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 4 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 5 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 6 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 7 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 8 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 9 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074Remote memory registeration, size: 1073741824
polled reply bufferr
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 0 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 1 got bad completion with status: 0x5, vendor syndrome: 0xf9
number 2 got bad completion with status: 0x5, vendor syndrome: 0xf9
BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074BloomFilter block size is 192074Remote memory registeration, size: 1073741824
Polling reply buffer ops

Thank you for your time and appreciate for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions