What's the proper way to protect dmu_buf_impl.db.db_data? #17118

asomers · 2025-03-05T18:35:27Z

asomers
Mar 5, 2025

According to the comments in dbuf.h, db.db_data is protected by db_mtx, while db_buf is protected by db_rwlock. That's been the case ever since f664f1e . However, the code seems a little bit inconsistent. Here's a table of a few selected functions, and which fields they protect with which locks.

	db_buf/db_mtx	db.db_data/db_mtx	db.db_data/db_rwlock	db.db_data/nothing
dbuf_clear_data	x	x
dbuf_set_data	x	x
dbuf_loan_arcbuf	x			x
dbuf_read_bonus		x
dbuf_handle_indirect_hole				x
dbuf_read_hole		x
dbuf_verify			x
dbuf_free_range	x		x
dbuf_find_bp			x
dbuf_hold_copy			x
dbuf_write_ready		x

I can't find any examples of a function that takes db_rwlock just to protect db_mtx. I think that what should be happening is that db_buf is protected by db_mtx, and db.db_data should be protected by db_rwlock (or perhaps protected by both?). Can anybody please help me understand this? It's not merely an academic exercise either. I suspect that locking mistakes on db.db_data are responsible for #16626 and the corruption described in #17077 . cc @pcd1193182

asomers · 2025-03-18T23:11:32Z

asomers
Mar 18, 2025
Author

Some possibilities for the writer that is racing with dbuf_write_ready are below. These functions write to db.db_data without locking db.db_mtx. The comment in dbuf.h still says that db_data "is protected by db_mtx", without any conditions, so I don't know why these functions don't lock it:

dnode_increase_indirection
dbuf_write_children_ready
free_children

0 replies

pcd1193182 · 2025-03-26T00:10:32Z

pcd1193182
Mar 26, 2025
Collaborator

So the subtlety here is that the value of the db.db_data and db_buf fields are, I believe, still protected by the db_mtx plus the db_holds refcount. The contents of the buffers are protected by the db_rwlock.

db_mtx is also responsible for protecting some of the other parts of the dbuf state. To highlight some examples from your table:

dbuf_clear_data and dbuf_set_data use db_mtx to protect the db_state and the value of db.db_data and db_buf.
As you pointed out, dbuf_loan_arcbuf is dead code
dbuf_read_bonus is protected the value of the db.db_data field (and others) with db_mtx.
dbuf_handle_indirect_hole is always called with db_mtx held, it's just not asserted in the function, it's done in the caller...
dbuf_read_hole, which is protecting the value of db.db_data and db_buf. It also affects the contents, but because those values were unset at the start of the function, no one else can be interacting with them.
dbuf_verify examines a lot of dbuf state, and checks the contents of the buffer only in a specific case (where I believe there are sufficient guarantees to protect the data, though I don't remember all of them at the moment).
dbuf_free_range uses db_mtx to protect most things, including the value of the db.db_data and db_buf fields. It also grabs the rwlock as a writer to clear the data in the buffer.
dbuf_find_bp calls hold_impl, which adds refcounts to the dbuf; these refcounts prevent the dbuf from being evicted, and therefore from having the db_buf or db.db_data fields from being cleared while it uses db_rwlock to protect the contents of the buffers.
dbuf_hold_copy is called from dbuf_hold_impl, where the mutex is already held, it just doesn't assert it.
dbuf_write_ready I agree looks a little funky. I don't have everything in my head at the moment; I think the contents of the data buffers are protected here because we're already writing out the buffers, and these are part of dirty records, but I am not as confident of this.
dnode_increase_indirection, dbuf_write_children_ready, and free_children hold the rwlock as a writer to protect the contents of the field, and has a hold of the dbuf so it cannot be evicted out from under it.

0 replies

snajpa · 2025-03-26T22:28:35Z

snajpa
Mar 26, 2025

@pcd1193182 so, if I understand it correctly, one needs to hold db_mtx and db_rwlock both to access db_data contents, is that right?

because if we don't hold db_mtx then db_data pointer could have changed and well, db_rwlock RW_WRITER or RW_READER is obvious

if that's the case, then here are the places which don't do this properly:

dbuf_loan_arcbuf (to be deleted as I understand)
dbuf_assign_arcbuf
dbuf_read_bonus
dbuf_fix_old_data
dbuf_sync_leaf

I mean, ZFS is obviously overwriting random memory in our deployment on Linux currently, this would be a nice explanation of why. It seems that the code has historically been optimized to take locks only where strictly necessary, but as the code evolves the context can change to contradict original assumptions. The level of complexity here is umm... something like RCU would be handy at least for the db_data pointer as such...

7 replies

pcd1193182 Mar 27, 2025
Collaborator

@snajpa No, that's not quite right.

The problem is that it's more complicated than simply "this mutex protects this value". dbufs have different states, and when they are in these different states, they can only be accessed in certain ways. To take the example of dbuf_read_bonus, in that case, we know the dbuf isn't cached (we're reading it in, after all). Because it isn't cached, no one else can possibly be looking at the contents of the db.db_data buffer, because there isn't one. As a result, we don't need the rwlock to protect those contents.

The dbuf locking code is certainly complex, and I think we would all agree that if it could be made simpler without affecting performance, that would be great. But it would be a dedicated project to do so, and it would need to be done by someone who really understands the nuances of the DMU.

asomers Mar 27, 2025
Author

It sounds to me like we need to add assertions based on those states. Something like:

ASSERT(db->db_state == DB_READ || RW_LOCK_HELD(db->db_rwlock));

But first we need to define the locking requirements for each db_state.

snajpa Apr 18, 2025

@pcd1193182 @asomers OK, so if I only focus on cases that seem to be usable in more contexts, thus should follow the rules around the mutex and rwlock more strictly, 622447f is what I arrive at. By the segfaults of userspace I'm seeing before the machine goes down completely, I would point at dbuf_loan_arcbuf, reading into b_data from unprotected deref of db.db_data; and as for the kernel data corruption closer to a crash, take a look at the change with dbuf_assign_arcbuf, writing into unprotected deref of db.db_data

I suppose, but am just speculating based on my current in-brain context, this somehow happens at the intersection of a txg sync, memory reclaim and either bonus buffer at play or holes, or both, or none of the last two. Just a speculation...

Going to deploy this in prod onto the next machine, which crashes (after a bit of a validation I didn't break anything) - I'd rather take a perf hit if there's a chance of increased stability at this point :)

Test results: https://github.com/vpsfreecz/zfs/actions/runs/14543184788

snajpa Apr 19, 2025

Oh, I forgot that dbuf_loan_arcbuf is actually unused haha :D So yeah, that leaves dbuf_assign_arcbuf, unprotected write into db.db_data. I'm not sure if this is sufficient though, shouldn't there also be a check if it's cached again? if it's possible it can get evicted...

snajpa Apr 20, 2025

dmu_assign_arcbuf_by_dbuf is used in zfs_write->abuf != NULL path in zfs_vnops.c (cca the block around line 900)

			/*
			 * Thus, we're writing a full block at a block-aligned
			 * offset and extending the file past EOF.
			 *
			 * dmu_assign_arcbuf_by_dbuf() will directly assign the
			 * arc buffer to a dbuf.
			 */

so a reproducer should probably be about writes and properly aligned rewriting + tight memory conditions

asomers · 2025-03-27T17:24:25Z

asomers
Mar 27, 2025
Author

So it sounds like there are a few problems:

Some functions should assert db_mtx, but don't. That's easy to fix.
db.db_data doesn't always need to be protected, based on its state. So we need to define the protection rules for each state. Or, in dbuf_read_bonus at least, we could defer assigning the newly allocated buffer to db.db_data, just to make it more clear that a lock is not needed.
Functions like dnode_increase_indirection have a hold on the dbuf so it cannot be evicted. But what's to prevent the db_data field from being reassigned?

Based on @pcd1193182 explanation from two days ago, but not considering dbuf state, I've made the following table of functions. Included is every function that directly accesses db.db_data. Some of these functions' only problem is that they don't assert that their parent locked a mutex.

Function	changes ptr value	changes ptr contents	reads ptr contents	Locks db_mtx	Locks db_rwlock as writer	Locks db_rwlock as reader	wr violation	rd violation
SA_GET_HDR	no	no	no	no	no	no	0	0
dbuf_verify_user	no	no	no	no	no	no	0	0
dbuf_verify	no	no	yes	yes	no	no	0	1
dbuf_clear_data	yes	no	no	yes	no	no	0	0
dbuf_set_data	yes	no	no	yes	no	no	0	0
dbuf_loan_arcbuf	no	no	yes	yes	no	no	0	1
dbuf_read_done	no	no	no	yes	no	no	0	0
dbuf_read_bonus	yes	yes	no	yes	no	no	1	0
dbuf_handle_indirect_hole	no	yes	yes	no	no	no	1	1
dbuf_read_hole	no	yes	no	yes	no	no	1	0
dbuf_read_impl	no	no	no	yes	no	yes	0	0
dbuf_fix_old_data	no	no	yes	yes	no	no	0	1
dbuf_noread	no	no	no	yes	no	no	0	0
dbuf_free_range	no	yes	no	yes	yes	no	0	0
dbuf_dirty	no	no	no	yes	no	no	0	0
dbuf_undirty_bonus	no	no	no	no	no	no	0	0
dmu_buf_will_clone_or_dio	no	no	no	yes	no	no	0	0
dmu_buf_fill_done	no	yes	no	yes	no	no	1	0
dmu_assign_arcbuf	no	yes	no	no	no	no	1	0
dbuf_destroy	no	yes	no	yes	no	no	1	0
dbuf_findbp	no	no	yes	no	no	yes	0	0
dbuf_hold_copy	no	yes	no	no	yes	no	0	0
dbuf_hold_impl	no	no	no	no	no	no	0	0
dbuf_check_blkptr	no	no	no	no	no	no	0	0
dbuf_lightweight_bp	no	no	yes	no	no	no	0	1
dbuf_lightweight_ready	no	yes	no	no	yes	no	0	0
dbuf_sync_leaf	no	no	yes	yes	no	no	0	1
dbuf_write_ready	no	no	yes	yes	no	no	0	1
dbuf_write_children_ready 1	no	no	yes	no	no	no	0	1
dbuf_write_children_ready 2	no	yes	no	no	yes	no	0	0
dbuf_remap	no	yes	yes	no	no	no	1	1
dbuf_remap_impl 1	no	no	yes	no	no	no	0	1
dbuf_remap_Impl 2	no	yes	no	no	yes	no	0	0
dmu_object_cached_size	no	no	yes	no	no	no	0	1
dmu_write_direct_done	no	no	no	no	no	no	0	0
dmu_read_abd	no	no	yes	yes	no	no	0	1
dmu_objset_userquota_find_data	no	no	no	no	no	no	0	0
dmu_objset_userquota_get_ids	no	no	yes	yes	no	no	0	1
dnode_verify	no	no	no	no	no	no	0	0
dnode_hold_impl	no	yes	yes	no	no	no	1	1
dnode_next_offset_level	no	no	yes	no	no	yes	0	0
dnode_increase_indirection	no	yes	yes	no	yes	no	0	0
free_verify	no	no	yes	yes	no	no	0	1
free_children 1	no	yes	yes	no	yes	no	0	0
free_children 2	no	no	no	no	no	yes
free_children 3	no	yes	yes	no	yes	no

0 replies

asomers · 2025-04-09T23:38:53Z

asomers
Apr 9, 2025
Author

#17209 adds some assertions to functions that need db_mtx and have it, but don't currently assert. I've also got a local branch that builds on that which adds lock acquisitions where needed. I'll open a PR after #17209 merges.

0 replies

asomers · 2025-04-22T17:37:22Z

asomers
Apr 22, 2025
Author

My local patch, which adds additional lock acquisitions similar to @snajpa 's , is now running in production on one machine. If it doesn't introduce any new problems for one week, then I'll deploy it more widely.

8 replies

snajpa Apr 28, 2025

Meanwhile I have another idea - while zfs_refcount_t works atomically (or at least I would hope so), dbuf_holds is mostly (except one inconsistency) incremented under db_mtx, but then checking if it's zero to evict isn't under db_mtx, so I'm going to try vpsfreecz@0168bc6 - plus I have the empty line in dnode_increase_indirection as a reminder to take another look at that one

snajpa Apr 29, 2025

@asomers I'm wondering, what specifically has led you to db.db_data? are you sure it can't be anything other? a bad access to db_buf maybe?

it's also interesting to note that when the corruption happens, it's always widespread, affecting way more than just a single page

asomers Apr 29, 2025
Author

Because the most common presentation of the crash, for me, is a failed assertion in the zfs_blkptr_verify function, as called from dbuf_write_ready, which does not take db_rwlock (but probably should). And dbuf_write_ready access the data via db->db.db_data. Your experience may differ.

snajpa Apr 30, 2025

oh on that note, I was wondering if dbuf_write_children_ready should access db_data under db_mtx...

as far as stacks go, there are multiple different combinations but on debug builds zfs_blkptr_verify is failing:

vpsfreecz#1 (comment)
vpsfreecz#1 (comment)
vpsfreecz#1 (comment)
vpsfreecz#1 (comment)

I used our fork's first and only PR to spin the bots and pin the results and stacktraces from local stress testing

as I was already saying multiple times, we should set up stress testing env for testing under tight memory conditions, somewhat more officially, than letting the users do it in prod :D I'd be happy to help with that, just don't know how to kick it off, who should I talk to (?)

asomers Apr 30, 2025
Author

oh on that note, I was wondering if dbuf_write_children_ready should access db_data under db_mtx...

Yes, I think it should. The patch that I'm testing includes that.

as far as stacks go, there are multiple different combinations but on debug builds zfs_blkptr_verify is failing:

vpsfreecz#1 (comment) vpsfreecz#1 (comment) vpsfreecz#1 (comment) vpsfreecz#1 (comment)

I haven't seen those stack traces. The most common stack trace that I see is the one shown in #16626 . And the others are mostly fallout from what happens after a corrupt blkptr gets read from disk.

I used our fork's first and only PR to spin the bots and pin the results and stacktraces from local stress testing

as I was already saying multiple times, we should set up stress testing env for testing under tight memory conditions, somewhat more officially, than letting the users do it in prod :D I'd be happy to help with that, just don't know how to kick it off, who should I talk to (?)

snajpa · 2025-04-30T00:20:00Z

snajpa
Apr 30, 2025

@asomers can you please take a look at this current state of code here? ->

zfs/module/zfs/dbuf.c

Line 3111 in a497c5f

arc_return_buf(buf, db);

this is dbuf_assign_arcbuf and I have a theory... when it does arc_return_buf(buf, db);, it exposes the db to possible eviction, doesn't it? and if there are multiple db->db_mtx contenders and one of them is eviction... kaboom... we don't run debug builds in prod, so that ASSERT on L3126 isn't there...

the loop is waiting if it's in DB_FILL or DB_READ stage, but if it's set to DB_EVICTING in dbuf_destroy (which also is relevant while truncating, not only eviction, if I'm not mistaken) well then we get fireworks as it does dbuf_set_data and returns, because dbuf_destroy has continued after it released the db_mtx to unhash the dbuf...

so I I haven't made a mistake and have this right - I would try two things:

diff --git a/module/zfs/dbuf.c b/module/zfs/dbuf.c
index 1a7274968..68b1a0aab 100644
--- a/module/zfs/dbuf.c
+++ b/module/zfs/dbuf.c
@@ -3116,11 +3116,10 @@ dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx)
        ASSERT3U(arc_buf_lsize(buf), ==, db->db.db_size);
        ASSERT(tx->tx_txg != 0);
 
+       mutex_enter(&db->db_mtx);
        arc_return_buf(buf, db);
        ASSERT(arc_released(buf));
 
-       mutex_enter(&db->db_mtx);
-
        while (db->db_state == DB_READ || db->db_state == DB_FILL)
                cv_wait(&db->db_changed, &db->db_mtx);
 
@@ -3252,9 +3251,9 @@ dbuf_destroy(dmu_buf_impl_t *db)
         * the hash table.  We can now drop db_mtx, which allows us to
         * acquire the dn_dbufs_mtx.
         */
+       DB_DNODE_ENTER(db);
        mutex_exit(&db->db_mtx);
 
-       DB_DNODE_ENTER(db);
        dn = DB_DNODE(db);
        dndb = dn->dn_dbuf;
        if (db->db_blkid != DMU_BONUS_BLKID) {

First is to arc_return_buf under db_mtx so that we can't get this state at all,

and second is to DB_DNODE_ENTER sooner, while under db_mtx (if this works, that is), so zfs_write doesn't run into problems when doing dmu_write_uio_dbuf iteratively

The second part, if this works, could maaaaybe mitigate #12078?

7 replies

snajpa Apr 30, 2025

maybe the second one is actually some problem with z_size being all racy in Linux ZPL, no idea atm, this probably isn't relevant here, that one is more for me to document my current thought process at least somewhere (more for myself, sorry should have noted that)

amotin Apr 30, 2025
Collaborator

@snajpa arc_return_buf() merely updates tag that holds the buffer's ARC header reference. There is nothing to lock. DB_DNODE_ENTER() merely prevents dnode relocation, which was never implemented outside of Solaris and since recent times I've actually made is a NOP to not waste resources. "These aren't the droids we're looking for".

snajpa Apr 30, 2025

@amotin ok so for the first case I'll try teaching dbuf_assign_arcbuf about DB_EVICTING so there's no room for doubt

and as for the second case, maybe dn_struct_rwlock then?

amotin Apr 30, 2025
Collaborator

@snajpa A caller of dbuf_assign_arcbuf() should have a reference on the dbuf and so it should not be evicted here. I don't understand what you are trying to fix here. And in the second case the buffer is already getting destroyed, which means there should be no other reference holders to it, and they can't appear once the state is set to DB_EVICTING.

snajpa Apr 30, 2025

Well, I am desperate :D we need to run 2.3.x but am getting multiple crashes per week and that has a tangible negative impact on the future of our operations, people are leaving... so I'm trying to figure out where to poke to get at least tiny bit of progress at the moment, any kind of new info, with which I haven't worked already, is handy. I'll just put a VERIFY that the db is not DB_EVICTING here and move on then :)

I assumed the ARC would do something down the line based on the tags it sees, but that's not the case.

Something in ZFS overwrites a ton of memory, which it doesn't own by the moment it does it, it causes a lot of userspace death and eventually kernel's too. It happens when dbuf eviction is running, but probably not only. My hunch is that it also races with zpl_fallocate on Linux.

snajpa · 2025-05-02T15:21:26Z

snajpa
May 2, 2025

So even with all the patches that we've both been floating around (incl. current #17209 as it is now, 3460c6c), I'm still able to reproduce

the easiest reproducer I have is running Docker on top of zpool, in a VM with 6 GB of RAM:

#!/bin/bash

pull() {
echo '37.205.12.192:5000/tensorflow/tensorflow:latest 37.205.12.192:5000/pytorch/pytorch:latest 37.205.12.192:5000/jupyter/datascience-notebook:latest 37.205.12.192:5000/continuumio/anaconda3:latest 37.205.12.192:5000/nvidia/cuda:11.8.0-runtime-ubuntu20.04 37.205.12.192:5000/huggingface/transformers-pytorch-gpu:latest 37.205.12.192:5000/apache/spark:latest 37.205.12.192:5000/rayproject/ray:latest 37.205.12.192:5000/mysql:latest 37.205.12.192:5000/postgres:latest 37.205.12.192:5000/mongo:latest 37.205.12.192:5000/redis:latest 37.205.12.192:5000/mariadb:latest 37.205.12.192:5000/neo4j:latest 37.205.12.192:5000/cassandra:latest 37.205.12.192:5000/influxdb:latest 37.205.12.192:5000/couchdb:latest 37.205.12.192:5000/rethinkdb:latest 37.205.12.192:5000/nginx:latest 37.205.12.192:5000/httpd:latest 37.205.12.192:5000/node:latest 37.205.12.192:5000/php:latest 37.205.12.192:5000/tomcat:latest 37.205.12.192:5000/python:latest 37.205.12.192:5000/golang:latest 37.205.12.192:5000/ruby:latest 37.205.12.192:5000/rust:latest 37.205.12.192:5000/openjdk:latest 37.205.12.192:5000/jenkins/jenkins:lts 37.205.12.192:5000/gitlab/gitlab-ce:latest 37.205.12.192:5000/sonarqube:latest 37.205.12.192:5000/hashicorp/terraform:latest 37.205.12.192:5000/hashicorp/packer:latest 37.205.12.192:5000/grafana/grafana:latest 37.205.12.192:5000/minio/minio:latest 37.205.12.192:5000/traefik:latest 37.205.12.192:5000/coredns/coredns:latest 37.205.12.192:5000/rabbitmq:latest 37.205.12.192:5000/bitnami/zookeeper:latest 37.205.12.192:5000/confluentinc/cp-kafka:latest 37.205.12.192:5000/kubernetesui/dashboard:latest 37.205.12.192:5000/gitea/gitea:latest 37.205.12.192:5000/vectorized/redpanda:latest 37.205.12.192:5000/splunk/splunk:latest 37.205.12.192:5000/aquasec/trivy:latest 37.205.12.192:5000/snyk/snyk-cli:docker 37.205.12.192:5000/falcosecurity/falco:latest 37.205.12.192:5000/wordpress:latest 37.205.12.192:5000/drupal:latest 37.205.12.192:5000/joomla:latest 37.205.12.192:5000/ghost:latest 37.205.12.192:5000/plone:latest 37.205.12.192:5000/redmine:latest 37.205.12.192:5000/matomo:latest 37.205.12.192:5000/prestashop/prestashop:latest 37.205.12.192:5000/eclipse-temurin:latest 37.205.12.192:5000/mcr.microsoft.com/dotnet/sdk:latest 37.205.12.192:5000/clojure:latest 37.205.12.192:5000/erlang:latest 37.205.12.192:5000/haskell:latest 37.205.12.192:5000/julia:latest 37.205.12.192:5000/perl:latest 37.205.12.192:5000/phpmyadmin:latest 37.205.12.192:5000/swaggerapi/swagger-ui:latest 37.205.12.192:5000/alpine:latest 37.205.12.192:5000/busybox:latest 37.205.12.192:5000/debian:latest 37.205.12.192:5000/ubuntu:latest 37.205.12.192:5000/centos:latest 37.205.12.192:5000/fedora:latest 37.205.12.192:5000/amazonlinux:latest 37.205.12.192:5000/archlinux:latest 37.205.12.192:5000/clearlinux:latest 37.205.12.192:5000/opensuse/leap:latest 37.205.12.192:5000/apache/nifi:latest 37.205.12.192:5000/apache/flink:latest 37.205.12.192:5000/jupyter/pyspark-notebook:latest 37.205.12.192:5000/databricksruntime/standard:latest 37.205.12.192:5000/confluentinc/cp-zookeeper:latest 37.205.12.192:5000/wurstmeister/kafka:latest 37.205.12.192:5000/bitnami/kafka:latest 37.205.12.192:5000/bitnami/rabbitmq:latest 37.205.12.192:5000/rabbitmq:3-management 37.205.12.192:5000/redis/redis-stack:latest 37.205.12.192:5000/eclipse-mosquitto:latest 37.205.12.192:5000/nats:latest 37.205.12.192:5000/solace/solace-pubsub-standard:latest 37.205.12.192:5000/bitnami/zookeeper:latest 37.205.12.192:5000/nginx/unit:latest 37.205.12.192:5000/jekyll/jekyll:latest 37.205.12.192:5000/gatsbyjs/gatsby:latest 37.205.12.192:5000/angular/ngcontainer:latest 37.205.12.192:5000/bitnami/nginx:latest 37.205.12.192:5000/nginxinc/nginx-unprivileged:latest 37.205.12.192:5000/httpd:alpine 37.205.12.192:5000/clickhouse/clickhouse-server:latest 37.205.12.192:5000/crate/crate:latest 37.205.12.192:5000/neo4j:latest 37.205.12.192:5000/tarantool/tarantool:latest 37.205.12.192:5000/arangodb/arangodb:latest 37.205.12.192:5000/bitnami/mariadb:latest 37.205.12.192:5000/yugabytedb/yugabyte:latest 37.205.12.192:5000/cockroachdb/cockroach:latest 37.205.12.192:5000/drone/drone:latest 37.205.12.192:5000/argoproj/argocd:latest 37.205.12.192:5000/bitnami/argo-cd:latest 37.205.12.192:5000/cyberark/conjur:latest 37.205.12.192:5000/splunk/splunk:latest 37.205.12.192:5000/certbot/certbot:latest 37.205.12.192:5000/arachni/arachni:latest 37.205.12.192:5000/projectdiscovery/nuclei:latest 37.205.12.192:5000/defectdojo/defectdojo-django:latest 37.205.12.192:5000/bitnami/ghost:latest 37.205.12.192:5000/adoptopenjdk/openjdk11:latest 37.205.12.192:5000/node:carbon 37.205.12.192:5000/openpolicyagent/opa:latest 37.205.12.192:5000/swaggerapi/swagger-editor:latest 37.205.12.192:5000/sonatype/nexus3:latest 37.205.12.192:5000/verdaccio/verdaccio:latest 37.205.12.192:5000/curlimages/curl:latest 37.205.12.192:5000/minio/minio:latest 37.205.12.192:5000/vaultwarden/server:latest 37.205.12.192:5000/keycloak/keycloak:latest 37.205.12.192:5000/tensorflow/serving:latest 37.205.12.192:5000/opendatacube/datacube-core:latest 37.205.12.192:5000/statsd/statsd:latest 37.205.12.192:5000/kaggle/python:latest 37.205.12.192:5000/digitalocean/doctl:latest 37.205.12.192:5000/mcr.microsoft.com/azure-cli:latest 37.205.12.192:5000/docker:dind 37.205.12.192:5000/pulumi/pulumi:latest 37.205.12.192:5000/cirros:latest 37.205.12.192:5000/scientificlinux/sl:latest 37.205.12.192:5000/debian:buster-slim 37.205.12.192:5000/almalinux:latest 37.205.12.192:5000/voidlinux/voidlinux:latest' | tr ' ' '\n' | tac | xargs -n 1 -P 25 docker pull
}

while :; do

while ! systemctl stop docker.service; do :; done

zpool export docker

zpool create -f -O mountpoint=/var/lib/docker docker /dev/vdc

sleep 2

systemctl restart docker.service

pull

done

the traces seem to differ a bit depending on what attempts are applied, now with all the patches it's down to dbuf_write_done path:

[  651.620879] VERIFY(BP_GET_TYPE(bp) != DMU_OT_NONE) failed
[  651.623490] PANIC at dsl_dataset.c:154:dsl_dataset_block_born()
[  651.625631] Showing stack for process 5686
[  651.627484] CPU: 1 PID: 5686 Comm: z_wr_iss_1 Tainted: P           O       6.10.1 #1
[  651.630925] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
[  651.634284] Call Trace:
[  651.635461]  <TASK>
[  651.636423]  dump_stack_lvl+0x64/0x80
[  651.638241]  spl_panic+0xd7/0xf0 [spl]
[  651.639874]  ? blk_mq_flush_plug_list.part.0+0x48e/0x590
[  651.642181]  ? update_load_avg+0x65e/0x7e0
[  651.644075]  ? place_entity+0xe0/0xf0
[  651.646315]  ? pick_eevdf+0x78/0x1a0
[  651.647860]  ? update_load_avg+0x7e/0x7e0
[  651.649545]  dsl_dataset_block_born+0x543/0x6b0 [zfs]
[  651.652158]  dbuf_write_done+0x492/0x890 [zfs]
[  651.654679]  arc_write_done+0xa1/0x880 [zfs]
[  651.656713]  ? _raw_spin_unlock+0xe/0x30
[  651.658361]  zio_done+0x2eb/0x1f80 [zfs]
[  651.660477]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[  651.662610]  ? zio_ready+0x424/0x790 [zfs]
[  651.664624]  zio_execute+0xdb/0x290 [zfs]
[  651.666599]  taskq_thread+0x348/0x870 [spl]
[  651.668290]  ? __pfx_default_wake_function+0x10/0x10
[  651.670549]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[  651.673132]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[  651.675351]  kthread+0xcf/0x100
[  651.676690]  ? __pfx_kthread+0x10/0x10
[  651.678296]  ret_from_fork+0x31/0x50
[  651.679900]  ? __pfx_kthread+0x10/0x10
[  651.681520]  ret_from_fork_asm+0x1a/0x30
[  651.683223]  </TASK>

However on its own, it's not reliable, it doesn't trigger on every such docker pull run. I have this patch with which it reproduces reliably: vpsfreecz@7540333

Essentially in all zfs*.c files, memset is replaced with a loop that runs a memset loop with line number of the original memset, after 2500 rounds it sets it to the desired value.

Now I have to dig through why kdump doesn't work on that dev Debian setup, I'm hoping that I'll be able to see more from a memory dump, hoping to see those line numbers instead of zeroes.

4 replies

snajpa May 7, 2025

So after a "few" more crash rounds, can conclude that I was misleading myself, that memeset patch doesn't change anything, it was more about old test stuff in the VM that sometimes launched (and thus consumed more RAM) and sometimes it didn't - the key remains to be the memory pressure.

Interestingly, I get like 80% of the crashes (ARC buffer verification failures) in zio_done stage and about 20% in dmu_buf_will_dirty; when it's the zio_done path, the zio is always the same, always io_priority = ZIO_PRIORITY_ASYNC_WRITE, io_type = ZIO_TYPE_WRITE, io_lsize = 131072, io_size = 131072, it is the last block of the file and full block aligned write. The callbacks io_done and io_ready in the zio are always arc_write_done, arc_write_ready, meaning it's a zio from arc_write.

@asomers ^

full zio dump:

struct zio {
  io_bookmark = {
    zb_objset = 54,
    zb_object = 7183,
    zb_level = 0,
    zb_blkid = 368
  },
  io_prop = {
    zp_checksum = ZIO_CHECKSUM_FLETCHER_4,
    zp_compress = ZIO_COMPRESS_LZ4,
    zp_complevel = 0 '\000',
    zp_level = 0 '\000',
    zp_copies = 1 '\001',
    zp_gang_copies = 2 '\002',
    zp_type = DMU_OT_PLAIN_FILE_CONTENTS,
    zp_dedup = B_FALSE,
    zp_dedup_verify = B_FALSE,
    zp_nopwrite = B_FALSE,
    zp_brtwrite = (B_TRUE | unknown: 0xffff9fd2),
    zp_encrypt = B_FALSE,
    zp_byteorder = B_TRUE,
    zp_direct_write = B_FALSE,
    zp_salt = "\000\000\000\000\000\000\000",
    zp_iv = "\000\000\000\000\000\000\000\000\000\000\000",
    zp_mac = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
    zp_zpl_smallblk = 0,
    zp_storage_type = DMU_OT_NONE
  },
  io_type = ZIO_TYPE_WRITE,
  io_child_type = ZIO_CHILD_LOGICAL,
  io_trim_flags = 0,
  io_priority = ZIO_PRIORITY_ASYNC_WRITE,
  io_reexecute = 0 '\000',
  io_state = "\001",
  io_txg = 858,
  io_spa = 0xffff9fd30c840000,
  io_bp = 0xffff9fd24217e2d0,
  io_bp_override = 0x0,
  io_bp_copy = {
    blk_dva = {{
        dva_word = {256, 136339026}
      }, {
        dva_word = {0, 0}
      }, {
        dva_word = {0, 0}
      }},
    blk_prop = 9228727766600319231,
    blk_pad = {0, 0},
    blk_birth_word = {0, 858},
    blk_fill = 1,
    blk_cksum = {
      zc_word = {70399545435548, 1153086052674354396, 2418976365717351745, 1592411817107977345}
    }
  },
  io_parent_list = {
    list_offset = 16,
    list_head = {
      next = 0xffff9fd33ab503a0,
      prev = 0xffff9fd33ab503a0
    }
  },
  io_child_list = {
    list_offset = 32,
    list_head = {
      next = 0xffff9fd331b3cd50,
      prev = 0xffff9fd331b3cd50
    }
  },
  io_logical = 0xffff9fd331b3cc00,
  io_transform_stack = 0x0,
  io_ready = 0xffffffffc167ddc0 <arc_write_ready>,
  io_children_ready = 0x0,
  io_done = 0xffffffffc167bd90 <arc_write_done>,
  io_private = 0xffff9fd31a34bc80,
  io_prev_space_delta = 131072,
  io_bp_orig = {
    blk_dva = {{
        dva_word = {0, 0}
      }, {
        dva_word = {0, 0}
      }, {
        dva_word = {0, 0}
      }},
    blk_prop = 0,
    blk_pad = {0, 0},
    blk_birth_word = {0, 0},
    blk_fill = 0,
    blk_cksum = {
      zc_word = {0, 0, 0, 0}
    }
  },
  io_lsize = 131072,
  io_abd = 0xffff9fd205062e00,
  io_orig_abd = 0xffff9fd205062e00,
  io_size = 131072,
  io_orig_size = 131072,
  io_vd = 0x0,
  io_vsd = 0x0,
  io_vsd_ops = 0xffffffffc1b3d228 <vdev_mirror_vsd_ops>,
  io_metaslab_class = 0xffff9fd30742e000,
  io_queue_state = ZIO_QS_NONE,
  io_queue_node = {
    l = {
      next = 0x0,
      prev = 0x0
    },
    a = {
      avl_child = {0x0, 0x0},
      avl_pcb = 1
    }
  },
  io_offset_node = {
    avl_child = {0x0, 0x0},
    avl_pcb = 0
  },
  io_offset = 0,
  io_timestamp = 0,
  io_queued_timestamp = 265659172225,
  io_target_timestamp = 0,
  io_delta = 0,
  io_delay = 0,
  io_alloc_list = {
    zal_list = {
      list_offset = 0,
      list_head = {
        next = 0xffff9fd331b3cee8,
        prev = 0xffff9fd331b3cee8
      }
    },
    zal_size = 0
  },
  io_flags = 16384,
  io_stage = ZIO_STAGE_DONE,
  io_pipeline = (ZIO_STAGE_WRITE_BP_INIT | ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_WRITE_COMPRESS | ZIO_STAGE_ENCRYPT | ZIO_STAGE_CHECKSUM_GENERATE | ZIO_STAGE_DVA_THROTTLE | ZIO_STAGE_DVA_ALLOCATE | ZIO_STAGE_READY | ZIO_STAGE_VDEV_IO_START | ZIO_STAGE_VDEV_IO_DONE | ZIO_STAGE_VDEV_IO_ASSESS | ZIO_STAGE_DONE),
  io_orig_flags = 0,
  io_orig_stage = ZIO_STAGE_OPEN,
  io_orig_pipeline = (ZIO_STAGE_WRITE_BP_INIT | ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_WRITE_COMPRESS | ZIO_STAGE_ENCRYPT | ZIO_STAGE_CHECKSUM_GENERATE | ZIO_STAGE_DVA_THROTTLE | ZIO_STAGE_DVA_ALLOCATE | ZIO_STAGE_READY | ZIO_STAGE_VDEV_IO_START | ZIO_STAGE_VDEV_IO_DONE | ZIO_STAGE_VDEV_IO_ASSESS | ZIO_STAGE_DONE),
  io_pipeline_trace = (ZIO_STAGE_OPEN | ZIO_STAGE_WRITE_BP_INIT | ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_WRITE_COMPRESS | ZIO_STAGE_ENCRYPT | ZIO_STAGE_CHECKSUM_GENERATE | ZIO_STAGE_DVA_THROTTLE | ZIO_STAGE_DVA_ALLOCATE | ZIO_STAGE_READY | ZIO_STAGE_VDEV_IO_START | ZIO_STAGE_VDEV_IO_DONE | ZIO_STAGE_VDEV_IO_ASSESS | ZIO_STAGE_DONE),
  io_error = 0,
  io_child_error = {0, 0, 0, 0},
  io_children = {{0, 0}, {0, 0}, {0, 0}, {0, 0}},
  io_stall = 0x0,
  io_gang_leader = 0xffff9fd331b3cc00,
  io_gang_tree = 0x0,
  io_executor = 0xffff9fd305e19040,
  io_waiter = 0x0,
  io_bio = 0x0,
  io_lock = {
    m_mutex = {
      owner = {
        counter = 0
      },
      wait_lock = {
        raw_lock = {
          {
            val = {
              counter = 0
            },
            {
              locked = 0 '\000',
              pending = 0 '\000'
            },
            {
              locked_pending = 0,
              tail = 0
            }
          }
        }
      },
      osq = {
        tail = {
          counter = 0
        }
      },
      wait_list = {
        next = 0xffff9fd331b3cfb8,
        prev = 0xffff9fd331b3cfb8
      }
    },
    m_lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              },
              {
                locked = 0 '\000',
                pending = 0 '\000'
              },
              {
                locked_pending = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    m_owner = 0x0
  },
  io_cv = {
    cv_magic = 879052276,
    cv_event = {
      lock = {
        {
          rlock = {
            raw_lock = {
              {
                val = {
                  counter = 0
                },
                {
                  locked = 0 '\000',
                  pending = 0 '\000'
                },
                {
                  locked_pending = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      head = {
        next = 0xffff9fd331b3cfe8,
        prev = 0xffff9fd331b3cfe8
      }
    },
    cv_destroy = {
      lock = {
        {
          rlock = {
            raw_lock = {
              {
                val = {
                  counter = 0
                },
                {
                  locked = 0 '\000',
                  pending = 0 '\000'
                },
                {
                  locked_pending = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      head = {
        next = 0xffff9fd331b3d000,
        prev = 0xffff9fd331b3d000
      }
    },
    cv_refs = {
      counter = 1
    },
    cv_waiters = {
      counter = 0
    },
    cv_mutex = 0x0
  },
  io_allocator = 1,
  io_cksum_report = 0x0,
  io_ena = 0,
  io_tqent = {
    tqent_lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              },
              {
                locked = 0 '\000',
                pending = 0 '\000'
              },
              {
                locked_pending = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    tqent_waitq = {
      lock = {
        {
          rlock = {
            raw_lock = {
              {
                val = {
                  counter = 0
                },
                {
                  locked = 0 '\000',
                  pending = 0 '\000'
                },
                {
                  locked_pending = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      head = {
        next = 0xffff9fd331b3d048,
        prev = 0xffff9fd331b3d048
      }
    },
    tqent_timer = {
      entry = {
        next = 0x0,
        pprev = 0x0
      },
      expires = 0,
      function = 0x0,
      flags = 43
    },
    tqent_list = {
      next = 0xffff9fd331b3d080,
      prev = 0xffff9fd331b3d080
    },
    tqent_id = 451879,
    tqent_func = 0xffffffffc182df90 <zio_execute>,
    tqent_arg = 0xffff9fd331b3cc00,
    tqent_taskq = 0xffff9fd303745000,
    tqent_flags = 5,
    tqent_birth = 4294932948
  }
}

asomers May 7, 2025
Author

it is the last block of the file and full block aligned write.

That's interesting. And it's not the pattern I see. My workload involves very large files written randomly. I always see the crash somewhere in the middle of the file. Sometimes it's caused by a regular write, and sometimes by zfs recv. I expect to deploy my patch widely next week, so soon I should know if it helps.

snajpa May 7, 2025

@asomers then it sounds like there is more than just one bug (at least on Linux); can you please share what you're deploying? I'd try that - I mean, if it fixes the thing you're after 100%, then I'd rather focus on whatever's left after that :)

asomers May 8, 2025
Author

This is the patch I'm using @snajpa , relative to FreeBSD 14.2. This patch also includes changes in #17209 . After that PR merges, which should hopefully be soon, I will rebase it onto the master branch and open a new PR.
db-mtx-locking.diff.gz

snajpa · 2025-05-13T00:56:23Z

snajpa
May 13, 2025

@asomers I think I have it: vpsfreecz@cb9cc9cfdc41

Problem is that db.db_size might be larger than the buffer copying was accounting for, or in the second case, the target buffer might have been smaller than what would have been necessary, thus yielding my hard-to-find but very-urgent-in-our-fleet memory problems. So I deduplicated the code so we have it once and hopefully correct :D

Originally I thought the whole time this was UAF or double-free, but nope, out of bounds access, not a full memset or anything, only partial buffer eff ups. That's why I wasn't seeing anything suspicious in the memory dumps, as it was always way beoynd where I looked.

If you read through the new (+deduplicated) dbuf_hold_copy_impl vs. dbuf_new_size, it should make sense.

Ping @amotin probably found the droids :D

Will ping you guys if it STILL crashes, but dunno, here I have something that is fully understandable and seems IMO reasonable, I mean I haven't felt comfortable with locking the hell out of everything either :D

Futher evidence your honor:

https://paste.vpsfree.cz/wnYu6U7c

This is actually the trace that has lead me eventually here. Had to bypass the SPL a bit (using:

echo 1 > /sys/module/spl/parameters/spl_kmem_cache_obj_per_slab
echo $((128*1024)) > /sys/module/spl/parameters/spl_kmem_cache_slab_limit
echo 1 > /sys/module/spl/parameters/spl_taskq_thread_sequential

Tests running, will post a PR in a few (days at worst), if there are no surprises

38 replies

snajpa May 14, 2025

Umm that alone doesn't entirely make sense either, it's a bit more nuanced, I think it has to do with interaction with dbuf_assign_arcbuf that happens in zfs_write path and perhaps dbuf_redirty or even dbuf_unoverride in the mix, I mean I don't want to make up excuses but it's really hard to focus, my brain wants to know more about that metaslab thing now :D

edit: maybe it's actually the second branch of the if in dmu_assign_arcbuf_by_dbuf, which is present as dmu_assign_arcbuf_by_dbuf in zfs_write

snajpa May 15, 2025

The important part which I've read over and over from the crashdumps is that a full size block write at an end of that file is being written - and on that dbuf evict, the checksum doesn't match... sometimes. :D Of this I'm 100% sure. zio_done about priority async write of full 131072 block at the end of the object

snajpa May 15, 2025

Passed 3h mark multiple times now - but only when I'm using 2.3.x as a base, so that metaslab-thing is a new bug in master compared to 2.3. Will try to gather more and going to file it as an issue for someone else to deal with, after this ordeal we're going to be staying with 2.3.x for what I expect to be a looooong time, unless testing of master improves, I don't want to repeat what I've been through this last year. Our third child is due in August, so I'm not going to have much room for it anyway... gotta catch up with other TODO points until then too

Good news is that the simple wait has resolved the problem with dbufs/arcbufs + evict related corruption.

vpsfreecz/zfs@6c0ab8e

snajpa May 15, 2025

btw I would be glad to set up a testing environment for low memory conditions, the primary reason why I haven't taken any action on that is that our budget doesn't have the funds necessary to keep it running over the long term - electricity costs in Europe are unforgiving... but I think we could use the non-profit status of our project so that the commercial actors in OpenZFS realm could get a tax write-off... I mean taking care of bare metal is something I do already anyway, so caring about one more rack is not a problem; but putting together something the whole project could rely on, isn't exactly a cheap proposition.

Refurbed 2nd hand HW + colo is going to be 100% cheaper than hosting this in the cloud, especially with the required amount of cores...

@amotin how do you personally see this? do you agree the project needs heavier testing? how do we make it a reality?

drescherjm May 17, 2025

Can you ask some company like 45 Drives (who use zfs) for access to a server in their cloud?

deajan · 2025-06-02T16:16:20Z

deajan
Jun 2, 2025

@snajpa As openzfs currently are preparing zfs-2.2.8 branch for EL9.6 (and 10), could you tell if this problem is present in 2.2 branch, or perhaps needs backporting before 2.2.8 release ?

1 reply

snajpa Jun 2, 2025

I would say yes, but it's pretty hard to hit; as it turns out, even this wasn't the reason why we're seeing so frequent crashes on Linux, there's something there, much closer to VFS ops structs, but I don't know what still, I'm now 100% focused on that - didn't even have the time to create a PR with this two line thing - I would like to hunt down the mmap-related problems first. I mean my docker pull test finishes OK with this patch, but that's about it as to what I know for a fact about it - the rest are theories, which still need to be validated.

asomers · 2025-06-03T21:16:24Z

asomers
Jun 3, 2025
Author

I'm satisfied with the testing of my patch, and I've submitted it as #17418 .

22 replies

asomers Jun 4, 2025
Author

Getting rid of heavy handed db_mtx for everything is the primary reason why db_rwlock was added, at least I understood it that way

Are you sure? Commit f664f1e introduced db_rwlock as a way to reduce contention on dn_struct_rwlock, not db_mtx.

... and in that process, albeit with some omissions that had to be fixed after merging, places where db_mtx doesn't have to be taken were identified and db_rwlock is used instead, in those places, db.db_data pointer shouldn't be unstable, so db_mtx shouldn't be needed at all...

I agree that it would be good to skip locking db_mtx when it can be proven that db.db_data cannot change. But how can that be proven? Such places should at least somehow assert on the condition of the db_data buffer. Or, more reliably, db_mtx could be converted from a mutex to an rwlock.

snajpa Jun 4, 2025

Are you sure?

was reasonably sure until now, have to work backwards through how I gained such confidence :D usually I tend to have a reason for it, but cant remember instantly

snajpa Jun 4, 2025

I agree that it would be good to skip locking db_mtx when it can be proven that db.db_data cannot change.

was thinking, that we could move the pointer assignment itself under the rwlock too

snajpa Jun 4, 2025

@asomers so I think I have the origins of the db_rwlock mixed up since the beginning, I apologize!

Took me a bit to load the context back to untangle the mess. I'm sorry for even thinking straightening out of the db_rwlock vs db_mtx vs db.db_data thing might not be needed. Looking at what actually got me through the docker pull, I had some variant of it in the stack too! So it is really not the case that this isn't needed, the pull wouldn't finish without this either, but the two line wait thing is needed on top of that, so yes, I was chasing a different bug.

What is different from your perspective and mine with the docker pull, is that the pull triggered the dbuf resize related thing way more often.

snajpa Jun 4, 2025

Not trying to make excuses, but I really didn't expect it was even possible, does not compute for me how can so many bugs fly under the radar for so long with relatively little to be found in the Issues. Pure denial from my side.

And the other thing is that I learn way slower and apparently with a lot of first-impressions based mistakes under stress :( taking this as a lesson

IMO we will have to do something about the QA under load...

amotin · 2025-06-08T23:19:52Z

amotin
Jun 8, 2025
Collaborator

@asomers Please take a look on #17441. I've taken a closer look on db_rwlock scope and found several inconsistencies. Not sure those are fatal, but all I have so far. What we could review is whether all block pointer accesses take the required locks. I only looked on/around current db_rwlock usage this time.

0 replies

amotin · 2025-06-09T01:58:23Z

amotin
Jun 9, 2025
Collaborator

I think that what should be happening is that db_buf is protected by db_mtx, and db.db_data should be protected by db_rwlock (or perhaps protected by both?).

db_rwlock protect content of buffers that are parent (indirect or dnode) of some other buffer, and we need to either write or read the block pointer of the buffer, either directly or via de-referencing the pointer of db_blkptr pointing inside it. All the parent buffers permanently referenced so can not be evicted, and have only one copy, so their memory should never be reallocated, so db_mtx protection is not required in this case.

0 replies

What's the proper way to protect dmu_buf_impl.db.db_data? #17118

Uh oh!

Replies: 13 comments · 87 replies

Uh oh!

asomers Mar 18, 2025 Author

Uh oh!

Uh oh!

pcd1193182 Mar 26, 2025 Collaborator

Uh oh!

Uh oh!

pcd1193182 Mar 27, 2025 Collaborator

Uh oh!

asomers Mar 27, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asomers Mar 27, 2025 Author

Uh oh!

asomers Apr 9, 2025 Author

Uh oh!

asomers Apr 22, 2025 Author

Uh oh!

Uh oh!

Uh oh!

asomers Apr 29, 2025 Author

Uh oh!

Uh oh!

asomers Apr 30, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotin Apr 30, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

amotin Apr 30, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asomers May 7, 2025 Author

Uh oh!

Uh oh!

asomers May 8, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Replies: 13 comments 87 replies

asomers
Mar 18, 2025
Author

pcd1193182
Mar 26, 2025
Collaborator

pcd1193182 Mar 27, 2025
Collaborator

asomers Mar 27, 2025
Author

asomers
Mar 27, 2025
Author

asomers
Apr 9, 2025
Author

asomers
Apr 22, 2025
Author

asomers Apr 29, 2025
Author

asomers Apr 30, 2025
Author

amotin Apr 30, 2025
Collaborator

amotin Apr 30, 2025
Collaborator

asomers May 7, 2025
Author

asomers May 8, 2025
Author