Open
Description
Problem
Running solana with snapshot-archive-format lz4 causes a crash.
I am using the latest version: solana-cli 2.1.18 (src:f91c2fca; feat:3271415109, client:Agave)
In short whenever I try to use lz4 instead of default the memory and disk get overloaded no matter what configuration I have.
I have tried:
- nvmes in raid 0
- nvmes in raid 10
22 cores cpu and 178 gb memory
44 cores cpu and 346gb memory as in the print screen.
my configs:
cat /etc/systemd/system/solana-validator.service
[Unit]
Description=Solana Validator
After=network.target
After=luks-unlock.service
[Service]
Type=simple
User=root
LimitNOFILE=1000000
LimitMEMLOCK=infinity
Environment="SOLANA_METRICS_CONFIG="
Environment="RUST_BACKTRACE=1"
Environment="RUST_LOG=info"
ExecStart=/root/.local/share/solana/install/active_release/bin/agave-validator \
--ledger /mnt/solana-ledger/validator-ledger \
--accounts /mnt/solana-ledger/validator-ledger/accounts \
--snapshots /mnt/solana-ledger/validator-snapshots \
--identity /mnt/solana-ledger/validator-identity.json \
--entrypoint entrypoint.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint2.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint3.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint4.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint5.mainnet-beta.solana.com:8001 \
--expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d \
--known-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 \
--known-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ \
--known-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ \
--known-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S \
--rpc-port 8899 \
--log /var/log/solana-ledger/mainnet/validator.log \
--limit-ledger-size \
--wal-recovery-mode skip_any_corrupted_record \
--rpc-bind-address 0.0.0.0 \
--private-rpc \
--full-rpc-api \
--snapshot-archive-format lz4 \
--maximum-local-snapshot-age 500 \
--snapshot-interval-slots 500 \
--account-index program-id \
--account-index spl-token-owner \
--accounts-db-cache-limit-mb 10240 \
--dynamic-port-range 8000-8020
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
cat /etc/sysctl.d/99-solana-network.conf
net.core.rmem_max=134217728
net.core.rmem_default=134217728
net.core.wmem_max=134217728
net.core.wmem_default=134217728
net.ipv4.tcp_rmem=4096 131072 134217728
net.ipv4.tcp_wmem=4096 16384 134217728
net.core.netdev_max_backlog=300000
Proposed Solution
Unfortunately I am forced to move back to clasic snapshot-archive-format .
I don't have an exact error and I believe based on those metrics that the data keeps accumulating until it crashes any instance.
I have the logs saved on my local in case someone wants to follow up on this.