Skip to content

db-analyser: add DumpStakeDistributions pass #1421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ouroboros-consensus-cardano/app/DBAnalyser/Parsers.hs
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,10 @@ parseAnalysis = asum [
]
, benchmarkLedgerOpsParser
, getBlockApplicationMetrics
, flag' DumpStakeDistributions $ mconcat [
long "dump-stake-distributions"
, help "Show the stake distribution for each epoch of some processed block"
]
, pure OnlyValidation
]

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<!--
A new scriv changelog fragment.

Uncomment the section that is right (remove the HTML comment wrapper).
-->

<!--
### Patch

- A bullet item for the Patch category.

-->

### Non-Breaking

- Added the --dump-stake-distributions pass to `db-analyser`


<!--
### Breaking

- A bullet item for the Breaking category.

-->
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ module Cardano.Tools.DBAnalyser.Analysis (
, runAnalysis
) where

import Cardano.Ledger.Crypto (StandardCrypto)
import qualified Cardano.Ledger.PoolDistr as SL
import qualified Cardano.Slotting.Slot as Slotting
import qualified Cardano.Tools.DBAnalyser.Analysis.BenchmarkLedgerOps.FileWriting as F
import qualified Cardano.Tools.DBAnalyser.Analysis.BenchmarkLedgerOps.SlotDataPoint as DP
Expand Down Expand Up @@ -115,6 +117,7 @@ runAnalysis analysisName = case go analysisName of
go (ReproMempoolAndForge nBks) = mkAnalysis $ reproMempoolForge nBks
go (BenchmarkLedgerOps mOutfile lgrAppMode) = mkAnalysis $ benchmarkLedgerOps mOutfile lgrAppMode
go (GetBlockApplicationMetrics nrBlocks mOutfile) = mkAnalysis $ getBlockApplicationMetrics nrBlocks mOutfile
go DumpStakeDistributions = mkAnalysis $ dumpStakeDistributions

mkAnalysis ::
forall startFrom. SingI startFrom
Expand Down Expand Up @@ -218,6 +221,7 @@ data TraceEvent blk =
-- * monotonic time to call 'Mempool.getSnapshotFor'
-- * total time spent in the mutator when calling 'Mempool.getSnapshotFor'
-- * total time spent in gc when calling 'Mempool.getSnapshotFor'
| DumpStakeDistribution EpochNo (SL.PoolDistr StandardCrypto)

instance (HasAnalysis blk, LedgerSupportsProtocol blk) => Show (TraceEvent blk) where
show (StartedEvent analysisName) = "Started " <> (show analysisName)
Expand Down Expand Up @@ -271,7 +275,14 @@ instance (HasAnalysis blk, LedgerSupportsProtocol blk) => Show (TraceEvent blk)
, "mutSnap " <> show mutSnap
, "gcSnap " <> show gcSnap
]

show (DumpStakeDistribution eno pd) =
intercalate "\t"
$ (\ss -> show eno : show (SL.pdTotalActiveStake pd) : show (Map.size mp) : ss)
$ [ show (keyhash, SL.individualTotalPoolStake x, SL.individualPoolStake x)
| (keyhash, x) <- Map.assocs mp
Comment on lines +280 to +282
Copy link
Member

@amesgen amesgen Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK pdTotalActiveStake and individualTotalPoolStake are only used for Conway voting stuff, so individualTotalPoolStake / pdTotalActiveStake does not necessarily equal individualPoolStake, which is used for leader election:

and the ledger peer info we give to Network
So I think it makes sense to either ignore them here or make this explicit in the output somehow.

I looked at all epochs until 544, and so far, we apparently didn't have a difference here, no idea how likely it is for a difference to arise in the future.

See IntersectMBO/cardano-ledger#4324 (comment), cc @lehins for confirmation.

Copy link
Contributor

@lehins lehins Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK pdTotalActiveStake and individualTotalPoolStake are only used for Conway voting stuff

That statement is not quite correct.

IndividualPoolStake that is extracted from PoolDistr as it is done in consensus, will not have any voting related stake within it. However, the version of this in DRep pulser will have proposal deposits added to that stake. The issue was that the haddock for individualTotalPoolStake should not have stated anything about proposal deposits.

I looked at all epochs until 544, and so far, we apparently didn't have a difference here, no idea how likely it is for a difference to arise in the future.

So, individualTotalPoolStake / pdTotalActiveStake should always equal individualPoolStake for leader election, unless you are looking into the version from pulser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is good to hear! I guess then the remaining mentions of proposal deposits here and here should also be removed?

]
where
mp = SL.unPoolDistr pd

{-------------------------------------------------------------------------------
Analysis: show block and slot number and hash for all blocks
Expand Down Expand Up @@ -863,6 +874,40 @@ reproMempoolForge numBlks env = do
-- this flushes blk from the mempool, since every tx in it is now on the chain
void $ Mempool.syncWithLedger mempool

{-------------------------------------------------------------------------------
Analysis: print out the stake distibution for each epoch
-------------------------------------------------------------------------------}

dumpStakeDistributions ::
forall blk.
( HasAnalysis blk,
LedgerSupportsProtocol blk
) =>
Analysis blk StartFromLedgerState
dumpStakeDistributions env = do
void $ processAll db registry GetBlock startFrom limit (initLedger, Nothing) process
pure Nothing
where
AnalysisEnv {db, cfg, limit, registry, startFrom, tracer} = env

FromLedgerState initLedger = startFrom

process
:: (ExtLedgerState blk, Maybe EpochNo)
-> blk
-> IO (ExtLedgerState blk, Maybe EpochNo)
process (oldLedger, mbEpoch) blk = do
let lcfg = ExtLedgerCfg cfg
newLedger = tickThenReapply lcfg blk oldLedger
lst = ledgerState newLedger

(,) newLedger <$> case HasAnalysis.epochPoolDistr lst of
Just (epoch, pd)
| mbEpoch /= Just epoch ->
Just epoch <$ traceWith tracer (DumpStakeDistribution epoch pd)

_ -> pure mbEpoch

{-------------------------------------------------------------------------------
Auxiliary: processing all blocks in the DB
-------------------------------------------------------------------------------}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ instance HasAnalysis ByronBlock where
-- metrics for the Byron era only.
blockApplicationMetrics = []

epochPoolDistr _lst = Nothing

instance HasProtocolInfo ByronBlock where
data Args ByronBlock =
ByronBlockArgs {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,16 @@ analyseBlock f =
p :: Proxy HasAnalysis
p = Proxy

analyseLedgerState ::
(forall blk. HasAnalysis blk => LedgerState blk -> a)
-> LedgerState (CardanoBlock StandardCrypto) -> a
analyseLedgerState f =
hcollapse
. hcmap (Proxy @HasAnalysis) (K . f . currentState)
. Telescope.tip
. getHardForkState
. hardForkLedgerStatePerEra

-- | Lift a function polymorphic over all block types supporting `HasAnalysis`
-- into a corresponding function over `CardanoBlock.`
analyseWithLedgerState ::
Expand Down Expand Up @@ -299,6 +309,8 @@ instance (HasAnnTip (CardanoBlock StandardCrypto), GetPrevHash (CardanoBlock Sta
)
]

epochPoolDistr = analyseLedgerState epochPoolDistr

dispatch ::
LedgerState (CardanoBlock StandardCrypto)
-> (LedgerState ByronBlock -> IO Builder)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}

{-# OPTIONS_GHC -Wno-orphans #-}
Expand Down Expand Up @@ -49,11 +50,13 @@ import qualified Ouroboros.Consensus.Shelley.Ledger.Block as Shelley
import Ouroboros.Consensus.Shelley.Node (Nonce (..),
ProtocolParamsShelleyBased (..), ShelleyGenesis,
protocolInfoShelley)
import Ouroboros.Consensus.Shelley.Protocol.Abstract (ProtoCrypto)
import Text.Builder (decimal)

-- | Usable for each Shelley-based era
instance ( ShelleyCompatible proto era
, PerEraAnalysis era
, ProtoCrypto proto ~ StandardCrypto
) => HasAnalysis (ShelleyBlock proto era) where

countTxOutputs blk = case Shelley.shelleyBlockRaw blk of
Expand Down Expand Up @@ -103,6 +106,13 @@ instance ( ShelleyCompatible proto era
-- metrics for Shelley-only eras.
blockApplicationMetrics = []

epochPoolDistr lst =
Just (SL.nesEL nes, SL.nesPd nes)
where
nes = shelleyLedgerState lst

-----

class PerEraAnalysis era where
txExUnitsSteps :: Maybe (Core.Tx era -> Word64)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ module Cardano.Tools.DBAnalyser.HasAnalysis (
, WithLedgerState (..)
) where

import Cardano.Ledger.Crypto (StandardCrypto)
import Cardano.Ledger.Shelley.API (PoolDistr)
import Data.Map.Strict (Map)
import Ouroboros.Consensus.Block
import Ouroboros.Consensus.HeaderValidation (HasAnnTip (..))
Expand Down Expand Up @@ -58,6 +60,16 @@ class (HasAnnTip blk, GetPrevHash blk, Condense (HeaderHash blk)) => HasAnalysis
-- the IO monad.
blockApplicationMetrics :: [(Builder, WithLedgerState blk -> IO Builder)]

-- | The epoch number of the block's slot, and the stake distribution used
-- for the leader schedule of that epoch
--
-- This pool distribution should match 'protocolLedgerView', for example.
--
-- It should return 'Nothing' if and only if the block is in the Byron era.
epochPoolDistr ::
LedgerState blk
-> Maybe (EpochNo, PoolDistr StandardCrypto)

class HasProtocolInfo blk where
data Args blk
mkProtocolInfo :: Args blk -> IO (ProtocolInfo blk)
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ data AnalysisName =
-- The metrics will be written to the provided file path, or to
-- the standard output if no file path is specified.
| GetBlockApplicationMetrics NumberOfBlocks (Maybe FilePath)
| DumpStakeDistributions
deriving Show

data AnalysisResult =
Expand Down
7 changes: 7 additions & 0 deletions scripts/genesis-stake-drift-analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
The `scrutinize-stake-drift.sh` bash script postprocesses the output of the `db-analyser --dump-stake-distributions` pass.

It yields several temporary files in the local directory, so run it in a temporary folder.

The script prints out a table that indicate how much stake the pools that have been in the top 90% of every epoch in the data had in each epoch.

The script also prints out a counterfactual table that pretends each of those pools' least-stake epochs were coincident, where that minimum iterates over every suffix of the list of epochs.
48 changes: 48 additions & 0 deletions scripts/genesis-stake-drift-analysis/scrutinize-stake-drift.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# For example:
#
# $ db-analyser ... --dump-stake-distributions >foo.txt
# $ bash scrutinize-stake-drift.sh foo.txt

db_analyser_output_file=$1

echo "# the PoolDistrs in tabular form >tidied.txt"
cat "${db_analyser_output_file}" | tr '(,){=}"' ' ' | sed 's/KeyHash/\n/g' | awk -f tidy.awk >tidied.txt

firstEpoch=$(head -n1 tidied.txt | awk '{print $1}')
lastEpoch=$(tail -n1 tidied.txt | awk '{print $1}')
nepochs=$(expr $lastEpoch - $firstEpoch + 1)

echo "# discard pools outside of the 90% in each epoch >big.txt"
cat tidied.txt | sort -k1,1n -k5,5gr | awk '(eno != $1) { eno = $1; acc = 0 } (acc < 0.9) { acc = acc + $5; print $0 }' >big.txt
# cp tidied.txt big.txt # uncomment this command to use pools that were in all epochs regardless of their relative stake

echo "# for how many epochs was each pool in the top 90% >epochs.txt"
cat big.txt | awk '{print $4}' | sort | uniq -c >epochs.txt

echo "# histogram of epochs.txt"
cat epochs.txt | awk '{print $1}' | sort -n | uniq -c

echo "# big.txt sorted by pool and then by epoch >sorted.txt"
cat big.txt | sort -k4,4 -k1,1n >sorted.txt

echo "# restrict sorted.txt to the pools that are in all $nepochs epochs >steady.txt"
join -1 2 -2 4 <(grep -w -e $nepochs epochs.txt) sorted.txt >steady.txt

echo "# wc -l"
wc -l tidied.txt epochs.txt sorted.txt steady.txt

echo "# head -n5"
head -n5 tidied.txt epochs.txt sorted.txt steady.txt

echo "# cumulative stake per epoch within steady.txt"
cat steady.txt | awk '{x[$3] = x[$3] + $6} END { acc = 1/0; for (k in x) { if (acc > x[k]) { kacc = k; acc = x[k] }; print k, x[k] }; print " Min is ", kacc, acc }' | sort -n

echo "# the statistical distance between each epoch and epoch $lastEpoch"
echo "# "
echo "# see https://en.wikipedia.org/wiki/Statistical_distance#Statistically_close"
cat steady.txt | awk -v eno=$lastEpoch '(eno == $3) { print $0 }' >tmpfile-lastEpoch
for i in $(seq $firstEpoch $lastEpoch); do
cat steady.txt | awk -v eno=$i '(eno == $3) { print $0 }' >tmpfile-$i

paste tmpfile-lastEpoch tmpfile-$i | awk -v eno=$i '($6 > $12) { x = x + ($6 - $12) } ($6 < $12) { x = x + ($12 - $6) } END { printf("%i %.3f\n", eno, (x / 2)) }'
done
10 changes: 10 additions & 0 deletions scripts/genesis-stake-drift-analysis/tidy.awk
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
/EpochNo/ {
eno = $3;
n = $7;
i = 0;
}

(/%/) {
print eno, i, n, $1, $5 / $7;
i++;
}
Loading