Skip to content

Streaming Transition interface #5549

@lehins

Description

@lehins

TLDR: Support data injection from genesis files using a streaming framework, rather than relying on lazy IO.

For the purposes of testing and benchmarking we provide functionality to be able to start in an arbitrary era with some initial data populated in the ledger state through the genesis files.

This injection of data for benchmarking is really big in size. Current solution of using lazy lists (namely ListMap) during decoding of those large Genesis files proved to be very fragile and has led to some serious space leaks, causing issues for performance and tracing team. It has been agreed upon to adjust this injection mechanism in such a way that instead of decoding big json file lazily, genesis files will provide a filename for the large injection data, instead of embedding this data directly into the genesis files.
Here are currently all of the occurrences of injected data that can be large:

Each of these need to transition to:

  • a consistent interface that is specified in "extraConfig" json field, like it has recently been done for Alonzo genesis in Add ability to inject any cost models via AlonzoGenesis #5379
  • streaming data from a flat json file with hash computation (hash algorithm implementation need to be chosen that supports streaming data, I know sha256 definitely supports it). Preferably streaming library should be used, since that is what already being used in some other project in cardano-node, however, if there is insufficient support for streaming aeson data, then conduit-aeson can be used instead
  • support embedded data instead of a streaming from a file, since people also use this interface for testing with small payloads
  • support the old fields, until the new one has been fully integrated and adopted. If both old fields and extraConfig are provided this should be an error

First part of this ticket is to actually design the interface by using injection of UTxO and the rest of the fields will follow in a subsequent PR.

I imagine Haskell types that look something like this:

data InjectionData k v
  = InjectFromFile !FilePath !Hash
  | EmbeddedData (ListMap k v)

data ShelleyExtraConfig = ShelleyExtraConfig
  { secInitialFunds :: InjectionData Addr Coin
  , secStakePools :: InjectionData (KeyHash StakePool) StakePoolParams
  , secStakeCredentials :: InjectionData (KeyHash Staking) (KeyHash StakePool)
  }

This is how initially we can approach this by changing the transition interface to accept an action that allows reading a file:
https://github.com/IntersectMBO/cardano-ledger/blob/e04cde449e9f0dcf38d4bc822cc028ff8fedac4a/eras/shelley/impl/src/Cardano/Ledger/Shelley/Transition.hs#L119C3-L126

  injectIntoTestState ::
    MonadFail m =>
    (forall a. FilePath -> (Handle -> m a) -> m a) ->
    -- ^ File reading action
    TransitionConfig era ->
    NewEpochState era ->
    m (NewEpochState era)

We can later polish the interface a bit more in order to support direct streaming into LedgerHD, but for now this should suffice.

Metadata

Metadata

Assignees

Labels

💳 technical-debtIssues related to technical debt we introduced

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions