Skip to content

fix: remove db access per validation for fee data #328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 3, 2025

Conversation

pdp2121
Copy link
Collaborator

@pdp2121 pdp2121 commented Apr 3, 2025

High Level Overview of Change

The validator table in database is being queried every time there's a new validation to retrieve fee data, which caused this error:
KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
We can instead cache this validator-network data and update it every hour to reduce number of queries.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Tests (You added tests for code that already exists, or your new feature included in this PR)
  • Documentation Updates
  • Release

@pdp2121 pdp2121 requested a review from ckeshava April 3, 2025 13:18
@@ -115,22 +111,17 @@ export async function handleWsMessageSubscribeTypes(
networks: string | undefined,
network_fee: Map<string, FeeVote>,
ws: WebSocket,
validationNetworkDb: Map<string, string>,
): Promise<void> {
if (data.type === 'validationReceived') {
const validationData = data as ValidationRaw
if (ledger_hashes.includes(validationData.ledger_hash)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ledger_hashes acts as a "cache" when VHS receives validations relevant to a network. We only need to perform the database query when we receive the "validation" before the "ledger". (Check the handling of ledgerStream below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ledger_hashes is maintained in-memory for every ws-connection. See the implementation of setHandlers() method.

Copy link
Contributor

@Patel-Raj Patel-Raj Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on the networks field passed to export async function handleWsMessageSubscribeTypes( to compute the fees? It would completely eliminate need to store the validationNetworkDb in-memory map and database access. In which case that networks fields would be inconsistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are picking up the ws_urls to open up websocket connection from crawls table. And this piece of code makes sure that at least some network is set. So, I wanted to know if we can remove the networks check in the first place and assume that it would always be defined.


const promises = []
  const networks = await getNetworks()
  for (const network of networks) {
    const crawler = new Crawler()
    promises.push(crawler.crawl(network))
  }

Also, we can pick only those ws_urls where networks is not null from the existing database:

const nodes = await query('crawls')
    .select(['ip', 'ws_url', 'networks'])
    .whereNotNull('ip')
    .andWhere('start', '>', tenMinutesAgo)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Patel-Raj

Can we rely on the networks field passed to export async function handleWsMessageSubscribeTypes( to compute the fees? It would completely eliminate need to store the validationNetworkDb in-memory map and database access. In which case that networks fields would be inconsistent?

Yes, this is the current fall-back mechanism of the code. This is the else condition if the database does not have the network entry.

We are picking up the ws_urls to open up websocket connection from crawls table. And this piece of code makes sure that at least some network is set. So, I wanted to know if we can remove the networks check in the first place and assume that it would always be defined.

Are you suggesting that we use the crawls table to fetch this network information? We can use it, however it only changes the database read access from validators to the crawls table. It doesn't reduce the load on the db per se.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to fallback? Can't we just use the networks passed in the handleWsMessageSubscribeTypes method as source of truth?

It is possible for one rippled validator to be configured for multiple networks, for instance main and dev. Look at this comment for more info.

In such cases, I suspect one websocket connection could serve ledgers, validations pertaining to multiple networks. That is the only reason why we cannot use the networks input parameter to this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, #329 will not go to fetch the network from database, it could remain undefined.

The networks value is not undefined. We are using the validation to read the networks value with a fall-back on the function input parameter.

I have identified one scenario where this code does not work correctly, however I don't know of many such cases. I believe its a worthy optimization to get the VHS back on track.

Copy link
Contributor

@Patel-Raj Patel-Raj Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation gets its network from the function input parameter.


if (ledger_hashes.includes(validationData.ledger_hash)) {
      validationData.networks = networks
    }

It does not implicitly have network field. It would either remain undefined or get it from the function parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the original interface:

interface ValidationRaw {
  flags: number
  full: boolean
  ledger_hash: string
  ledger_index: string
  master_key: string
  signature: string
  signing_time: number
  type: string
  validation_public_key: string
  server_version?: string

  networks?: string

  amendments?: string[]
  base_fee?: number
  reserve_base?: number
  reserve_inc?: number
  ledger_fee?: FeeVote

  // The validation_public_key is the same as the signing_key in StreamManifest
}

validationData does implicitly contain an optional network field. If this ledger_hash hasn't been encountered in recent hashes (as evidenced by the if-condition), then we are not using the networks function parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Patel-Raj we cannot use the networks field passed in to populate the networks for validations since sometimes a node from one network can receive validation message on different networks. That was why we had to use past 10 ledger hashes as source of truth

Copy link
Contributor

@ckeshava ckeshava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have my concerns about this design. This introduces complexity and tech-debt into the codebase due to the poor performance of the database.

Do we have numbers on how many accesses are being performed on the validators table?

If the db performance is the bottleneck, we will run into a similar problem with the crawls table too, because it is read and written into very frequently. It is not sustainable for us to build in-memory-cache for each of these "hot" tables.

I prefer to explore the alternatives to the current db.

Comment on lines +250 to +251
validationNetworkDb.clear()
await getValidationNetworkDb()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The createConnections method is executed every CM_INTERVAL (1 hour). I'm concerned about the staleness of the data in the validationNetworkDb in-memory map.

New entries into the validator table are inserted in the saveValidators method. The saveValidator method is invoked upon receiving every new validation.

This is a hypothetical scenario: When we start the VHS connection service, we are not aware of many validators. However, we populate the validationNetworkDb in-memory map and it will not be updated for the next hour.

In the meantime, we can receive validations from new validators which are not present in our validationNetworkDb map. In the current code, we will mis-place these validations into the wrong network.

We can remedy this situation by falling-back to do a fresh db access, if we don't have any entry in validationNetworkDb in-memory map. Alternatively, we can run the createConnections method much more frequently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a fallback to database if we don't find in cache.

Copy link
Collaborator Author

@pdp2121 pdp2121 Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validationNetworkDb map is only used to fetch the network default fee voting data in case the current validation ledger_hash is not seen in the previous 10 hashes. It is not being used as source of truth for networks

@Patel-Raj Patel-Raj force-pushed the reduce-db-access-per-validation branch from 3226dca to 4ffb62b Compare April 3, 2025 21:47
Copy link
Contributor

@ckeshava ckeshava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network_fee/ledger_fee column in the validators table might be stale for at most 1 hour. However, that will be rectified in the next update of the validators table.

@Patel-Raj Patel-Raj merged commit 8f65f24 into main Apr 3, 2025
4 checks passed
@Patel-Raj Patel-Raj deleted the reduce-db-access-per-validation branch April 3, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants