Skip to content

Expose WebSocket connection count as prometheus metrics and some refactors #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ DB_HOST=
DB_USER=
DB_DATABASE=
DB_PASSWORD=
ACQUIRE_CONNECTION_TIMEOUT=

# Mainnet entry point for crawler (without https://).
MAINNET_P2P_ENTRY=
Expand Down
16 changes: 14 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ There are 3 folders in `src`, corresponding to the 3 processes that the VHS runs
* `/`: Information about the endpoints.
* `v1`
* `/health`: A health check for the VHS. Returns the number of nodes that it is connected to.
* `/metrics`: A health check for the VHS. Returns the number of connected nodes for each network in prometheus exposition format.
Comment on lines 14 to +15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come we are keeping both endpoints? It seems like the new metrics endpoint is a more detailed version of the same info the health endpoint has

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new metrics endpoint returns data in prometheus exposition format and the /health returns data in JSON format. Since, it was public endpoint users of VHS might be consuming it. No harm in keeping both.

* `/networks`: Returns the list of all networks on VHS database.
* `/network/validator_reports`: Returns scores for the nodes that it has crawled in the last day.
* `/network/topology`: Returns information about all the nodes that the crawler has crawled in the last hour.
Expand Down Expand Up @@ -42,8 +43,6 @@ This table keeps track of the nodes in the network, which it finds via crawling
| `complete_shards` |The [history shards](https://xrpl.org/history-sharding.html) the node keeps track of.|
| `ip` |The IP address of the node. |
| `port` |The peer port of the node. |
| `ws_url` |The WS URL of the node. Optional. |
| `connected` |This appears to be false for every node. |
| `networks` |The network(s) that the node belongs to. |
| `type` |Whether the TCP connection to the peer is incoming or outgoing. |
| `uptime` |The uptime of the node. |
Expand Down Expand Up @@ -186,5 +185,18 @@ This table keeps track of the validators on the networks.
| `agreement_24hour` |Data about the reliability of the validator over the last 24 hours.|
| `agreement_30day` |Data about the reliability of the validator over the 30 days. |


### `connection_health`

This table keeps track of the WebSocket connection status for all networks.

| Key | Definition |
|----------------------|-------------------------------------------------------------------|
| `ws_url` |The connection websocket url. |
| `public_key ` |The public key of the node. |
| `network` |The network that the node belongs to. |
| `connected` |Boolean denoting websocket connection status. |
| `status_update_time` |Time when the connected column was updated. |

*Partial validations are not meant to vote for any particular ledger. A partial validation indicates that the validator is still online but not keeping up with consensus.
**A chain is a group of validators validating the same set of ledgers. `main`, `test`, and `dev` represent the validated versions of mainnet, testnet, and devnet respectively. Validators on a fork/validating an alternate version of the ledger will have a different value, usually of the form `chain.[num]`.
33 changes: 31 additions & 2 deletions src/api/routes/v1/health.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,40 @@ export default async function handleHealth(
res: Response,
): Promise<void> {
try {
const count = (await query('crawls')
.countDistinct('ip')
const count = (await query('connection_health')
.count('ws_url')
.where('connected', '=', true)) as Array<{ [key: string]: number }>
res.status(200).send(count[0])
} catch {
res.send({ result: 'error', message: 'internal error' })
}
}

/**
* Handles monitoring metrics requests.
*
* @param _req - HTTP request object.
* @param res - Response containing number of connected nodes in Prometheus exposition format.
*/
export async function handleMonitoringMetrics(
_req: Request,
res: Response,
): Promise<void> {
try {
const result = (await query('connection_health')
.select('network')
.count('* as count')
.where('connected', '=', true)
.groupBy('network')) as Array<{ network: string; count: number }>

const metrics = result
.map((row) => `connected_nodes{network="${row.network}"} ${row.count}`)
.join('\n')

res.set('Content-Type', 'text/plain')
res.status(200)
res.send(metrics)
} catch {
res.send({ result: 'error', message: 'internal error' })
}
}
3 changes: 2 additions & 1 deletion src/api/routes/v1/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import {
} from './amendments'
import handleDailyScores from './daily-report'
import getNetworkOrAdd from './get-network'
import handleHealth from './health'
import handleHealth, { handleMonitoringMetrics } from './health'
import handleValidatorManifest from './manifests'
import handleNetworks from './networks'
import { handleNode, handleNodes, handleTopology } from './nodes'
Expand All @@ -18,6 +18,7 @@ import handleValidatorReport from './validator-report'
const api = createRouter()

api.use('/health', handleHealth)
api.use('/metrics', handleMonitoringMetrics)
api.use('/network/validator_reports', handleDailyScores)
api.use('/network/amendment/info/:param', handleAmendmentInfo)
api.use('/network/amendments/info', handleAmendmentsInfo)
Expand Down
11 changes: 11 additions & 0 deletions src/api/routes/v1/info.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,17 @@ const info = {
example:
'https://data.xrpl.org/v1/network/amendments/vote/{network}/{identifier}',
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is for internal use I don't think we need to expose it in info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckeshava If you don't feel strongly about having this and /health, I can remove it from here and ARCHITECTURE.md.

@pdp2121 Do we remove /health endpoint from ARCHITECTURE.md as well? Since, its been there from beginning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if its for internal use, I feel a good documentation is helpful for future development.

@pdp2121 Are there any concerns about privacy/security? Are there are disadvantages to exposing it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we don't show it in the info when users hit https://data.xrpl.org, but still keep it within architecture document

action: 'Get total number of connected rippled nodes.',
route: '/v1/health',
example: 'https://data.xrpl.org/v1/health',
},
{
action:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

'Get total number of connected rippled nodes for each network in prometheus exposition format.',
route: '/v1/metrics',
example: 'https://data.xrpl.org/v1/metrics',
},
],
}

Expand Down
Loading