Skip to content

Expose WebSocket connection count as prometheus metrics and some refactors #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ DB_HOST=
DB_USER=
DB_DATABASE=
DB_PASSWORD=
ACQUIRE_CONNECTION_TIMEOUT=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this to be env variable instead of a constant?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remain consistent with other database configs and having it as env variable enables us to change it without changing the code and have a different value for staging, dev etc.


# Mainnet entry point for crawler (without https://).
MAINNET_P2P_ENTRY=
Expand Down
16 changes: 14 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ There are 3 folders in `src`, corresponding to the 3 processes that the VHS runs
* `/`: Information about the endpoints.
* `v1`
* `/health`: A health check for the VHS. Returns the number of nodes that it is connected to.
* `/metrics/:network`: A health check for the VHS. Returns the number of nodes that it is connected to for a particular network in prometheus exposition format.
* `/networks`: Returns the list of all networks on VHS database.
* `/network/validator_reports`: Returns scores for the nodes that it has crawled in the last day.
* `/network/topology`: Returns information about all the nodes that the crawler has crawled in the last hour.
Expand Down Expand Up @@ -42,8 +43,6 @@ This table keeps track of the nodes in the network, which it finds via crawling
| `complete_shards` |The [history shards](https://xrpl.org/history-sharding.html) the node keeps track of.|
| `ip` |The IP address of the node. |
| `port` |The peer port of the node. |
| `ws_url` |The WS URL of the node. Optional. |
| `connected` |This appears to be false for every node. |
| `networks` |The network(s) that the node belongs to. |
| `type` |Whether the TCP connection to the peer is incoming or outgoing. |
| `uptime` |The uptime of the node. |
Expand Down Expand Up @@ -186,5 +185,18 @@ This table keeps track of the validators on the networks.
| `agreement_24hour` |Data about the reliability of the validator over the last 24 hours.|
| `agreement_30day` |Data about the reliability of the validator over the 30 days. |


### `connection_health`

This table keeps track of the WebSocket connection status for all networks.

| Key | Definition |
|----------------------|-------------------------------------------------------------------|
| `ws_url` |The connection websocket url. |
| `public_key ` |The public key of the node. |
| `network` |The network that the node belongs to. |
| `connected` |Boolean denoting websocket connection status. |
| `status_update_time` |Time when the connected column was updated. |

*Partial validations are not meant to vote for any particular ledger. A partial validation indicates that the validator is still online but not keeping up with consensus.
**A chain is a group of validators validating the same set of ledgers. `main`, `test`, and `dev` represent the validated versions of mainnet, testnet, and devnet respectively. Validators on a fork/validating an alternate version of the ledger will have a different value, usually of the form `chain.[num]`.
30 changes: 28 additions & 2 deletions src/api/routes/v1/health.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,37 @@ export default async function handleHealth(
res: Response,
): Promise<void> {
try {
const count = (await query('crawls')
.countDistinct('ip')
const count = (await query('connection_health')
.count('ws_url')
.where('connected', '=', true)) as Array<{ [key: string]: number }>
res.status(200).send(count[0])
} catch {
res.send({ result: 'error', message: 'internal error' })
}
}

/**
* Handles monitoring metrics requests.
*
* @param req - HTTP request object.
* @param res - Response containing number of connected nodes in Prometheus exposition format.
*/
export async function handleWebSocketHealthMetrics(
req: Request,
res: Response,
): Promise<void> {
try {
const { network } = req.params
const result = (await query('connection_health')
.count('ws_url')
.where('network', '=', network)
.andWhere('connected', '=', true)) as Array<{ [key: string]: number }>

const metrics = `connected_nodes{network="${network}"} ${result[0].count}`
res.set('Content-Type', 'text/plain')
res.status(200)
res.send(metrics)
} catch {
res.send({ result: 'error', message: 'internal error' })
}
}
3 changes: 2 additions & 1 deletion src/api/routes/v1/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import {
} from './amendments'
import handleDailyScores from './daily-report'
import getNetworkOrAdd from './get-network'
import handleHealth from './health'
import handleHealth, { handleWebSocketHealthMetrics } from './health'
import handleValidatorManifest from './manifests'
import handleNetworks from './networks'
import { handleNode, handleNodes, handleTopology } from './nodes'
Expand All @@ -18,6 +18,7 @@ import handleValidatorReport from './validator-report'
const api = createRouter()

api.use('/health', handleHealth)
api.use('/metrics/:network', handleWebSocketHealthMetrics)
api.use('/network/validator_reports', handleDailyScores)
api.use('/network/amendment/info/:param', handleAmendmentInfo)
api.use('/network/amendments/info', handleAmendmentsInfo)
Expand Down
11 changes: 11 additions & 0 deletions src/api/routes/v1/info.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,17 @@ const info = {
example:
'https://data.xrpl.org/v1/network/amendments/vote/{network}/{identifier}',
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is for internal use I don't think we need to expose it in info

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckeshava If you don't feel strongly about having this and /health, I can remove it from here and ARCHITECTURE.md.

@pdp2121 Do we remove /health endpoint from ARCHITECTURE.md as well? Since, its been there from beginning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if its for internal use, I feel a good documentation is helpful for future development.

@pdp2121 Are there any concerns about privacy/security? Are there are disadvantages to exposing it?

action: 'Get total number of connected rippled nodes.',
route: '/v1/health',
example: 'https://data.xrpl.org/v1/health',
},
{
action:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

'Get total number of connected rippled nodes for a particular network in prometheus exposition format.',
route: '/v1/metrics/{network}',
example: 'https://data.xrpl.org/v1/metrics/{network}',
},
],
}

Expand Down
Loading