Skip to content

sqldb+graph/db: add node related tables and implement some node CRUD #9824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: elle-graphSQL8-prep
Choose a base branch
from

Conversation

ellemouton
Copy link
Collaborator

@ellemouton ellemouton commented May 19, 2025

Builds on top of #9853

In this PR, the following schemas are defined for storing graph nodes:

(NOTE: the block_height column is only shown as an example to show how v2 data might fit into this schema. It is not included in the initial schema definition).

nodes (4)

  • nodes: stores the main info about the node from a node_announcement. Note that we might sometimes insert a "shell node" for when we get a channel announcement that points to a node that we dont yet have a record for. In this case, we will only have the public key of the node along with the gossip version it was advertised on. So the version:pub_key pair is always unique.
  • node_addresses: normalised node addresses.
  • node_extra_types: stores the normalised TLV records of a node_announcement for any fields that we dont explicitly store in the nodes table.
  • node_features: stores the normalised features advertised in a node announcement.
  • source_nodes: we store a pointer to an entry in the nodes table to indicate which node belongs to this node. NOTE: there will be an entry here per gossip protocol that we are aware of.

The following methods are implemented on the SQLStore:

  • AddLightningNode
  • FetchLightningNode
  • HasLightningNode
  • AddrsForNode
  • DeleteLightningNode
  • FetchNodeFeatures
  • LookupAlias
  • NodeUpdatesInHorizon
  • SourceNode
  • SetSourceNode

This then lets us run a number of existing unit tests against the new SQLStore backend. Namely:

  • TestNodeInsertionAndDeletion
  • TestLightningNodePersistence
  • TestAliasLookup
  • TestNodeUpdatesInHorizon
  • TestSourceNode

Part of #9795

@ellemouton ellemouton added this to the v0.20.0 milestone May 19, 2025
@ellemouton ellemouton self-assigned this May 19, 2025
Copy link
Contributor

coderabbitai bot commented May 19, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ellemouton ellemouton added graph database Related to the database/storage of LND labels May 19, 2025
@ellemouton ellemouton force-pushed the graphSQL7-nodes-tables branch from eac15ac to eae4d81 Compare May 19, 2025 12:21
@saubyk saubyk added this to lnd v0.20 May 19, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 May 19, 2025
@ellemouton ellemouton force-pushed the graphSQL7-nodes-tables branch from eae4d81 to 2802707 Compare May 19, 2025 18:49
@ellemouton ellemouton added the sql label May 19, 2025
@ellemouton ellemouton requested review from bhandras and guggero May 20, 2025 03:57
// information.
//
// NOTE: part of the V1Store interface.
func (s *SQLStore) AddLightningNode(node *models.LightningNode,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently working on a benchmark to compare this to the performance of the kvdb version which uses the batch schedular. We may end up needing something similar here to batch these txs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this method will be updated to make use of the timed scheduler once this PR is merged: #9845

order of operations: either:

  1. merge 9845, then update this PR
  2. merge this, then 9845, then new PR to update this to use that.

All depends on what gets merged first

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!
Did a first pass, will probably need a second one since it's quite a chunk of code. But looks great so far.

node_id BIGINT NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,

-- The id of the feature that this node has.
feature_id BIGINT NOT NULL REFERENCES features(id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, similar to today's offline discussion: What's the advantage of having the feature bits defined in a separate table? Since we don't store more information with each bit that would be de-duplicated. IMO this can just be the bit itself.
I know @Roasbeef had different opinions on such enums (or generally numbers that correspond to Golang constants). So I think we should settle this discussion once and then do the same everywhere.

Copy link
Collaborator Author

@ellemouton ellemouton May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this can just be the bit itself.

unlike the discussion offline today though, this is an array of feature bits. So i think the idea was if you normalise this out, then you can do "Which nodes advertise feature x" which is hard to do if you keep it in the array form

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not talking about removing the node_features table, but just the features. The "array" is the node_features, which is good. But the features table just maps a database ID to a numeric feature number. So we can just inline the feature number here, in the node_features table (and save quite a bit of space while doing that as well).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha 👍

cool - will update as suggested

feature_id BIGINT NOT NULL REFERENCES features(id)
);
CREATE INDEX IF NOT EXISTS node_feature_node_id_idx ON node_features(node_id);
CREATE UNIQUE INDEX IF NOT EXISTS node_features_unique ON node_features (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unique index will create a normal index. So we don't need the first one, as the second one also includes that same field (node_id).

CREATE UNIQUE INDEX IF NOT EXISTS node_addresses_unique ON node_addresses (
node_id, type, position
);
CREATE INDEX IF NOT EXISTS node_addresses_node_id_idx ON node_addresses(node_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re duplicate index for node_id field.


-- The gossip protocols are distinct. So often we will only want to
-- query for nodes that are gossiped on a specific protocol version.
CREATE INDEX IF NOT EXISTS nodes_version_idx ON nodes(version);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index not needed since we have the unique one (see other comments).

-- announcement for a channel that is connected to a node that we
-- have not yet received a node announcement for.
signature BLOB
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add some checks to this table for extra validation?
For example (CHECK ((version = 0 OR VERSION =1) AND length(pub_key) == 33)?
I think the downside is that we can't have named checks. So if we ever wanted to add checks or change them, we'd need to drop the table and re-create it, which is a bit tedious (especially with multiple foreign keys).

So I'm leaning toward no, but let's discuss.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think for version at least we should not be strict since we expect future versions and so would need to change the checks.

Even for pubkey - im hesitant to add the length check just given that there might be a version in which we start advertising schnorr (x-only) pub keys which are 32 bytes

)
ON CONFLICT (pub_key, version)
DO UPDATE SET
alias = EXCLUDED.alias,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these ON CONFLICT ... DO UPDATE statements are a bit hard to read sometimes, I made it a habit of adding a comment above what the expectation is. For example, the above on the feature bit is a no-OP because the bit field is part of the unique index.
But here we update the fields that aren't covered by the unique index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yeah - good idea - will add!

-- name: GetNodeFeaturesByPubKey :many
SELECT f.bit
FROM nodes n
JOIN node_features nf ON nf.node_id = n.id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation of JOIN doesn't match above statement.

}

if node.HaveNodeAnnouncement {
params.LastUpdate = sql.NullInt64{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use sqldb.SQLInt64(), SQLStr() and SQLTime().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh cool! TIL - thanks.

just a quick check: i've opted to have LastUpdate as an int64 in the DB since it is the unix timestamp advertised in the node announcement so we want to be sure it is not changed somehow by timezone etc.

just want to check if reviewers are ok with that part?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I assume precision down to one second is enough here? Otherwise we'd need to use UnixNano() in which case we should make it very clear in the documentation that it's nanoseconds and not seconds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is advertised in seconds in the announcement 👍

-- name: GetNodesByLastUpdateRange :many
SELECT *
FROM nodes
WHERE last_update >= sqlc.arg(start_time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use @start_time and @end_time.

@ellemouton ellemouton force-pushed the graphSQL7-nodes-tables branch from 2802707 to 0e86ef1 Compare May 22, 2025 09:25
Copy link
Collaborator Author

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks @guggero!

as noted offline, i'll continue to iterate on the node_addrs schema to see if we can improve update efficiency there. I'll work on this in parallel with the upcoming PRs - the migrations are not available in prod build so should not be an issue.

depending on what gets approved first, i'll also update the AddLightningNode here to use the batcher: #9845

-- announcement for a channel that is connected to a node that we
-- have not yet received a node announcement for.
signature BLOB
);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think for version at least we should not be strict since we expect future versions and so would need to change the checks.

Even for pubkey - im hesitant to add the length check just given that there might be a version in which we start advertising schnorr (x-only) pub keys which are 32 bytes

node_id BIGINT NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,

-- The id of the feature that this node has.
feature_id BIGINT NOT NULL REFERENCES features(id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha 👍

cool - will update as suggested

// information.
//
// NOTE: part of the V1Store interface.
func (s *SQLStore) AddLightningNode(node *models.LightningNode,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this method will be updated to make use of the timed scheduler once this PR is merged: #9845

order of operations: either:

  1. merge 9845, then update this PR
  2. merge this, then 9845, then new PR to update this to use that.

All depends on what gets merged first

}

if node.HaveNodeAnnouncement {
params.LastUpdate = sql.NullInt64{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh cool! TIL - thanks.

just a quick check: i've opted to have LastUpdate as an int64 in the DB since it is the unix timestamp advertised in the node announcement so we want to be sure it is not changed somehow by timezone etc.

just want to check if reviewers are ok with that part?

)
ON CONFLICT (pub_key, version)
DO UPDATE SET
alias = EXCLUDED.alias,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yeah - good idea - will add!


CREATE TABLE IF NOT EXISTS source_nodes (
node_id BIGINT PRIMARY KEY REFERENCES nodes (id) ON DELETE CASCADE
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt that only the case if we want this to be auto incrementing? we defs dont want this to auto increment.


CREATE TABLE IF NOT EXISTS source_nodes (
node_id BIGINT PRIMARY KEY REFERENCES nodes (id) ON DELETE CASCADE
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason i added PRIMARY KEY is just to make sure this is unique - but I guess could also remove that and just use UNIQUE to avoid confusion?

@ellemouton ellemouton force-pushed the graphSQL7-nodes-tables branch from 0e86ef1 to f89c2f2 Compare May 22, 2025 09:30
@ellemouton ellemouton force-pushed the graphSQL7-nodes-tables branch from f89c2f2 to 32d4287 Compare May 22, 2025 14:40
@ellemouton ellemouton removed the request for review from bhandras May 22, 2025 14:41
@ellemouton
Copy link
Collaborator Author

ok i'll update this now to use the generic batch scheduler now that that is merged. Will do that before re-requesting

@ellemouton ellemouton changed the base branch from elle-graph to master May 23, 2025 08:38
In this commit, the various SQL schemas required to store graph node
related data is defined. Specifically, the following tables are defined:

- nodes
- node_extra_types
- node_features
- node_addresses
In this commit, we add the various sqlc queries that we need in order
to implement the following V1Store methods:

- AddLightningNode
- FetchLightningNode
- HasLightningNode
- AddrsForNode
- DeleteLightningNode
- FetchNodeFeatures

These are implemented by SQLStore which then lets us use the SQLStore
backend for the following unit tests:

- TestNodeInsertionAndDeletion
- TestLightningNodePersistence
In this commit, we let the SQLStore implement LookupAlias. This then
lets us run the TestAliasLookup unit test against the SQL backends.
In this commit we add the necessary SQL queries and then implement the
SQLStore's NodeUpdatesInHorizon method. This lets us run the
TestNodeUpdatesInHorizon unit tests against SQL backends.
In this commit, we add the `source_nodes` table. It points to entries in
the `nodes` table. This table will store one entry per protocol version
that we are announcing a node_announcement on.

With this commit, we can run the TestSourceNode unit test against our
SQL backends.
@ellemouton ellemouton force-pushed the graphSQL7-nodes-tables branch from 32d4287 to ef435a7 Compare May 23, 2025 09:13
@ellemouton ellemouton changed the base branch from master to elle-graphSQL8-prep May 23, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Related to the database/storage of LND graph no-changelog sql
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants