Description
In #9729 we have a long delay on startup when using postgres and a medium delay on startup with SQLite getting through a search for in flight HTLC in the payments DB.
With postgres, I'm having another problem on startup. After the scan for in flight HTLC completes, I still have a postgres process running with 100% CPU:
I believe this process was started around the same time and was running in parallel to the search for in flight HTLC in the payments DB. However, after the search for in flight HTLC in the payments DB completes, I've let things go for another ~3 additional hours waiting for that extra postgres process to complete.
I tried running a profile to see what could be going on:
I'm not sure what is going on, so I tried to kill the postgres process. After killing, I get the following in my logs:
2025-04-17 20:14:04.922 [ERR] RPCS: [/lnrpc.Lightning/SubscribeInvoices]: unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
2025-04-17 20:14:04.925 [DBG] STAT: Setting the accounts sub-server as errored: Error in invoice subscription: rpc error: code = Unknown desc = unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
2025-04-17 20:14:04.925 [ERR] STAT: could not start the accounts sub-server: Error in invoice subscription: rpc error: code = Unknown desc = unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
2025-04-17 20:14:04.925 [ERR] LITD: Error thrown in the accounts service, keeping litd running: Error in invoice subscription: rpc error: code = Unknown desc = unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
db-1 | 2025-04-17 20:14:04.910 UTC [36] FATAL: terminating connection due to administrator command
db-1 | 2025-04-17 20:14:04.910 UTC [36] STATEMENT: -- name: FilterInvoices :many
db-1 | SELECT
db-1 | invoices.id, invoices.hash, invoices.preimage, invoices.settle_index, invoices.settled_at, invoices.memo, invoices.amount_msat, invoices.cltv_delta, invoices.expiry, invoices.payment_addr, invoices.payment_request, invoices.payment_request_hash, invoices.state, invoices.amount_paid_msat, invoices.is_amp, invoices.is_hodl, invoices.is_keysend, invoices.created_at
db-1 | FROM invoices
db-1 | WHERE (
db-1 | id >= $1 OR
db-1 | $1 IS NULL
db-1 | ) AND (
db-1 | id <= $2 OR
db-1 | $2 IS NULL
db-1 | ) AND (
db-1 | settle_index >= $3 OR
db-1 | $3 IS NULL
db-1 | ) AND (
db-1 | settle_index <= $4 OR
db-1 | $4 IS NULL
db-1 | ) AND (
db-1 | state = $5 OR
db-1 | $5 IS NULL
db-1 | ) AND (
db-1 | created_at >= $6 OR
db-1 | $6 IS NULL
db-1 | ) AND (
db-1 | created_at < $7 OR
db-1 | $7 IS NULL
db-1 | ) AND (
db-1 | CASE
db-1 | WHEN $8 = TRUE THEN (state = 0 OR state = 3)
db-1 | ELSE TRUE
db-1 | END
db-1 | )
db-1 | ORDER BY
db-1 | CASE
db-1 | WHEN $9 = FALSE OR $9 IS NULL THEN id
db-1 | ELSE NULL
db-1 | END ASC,
db-1 | CASE
db-1 | WHEN $9 = TRUE THEN id
db-1 | ELSE NULL
db-1 | END DESC
db-1 | LIMIT $11 OFFSET $10
db-1 |
Which leads me to believe that because I'm actually running litd, litd is what might be interacting with the DB here too. I'm wondering, can litd connect to the database directly and not go through lnd? If so, could litd not be seeing the tombstone marker on an old KV schema DB or not obeying the --lnd.db.use-native-sql
argument?
Or, is this query that litd is making intended to be an ongoing thing that keeps litd accounts in sync with lnd invoices and it is just really inefficient in how it is implemented in postgres and it needs to be refactored?
Also, I re-tested the above with SQLite with what should be the same DB (it was created from the same original bbolt DB) and after a 17 minute period to complete a search through the payments DB for in flight HTLC (#9729 (comment)) there was no CPU activity after that. So SQLite must be finishing the extra (invoice???) workload very quickly and I didn't even notice it.