Skip to content

Feat/preorg tsv event replay #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 134 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
8718112
refactor: split postgres-store into several files
rafaelcr Apr 18, 2022
fcbe844
refactor: inherit all from PgStore
rafaelcr Apr 18, 2022
491a632
refactor: start using PgStore everywhere
rafaelcr Apr 19, 2022
c817352
refactor: fix test imports
rafaelcr Apr 19, 2022
c8891fa
refactor: restore notifier into PgStore
rafaelcr Apr 19, 2022
e6f6119
refactor: rename to PgWriteStore
rafaelcr Apr 19, 2022
30f80d9
Merge branch 'develop' into feat/pg-refactor
rafaelcr Apr 19, 2022
2bba410
fix: tests and lint
rafaelcr Apr 20, 2022
fffec65
fix: unused exports
rafaelcr Apr 20, 2022
737bd38
refactor: pg-store converted
rafaelcr Apr 22, 2022
56050ca
refactor: pg-write-store
rafaelcr Apr 25, 2022
9588295
refactor: use legacy pg for event requests, notifier and migrations
rafaelcr Apr 25, 2022
543bb86
refactor: update tests
rafaelcr Apr 26, 2022
e02e3ce
refactor: tx insert mode
rafaelcr Apr 26, 2022
d105d91
refactor: first test passing
rafaelcr Apr 26, 2022
77af0b9
refactor: transformed some inserts
rafaelcr Apr 27, 2022
c309b26
refactor: transform all inserts
rafaelcr Apr 27, 2022
4e6484a
refactor: fix token-tests
rafaelcr Apr 27, 2022
e5ecbc7
refactor: some more tests fixed
rafaelcr Apr 27, 2022
8706f98
fix: more tests
rafaelcr Apr 27, 2022
e36459a
fix: nasty mempool query
rafaelcr Apr 27, 2022
6190ed5
fix: microblock hash test
rafaelcr Apr 27, 2022
0a72a8c
fix: events weird test
rafaelcr Apr 27, 2022
1de3145
fix: api tests all passing
rafaelcr Apr 28, 2022
df0e291
fix: mb tests
rafaelcr Apr 28, 2022
d20d546
fix: bns code
rafaelcr Apr 28, 2022
780b253
fix: datastore connection tests
rafaelcr Apr 28, 2022
dec3711
fix: rosetta, bns, token tests
rafaelcr Apr 28, 2022
4f42d3d
fix: faucet write db
rafaelcr Apr 29, 2022
5384367
fix: write store to api server for faucet
rafaelcr Apr 29, 2022
e70da5f
style: cleanup
rafaelcr Apr 29, 2022
359a9b8
fix: faucet launch
rafaelcr Apr 29, 2022
02e44d4
fix: upgrade postgres
rafaelcr Apr 29, 2022
bffdd82
fix: test launch
rafaelcr Apr 29, 2022
1687660
fix: remove type transform
rafaelcr Apr 29, 2022
eb4f0eb
refactor: go back to buffer fields
rafaelcr Apr 29, 2022
29e0380
fix: insert numeric as string so values can fit
rafaelcr Apr 29, 2022
99ad314
refactor: move all types to common.ts
rafaelcr Apr 29, 2022
b083fe4
style: comments
rafaelcr Apr 29, 2022
98e4e68
fix: rosetta cli tests
rafaelcr Apr 30, 2022
8eb2a3a
fix: undo launch custom
rafaelcr Apr 30, 2022
22e666a
refactor: bytea type, api tests pass
rafaelcr Apr 30, 2022
8d6f6bc
fix: pg uri test
rafaelcr May 1, 2022
63a030d
chore: define PgNumeric type
rafaelcr May 1, 2022
87bbabb
style: sql object references
rafaelcr May 1, 2022
a74ea3c
style: clean up tx events query
rafaelcr May 1, 2022
4902975
style: add explicit PgJsonb type
rafaelcr May 1, 2022
a0ea63d
feat: dont parse into buffer from pg
rafaelcr May 4, 2022
154c329
fix: datastore tests
rafaelcr May 4, 2022
74195fa
fix: rosetta tests
rafaelcr May 4, 2022
5f45bfe
fix: bns tests
rafaelcr May 4, 2022
d0b8e77
fix: microblock tests
rafaelcr May 4, 2022
ea63864
chore: add nvmrc set to v16.15.0
rafaelcr May 4, 2022
9b6f0f7
chore: update package-lock.json
rafaelcr May 4, 2022
2e6dde2
chore: restore vscode config
rafaelcr May 4, 2022
697d5c3
chore: shorten poll connection name to avoid hitting pg limits
rafaelcr May 5, 2022
2a0c1d5
fix: remove returning statements when not required, use count instead
rafaelcr May 10, 2022
4e4978e
fix: nvmrc back to v16
rafaelcr May 12, 2022
f7e68ef
fix: pool max env config
rafaelcr May 12, 2022
0e5b90d
feat: implement new "pre-org" event-replay mode
zone117x May 3, 2022
2e5c983
feat: batch sql inserts
zone117x May 3, 2022
90eb686
test(perf): use pg COPY stream for streaming insert
zone117x May 4, 2022
eddbeed
test: for reference, commit several approaches for fast pg inserting
zone117x May 5, 2022
b61e232
temp: commit read cpu profiling output
zone117x May 5, 2022
8d9730a
feat: pre-org mode inserts for /new_burn_block data
zone117x May 9, 2022
07843d1
feat: pre-org insert into `blocks` table from /new_block
zone117x May 9, 2022
24a3b80
feat: pre-org insert into `microblocks` table from /new_block
zone117x May 9, 2022
60b4949
feat: pre-org insert into `txs` table from /new_block
zone117x May 9, 2022
b1a75b4
feat: pre-org insert into `stx_events` table from /new_block
zone117x May 10, 2022
a4be310
feat: pre-org insert into `principal_stx_txs` table from /new_block
zone117x May 10, 2022
666bdd0
feat: pre-org insert into `contract_logs` table from /new_block
zone117x May 10, 2022
b8c7e0f
feat: pre-org insert into `stx_lock_events` table from /new_block
zone117x May 10, 2022
6ebdea8
feat: pre-org insert into `ft_events`, `nft_events`, and `smart_contr…
zone117x May 10, 2022
874e1da
feat: pre-org insert into `zonefiles`, `names`, and `namespaces` tabl…
zone117x May 10, 2022
a80b84e
feat: pre-org insert into `zonefiles` tables from /attachments/new ev…
zone117x May 10, 2022
157ac2f
chore: cleanup duplicate attachment parsing code
zone117x May 10, 2022
115741e
feat: pre-org insert into `subdomains` and `zonefiles` tables from /a…
zone117x May 10, 2022
77f5654
chore: logging improvements
zone117x May 10, 2022
daa746b
chore: move `preorg` logic into separate file
zone117x May 11, 2022
9a71d93
chore: add more logging to `insertRawEvents`
zone117x May 11, 2022
d03d7b4
feat: cumulative time tracking and reporting for each sql function du…
zone117x May 11, 2022
e24a0c2
feat: batch inserts for 2x speedup to `txs` table updates
zone117x May 11, 2022
88b1883
feat: batch inserts for 3x speedup to `principal_stx_txs` table updates
zone117x May 11, 2022
f2cdf76
feat: batch inserts for 25% speedup to `event_observer_requests` tabl…
zone117x May 11, 2022
b39ac4a
feat: batch inserts for 25% speedup to `stx_events` table updates
zone117x May 11, 2022
cd01ff8
feat: batch inserts for 2x speedup to `ft_events` table updates
zone117x May 11, 2022
48edc7b
feat: batch inserts for 3x speedup to n`ft_events` table updates
zone117x May 11, 2022
79371f3
feat: batch inserts for 2x speedup to `contract_logs` table updates
zone117x May 11, 2022
a1fbf07
chore: fix log strings
zone117x May 11, 2022
cdb1ac9
feat: batched inserts for `blocks` and `microblocks` tables
zone117x May 12, 2022
3a9f1e6
feat: batch inserts for 10x speedup to `names` and `zonefiles` table …
zone117x May 12, 2022
4ba0e4b
chore: fix typos
zone117x May 12, 2022
3c081d4
chore: revert no longer used code
zone117x May 12, 2022
ffd872f
feat: run `db.FinishEventReplay()` in `preorg` mode
zone117x May 12, 2022
c051915
feat: support for inserting unpruned & un-org'd for last N (default 2…
zone117x May 12, 2022
5760246
fix: populate bns subdomain block data during bulk insertion
zone117x May 13, 2022
f73ac5d
feat: configurable sql index method for identifier columns (default t…
zone117x May 13, 2022
ae23acc
chore: post-rebase lint fixes
zone117x May 13, 2022
343ffaa
chore: fix more linter errors
zone117x May 13, 2022
d2c15e1
chore(ci): test event-import perf in gh actions
zone117x May 13, 2022
d9ab137
fix(tests): repair `Block execution cost` unit test
zone117x May 13, 2022
9f1d271
chore: logging improvements
zone117x May 13, 2022
d2e70f1
perf: run importing in parallel, about 20% faster
zone117x May 13, 2022
ec76c56
perf: leave indexes alone on `event_observer_requests` table
zone117x May 13, 2022
8530774
feat: write temporary tsv preprocessing file output to /tmp dir
zone117x May 13, 2022
aee7d1a
feat: perform table re-indexing in parallel, around 40% faster
zone117x May 13, 2022
d6c0bb0
chore: lint fixes
zone117x May 13, 2022
4781372
chore: time tracking for parallel /new_block table reindexing
zone117x May 13, 2022
e03c656
Merge branch 'develop' into feat/preorg-tsv-event-replay
zone117x Jun 6, 2022
4d74742
chore: update `postgres` lib to latest stable
zone117x Jun 6, 2022
0711613
chore: restore vscode debug config to archival event replay
zone117x Jun 6, 2022
07dcaf7
chore: always use default (btree) index method for `sender` and `reci…
zone117x Jun 6, 2022
2d64210
chore: remove cpuprofile files, should not have been committed
zone117x Jun 6, 2022
54a4151
chore: reduce in-memory buffer sizes
zone117x Jun 7, 2022
747ccaf
chore: pin client and docs package.json versions
zone117x Jun 7, 2022
b72c162
chore: use npm ci everywhere
zone117x Jun 7, 2022
b706764
chore: update client and docs package-lock.json files
zone117x Jun 7, 2022
e624fbd
chore: fix root package-lock.json
zone117x Jun 7, 2022
2bbb3f7
chore: fix docs package-lock.json
zone117x Jun 7, 2022
70d0028
chore: fix docs package-lock.json
zone117x Jun 7, 2022
8f5243f
chore: fix docs package-lock.json, attempt 3
zone117x Jun 7, 2022
dbda2ae
chore: fix docs package-lock.json, attempt 4
zone117x Jun 7, 2022
b7f8d84
chore: log process exit code
zone117x Jun 9, 2022
533b56a
chore: preorg vscode debug config
zone117x Jun 16, 2022
5bf8e67
Merge branch 'develop' into feat/preorg-tsv-event-replay
zone117x Jun 16, 2022
724ac4c
chore: breakup large `preOrgTsvImport` fn
zone117x Aug 15, 2022
2232a4d
Merge branch 'develop' into feat/preorg-tsv-event-replay
zone117x Aug 15, 2022
f3589fd
chore: post-merge fixes
zone117x Aug 15, 2022
2a238db
chore: remove parallel event pg inserts, simplify table index togglin…
zone117x Aug 15, 2022
1d70ee3
chore: progress on more memory-friendly tsv streaming and transforming
zone117x Aug 15, 2022
0892e9e
chore: rewrite the /new_block preorg ingest loop to use less nodejs m…
zone117x Aug 15, 2022
80446ec
chore: fix pg memory usage issue caused by large batch inserts into t…
zone117x Aug 16, 2022
813964c
chore: lint fix
zone117x Aug 16, 2022
405a0f3
ci: patch
rafaelcr Oct 28, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ PG_APPLICATION_NAME=stacks-blockchain-api
# See https://node-postgres.com/api/pool
# PG_CONNECTION_POOL_MAX=10

# Specify the sql index method used for identifier columns like `tx_id`, `block_hash`, etc. By
# default `btree` is used which is fast for writes but slightly slower for reads. For production
# environments the recommended type is `hash` which is slower for writes but faster for reads.
# See https://www.postgresql.org/docs/current/indexes-types.html
# PG_IDENT_INDEX_TYPE=hash

# Enable to have stacks-node events streamed to a file while the application is running
# STACKS_EXPORT_EVENTS_FILE=/tmp/stacks-events.tsv

Expand Down
44 changes: 44 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,49 @@ jobs:
flag-name: run-${{ github.job }}
parallel: true

test-event-import:
if: ${{ false }} # gh runner not given enough resources to complete this on time
runs-on: ubuntu-latest
services:
postgres:
image: postgres
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: stacks_blockchain_api
ports:
- 5490:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Use Node.js
uses: actions/setup-node@v2
with:
node-version-file: '.nvmrc'

- name: Install deps
run: npm ci

- name: Build
run: npm run build

- name: Fetch tsv file
run: |
wget https://public-hiro-m.s3.amazonaws.com/stacks-node-events.tsv.tar.zst
tar axvf stacks-node-events.tsv.tar.zst

- name: Run event-import
env:
NODE_ENV: production
run: node lib/index.js import-events --wipe-db --force --mode preorg --file stacks-node-events.tsv

upload-coveralls:
runs-on: ubuntu-latest
needs:
Expand Down Expand Up @@ -497,6 +540,7 @@ jobs:
with:
token: ${{ secrets.GH_TOKEN || secrets.GITHUB_TOKEN }}
fetch-depth: 0
persist-credentials: false

- name: Semantic Release
uses: cycjimmy/[email protected]
Expand Down
45 changes: 45 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,51 @@
"TS_NODE_SKIP_IGNORE": "true"
}
},
{
"type": "node",
"request": "launch",
"name": "Launch: import-events - preorg mode",
"runtimeArgs": ["-r", "ts-node/register/transpile-only", "-r", "tsconfig-paths/register"],
"args": [
"${workspaceFolder}/src/index.ts",
"import-events",
"--wipe-db",
"--force",
"--mode",
"preorg",
"--file",
"/Users/matt/Downloads/tsv/stacks-node-events.tsv"
],
// 2.19 GB
"outputCapture": "std",
"internalConsoleOptions": "openOnSessionStart",
"env": {
"NODE_ENV": "production",
"TS_NODE_SKIP_IGNORE": "true"
}
},
{
"type": "node",
"request": "launch",
"name": "Launch: (compiled) import-events - preorg mode",
"args": [
"${workspaceFolder}/lib/index.js",
"import-events",
"--wipe-db",
"--force",
"--mode",
"preorg",
"--file",
"/Users/matt/Downloads/tsv/stacks-node-events.tsv"
],
// 2.19 GB
"outputCapture": "std",
"internalConsoleOptions": "openOnSessionStart",
"env": {
"NODE_ENV": "production",
"TS_NODE_SKIP_IGNORE": "true"
}
},
{
"type": "node",
"request": "launch",
Expand Down
6 changes: 6 additions & 0 deletions src/datastore/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,7 @@ export interface DbBnsSubdomain {
tx_id: string;
tx_index: number;
canonical: boolean;
index_block_hash?: string;
}

export interface DbConfigState {
Expand Down Expand Up @@ -1145,6 +1146,11 @@ export interface FaucetRequestInsertValues {
occurred_at: number;
}

export interface RawEventRequestInsertValues {
event_path: string;
payload: string;
}

export interface PrincipalStxTxsInsertValues {
principal: string;
tx_id: PgBytea;
Expand Down
16 changes: 11 additions & 5 deletions src/datastore/connection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ const PG_TYPE_MAPPINGS = {
from: [17],
serialize: (x: any) => {
if (typeof x === 'string') {
if (/^(0x|0X)[a-fA-F0-9]*$/.test(x)) {
if (x.length === 0) {
return '\\x';
} else if (/^(0x|0X)[a-fA-F0-9]*$/.test(x)) {
// hex string with "0x" prefix
if (x.length % 2 !== 0) {
throw new Error(`Hex string is an odd number of digits: "${x}"`);
}
return '\\x' + x.slice(2);
} else if (x.length === 0) {
return '\\x';
} else if (/^\\x[a-fA-F0-9]*$/.test(x)) {
// hex string with "\x" prefix (already encoded for postgres)
if (x.length % 2 !== 0) {
Expand Down Expand Up @@ -69,9 +69,11 @@ export type PgJsonb = any;
export async function connectPostgres({
usageName,
pgServer,
maxPoolOverride,
}: {
usageName: string;
pgServer: PgServer;
maxPoolOverride?: number;
}): Promise<PgSqlClient> {
const initTimer = stopwatch();
let connectionError: Error | undefined;
Expand All @@ -81,6 +83,7 @@ export async function connectPostgres({
const testSql = getPostgres({
usageName: `${usageName};conn-poll`,
pgServer: pgServer,
maxPoolOverride,
});
try {
await testSql`SELECT version()`;
Expand Down Expand Up @@ -110,16 +113,19 @@ export async function connectPostgres({
const sql = getPostgres({
usageName: `${usageName};datastore-crud`,
pgServer: pgServer,
maxPoolOverride,
});
return sql;
}

export function getPostgres({
usageName,
pgServer,
maxPoolOverride,
}: {
usageName: string;
pgServer?: PgServer;
maxPoolOverride?: number;
}): PgSqlClient {
// Retrieve a postgres ENV value depending on the target database server (read-replica/default or primary).
// We will fall back to read-replica values if a primary value was not given.
Expand Down Expand Up @@ -164,7 +170,7 @@ export function getPostgres({
uri.searchParams.set('application_name', appName);
sql = postgres(uri.toString(), {
types: PG_TYPE_MAPPINGS,
max: pgEnvVars.poolMax,
max: maxPoolOverride ?? pgEnvVars.poolMax,
connection: {
application_name: appName,
search_path: schema,
Expand All @@ -179,7 +185,7 @@ export function getPostgres({
host: pgEnvVars.host,
port: parsePort(pgEnvVars.port),
ssl: parseArgBoolean(pgEnvVars.ssl),
max: pgEnvVars.poolMax,
max: maxPoolOverride ?? pgEnvVars.poolMax,
types: PG_TYPE_MAPPINGS,
connection: {
application_name: appName,
Expand Down
2 changes: 1 addition & 1 deletion src/datastore/event-requests.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ export async function* getRawEventRequests(
id bigint PRIMARY KEY,
receive_timestamp timestamptz NOT NULL,
event_path text NOT NULL,
payload jsonb NOT NULL
payload text NOT NULL
) ON COMMIT DROP
`);
// Use a `temp_raw_tsv` table first to store the raw TSV data as it might come with duplicate
Expand Down
9 changes: 9 additions & 0 deletions src/datastore/helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -803,6 +803,15 @@ export function createDbTxFromCoreMsg(msg: CoreNodeParsedTxMessage): DbTx {
return dbTx;
}

const DEFAULT_MEMPOOL_TX_GARBAGE_COLLECTION_THRESHOLD = 256;

export function getMempoolTxGarbageCollectionThreshold() {
return parseInt(
process.env['STACKS_MEMPOOL_TX_GARBAGE_COLLECTION_THRESHOLD'] ??
`${DEFAULT_MEMPOOL_TX_GARBAGE_COLLECTION_THRESHOLD}`
);
}

export function registerMempoolPromStats(pgEvents: PgStoreEventEmitter) {
const mempoolTxCountGauge = new prom.Gauge({
name: `mempool_tx_count`,
Expand Down
2 changes: 1 addition & 1 deletion src/datastore/migrations.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import { Client } from 'pg';
import { APP_DIR, isDevEnv, isTestEnv, logError, logger } from '../helpers';
import { getPgClientConfig, PgClientConfig } from './connection-legacy';

const MIGRATIONS_TABLE = 'pgmigrations';
export const MIGRATIONS_TABLE = 'pgmigrations';
const MIGRATIONS_DIR = path.join(APP_DIR, 'migrations');

export async function runMigrations(
Expand Down
Loading