diff --git a/.gitignore b/.gitignore index b01cf2d2..dc281e26 100644 --- a/.gitignore +++ b/.gitignore @@ -9,3 +9,4 @@ draft/ .terraform.plan .github/workflows/.artifacts/ .vercel +indexer/indexer \ No newline at end of file diff --git a/README.md b/README.md index e5a4e714..47ae5fde 100644 --- a/README.md +++ b/README.md @@ -1,176 +1,40 @@ -# Kadena Indexer +# Kadindexer - Kadena Indexer This project is a monorepo that contains the following packages: -- `@kadena-indexer/indexer`: The indexer package, which is responsible for scanning and storing blocks for Kadena blockchain. -- `@kadena-indexer/terraform`: The Terraform configuration for provisioning the infrastructure required to run the indexer and the node. +- [`@kadena-indexer/indexer`](indexer/README.md): The indexer package, which is responsible for scanning and storing blocks for Kadena blockchain. +- [`@kadena-indexer/terraform`](terraform/README.md): The Terraform configuration for provisioning the infrastructure required to run the indexer and the node. +- [`@kadena-indexer/backfill`](backfill/README.md): The backfill package, which is responsible for backfilling the indexer data. -## Prerequisites +## Requirements -- [Terraform](https://www.terraform.io/downloads.html) -- [AWS CLI](https://aws.amazon.com/cli/) -- [AWS Account](https://aws.amazon.com/) -- [AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) - -### Dev Container - -This project is configured to run in a dev container. You can use the `Dev Containers: Open Folder in Container` command in VSCode to open the project in a dev container. This will automatically install the required dependencies and set up the environment. To use the dev container, you need to have Docker installed on your machine. - -If you don't have Dev Containers installed, you can install it from the [VSCode Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers). - -### Configure Environment Variables - -Under the `/terraform` directory, create an `.env` file using the `.env.template` as a reference and set the environment variables accordingly. - -```bash -cp terraform/.env.template terraform/.env -``` - -`TF_VAR_AWS_ACCESS_KEY_ID` is your AWS access key ID. -`TF_VAR_AWS_SECRET_ACCESS_KEY` is your AWS secret access key. -`TF_VAR_AWS_ACCOUNT_ID` is your AWS account ID. -`TF_VAR_AWS_USER_NAME` is the name of the user you created in AWS. -`TF_VAR_AWS_DB_USERNAME` is the username for the postgress database. -`TF_VAR_AWS_DB_PASSWORD` is the password for the postgress database. - -Under the `/indexer` directory, create an `.env` file using the `.env.template` as a reference and set the environment variables accordingly. - -```bash -cp indexer/.env.template indexer/.env -``` - -`AWS_S3_REGION` is the region where the S3 bucket is located. -`AWS_S3_BUCKET_NAME` is the name of the S3 bucket where the data will be stored. -`AWS_ACCESS_KEY_ID` is the access key ID for the S3 bucket. -`AWS_SECRET_ACCESS_KEY` is the secret access key for the S3 bucket. - -`SYNC_BASE_URL` is the base URL for the Kadena node. -`SYNC_MIN_HEIGHT` is the minimum height to start syncing from. -`SYNC_FETCH_INTERVAL_IN_BLOCKS` is the interval in blocks to fetch. -`SYNC_TIME_BETWEEN_REQUESTS_IN_MS` is the time between requests in milliseconds. -`SYNC_ATTEMPTS_MAX_RETRY` is the maximum number of attempts to retry. -`SYNC_ATTEMPTS_INTERVAL_IN_MS` is the interval in milliseconds between attempts. -`SYNC_NETWORK` is the network to sync. - -`DB_USERNAME` is the username for the postgress database. -`DB_PASSWORD` is the password for the postgress database. -`DB_NAME` is the name of the postgress database. -`DB_HOST` is the host for the postgress database. You have the host after the resource creation, so you can check for this information in the AWS console or in terraform output (postgres_db_host). - -### Initialize Terraform - -Initialize your Terraform workspace, which will download the provider and initialize it with the values provided in the terraform.`tfvars`` file. - -```bash -terraform init -``` - -### Deploy Infrastructure - -Plan and apply the Terraform configuration to provision your AWS resources: - -```bash -yarn terraform plan -yarn terraform apply -``` - -### Destroy Infrastructure - -If you want to destroy the infrastructure created, you can use the following command: - -```bash -yarn terraform destroy -``` +- Install dependencies +- See individual package READMEs for specific prerequisites ## Installation -Set up the indexer with the following commands: +Install dependencies with the following command: ```bash -yarn && yarn indexer build +yarn install ``` -## Features +## Quick Start -### Run processing +This is the quickest way to get the indexer running. -Continuous process of streaming, headers, payloads and missing blocks from node to s3 bucket and from s3 bucket to database +Install [Docker](https://www.docker.com/). -```bash -yarn indexer dev:run -``` - -## Additional Commands - -### Running with Docker +Fill the `.env` file in the `indexer` folder. See [Environment Variables Reference](../indexer/README.md#32-environment-variables-reference). ```bash -sudo docker build -t kadena-indexer:latest . -sudo docker run --env-file ./indexer/.env -p 3000:3000 kadena-indexer:latest -``` - -### Backfilling Blocks - -Scan for and store historical blocks. - -```bash -yarn indexer dev:backfill -``` - -### Streaming Blocks - -Listen for new blocks and store them in real-time. - -```bash -yarn indexer dev:streaming -``` - -### Identifying Missing Blocks - -Scan for and store any blocks that were missed. - -```bash -yarn indexer dev:missing -``` - -### Processing Headers - -Start the header processing from S3 to the database. - -```bash -yarn indexer dev:headers -``` - -### Processing Payloads - -Start the payload processing from S3 to the database. - -```bash -yarn indexer dev:payloads -``` - -## Advanced Usage - -### Local Workflow Testing - -For testing workflows locally, act is required. Install it using Homebrew: - -```bash -brew install act +cp indexer/.env.template indexer/.env ``` -### Run Terraform Workflow Manually - -If you want to run the terraform workflow manually, you can use the following command: - +To start all services: ```bash -yarn run-terraform-workflow +yarn indexer dev ``` -### Run Indexer Workflow Manually - -If you want to run the indexer workflow manually, you can use the following command: +**NOTE:** Using the image on with the composer require the database `DB_USERNAME` to default to `postgres`. -```bash -yarn run-indexer-workflow -``` diff --git a/backfill/.env.template b/backfill/.env.template new file mode 100644 index 00000000..86df1e91 --- /dev/null +++ b/backfill/.env.template @@ -0,0 +1,15 @@ +CERT_PATH=./global-bundle.pem +SYNC_BASE_URL=https://api.chainweb.com/chainweb/0.0 + +CHAIN_ID=0 +NETWORK=mainnet01 +SYNC_MIN_HEIGHT=5370495 +SYNC_FETCH_INTERVAL_IN_BLOCKS=100 +SYNC_ATTEMPTS_MAX_RETRY=5 +SYNC_ATTEMPTS_INTERVAL_IN_MS=500 + +DB_USERNAME=postgres +DB_PASSWORD=password +DB_NAME=indexer +DB_HOST=localhost +DB_PORT=5432 \ No newline at end of file diff --git a/backfill/README.md b/backfill/README.md new file mode 100644 index 00000000..f27ab31b --- /dev/null +++ b/backfill/README.md @@ -0,0 +1,99 @@ +# Kadena Indexer Backfill + +## 1. Introduction + +The Kadindexer Backfill is a utility tool designed to synchronize historical blockchain data from the Kadena network into your local database. It allows you to fetch and index past blocks and transactions, ensuring your database has a complete history of the chain. The backfill process can be configured to sync data from any specified block height, making it useful for both initial data population and recovery scenarios where data needs to be resynced from a particular point. + +## 2. Prerequisites + +- [Docker](https://www.docker.com/) +- Kadena Indexer PostgreSQL database running +- Network access to the Kadena network +- Running your own Kadena node + +## 3. Setup + +### 3.1. Starting Docker +Start Docker Desktop from command line or via IOS application. + +```bash +# MacOS - Start Docker Desktop from command line +open -a Docker + +# Linux - Start Docker daemon +sudo systemctl start docker +``` + +### 3.2. Environment Variables + +| Variable | Description | Example | +|----------|-------------|---------| +| `CERT_PATH` | Path to SSL certificate bundle | `./global-bundle.pem` | +| `SYNC_BASE_URL` | Base URL for the Chainweb API | `https://api.chainweb.com/chainweb/0.0` | +| `CHAIN_ID` | ID of the chain to backfill | `0` | +| `NETWORK` | Kadena network to sync from | `mainnet01` | +| `SYNC_MIN_HEIGHT` | Starting block height for backfill | `5370495` | +| `SYNC_FETCH_INTERVAL_IN_BLOCKS` | Number of blocks to fetch in each interval | `100` | +| `SYNC_ATTEMPTS_MAX_RETRY` | Maximum number of retry attempts | `5` | +| `SYNC_ATTEMPTS_INTERVAL_IN_MS` | Interval between retry attempts in milliseconds | `500` | +| `DB_USERNAME` | PostgreSQL database username | `postgres` | +| `DB_PASSWORD` | PostgreSQL database password | `password` | +| `DB_NAME` | Name of the database | `indexer` | +| `DB_HOST` | Database host address | `localhost` | +| `DB_PORT` | Database port number | `5432` | + +**NOTE:** The example Kadena node API from chainweb will not work for the indexer purpose. You will need to run your own Kadena node and set the `NODE_API_URL` to your node's API URL. + +## 4. Usage + +### 4.1. Start the Kadindexer services + +Please refer to the [Kadena Indexer README](../indexer/README.md) for instructions on how to start the Kadindexer services. + +### 4.2. Build the backfill image + +Build the image: +```bash +docker build -t chainbychain -f Dockerfile . +``` + +### 4.3. Run the container + +#### Dockerfile (Chain by Chain) +This Dockerfile is designed to run the backfill process for a single chain at a time. It's useful when you need to: +- Sync data for a specific chain ID +- Have more granular control over the backfill process +- Debug issues with a particular chain +- Manage resources more efficiently + +#### Dockerfile.indexes +This Dockerfile is specifically for recreating database indexes. Use this when you need to: +- Rebuild corrupted indexes +- Optimize existing indexes +- Add new indexes to improve query performance +- Perform database maintenance + +#### Dockerfile.middle-backfill +This Dockerfile orchestrates the backfill process across all chains simultaneously. It's beneficial when you want to: +- Perform a complete system backfill +- Sync data for all chains in parallel +- Save time by running multiple chain syncs concurrently +- Ensure consistency across all chains + +For single chain backfill: +```bash +docker build -t chainbychain -f Dockerfile . +docker run --rm --name chainbychain --env-file .env chainbychain +``` + +For rebuilding indexes: +```bash +docker build -t rebuild-indexes -f Dockerfile.indexes . +docker run --rm --name rebuild-indexes --env-file .env rebuild-indexes +``` + +For all chains backfill: +```bash +docker build -t all-chains -f Dockerfile.middle-backfill . +docker run --rm --name all-chains --env-file .env all-chains +``` \ No newline at end of file diff --git a/backfill/config/env.go b/backfill/config/env.go index 83922926..6a996a2c 100644 --- a/backfill/config/env.go +++ b/backfill/config/env.go @@ -35,7 +35,7 @@ func InitEnv(envFilePath string) { } config = &Config{ - DbUser: getEnv("DB_USER"), + DbUser: getEnv("DB_USERNAME"), DbPassword: getEnv("DB_PASSWORD"), DbName: getEnv("DB_NAME"), DbHost: getEnv("DB_HOST"), diff --git a/indexer/.env.template b/indexer/.env.template index eb90fd20..3bbeb7af 100644 --- a/indexer/.env.template +++ b/indexer/.env.template @@ -7,8 +7,8 @@ SYNC_NETWORK="mainnet01" KADENA_GRAPHQL_API_URL=localhost KADENA_GRAPHQL_API_PORT=3001 -DB_USERNAME="postgres" -DB_PASSWORD="YOUR_DB_PASSWORD" -DB_NAME="indexer" -DB_SSL_ENABLED=false -DB_HOST="YOUR_DB_HOST" \ No newline at end of file +DB_USERNAME=postgres +DB_PASSWORD=password +DB_NAME=indexer +DB_HOST="YOUR_DB_HOST" +DB_SSL_ENABLED=false \ No newline at end of file diff --git a/indexer/README.md b/indexer/README.md new file mode 100644 index 00000000..5e727bdd --- /dev/null +++ b/indexer/README.md @@ -0,0 +1,177 @@ +# Kadena Indexer - Infrastructure Configuration + +### 🚀 Getting Started +- [Introduction](#1-introduction) +- [Prerequisites](#2-prerequisites) + +### ⚙️ Configuration +- [Environment Setup](#3-environment-setup) + - [Configure Variables](#31-configure-environment-variables) + - [Variables Reference](#32-environment-variables-reference) + +### 🐳 Docker Setup +- [Starting Docker](#41-starting-docker) +- [Dev Container](#42-dev-container) +- [Running Options](#43-running-with-docker) + - [Basic Docker Run](#43-running-with-docker) + - [Docker Compose](#44-running-with-docker-compose) + - [Temporary Containers](#45-running-separately-with-temporary-containers) + +## 1. Introduction +This directory contains the instructions on how to set up the Docker container for the Kadena indexer, configure the environment variables, and run the indexer. We present two options for running the indexer, by using Docker Compose or running the services separately. + +## 2. Prerequisites +- [Docker](https://www.docker.com/) +- [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) for VSCode or Cursor (optional) +- Installed dependencies +- PostgreSQL (will be run in Docker) +- Sufficient disk space for Docker images and blockchain data +- Internet connection to access Kadena node API + +## 3. Environment Setup + +### 3.1. Configure Environment Variables +Under the `/indexer` directory, run the following command to create an `.env` file using the `.env.template` as a reference: + +```bash +cp indexer/.env.template indexer/.env +``` + +### 3.2. Environment Variables Reference +| Variable | Description | Example | +|----------|-------------|---------| +| `NODE_API_URL` | Base URL for the Kadena node API | `https://api.chainweb.com` | +| `SYNC_BASE_URL` | Base URL for the Chainweb API | `https://api.chainweb.com/chainweb/0.0` | +| `SYNC_MIN_HEIGHT` | Minimum height to start syncing from | `0` | +| `SYNC_FETCH_INTERVAL_IN_BLOCKS` | Interval in blocks to fetch | `100` | +| `SYNC_NETWORK` | Network to sync | `mainnet01`, `testnet04`, `devnet` | +| `KADENA_GRAPHQL_API_URL` | GraphQL API host | `localhost` | +| `KADENA_GRAPHQL_API_PORT` | GraphQL API port | `3000` | +| `DB_USERNAME` | PostgreSQL database username | `postgres` | +| `DB_PASSWORD` | PostgreSQL database password | `your_password` | +| `DB_NAME` | PostgreSQL database name | `indexer` | +| `DB_HOST` | PostgreSQL database host | `localhost` | +| `DB_SSL_ENABLED` | Enable/disable SSL for database | `true` or `false` | + +**NOTE:** The example Kadena node API from chainweb will not work for the indexer purpose. You will need to run your own Kadena node and set the `NODE_API_URL` to your node's API URL. + +## 4. Docker Setup + +### 4.1. Starting Docker +Start Docker Desktop from command line or via IOS application. + + +```bash +# MacOS - Start Docker Desktop from command line +open -a Docker + +# Linux - Start Docker daemon +sudo systemctl start docker +``` + +**NOTE:** Make sure to check the `.env` file to set the correct environment variables. + +### 4.2. Dev Container +This project is configured to run in a dev container. You can use the `Dev Containers: Open Folder in Container` command in VSCode to open the project in a dev container. This will automatically install the required dependencies and set up the environment. To use the dev container, you need to have Docker installed on your machine. + +If you don't have Dev Containers installed, you can install it from the [VSCode Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers). + +### 4.3. Running with Docker +```bash +# Build a Docker image named 'kadena-indexer' using the Dockerfile in current directory +sudo docker build -t kadena-indexer:latest . +# Run a container from the image, load environment variables from .env file, and map port 3000 +sudo docker run --env-file ./indexer/.env -p 3000:3000 kadena-indexer:latest +``` + +### 4.4. Running with Docker Compose + +Docker Compose provides a way to run the entire indexer stack with a single command. While you could run each service separately (database, migrations, GraphQL server, and streaming service), Docker Compose orchestrates all these components together, handling their dependencies and startup order automatically. The services are defined in `docker-compose.development.yml`, which includes: +- PostgreSQL database +- Database migrations +- GraphQL API server +- Streaming indexer service + +To start all services: +```bash +yarn dev +``` + +**NOTE:** Using the image on with the composer require the database `DB_USERNAME` to default to `postgres`. + +### 4.5. Running Postgres Container +This workflow will start the PostgreSQL database in a temporary container. Remove the `--rm` flag to keep the container running after the command is finished. + +```bash +# First, load the environment variables from .env +source .env + +# Then run the container using the environment variables +docker run --rm --name postgres-indexer \ + -e POSTGRES_USER=$DB_USERNAME \ + -e POSTGRES_PASSWORD=$DB_PASSWORD \ + -e POSTGRES_DB=$DB_NAME \ + -p 5432:5432 \ + postgres +``` + +## 5. Indexer + +### 5.1. Running the Indexer +Assuming you've already started the Docker container, you can run the following commands to start the indexer: + +**Note**: Run each command in a separate terminal window -- with exeption of `yarn create:database`, as they are long-running process. + +```bash +# Run the database migrations +yarn create:database + +# Start the streaming service +yarn dev:streaming + +# Start the GraphQL server with hot reload +yarn dev:hot:graphql +``` + +### 5.2. Additional Commands + +The following commands will aid in the maintenance of the indexer. + +```bash +# Identifying Missing Blocks - Scan for and store any blocks that were missed. +yarn dev:missing + +# Processing Headers - Start the header processing from S3 to the database. +yarn dev:headers + +# Processing Payloads - Start the payload processing from S3 to the database. +yarn dev:payloads + +# Update GraphQL - Makers a hot reload (without building) +yarn dev:hot:graphql + +# Generate GraphQL types - Generate the GraphQL types from the schema. +yarn graphql:generate-types + +# Run the pagination tests offline +yarn test +``` + +### 5.3. Local Workflow Testing + +**NOTE:** This is not being actively maintained at the moment. + +Install act for local testing: +```bash +# For MacOS +brew install act + +# For Linux +sudo apt-get update +sudo apt-get install act +``` + +Then run the indexer workflow by using the following command: +```bash +yarn run-indexer-workflow +``` diff --git a/indexer/package.json b/indexer/package.json index b9a4e449..1cd9b93f 100644 --- a/indexer/package.json +++ b/indexer/package.json @@ -63,20 +63,15 @@ "typescript": "^5.3.3" }, "scripts": { - "build": "tsc", - "graphql:generate-types": "npx graphql-codegen", + "create:database": "ts-node src/index.ts --database && yarn migrate:up", + "dev": "docker-compose -f docker-compose.development.yml up && docker-compose logs -f indexer", "dev:database": "ts-node src/index.ts --database", "dev:streaming": "ts-node src/index.ts --streaming", "dev:graphql": "ts-node src/index.ts --graphql", - "dev:old-graphql": "ts-node src/index.ts --oldGraphql", "dev:hot:graphql": "nodemon src/index.ts --graphql", - "prod:start": "docker-compose up --build indexer && docker-compose logs -f indexer", - "prod:streaming": "node dist/index.js --streaming", - "prod:backfill": "node dist/index.js --backfill", - "test": "NODE_ENV=test mocha -r ts-node/register 'tests/**/*.test.ts'", - "test:unit": "jest tests/unit/*.test.ts", - "create:database": "ts-node src/index.ts --database && yarn migrate:up", + "graphql:generate-types": "npx graphql-codegen", "migrate:up": "dotenv -e .env npx sequelize-cli db:migrate", - "migrate:down": "dotenv -e .env npx sequelize-cli db:migrate:undo" + "migrate:down": "dotenv -e .env npx sequelize-cli db:migrate:undo", + "test": "jest tests/unit/*.test.ts" } } diff --git a/package.json b/package.json index 4890469f..67ee1c2a 100644 --- a/package.json +++ b/package.json @@ -6,10 +6,8 @@ "repository": "https://github.com/hack-a-chain-software/indexer-kadena.git", "license": "MIT", "scripts": { - "web": "yarn workspace @kadena-indexer/web", "indexer": "yarn workspace @kadena-indexer/indexer", "terraform": "yarn workspace @kadena-indexer/terraform", - "serverless": "yarn workspace @kadena-indexer/serverless", "run-terraform-workflow": "act -W .github/workflows/terraform.yml -P ubuntu-latest=-self-hosted --artifact-server-path ./.github/workflows/.artifacts/ --secret-file ./terraform/.env", "run-indexer-workflow": "act -W .github/workflows/indexer.yml --secret-file ./indexer/.env" }, diff --git a/terraform/README.md b/terraform/README.md new file mode 100644 index 00000000..806feea9 --- /dev/null +++ b/terraform/README.md @@ -0,0 +1,82 @@ +# Kadena Indexer - Terraform Configuration + +### 🚀 Getting Started +- [Introduction](#1-introduction) +- [Prerequisites](#2-prerequisites) + +### ⚙️ Configuration +- [Environment Setup](#3-environment-setup) + - [Configure AWS Credentials](#31-configure-aws-credentials) + - [Environment Variables](#32-environment-variables) + +### 🛠️ Infrastructure Management +- [Terraform Operations](#4-terraform-operations) + - [Initialize](#41-initialize-terraform) + - [Deploy](#42-deploy-infrastructure) + - [Destroy](#43-destroy-infrastructure) + - [Local Testing](#44-local-workflow-testing) + +## 1. Introduction +This directory contains the infrastructure configuration for running the Kadena indexer assuming that you have already set up your Kadena node. + +## 2. Prerequisites +- [Terraform](https://www.terraform.io/downloads.html) +- [AWS CLI](https://aws.amazon.com/cli/) +- [AWS Account](https://aws.amazon.com/) +- [AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) + +## 3. Environment Setup + +### 3.1. Configure AWS Credentials +Create an `.env` file using the `.env.template` as a reference: +```bash +cp .env.template .env +``` + +### 3.2. Environment Variables +Required variables: +- `TF_VAR_AWS_ACCESS_KEY_ID`: Your AWS access key ID +- `TF_VAR_AWS_SECRET_ACCESS_KEY`: Your AWS secret access key +- `TF_VAR_AWS_ACCOUNT_ID`: Your AWS account ID +- `TF_VAR_AWS_USER_NAME`: The name of the user created in AWS +- `TF_VAR_AWS_DB_USERNAME`: Username for the PostgreSQL database +- `TF_VAR_AWS_DB_PASSWORD`: Password for the PostgreSQL database + +Don't forget to define the remaining variables. Their values are described in [Environment Variables Reference](../indexer/README.md#32-environment-variables-reference). + +## 4. Terraform Operations + +### 4.1. Initialize Terraform +```bash +terraform init +``` + +### 4.2. Deploy Infrastructure +```bash +yarn terraform plan +yarn terraform apply +``` + +### 4.3. Destroy Infrastructure +```bash +yarn terraform destroy +``` + +### 4.4. Local Workflow Testing + +**NOTE:** This is not being actively maintained at the moment. + +Install act for local testing: +```bash +# For MacOS +brew install act + +# For Linux +sudo apt-get update +sudo apt-get install act +``` + +Run the terraform workflow: +```bash +yarn run-terraform-workflow +```