hack-a-chain-software · 0xneves · Feb 14, 2025 · Feb 14, 2025 · Feb 14, 2025 · Feb 14, 2025
diff --git a/.gitignore b/.gitignore
@@ -9,3 +9,4 @@ draft/
 .terraform.plan
 .github/workflows/.artifacts/
 .vercel
+indexer/indexer
diff --git a/README.md b/README.md
@@ -1,176 +1,40 @@
-# Kadena Indexer
+# Kadindexer - Kadena Indexer
 
 This project is a monorepo that contains the following packages:
 
-- `@kadena-indexer/indexer`: The indexer package, which is responsible for scanning and storing blocks for Kadena blockchain.
-- `@kadena-indexer/terraform`: The Terraform configuration for provisioning the infrastructure required to run the indexer and the node.
+- [`@kadena-indexer/indexer`](indexer/README.md): The indexer package, which is responsible for scanning and storing blocks for Kadena blockchain.
+- [`@kadena-indexer/terraform`](terraform/README.md): The Terraform configuration for provisioning the infrastructure required to run the indexer and the node. 
+- [`@kadena-indexer/backfill`](backfill/README.md): The backfill package, which is responsible for backfilling the indexer data.
 
-## Prerequisites
+## Requirements
 
-- [Terraform](https://www.terraform.io/downloads.html)
-- [AWS CLI](https://aws.amazon.com/cli/)
-- [AWS Account](https://aws.amazon.com/)
-- [AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys)
-
-### Dev Container
-
-This project is configured to run in a dev container. You can use the `Dev Containers: Open Folder in Container` command in VSCode to open the project in a dev container. This will automatically install the required dependencies and set up the environment. To use the dev container, you need to have Docker installed on your machine.
-
-If you don't have Dev Containers installed, you can install it from the [VSCode Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers).
-
-### Configure Environment Variables
-
-Under the `/terraform` directory, create an `.env` file using the `.env.template` as a reference and set the environment variables accordingly.
-
-```bash
-cp terraform/.env.template terraform/.env
-```
-
-`TF_VAR_AWS_ACCESS_KEY_ID` is your AWS access key ID.
-`TF_VAR_AWS_SECRET_ACCESS_KEY` is your AWS secret access key.
-`TF_VAR_AWS_ACCOUNT_ID` is your AWS account ID.
-`TF_VAR_AWS_USER_NAME` is the name of the user you created in AWS.
-`TF_VAR_AWS_DB_USERNAME` is the username for the postgress database.
-`TF_VAR_AWS_DB_PASSWORD` is the password for the postgress database.
-
-Under the `/indexer` directory, create an `.env` file using the `.env.template` as a reference and set the environment variables accordingly.
-
-```bash
-cp indexer/.env.template indexer/.env
-```
-
-`AWS_S3_REGION` is the region where the S3 bucket is located.
-`AWS_S3_BUCKET_NAME` is the name of the S3 bucket where the data will be stored.
-`AWS_ACCESS_KEY_ID` is the access key ID for the S3 bucket.
-`AWS_SECRET_ACCESS_KEY` is the secret access key for the S3 bucket.
-
-`SYNC_BASE_URL` is the base URL for the Kadena node.
-`SYNC_MIN_HEIGHT` is the minimum height to start syncing from.
-`SYNC_FETCH_INTERVAL_IN_BLOCKS` is the interval in blocks to fetch.
-`SYNC_TIME_BETWEEN_REQUESTS_IN_MS` is the time between requests in milliseconds.
-`SYNC_ATTEMPTS_MAX_RETRY` is the maximum number of attempts to retry.
-`SYNC_ATTEMPTS_INTERVAL_IN_MS` is the interval in milliseconds between attempts.
-`SYNC_NETWORK` is the network to sync.
-
-`DB_USERNAME` is the username for the postgress database.
-`DB_PASSWORD` is the password for the postgress database.
-`DB_NAME` is the name of the postgress database.
-`DB_HOST` is the host for the postgress database. You have the host after the resource creation, so you can check for this information in the AWS console or in terraform output (postgres_db_host).
-
-### Initialize Terraform
-
-Initialize your Terraform workspace, which will download the provider and initialize it with the values provided in the terraform.`tfvars`` file.
-
-```bash
-terraform init
-```
-
-### Deploy Infrastructure
-
-Plan and apply the Terraform configuration to provision your AWS resources:
-
-```bash
-yarn terraform plan
-yarn terraform apply
-```
-
-### Destroy Infrastructure
-
-If you want to destroy the infrastructure created, you can use the following command:
-
-```bash
-yarn terraform destroy
-```
+- Install dependencies
+- See individual package READMEs for specific prerequisites
 
 ## Installation
 
-Set up the indexer with the following commands:
+Install dependencies with the following command:
 
 ```bash
-yarn && yarn indexer build
+yarn install
 ```
 
-## Features
+## Quick Start
 
-### Run processing
+This is the quickest way to get the indexer running.
 
-Continuous process of streaming, headers, payloads and missing blocks from node to s3 bucket and from s3 bucket to database
+Install [Docker](https://www.docker.com/).
 
-```bash
-yarn indexer dev:run
-```
-
-## Additional Commands
-
-### Running with Docker
+Fill the `.env` file in the `indexer` folder. See [Environment Variables Reference](../indexer/README.md#32-environment-variables-reference).
 
 ```bash
-sudo docker build -t kadena-indexer:latest .
-sudo docker run --env-file ./indexer/.env -p 3000:3000 kadena-indexer:latest
-```
-
-### Backfilling Blocks
-
-Scan for and store historical blocks.
-
-```bash
-yarn indexer dev:backfill
-```
-
-### Streaming Blocks
-
-Listen for new blocks and store them in real-time.
-
-```bash
-yarn indexer dev:streaming
-```
-
-### Identifying Missing Blocks
-
-Scan for and store any blocks that were missed.
-
-```bash
-yarn indexer dev:missing
-```
-
-### Processing Headers
-
-Start the header processing from S3 to the database.
-
-```bash
-yarn indexer dev:headers
-```
-
-### Processing Payloads
-
-Start the payload processing from S3 to the database.
-
-```bash
-yarn indexer dev:payloads
-```
-
-## Advanced Usage
-
-### Local Workflow Testing
-
-For testing workflows locally, act is required. Install it using Homebrew:
-
-```bash
-brew install act
+cp indexer/.env.template indexer/.env
 ```
 
-### Run Terraform Workflow Manually
-
-If you want to run the terraform workflow manually, you can use the following command:
-
+To start all services:
 ```bash
-yarn run-terraform-workflow
+yarn indexer dev
 ```
 
-### Run Indexer Workflow Manually
-
-If you want to run the indexer workflow manually, you can use the following command:
+**NOTE:** Using the image on with the composer require the database `DB_USERNAME` to default to `postgres`.
 
-```bash
-yarn run-indexer-workflow
-```
diff --git a/backfill/.env.template b/backfill/.env.template
@@ -0,0 +1,15 @@
+CERT_PATH=./global-bundle.pem
+SYNC_BASE_URL=https://api.chainweb.com/chainweb/0.0
+
+CHAIN_ID=0
+NETWORK=mainnet01
+SYNC_MIN_HEIGHT=5370495
+SYNC_FETCH_INTERVAL_IN_BLOCKS=100
+SYNC_ATTEMPTS_MAX_RETRY=5
+SYNC_ATTEMPTS_INTERVAL_IN_MS=500
+
+DB_USERNAME=postgres
+DB_PASSWORD=password
+DB_NAME=indexer
+DB_HOST=localhost
+DB_PORT=5432
diff --git a/backfill/README.md b/backfill/README.md
@@ -0,0 +1,99 @@
+# Kadena Indexer Backfill
+
+## 1. Introduction
+
+The Kadindexer Backfill is a utility tool designed to synchronize historical blockchain data from the Kadena network into your local database. It allows you to fetch and index past blocks and transactions, ensuring your database has a complete history of the chain. The backfill process can be configured to sync data from any specified block height, making it useful for both initial data population and recovery scenarios where data needs to be resynced from a particular point.
+
+## 2. Prerequisites
+
+- [Docker](https://www.docker.com/)
+- Kadena Indexer PostgreSQL database running
+- Network access to the Kadena network
+- Running your own Kadena node
+
+## 3. Setup
+
+### 3.1. Starting Docker
+Start Docker Desktop from command line or via IOS application.
+
+```bash
+# MacOS - Start Docker Desktop from command line
+open -a Docker
+
+# Linux - Start Docker daemon
+sudo systemctl start docker
+```
+
+### 3.2. Environment Variables
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `CERT_PATH` | Path to SSL certificate bundle | `./global-bundle.pem` |
+| `SYNC_BASE_URL` | Base URL for the Chainweb API | `https://api.chainweb.com/chainweb/0.0` |
+| `CHAIN_ID` | ID of the chain to backfill | `0` |
+| `NETWORK` | Kadena network to sync from | `mainnet01` |
+| `SYNC_MIN_HEIGHT` | Starting block height for backfill | `5370495` |
+| `SYNC_FETCH_INTERVAL_IN_BLOCKS` | Number of blocks to fetch in each interval | `100` |
+| `SYNC_ATTEMPTS_MAX_RETRY` | Maximum number of retry attempts | `5` |
+| `SYNC_ATTEMPTS_INTERVAL_IN_MS` | Interval between retry attempts in milliseconds | `500` |
+| `DB_USERNAME` | PostgreSQL database username | `postgres` |
+| `DB_PASSWORD` | PostgreSQL database password | `password` |
+| `DB_NAME` | Name of the database | `indexer` |
+| `DB_HOST` | Database host address | `localhost` |
+| `DB_PORT` | Database port number | `5432` |
+
+**NOTE:** The example Kadena node API from chainweb will not work for the indexer purpose. You will need to run your own Kadena node and set the `NODE_API_URL` to your node's API URL.
+
+## 4. Usage
+
+### 4.1. Start the Kadindexer services
+
+Please refer to the [Kadena Indexer README](../indexer/README.md) for instructions on how to start the Kadindexer services.
+
+### 4.2. Build the backfill image
+
+Build the image:
+```bash
+docker build -t chainbychain -f Dockerfile .
+```
+
+### 4.3. Run the container
+
+#### Dockerfile (Chain by Chain)
+This Dockerfile is designed to run the backfill process for a single chain at a time. It's useful when you need to:
+- Sync data for a specific chain ID
+- Have more granular control over the backfill process
+- Debug issues with a particular chain
+- Manage resources more efficiently
+
+#### Dockerfile.indexes
+This Dockerfile is specifically for recreating database indexes. Use this when you need to:
+- Rebuild corrupted indexes
+- Optimize existing indexes
+- Add new indexes to improve query performance
+- Perform database maintenance
+
+#### Dockerfile.middle-backfill
+This Dockerfile orchestrates the backfill process across all chains simultaneously. It's beneficial when you want to:
+- Perform a complete system backfill
+- Sync data for all chains in parallel
+- Save time by running multiple chain syncs concurrently
+- Ensure consistency across all chains
+
+For single chain backfill:
+```bash
+docker build -t chainbychain -f Dockerfile .
+docker run --rm --name chainbychain --env-file .env chainbychain
+```
+
+For rebuilding indexes:
+```bash
+docker build -t rebuild-indexes -f Dockerfile.indexes .
+docker run --rm --name rebuild-indexes --env-file .env rebuild-indexes
+```
+
+For all chains backfill:
+```bash
+docker build -t all-chains -f Dockerfile.middle-backfill .
+docker run --rm --name all-chains --env-file .env all-chains
+```
diff --git a/backfill/config/env.go b/backfill/config/env.go
@@ -35,7 +35,7 @@ func InitEnv(envFilePath string) {
 	}
 
 	config = &Config{
-		DbUser:                    getEnv("DB_USER"),
+		DbUser:                    getEnv("DB_USERNAME"),
 		DbPassword:                getEnv("DB_PASSWORD"),
 		DbName:                    getEnv("DB_NAME"),
 		DbHost:                    getEnv("DB_HOST"),

diff --git a/indexer/.env.template b/indexer/.env.template
@@ -7,8 +7,8 @@ SYNC_NETWORK="mainnet01"
 KADENA_GRAPHQL_API_URL=localhost
 KADENA_GRAPHQL_API_PORT=3001
 
-DB_USERNAME="postgres"
-DB_PASSWORD="YOUR_DB_PASSWORD"
-DB_NAME="indexer"
-DB_SSL_ENABLED=false
-DB_HOST="YOUR_DB_HOST"
+DB_USERNAME=postgres
+DB_PASSWORD=password
+DB_NAME=indexer
+DB_HOST="YOUR_DB_HOST"
+DB_SSL_ENABLED=false