Skip to content

storacha/agent-store-migration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Store Migration

Migrates an S3 symlink based agent store to dynamo

Setup

1. Install Dependencies

pnpm install

2. Configure Environment Variables

Create a .env file based on .env.example:

The easiest way to switch between environments is using STORACHA_ENV:

# .env file

# For staging (us-east-2, staging-w3infra-* tables)
STORACHA_ENV=staging

# For production (us-west-2, prod-w3infra-* tables) - default
# STORACHA_ENV=production

# AWS credentials (same for both environments)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

This automatically configures:

  • AWS Region (staging: us-east-2, production: us-west-2)
  • Table/Bucket names (staging-w3infra-* vs prod-w3infra-*)

AWS IAM Permissions Required:

  • See policy: arn:aws:iam::505595374361:policy/agent-store-migration-access

Usage

node ./src/migrate.js

Migration Process

  1. Insure the W3Infra version deployed in the environment you are migrating is at commit 807a03b or later. (at the time of writing, staging and forge-prod have this commit, but prod does not). This is the code that will create the relevant dynamo tables and beginning writing to them, so that the S3 index no longer receives data

  2. Create an S3 inventory job for the s3 bucket with the agent index, if one does not exist (at the time of this writing staging and forge-prod have this, but prod does not):

https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-inventory.html#configure-inventory-destination-bucket-policy

the src destination mappings are as follows: prod: invocation-store-prod-0 -> invocation-store-prod-inventory staging: invocation-store-staging-0 -> invocation-store-staging-inventory forge-prod: forge-prod-upload-api-invocation-store-0 -> forge-prod-invocation-store-inventory

Call the inventory policy "ListEverything"

If you have to create the policy for your environment, you'll have to wait for the first inventory job to finish, which may take up to 48 hours. This will create a full index of the source bucket content. However, it will keep running daily, which isn't needed, so you should disable it after it completes one run. You'll know it's run when data appears in the destination bucket.

  1. Warm the destination dynamo table to handle high throughput

The script will attempt to write items to the destination dynamo as quickly as possible. However, by default DynamoDB tables will initially only accept 4000 writes/sec, then they will very slowly scale up, which will cause a number of write failures along the way. Fortunately, you can pre-warm a table in the AWS management console, ideally to 40,000 which is the max limit: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/update-warm-throughput.html

  1. Run the script

The script will download the inventory to get a list of files in the original S3 bucket. These files are effectively just symlinks -- the key has all the relevant information and the file is empty. The script will iterate through each key and insert it as an item in the DynamoTable. To maximize through put, the script calls BatchWriteItem for up to 25 items, and can make up to 100 BatchWriteItem calls at once. For the prod table of 3.8 billion records, it will still take over a day to run, if we maximize throughput of 40,000 records/sec, which we probably won't.

  1. Deploy code to delete the reads from S3 and delete the underlying tables

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •