Migrates an S3 symlink based agent store to dynamo
pnpm installCreate a .env file based on .env.example:
The easiest way to switch between environments is using STORACHA_ENV:
# .env file
# For staging (us-east-2, staging-w3infra-* tables)
STORACHA_ENV=staging
# For production (us-west-2, prod-w3infra-* tables) - default
# STORACHA_ENV=production
# AWS credentials (same for both environments)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_keyThis automatically configures:
- AWS Region (staging:
us-east-2, production:us-west-2) - Table/Bucket names (
staging-w3infra-*vsprod-w3infra-*)
AWS IAM Permissions Required:
- See policy:
arn:aws:iam::505595374361:policy/agent-store-migration-access
node ./src/migrate.js
-
Insure the W3Infra version deployed in the environment you are migrating is at commit 807a03b or later. (at the time of writing, staging and forge-prod have this commit, but prod does not). This is the code that will create the relevant dynamo tables and beginning writing to them, so that the S3 index no longer receives data
-
Create an S3 inventory job for the s3 bucket with the agent index, if one does not exist (at the time of this writing staging and forge-prod have this, but prod does not):
the src destination mappings are as follows: prod: invocation-store-prod-0 -> invocation-store-prod-inventory staging: invocation-store-staging-0 -> invocation-store-staging-inventory forge-prod: forge-prod-upload-api-invocation-store-0 -> forge-prod-invocation-store-inventory
Call the inventory policy "ListEverything"
If you have to create the policy for your environment, you'll have to wait for the first inventory job to finish, which may take up to 48 hours. This will create a full index of the source bucket content. However, it will keep running daily, which isn't needed, so you should disable it after it completes one run. You'll know it's run when data appears in the destination bucket.
- Warm the destination dynamo table to handle high throughput
The script will attempt to write items to the destination dynamo as quickly as possible. However, by default DynamoDB tables will initially only accept 4000 writes/sec, then they will very slowly scale up, which will cause a number of write failures along the way. Fortunately, you can pre-warm a table in the AWS management console, ideally to 40,000 which is the max limit: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/update-warm-throughput.html
- Run the script
The script will download the inventory to get a list of files in the original S3 bucket. These files are effectively just symlinks -- the key has all the relevant information and the file is empty. The script will iterate through each key and insert it as an item in the DynamoTable. To maximize through put, the script calls BatchWriteItem for up to 25 items, and can make up to 100 BatchWriteItem calls at once. For the prod table of 3.8 billion records, it will still take over a day to run, if we maximize throughput of 40,000 records/sec, which we probably won't.
- Deploy code to delete the reads from S3 and delete the underlying tables