This example builds an embedding index based on files stored in an Amazon S3 bucket. It continuously updates the index as files are added / updated / deleted in the source bucket: it keeps the index in sync with the Amazon S3 bucket effortlessly.
Before running the example, you need to:
-
Install Postgres if you don't have one.
-
Prepare for Amazon S3. See Setup for AWS S3 for more details.
-
Create a
.envfile with your Amazon S3 bucket name and (optionally) prefix. Start from copying the.env.example, and then edit it to fill in your bucket name and prefix.cp .env.example .env $EDITOR .envExample
.envfile:# Database Configuration DATABASE_URL=postgresql://localhost:5432/cocoindex # Amazon S3 Configuration AMAZON_S3_BUCKET_NAME=your-bucket-name AMAZON_S3-SQS_QUEUE_URL=https://sqs.us-west-2.amazonaws.com/123456789/S3ChangeNotifications
Install dependencies:
pip install -e .Run:
python main.pyDuring running, it will keep observing changes in the Amazon S3 bucket and update the index automatically. At the same time, it accepts queries from the terminal, and performs search on top of the up-to-date index.
CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: Watch on YouTube.
Run CocoInsight to understand your RAG data pipeline:
cocoindex server -ci mainYou can also add a -L flag to make the server keep updating the index to reflect source changes at the same time:
cocoindex update -L mainThen open the CocoInsight UI at https://cocoindex.io/cocoinsight.