To install libraries using uv, there is a pyproject.toml at the root:
uv syncTo start working, always do:
uv sync
source ./.venv/bin/activate # or ./.venv/Scripts/activateUse qdrant_delete_by_filter.py when deleting points in a filtered way (i.e only CSV chunks).
Recommended to set encoding to UTF-8 if encountering errors related to Unicode that may not render correctly when dumping into UTF-8 text files:
export PYTHONIOENCODING="utf-8" # or $env:PYTHONIOENCODING="utf-8"
- Start Qdrant: ensure Docker Desktop is running
To create the storage folder in the local filesystem (i.e, PWD):
docker run -p 6333:6333 -p 6334:6334 -v "$(pwd)/qdrant_storage:/qdrant/storage:z" qdrant/qdrantTo instead use a Docker volume to run from any directory or prevent potential file corruption:
docker volume create qdrant_storage
# Removing the $(pwd) from above to instead use the Docker volume
docker run -d -p 6333:6333 -p 6334:6334 -v "qdrant_storage:/qdrant/storage:z" qdrant/qdrant- Generate embeddings for PDF documents by running
embed.pyfrom the root of this repo - Run
preprocess_csv.pybefore runningembed_csv.py, both from the root of this repo - Set up inference source: locally with Ollama or with AWS Bedrock with the instructions below
- Run
rag_workflow_combine_aws.pyfrom the root of this repo.
rag_local.pyandrag_aws.pyare legacy scripts without the most up-to-date chunking methods.
To test Qdrant retrievals (i.e which documents or latency), run retrieval.py.
See boto3 documentation (Python client for Bedrock).
Ensure AWS CLI is installed: see aws/README.md for manual installation on Linux, else with package managers:
# Windows via winget
winget install Amazon.AWSCLI
scoop install aws # or via Scoop# macOS via brew
brew install awscli# Linux, globally via pip if preferred
# Can add --user for just your user
sudo python -m pip install awscliGenerate API keys from the login page. You can set these as environment variables, or:
aws configureMake sure you set your region to us-west-2 ONLY.
To check what models are available on this region: (The models that we can use are the ones' listed ON-DEMAND)
aws bedrock list-foundation-models \
--region us-west-2 \
--query "modelSummaries[].{id:modelId, name:modelName, provider:providerName, types:inferenceTypesSupported}"Running the above command should also validate your credentials, i.e. if your credential or region setup is wrong, the above would fail as well.
Run scripts/bedrock_quickstart.py to run a prompt request to a model (obviously, don't spam this).