Skip to content

sneha-afk/cse291a-proj

Repository files navigation

cse291a

Development

To install libraries using uv, there is a pyproject.toml at the root:

uv sync

To start working, always do:

uv sync
source ./.venv/bin/activate     # or ./.venv/Scripts/activate

Use qdrant_delete_by_filter.py when deleting points in a filtered way (i.e only CSV chunks).

Recommended to set encoding to UTF-8 if encountering errors related to Unicode that may not render correctly when dumping into UTF-8 text files:

export PYTHONIOENCODING="utf-8"
# or
$env:PYTHONIOENCODING="utf-8"

Quickstarts

Setup

  1. Start Qdrant: ensure Docker Desktop is running

To create the storage folder in the local filesystem (i.e, PWD):

docker run -p 6333:6333 -p 6334:6334 -v "$(pwd)/qdrant_storage:/qdrant/storage:z" qdrant/qdrant

To instead use a Docker volume to run from any directory or prevent potential file corruption:

docker volume create qdrant_storage
# Removing the $(pwd) from above to instead use the Docker volume
docker run -d -p 6333:6333 -p 6334:6334 -v "qdrant_storage:/qdrant/storage:z" qdrant/qdrant
  1. Generate embeddings for PDF documents by running embed.py from the root of this repo
  2. Run preprocess_csv.py before running embed_csv.py, both from the root of this repo
  3. Set up inference source: locally with Ollama or with AWS Bedrock with the instructions below
  4. Run rag_workflow_combine_aws.py from the root of this repo.

rag_local.py and rag_aws.py are legacy scripts without the most up-to-date chunking methods.

To test Qdrant retrievals (i.e which documents or latency), run retrieval.py.

Running AWS Bedrock

See boto3 documentation (Python client for Bedrock).

Ensure AWS CLI is installed: see aws/README.md for manual installation on Linux, else with package managers:

# Windows via winget
winget install Amazon.AWSCLI
scoop install aws             # or via Scoop
# macOS via brew
brew install awscli
# Linux, globally via pip if preferred
# Can add --user for just your user
sudo python -m pip install awscli

Generate API keys from the login page. You can set these as environment variables, or:

aws configure

Make sure you set your region to us-west-2 ONLY.

To check what models are available on this region: (The models that we can use are the ones' listed ON-DEMAND)

aws bedrock list-foundation-models \
  --region us-west-2 \
  --query "modelSummaries[].{id:modelId, name:modelName, provider:providerName, types:inferenceTypesSupported}"

Running the above command should also validate your credentials, i.e. if your credential or region setup is wrong, the above would fail as well.

Run scripts/bedrock_quickstart.py to run a prompt request to a model (obviously, don't spam this).

About

CSE 291A @ UCSD, Fall 2025

Resources

License

Stars

Watchers

Forks

Contributors