Vector Embeddings using Hugging Face and Mongo DB

A mini vector embedding project using Hugging Face's sentence-transformers/all-MiniLM-L6-v2 transformer model and MongoDB Atlas.

MongoDB Atlas's M0 Sandbox (Shared RAM, 512 MB Storage) free tier plan is used for the project.

MongoDB Atlas' search index functionality is used to

Personal MongoDB Atlas details:

Org: Rishi's Org - 2024-03-15
Project: Project 0
Database: VectorEmbeddingCluster (sample_mflix dataset)
Collection: movies

Environmental Variables

Below environment variables need to be saved in a .env file

1. MongoDB Connection String
2. Hugging Face Inference Token
3. Hugging Face Inference API

MongoDB Atlas Setup

1. Create a MongoDB Atlas account
2. Create a project, select M0 free tier plan, and host it in AWS / Frankfurt (eu-central-1) region
3. Create a database for vector embeddings and populate it with sample_mflix dataset
4. Security > Database Access > copy username and password
5. Go to the database: Connect database > Drivers > copy connection string
6. Paste the username and password into connection string and store it in .env file

Hugging Face Setup

1. Login to Hugging Face account
2. Go to settings > Access Token > Create new token
3. Copy and paste the token in .env file

MongoDB Atlas Search Index Setup

1. Go to Database > VectorEmbeddingCluster > Atlas Search > Create Search Index
2. Select JSON editor > next > select movies collection inside sample_mflix dataset
3. Name the index as SemanticSearchMoviePlot
4. Enter the below JSON code > create search index
5. Once the search index is active run the deploy command in terminal

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "plot_embedding_hf": {
        "dimensions": 384,
        "similarity": "dotProduct",
        "type": "knnVector"
      }
    }
  }
}

Installation

Use the package manager pip to install the packages. Run the setup file setup.sh to create .env file.

The setup file also installs the requirements from requirements.txt file.

sh setup.sh

Deploy

python3 movie_recs.py

Thanks

Free Code Camp

Hugging Face

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector Embeddings using Hugging Face and Mongo DB

Environmental Variables

MongoDB Atlas Setup

Hugging Face Setup

MongoDB Atlas Search Index Setup

Installation

Deploy

Thanks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
movie_recs.py		movie_recs.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Vector Embeddings using Hugging Face and Mongo DB

Environmental Variables

MongoDB Atlas Setup

Hugging Face Setup

MongoDB Atlas Search Index Setup

Installation

Deploy

Thanks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages