A mini vector embedding project using Hugging Face's sentence-transformers/all-MiniLM-L6-v2 transformer model and MongoDB Atlas.
MongoDB Atlas's M0 Sandbox (Shared RAM, 512 MB Storage) free tier plan is used for the project.
MongoDB Atlas' search index functionality is used to
Personal MongoDB Atlas details:
Org: Rishi's Org - 2024-03-15
Project: Project 0
Database: VectorEmbeddingCluster (sample_mflix dataset)
Collection: movies
Below environment variables need to be saved in a .env file
1. MongoDB Connection String
2. Hugging Face Inference Token
3. Hugging Face Inference API
1. Create a MongoDB Atlas account
2. Create a project, select M0 free tier plan, and host it in AWS / Frankfurt (eu-central-1) region
3. Create a database for vector embeddings and populate it with sample_mflix dataset
4. Security > Database Access > copy username and password
5. Go to the database: Connect database > Drivers > copy connection string
6. Paste the username and password into connection string and store it in .env file
1. Login to Hugging Face account
2. Go to settings > Access Token > Create new token
3. Copy and paste the token in .env file
1. Go to Database > VectorEmbeddingCluster > Atlas Search > Create Search Index
2. Select JSON editor > next > select movies collection inside sample_mflix dataset
3. Name the index as SemanticSearchMoviePlot
4. Enter the below JSON code > create search index
5. Once the search index is active run the deploy command in terminal
{
"mappings": {
"dynamic": true,
"fields": {
"plot_embedding_hf": {
"dimensions": 384,
"similarity": "dotProduct",
"type": "knnVector"
}
}
}
}Use the package manager pip to install the packages. Run the setup file setup.sh to create .env file.
The setup file also installs the requirements from requirements.txt file.
sh setup.shpython3 movie_recs.py