Description
What happened?
I have been using the Httpclient of chroma, and if I send 500 requests in parallel, the chroma server uses the memory during these call but does not free up the memory, after the results are returned? I want to know the possible problems and solutions?
Here is the code:
import chromadb
import gc
import os
from langchain_community.vectorstores import Chroma
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
from langchain_openai import OpenAIEmbeddings
import psutil
from chromadb.types import SegmentScope
model_name = "text-embedding-ada-002"
openai_api_key = my_api_key
EMBEDDINGS = OpenAIEmbeddings(model=model_name, openai_api_key=openai_api_key, request_timeout=30000)
organization_id = my_organization_id
customer_id = customer_id
VECTOR_DB_HOST = "127.0.0.1"
VECTOR_DB_PORT = "5000"
collection_name = "657c87c28f04e8decca92969_66a8f9f5284fdd5763f671a2"
Initialize the shared client outside of the parallel loop
client = chromadb.HttpClient(host=VECTOR_DB_HOST, port=VECTOR_DB_PORT)
Initialize the shared Chroma vector database object
VECTORDB = Chroma(collection_name=collection_name, embedding_function=EMBEDDINGS,
client=client, collection_metadata={"hnsw:space": "cosine"})
PROMPT = "write job description of a software engineer"
total_calls = 500
parallel_requests = 100
Function to make the API request
def make_request(i):
print(f"Starting call {i}...")
try:
# Use the shared VECTORDB instance for the request
results = VECTORDB.similarity_search_with_score(PROMPT, k=10)
if results:
print(results)
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
print(f"Finished call {i}.")
Using ThreadPoolExecutor to handle parallel requests
with ThreadPoolExecutor(max_workers=parallel_requests) as executor:
futures = [executor.submit(make_request, i) for i in range(total_calls)]
for future in as_completed(futures):
pass
print("All results extracted")
Versions
Name: chromadb
Version: 0.5.5
Relevant log output
No response