Skip to content

[Bug]: Chroma server does not free up the memory #2673

Open
@saqlainumer-181

Description

@saqlainumer-181

What happened?

I have been using the Httpclient of chroma, and if I send 500 requests in parallel, the chroma server uses the memory during these call but does not free up the memory, after the results are returned? I want to know the possible problems and solutions?

Here is the code:

import chromadb
import gc
import os
from langchain_community.vectorstores import Chroma
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
from langchain_openai import OpenAIEmbeddings
import psutil
from chromadb.types import SegmentScope

model_name = "text-embedding-ada-002"
openai_api_key = my_api_key

EMBEDDINGS = OpenAIEmbeddings(model=model_name, openai_api_key=openai_api_key, request_timeout=30000)

organization_id = my_organization_id
customer_id = customer_id

VECTOR_DB_HOST = "127.0.0.1"
VECTOR_DB_PORT = "5000"
collection_name = "657c87c28f04e8decca92969_66a8f9f5284fdd5763f671a2"

Initialize the shared client outside of the parallel loop

client = chromadb.HttpClient(host=VECTOR_DB_HOST, port=VECTOR_DB_PORT)

Initialize the shared Chroma vector database object

VECTORDB = Chroma(collection_name=collection_name, embedding_function=EMBEDDINGS,
client=client, collection_metadata={"hnsw:space": "cosine"})

PROMPT = "write job description of a software engineer"

total_calls = 500
parallel_requests = 100

Function to make the API request

def make_request(i):
print(f"Starting call {i}...")

try:
    # Use the shared VECTORDB instance for the request
    results = VECTORDB.similarity_search_with_score(PROMPT, k=10)
    if results:
        print(results)


except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

print(f"Finished call {i}.")

Using ThreadPoolExecutor to handle parallel requests

with ThreadPoolExecutor(max_workers=parallel_requests) as executor:
futures = [executor.submit(make_request, i) for i in range(total_calls)]
for future in as_completed(futures):
pass

print("All results extracted")

Versions

Name: chromadb
Version: 0.5.5

Relevant log output

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghttp

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions