Open
Description
In lab_3_mongodb_vector_search, when collections contain large number of documents, error is encountered while iterating through the cursor. It is resolved by first typecasting the cursor to a list & then iterating through the list.
Before:
def add_collection_content_vector_field(collection_name: str):
collection = db[collection_name]
bulk_operations = []
for doc in collection.find():
# remove any previous contentVector embeddings
if "contentVector" in doc:
print('content vector exists')
del doc["contentVector"]
# generate embeddings for the document string representation
content = json.dumps(doc, default=str)
content_vector = generate_embeddings(content)
bulk_operations.append(pymongo.UpdateOne(
{"_id": doc["_id"]},
{"$set": {"contentVector": content_vector}},
upsert=True
))
# execute bulk operations
collection.bulk_write(bulk_operations)
On typecasting to list:
def add_collection_content_vector_field(collection_name: str):
collection = db[collection_name]
bulk_operations = []
documents = list(collection.find())
for doc in documents:
# remove any previous contentVector embeddings
if "contentVector" in doc:
print('content vector exists')
del doc["contentVector"]
# generate embeddings for the document string representation
content = json.dumps(doc, default=str)
content_vector = generate_embeddings(content)
bulk_operations.append(pymongo.UpdateOne(
{"_id": doc["_id"]},
{"$set": {"contentVector": content_vector}},
upsert=True
))
# execute bulk operations
collection.bulk_write(bulk_operations)
Can I raise a PR to make this change?
Metadata
Metadata
Assignees
Labels
No labels