Skip to content

Cursor not found error while adding vector fields to the documents #3

Open
@ricurious

Description

@ricurious

In lab_3_mongodb_vector_search, when collections contain large number of documents, error is encountered while iterating through the cursor. It is resolved by first typecasting the cursor to a list & then iterating through the list.
Before:

def add_collection_content_vector_field(collection_name: str):
    collection = db[collection_name]
    bulk_operations = []
    for doc in collection.find():
        # remove any previous contentVector embeddings
        if "contentVector" in doc:
            print('content vector exists')
            del doc["contentVector"]

        # generate embeddings for the document string representation
        content = json.dumps(doc, default=str)
        content_vector = generate_embeddings(content)       
        
        bulk_operations.append(pymongo.UpdateOne(
            {"_id": doc["_id"]},
            {"$set": {"contentVector": content_vector}},
            upsert=True
        ))
    # execute bulk operations
    collection.bulk_write(bulk_operations)

image

On typecasting to list:

def add_collection_content_vector_field(collection_name: str):
    collection = db[collection_name]
    bulk_operations = []
    documents = list(collection.find())
    for doc in documents:
        # remove any previous contentVector embeddings
        if "contentVector" in doc:
            print('content vector exists')
            del doc["contentVector"]

        # generate embeddings for the document string representation
        content = json.dumps(doc, default=str)
        content_vector = generate_embeddings(content)       
        
        bulk_operations.append(pymongo.UpdateOne(
            {"_id": doc["_id"]},
            {"$set": {"contentVector": content_vector}},
            upsert=True
        ))
    # execute bulk operations
    collection.bulk_write(bulk_operations)

image

Can I raise a PR to make this change?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions