Skip to content

Having trouble replicating IrisVectorStore Llama Index demo from iris-vector-search for my program's user table #10

Open
@ericmariasis

Description

@ericmariasis

I'll preface this by saying I'm not sure if I found a bug or because I'm somehow misusing IrisVectorStore. Also for testing you'll probably need an OpenAI token.

Basically I have code from the regular llama-index module working in my Python project which has SimpleDirectoryReader objects similar in nature to the demo I mentioned (https://github.com/intersystems-community/iris-vector-search/blob/main/d...). And I have other code working (not shown) that can add new users to a SQL table in Iris.

I tried to use IRISVectorStore in a manner similar to the below excerpt from the demo code but I just changed the table name to the name of my user table. And I also just changed the documents object in that code to my own SimpleDirectoryReader object.

However no matter how many times I try to run with those changes I get a flurry of exceptions where the trace makes little sense to me. I can confirm that my code in place to connect to my user table locally does work. I'll include the trace at the bottom.

# StorageContext captures how vectors will be stored
vector_store = IRISVectorStore.from_params(
    connection_string = url,
    table_name = "paul_graham_essay",
    embed_dim = 1536,  # openai embedding dimension
    engine_args = { "connect_args": {"sslcontext": sslcontext} }

Below is the entire code module where I use llama Index and you can see the block of commented out code I tried to add in run_query_on_files in addition to the setup steps above similar to the demo.

import textwrap

import nest_asyncio
from openai import OpenAIError
from pydantic import ValidationError

nest_asyncio.apply()

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.llms.openai import OpenAI

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_iris import IRISVectorStore

import os
from .myconfig import *

os.environ["OPENAI_API_KEY"] = f'{OPENAI_API_KEY}'

username = f'{DB_USER}'
password = f'{DB_PASS}'
hostname = os.getenv('IRIS_HOSTNAME', f'{DB_URL}')
port = f'{DB_PORT}'
namespace = f'{DB_NAMESPACE}'

from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0.2, model="gpt-3.5-turbo")

import ssl

certificateFile = "/usr/cert-demo/certificateSQLaaS.pem"

if (os.path.exists(certificateFile)):
    print("Located SSL certficate at '%s', initializing SSL configuration", certificateFile)
    sslcontext = ssl.create_default_context(cafile=certificateFile)
else:
    print("No certificate file found, continuing with insecure connection")
    sslcontext = None

from sqlalchemy import create_engine, text

url = f"iris://{username}:{password}@{hostname}:{port}/{namespace}"

engine = create_engine(url, connect_args={"sslcontext": sslcontext})
with engine.connect() as conn:
    print(conn.execute(text("SELECT 'hello world!'")).first()[0])

# StorageContext captures how vectors will be stored
vector_store = IRISVectorStore.from_params(
    connection_string = url,
    table_name = "user",
    embed_dim = 1536,  # openai embedding dimension
    engine_args = { "connect_args": {"sslcontext": sslcontext} }
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
def get_filename_before_dot(filename):
    name, extension = os.path.splitext(filename)
    return name


def run_query_on_files(files, query):
    # Check if the OpenAI API key is provided
    if not os.getenv("OPENAI_API_KEY"):
        return "Cannot run model. No API key provided."

    try:
        queryEngineTools = []
        for file in files:
            curDoc = SimpleDirectoryReader(input_files=[file]).load_data()
            # index = VectorStoreIndex.from_documents(
            #     curDoc,
            #     storage_context=storage_context,
            #     show_progress=True,
            # )
            # query_engine = index.as_query_engine()
            # userResp = query_engine.query("Summarize this content.")
            # print(textwrap.fill(str(userResp), 100))
            curVectorStore = VectorStoreIndex.from_documents(curDoc)
            curEngine = curVectorStore.as_query_engine(similarity_top_k=3)
            curTool = QueryEngineTool(query_engine=curEngine, metadata=ToolMetadata(
                name=get_filename_before_dot(file),
                description=get_filename_before_dot(file)
            ))
            queryEngineTools.append(curTool)

        if len(files) > 0:
            s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=queryEngineTools)
            response = s_engine.query(query)
            return response
        return ''
    except OpenAIError as e:
        return "Cannot run model. Invalid API key or other OpenAI error."
    except ValidationError as e:
        print(f"Validation error: {str(e)}")
        return "Validation error occurred."
    except Exception as e:
        print(f"An unexpected error occurred: {str(e)}")
        return "An unexpected error occurred."

And here is a trace of the error I get.

Parsing nodes: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 991.33it/s]
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.29s/it]
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
Parsing nodes: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1009.22it/s]
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.74it/s] 
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
Parsing nodes: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s] 
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.55it/s] 
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
Parsing nodes: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1009.70it/s] 
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.21it/s] 
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)

My question is basically does anybody know for sure that IRISVectorStore can successfully extract information from a user table? Or might I have hit some weird edge case when trying to use this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions