Skip to content

Ontotext-AD/embedding-model-clients

Repository files navigation

embedding-model-clients

This project provides a collection of clients for interacting with various embedding models. It includes modules for specific embedding model providers like Graphwise Transformer and OpenAI API. Clients are implementations of langchain4j EmbeddingModel interface. They can be used in systems that provide similarity search by creating embeddings from texts, such as GraphDB Elasticsearch and Opensearch Connectors. Existing clients serve as examples and default implementations that GraphDB connectors use to provide similarity searches. You can provide additional clients by implementing EmbeddingModel interface and add them to your GraphDB distribution to use for similarity searches in GraphDB Connectors.

Modules

  • assembly: Create an assembly jar that contains clients and their dependencies.
  • embedding-clients-common: Common code used by the other client modules.
  • graphwise-transformer-client: A client for interacting with a Graphwise Transformer.
  • openai-embedding-client: A client for interacting with the OpenAI embedding API.

Building

To build the project, you can use Maven. Run the following command from the project's root directory:

mvn clean install

Installation

  1. Build the project to create the assembly JAR.
  2. Copy the generated JAR from ./assembly/target/embedding-model-clients-assembly-{project.version}.jar to your application's classpath.
  • For GraphDB Connectors, place the JAR in the directory of the respective connector, for example: dist/graphdb/target/graphdb/lib/plugins/elasticsearch-connector/.

Configuration

Clients are configured via system properties.

GraphwiseTransformerClient

Property Description Required Default
graphwise.transformer.address The host and port of the GraphWise Transformer service. no localhost:5050
graphwise.transformer.embedding.model.name The name of the sentence transformer model to use. no sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
graphwise.transformer.batch.size The maximum request batch size in kilobytes. no 256
graphwise.transformer.auth.token.secret Shared secret for authentication. no none
graphwise.transformer.thread.pool.size The size of the client-side thread pool. no Number of available processors

OpenAIEmbeddingClient

Property Description Required Default
openai.embedding.model.api.key Your OpenAI API key. yes none
openai.embedding.model.name The OpenAI model to use. yes none
openai.embedding.model.dimensions The OpenAI model dimensions. no none

Usage

Once the JAR is on the classpath and the necessary properties are configured, you can use the clients in GraphDB Connectors by specifying the fully qualified class name of the desired implementation in the connector configuration.

Example values for embeddingModel parameter:

  • com.ontotext.embeddings.GraphwiseTransformerClient
  • com.ontotext.embeddings.OpenAIEmbeddingClient

License

Licensed under Apache 2.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages