This project provides a collection of clients for interacting with various embedding models. It includes modules for specific embedding model providers like Graphwise Transformer and OpenAI API. Clients are implementations of langchain4j EmbeddingModel interface. They can be used in systems that provide similarity search by creating embeddings from texts, such as GraphDB Elasticsearch and Opensearch Connectors. Existing clients serve as examples and default implementations that GraphDB connectors use to provide similarity searches. You can provide additional clients by implementing EmbeddingModel interface and add them to your GraphDB distribution to use for similarity searches in GraphDB Connectors.
- assembly: Create an assembly jar that contains clients and their dependencies.
- embedding-clients-common: Common code used by the other client modules.
- graphwise-transformer-client: A client for interacting with a Graphwise Transformer.
- openai-embedding-client: A client for interacting with the OpenAI embedding API.
To build the project, you can use Maven. Run the following command from the project's root directory:
mvn clean install- Build the project to create the assembly JAR.
- Copy the generated JAR from
./assembly/target/embedding-model-clients-assembly-{project.version}.jarto your application's classpath.
- For GraphDB Connectors, place the JAR in the directory of the respective connector, for example:
dist/graphdb/target/graphdb/lib/plugins/elasticsearch-connector/.
Clients are configured via system properties.
| Property | Description | Required | Default |
|---|---|---|---|
graphwise.transformer.address |
The host and port of the GraphWise Transformer service. | no | localhost:5050 |
graphwise.transformer.embedding.model.name |
The name of the sentence transformer model to use. | no | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
graphwise.transformer.batch.size |
The maximum request batch size in kilobytes. | no | 256 |
graphwise.transformer.auth.token.secret |
Shared secret for authentication. | no | none |
graphwise.transformer.thread.pool.size |
The size of the client-side thread pool. | no | Number of available processors |
| Property | Description | Required | Default |
|---|---|---|---|
openai.embedding.model.api.key |
Your OpenAI API key. | yes | none |
openai.embedding.model.name |
The OpenAI model to use. | yes | none |
openai.embedding.model.dimensions |
The OpenAI model dimensions. | no | none |
Once the JAR is on the classpath and the necessary properties are configured, you can use the clients in GraphDB Connectors by specifying the fully qualified class name of the desired implementation in the connector configuration.
Example values for embeddingModel parameter:
com.ontotext.embeddings.GraphwiseTransformerClientcom.ontotext.embeddings.OpenAIEmbeddingClient
Licensed under Apache 2.0.