[RFC] [DESIGN] [Multi-DB] Making OSB Database-Agnostic 

### Overview

OpenSearch Benchmark (OSB) is the de facto performance testing suite for OpenSearch. Customers have several search solutions to choose from and have expressed growing interest in performing comparative benchmarking of these solutions. However, OSB does not support benchmarking other search engines and databases. As a consequence and despite OSB’s growth in usage, users are increasingly turning to other popular benchmarking tools for cross-database performance comparisons and using results to decide on which search solution to use. 

This document outlines a high-level design and plan to convert OpenSearch Benchmark into a database-agnostic benchmarking tool. To understand more about the intent behind this effort, see [this RFC](https://github.com/opensearch-project/opensearch-benchmark/issues/973).

### High-Level Design (HLD)

#### User Stories

* OpenSearch customers and developers should be able to benchmark against OpenSearch and other database offerings and compare performance
* OpenSearch developers should be able to easily contribute new offerings in OSB in ~500 lines of new code 
* Users should also be able to compare different variants of OpenSearch (open-source, managed, and serverless offerings) with ease


#### Stakeholders & Customers

* OpenSearch users who need to benchmark and understand OpenSearch’s performance in comparison with other databases and offerings
* Corporate benchmarking teams, who may be interested in benchmarking their use-cases with OpenSearch against other options
* ML-engineers, DevOps engineers, Solution Architects, and Performance Engineers who are interested in comparative benchmarking

### Requirements

#### Functional Requirements:

* Expanded Targets: OSB can benchmark against OpenSearch along with other set of databases like Elasticsearch, Milvus, and Vespa, and future databases and search offerings
* Core Operations: OSB should support core operations across all databases and offerings. This includes index creation and deletion, ingestion / bulk indexing, search / querying, cluster health checks, retrieving database / engine info, and refresh operations. 
* Database-Specific Operations: OSB should support operations specific to a database or offering
* Metrics: OSB should collect common metrics — such as throughput, service time, latency, error rate — and engine-specific (e.g. JVM-telemetry) metrics. Database type must be written to the metrics.
* New CLI Flag: OSB should be able to connect to other databases by simply providing a flag, such as --database-type . The value provided must be one of the databases that are onboarded on OSB
* Pipelines: OSB should reject tests with pipelines from-sources and from-distributions against non-OpenSearch targets
* Workloads Support: OSB should support new workload formats (i.e. formats like new test procedures or operations that are exclusive to other offerings)

#### Non-Functional Requirements:

* OpenSearch Performance: Performance should be similar or <5% difference between current OSB (OpenSearch-only and latest version 2.1) and Agnostic OSB
* Other Database Performance: Performance should be <5% different compared to SDK usage
* Throughput: System should have same throughput levels as current OSB (OpenSearch-only and latest version 2.1)
* Backwards Compatibility: Workloads should work on OpenSearch still and all new databases, with some exceptions (e.g. vector databases)
* Maintainability: Core benchmarking logic should be isolated from database-specific logic. Code should be modular and extensible. Adding new database supports should only be added to databases/ directory. No hard-coded database implementations outside of this directory or factories. 

#### Assumptions

* OpenSearch will remain as the default engine OpenSearch Benchmark runs against
* A rename of OpenSearch Benchmark might be needed

### Current Architecture

<img width="747" height="531" alt="Image" src="https://github.com/user-attachments/assets/3c643498-cbdf-4ef3-acd7-96cf78401a0b" />

<img width="885" height="604" alt="Image" src="https://github.com/user-attachments/assets/5536a96c-6e9a-4d15-91cb-3ce7b508adc9" />

```
# Currrent Architecture (Tree Directory)osbenchmark/
├── benchmark.py                        # CLI entry point
├── test_run_orchestrator.py            # Pipeline
├── config.py                           # Configuration management
├── metrics.py                          
├── async_connection.py                 # Directly tied to async OpenSearch client connections used by runners
├── client.py                           # Creating clients is specific to OpenSearch clients: hardcoded opensearchpy imports and
│                                       # OsClientFactory creates OpenSearch clients
├── telemetry.py                        # Telemetry specific to OpenSearch: FlightRecorder, GC, Heapdump (JVM-only)
├── worker_coordinator/
│   ├── worker_coordinator.py           
│   │
│   ├── runner.py                       # Runners are specific to OpenSearch:
│   │                                   # - 50+ runners with opensearch client parameter
│   │                                   # - Direct calls to opensearch.bulk() and opensearch.search()
│   │                                   # - OpenSearch-specific response parsing
│   │
│   └── scheduler.py                    # Scheduling (database-agnostic)
│
├── workload/
│   ├── loader.py                       # Workload loading (mostly agnostic)
│   └── __init__.py                     
│
├── resources/
│   └── cluster_configs/                # OpenSearch-Specific configs
│       └── main/
│           └── templates/
│               └── config/
│                   └── opensearch.yml  # OpenSearch configuration template
│
└── utils/
    └── ...
```
The current code currently has OpenSearch clients or OpenSearch-specific features directly coupled to the main workflow. The following architecture proposes how we abstract OpenSearch-specific logic from the main workflow and make adding other databases extensible. 

### Proposed Architecture

<img width="794" height="611" alt="Image" src="https://github.com/user-attachments/assets/e182baff-6f60-4a96-ab84-4d86f7d51c20" />

```
# Proposed Architecture (Tree Directory)
osbenchmark/
├── benchmark.py                        # Update: Add --database-type flag
├── test_run_orchestrator.py.           # Update: Limit builder to only be used if OpenSearch is database type
├── actor.py
├── config.py                           # Update: Store database type
├── metrics.py
├── client.py                           # Updated: Moved into database directory and wrapped by adapter
├── telemetry.py
├── async_connection.py                 # Updated: Abstract opensearch
│
├── database/                           # New: Abstraction layer
│   ├── __init__.py
│   ├── interface.py                    # New: DatabaseClient abstract class
│   ├── registry.py                     # New: Database type registry
│   ├── factory.py                      # New: Factory for creating clients/builders
│   └── clients/                        # New: Clients for specific library implementations
│       ├── __init__.py
│       ├── Milvus/milvus.py
│       ├── Vespa/vespa.py
│       ├── Elasticsearch/elasticsearch.py
│       └── OpenSearch/ opensearch.py   
│
├── worker_coordinator/
│   ├── worker_coordinator.py           # Updated: Pass DatabaseClient to runners
│   │
│   ├── runner.py                       # Updated: All 50+ runners to be database-agnostic
│   │                                   # Changed: opensearch → database parameter
│   │                                   
│   │
│   └── scheduler.py                    # Already agnostic (remains the same)
│
├── workload/
│   ├── loader.py                       # Remains the same
│   └── __init__.py
│
├── telemetry/                          # Updated: Reorganized telemetry
│   ├── __init__.py
│   ├── registry.py                     # Updated: Database-aware telemetry registry
│   │
│   └── devices/                        # Updated: Per-database devices
│       ├── __init__.py
│       ├── common/                     # Updated: Database-agnostic devices
│       │   ├── system_stats.py         # CPU, memory, disk (works for all)
│       │   └── node_stats.py           # Database node stats (generic)
│       │
│       └── jvm/                        # For JVM-specific devices
│           ├── __init__.py
│           ├── flight_recorder.py      # Moved: Java Flight Recorder
│           ├── gc.py                   # Moved: GC logs
│           ├── heapdump.py             # Moved: Heap dumps
│           └── jit.py                  # Moved: JIT compiler logs
│
├── resources/
│   ├── database_configs/               # Updated: Database-specific configs
│   │   └── opensearch/                 # Updated: OpenSearch configs
│   │       └── templates/
│   │           └── config/
│   │               └── opensearch.yml  # Moved: cluster_configs/
│   │
│   └── cluster_configs/                # Moved into database_configs/
│
└── utils/
    └── ...
```

The updated code will focus on enhancing these four modules to ensure their logic is abstracted and support for other databases is extensible. This design will focus on abstracting four large components of the end to end workflows — Database Client, Database Builder, Runners, and Metrics Collection & Telemetry. 

### Key Areas to Abstract

#### Database Client

<img width="790" height="583" alt="Image" src="https://github.com/user-attachments/assets/f734b609-7939-4aac-abbf-8d65722b7c49" />

All standard methods shared amongst supported offerings will be abstracted into the DatabaseClient interface. Methods like bulk(), search() , and create_index() will be housed here. Database implementations of this interface will use native SDK libraries to implement each method. For example, OpenSearchClient will use opensearch-py still while MilvusClient will use pymilvus.

#### Runners

There are currently 50+ operation runners in the worker coordinator. These runners should be refactored to remove the tightly-coupled OpenSearch-py client code and use the generic DatabaseClient . Change runner signatures from async def __call__(self, opensearch, params) to async def __call__(self, database, params), making them work with any database implementation. This should be for runners that can apply to all implementations of DatabaseClient. Runners specific to a database should provide that specific database as the param name. We’ll also need to inspect async_connections.py as that is used by runners and is directly tied to OpenSearch clients. 

#### Metrics Collection & Telemetry

Common metric devices (such as CPU, memory, and disk) can be abstracted to be collected from all offerings. However, some offerings have specialized telemetry devices. For example, specialized telemetry devices like JVM telemetry are exclusive for JVM-based offerings like OpenSearch and Elasticsearch. Because of this, we should create a mapping of offerings and their compatible specialized telemetry devices.

Patterns that OSB currently implements like factory patterns, registry patterns, and strategy patterns will be upheld. Additionally, this will introduce an adapter pattern to retain the general benchmarking workflow and promote easy extensibility.

#### Workloads

Workloads should remain the same but there might be need to enhance some to include test procedures for database-specific operations. More can be investigated after implementing the first new database aside from OpenSearch. 

#### What about Builder?

#### Database Builder (Can Wait)

All provisioning and lifecycle management logic will be abstracted into the DatabaseBuilder interface. Each database implementation will need to provide its own downloaders, provisioners, and launchers. OpenSearch already has these implemented but we’ll need to migrate them to use the new architecture. While this is essential, this can be delayed to a later phase because most users use the recommended benchmarking approach of benchmarking an externally-provisioned database. The community can also consider make this more useful by expanding it to provision clusters in a cloud-provider account. However, this effort is out of scope for this design review.

### Recommended Plan

This development will be implemented into two phases. Phase 1 prioritizes the most common benchmarking workflow (benchmarking existing systems under tests) while deferring less-critical features in phase 2. This document has more focus on the Phase 1 tasks. Phase 2 tasks will need separate documents to explore designs.


* **Pre-Phase 1**: Investigate runners and hack → asyncio executor, (take a look at Rishabh’s branch for CW support) 
* **Phase 1**: Enable benchmark-only pipeline for multiple databases (OpenSearch, Milvus). Users can benchmark existing clusters end-to-end with zero breaking changes to existing OpenSearch workflows.
* **Phase 2**: Add full feature parity including builder support for all databases, advanced telemetry, cloud provider integrations, and additional database support (Vespa, Elasticsearch).

#### Phase 1  Success Metrics

* Easy extensibility: Users can add database support in less than 500 lines of code
* Zero breaking changes: Existing workloads can be run without modifications
* Performance: Less than 5% overhead from abstraction layer
* Testing: Passes integration tests and minimal regressions. Core operations are also tested. Exception handling works for OpenSearch and other databases added.
* Benchmarking Databases: Can benchmark against another database added end-to-end. Sample workload tested on both OpenSearch and opposing database.
* Clear messages for unsupported features: For features that are not ready, error messages are present
* Workloads: popular official workloads — Big5, NYC Taxis, and Vectorsearch — work against both OpenSearch and added database

#### Phase 2 Success Metrics

* Builder is now agnostic: Builder works for other databases supported (out of scope for this document)
* All pipelines work with all databases: All pipelines work for all databases
* Advanced telemetry: Telemetry for other databases are now supported (out of scope for this document)
* Documentation on adding new databases: Comprehensive documentation on adding new databases is added
* Cloud Providers enhanced: Cloud providers is now able to work with all types of databases (excluding managed service offerings like Pinecone) (out of scope for this document)

### Pros and Cons

#### Pros:

* Go to market is faster: Prioritized tasks for minimum viable product of OpenSearch Benchmark that uses most common and recommended workflow of benchmarking external clusters delivers product to users sooner (< 4 weeks)
* Early validation: compare OpenSearch and Milvus before adding other databases
* No migration and no breaking changes: No migration to new major versions or major changes to user experience
* Easier extensibility: adding new databases will be easy
* Modernizes OSB: OSB can now collect competitive results and perform comparative benchmarking like other industry-standard tools 

#### Cons:

* Only common features priortized: Does not prioritize all features like Builder or create-workload but these are not used often
* SDKs for other databases are not exactly like OpenSearch’s: Abstraction might not be straightforward for all databases. Some clients might not have asynchronous clients and might require more intervention
* Mappings and Operations are not exactly like OpenSearch: Operations in some offerings or databases might not be 1:1 with OpenSearch
* Builder complexity: Builder might be complex for other databases. Might be better to deprecate this and allow provisioning feature in cloud providers
* Incompatible telemetry devices: Telemetry device incompatibility

---

### How Can You Help?

- Any general comments about the overall direction are welcome.
- Indicating whether the areas identified above for workload enhancement include your scenarios and use-cases will be helpful in prioritizing them.
- Provide early feedback by testing the new workload features as they become available.
- Help out on the implementation! Check out the [issues page](https://github.com/opensearch-project/opensearch-benchmark/issues) for work that is ready to be picked up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] [DESIGN] [Multi-DB] Making OSB Database-Agnostic #998

Overview

High-Level Design (HLD)

User Stories

Stakeholders & Customers

Requirements

Functional Requirements:

Non-Functional Requirements:

Assumptions

Current Architecture

Proposed Architecture

Key Areas to Abstract

Database Client

Runners

Metrics Collection & Telemetry

Workloads

What about Builder?

Database Builder (Can Wait)

Recommended Plan

Phase 1 Success Metrics

Phase 2 Success Metrics

Pros and Cons

Pros:

Cons:

How Can You Help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] [DESIGN] [Multi-DB] Making OSB Database-Agnostic #998

Description

Overview

High-Level Design (HLD)

User Stories

Stakeholders & Customers

Requirements

Functional Requirements:

Non-Functional Requirements:

Assumptions

Current Architecture

Proposed Architecture

Key Areas to Abstract

Database Client

Runners

Metrics Collection & Telemetry

Workloads

What about Builder?

Database Builder (Can Wait)

Recommended Plan

Phase 1 Success Metrics

Phase 2 Success Metrics

Pros and Cons

Pros:

Cons:

How Can You Help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions