Skip to content

[Performance]: Performance improvement for OpenSearch storage #2785

@LantaoJin

Description

@LantaoJin

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

#2739 introduced an OpenSearch storage backend. We noticed a significant performance discrepancy: OpenSearch indexing took 8 minutes, while the default storage took only 1.5 minutes (a 5.3x overhead). This was tested locally via docker-compose-3.x.yml (512MB heap size cluster).

Steps to reproduce

Profiling on examples/lightrag_openai_demo.py

================================================================================
STORAGE PROFILING REPORT
================================================================================

  [DOC_STATUS_STORAGE]  total: 0.0645s  calls: 8
    upsert                                      0.0472s  (3 calls)
    index_done_callback                         0.0166s  (1 calls)
    get_docs_by_status                          0.0005s  (3 calls)
    filter_keys                                 0.0003s  (1 calls)

  [GRAPH_STORAGE]  total: 0.5220s  calls: 1492
    upsert_edge                                 0.1228s  (245 calls)
    upsert_node                                 0.1101s  (252 calls)
    get_node                                    0.1065s  (735 calls)
    index_done_callback                         0.0517s  (1 calls)
    has_edge                                    0.0470s  (245 calls)
    get_edges_batch                             0.0291s  (4 calls)
    edge_degrees_batch                          0.0232s  (2 calls)
    get_nodes_batch                             0.0152s  (4 calls)
    node_degrees_batch                          0.0095s  (2 calls)
    get_nodes_edges_batch                       0.0069s  (2 calls)

  [KV_STORAGE]  total: 1.0799s  calls: 1672
    get_by_id                                   0.6464s  (1114 calls)
    upsert                                      0.3611s  (543 calls)
    index_done_callback                         0.0720s  (12 calls)
    get_by_ids                                  0.0005s  (3 calls)

  [VECTOR_STORAGE]  total: 263.6970s  calls: 754
    upsert                                    262.7471s  (498 calls)
    query                                       0.3988s  (5 calls)
    delete                                      0.3905s  (245 calls)
    index_done_callback                         0.1551s  (3 calls)
    get_vectors_by_ids                          0.0055s  (3 calls)

  ────────────────────────────────────────────────────────────
  CATEGORY SUMMARY                                TIME  % OF TOTAL
  ────────────────────────────────────────────────────────────
  VECTOR_STORAGE                             263.6970s       99.4%
  KV_STORAGE                                   1.0799s        0.4%
  GRAPH_STORAGE                                0.5220s        0.2%
  DOC_STATUS_STORAGE                           0.0645s        0.0%
  ────────────────────────────────────────────────────────────
  GRAND TOTAL                                265.3635s
================================================================================

Profiling on examples/lightrag_openai_opensearch_graph_demo.py

================================================================================
STORAGE PROFILING REPORT
================================================================================

  [DOC_STATUS_STORAGE]  total: 2.5121s  calls: 8
    upsert                                      2.3110s  (3 calls)
    get_docs_by_status                          0.1890s  (3 calls)
    index_done_callback                         0.0076s  (1 calls)
    filter_keys                                 0.0046s  (1 calls)

  [GRAPH_STORAGE]  total: 481.7155s  calls: 1516
    upsert_node                               252.1001s  (266 calls)
    upsert_edge                               223.6067s  (244 calls)
    get_node                                    3.4759s  (747 calls)
    has_edge                                    2.0792s  (244 calls)
    get_edges_batch                             0.3045s  (4 calls)
    index_done_callback                         0.0491s  (1 calls)
    node_degrees_batch                          0.0470s  (2 calls)
    get_nodes_edges_batch                       0.0332s  (2 calls)
    get_nodes_batch                             0.0197s  (4 calls)
    edge_degrees_batch                          0.0000s  (2 calls)

  [KV_STORAGE]  total: 434.8673s  calls: 1886
    upsert                                    429.1647s  (651 calls)
    get_by_id                                   5.6020s  (1221 calls)
    index_done_callback                         0.0800s  (12 calls)
    get_by_ids                                  0.0206s  (2 calls)

  [VECTOR_STORAGE]  total: 749.0035s  calls: 765
    upsert                                    620.2629s  (511 calls)
    delete                                    127.7973s  (244 calls)
    query                                       0.6355s  (5 calls)
    get_vectors_by_ids                          0.2081s  (2 calls)
    index_done_callback                         0.0996s  (3 calls)

  ────────────────────────────────────────────────────────────
  CATEGORY SUMMARY                                TIME  % OF TOTAL
  ────────────────────────────────────────────────────────────
  VECTOR_STORAGE                             749.0035s       44.9%
  GRAPH_STORAGE                              481.7155s       28.9%
  KV_STORAGE                                 434.8673s       26.1%
  DOC_STATUS_STORAGE                           2.5121s        0.2%
  ────────────────────────────────────────────────────────────
  GRAND TOTAL                               1668.0984s
================================================================================

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

  • LightRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions