Semantic search by khaledk2 · Pull Request #109 · ome/omero_search_engine

khaledk2 · 2025-05-29T10:51:51Z

The semantic search uses machine learning to understand the meaning behind words rather than just matching keywords.
It can find related concepts, handle natural language queries, and retrieve results based on context rather than exact words.

I’ve built a basic prototype for a feature that allows users to perform semantic search locally. It has been tested with IDR data.

I have conducted tests with various queries, and the results appear to be promising. For instance, I used the following query

Provide me with images related to cancer

The top results items are:

Number of images: 90642, Pathology is carcinoid, malignant, nos
Number of images: 111381, Pathology is carcinoma, embryonal, nos
Number of images: 37620, Pathology is carcinoma, nos
Number of images: 15984, Pathology is adenocarcinoma, metastatic, nos
Number of images: 1029, Pathology is glioma, malignant, nos
Number of images: 915, Pathology is neoplasm, malignant, nos
Number of images: 166130, Pathology is glioma, malignant, high grade
Number of images: 3, CIS - Tumors is 18798
Number of images: 75955, Pathology is glioma, malignant, low grade
Number of images: 1464774, Pathology is adenocarcinoma, nos

As you can see, it returns the metadata related to the query, this can be extended to build a query automatically that returns all the detailed results, as we have in the exact match queries.

Under the hood, it uses vector search capabilities in Elasticsearch , which requires embedding the data using an NLP model.
This prototype feature uses the all-MiniLM-L6-v2 model from Sentence Transformers. It’s small and fast enough to run locally while providing good sentence embeddings.
This model is used to embeddings the searchengine cached data (searchengine metadata) then the searchengine sends them to Elasticsearch to be saved in dedicated fields, which are pre-defined in the template. This has been implemented for the data source's cached data. The query term is also encoded using the same model before sending the query to Elasticsearch to return the query results.

The user can access this feature through the following endpoint:

/api/v1/resources/semanticsearch/?query_text=Provide me with images related to cancer

I will deploy it to a server, allowing the team to collaborate and provide feedback.

…iguration

khaledk2 · 2025-06-23T08:47:36Z

The semantic search is deployed in the idr-testing

It supports questions like this

Provide me with images related to liver cancer

It uses a new endpoint /semanticsearch, the user should provide a query text (query_text)
It is possible to test it using the Swagger document
https://idr-testing.openmicroscopy.org/searchengine/apidocs/#/semantic%20search/get_searchengine__api_v1_resources_semanticsearch_

or directly using the API
https://idr-testing.openmicroscopy.org/searchengine//api/v1/resources/semanticsearch/?query_text=Provide%20me%20with%20images%20related%20to%20liver%20cancer

for more information, see https://pre-commit.ci

khaledk2 added 9 commits May 26, 2025 21:21

semantic search prototype

f05a432

Add NOT_INDEX_VECTOR configuration

7f56563

change index vectore configration name

7036303

upgrade elasticsearch version

df8938a

select key_value_buckets_info_template according to index_vector conf…

9c0eb18

…iguration

update requirments file

17846af

fix import statment

1e0dfac

Add url to the results

0f17b54

fix merge conflict

9f6c7ac

joshmoore changed the title ~~Sematic search~~ Semantic search Jun 23, 2025

khaledk2 and others added 2 commits March 18, 2026 11:51

Merge branch 'main' into sematic_search

64c4216

[pre-commit.ci] auto fixes from pre-commit.com hooks

d46abc6

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic search#109

Semantic search#109
khaledk2 wants to merge 11 commits intoome:mainfrom
khaledk2:sematic_search

khaledk2 commented May 29, 2025

Uh oh!

khaledk2 commented Jun 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khaledk2 commented May 29, 2025

Uh oh!

khaledk2 commented Jun 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant