Fix search_repository: Docker URL, ecm:fulltext field, configurable highlights#21
Open
bdelbosc wants to merge 3 commits into
Open
Fix search_repository: Docker URL, ecm:fulltext field, configurable highlights#21bdelbosc wants to merge 3 commits into
bdelbosc wants to merge 3 commits into
Conversation
d5aaa9d to
76e40f6
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes Elasticsearch passthrough behavior for search_repository / search_audit in Dockerized deployments, aligns repository fulltext querying/highlighting with Nuxeo’s OpenSearch mapping, and adds configurable highlight/source-field controls to improve content extraction.
Changes:
- Use
NUXEO_URLenv var (fallback tonuxeo.client.host) for Elasticsearch passthrough base URL, and increase the repository ES probe timeout. - Switch fulltext query target from
ecm:fulltexttoall_field, and update highlighting to targetecm:binarytextwithrequire_field_match: false. - Add
source_fields,highlight_fragment_size, andhighlight_number_of_fragmentsparameters through the tool → passthrough → parser stack; update docs and one unit test accordingly.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
USAGE.md |
Documents new search_repository parameters for highlight sizing and _source field inclusion. |
tests/test_es_query_builder.py |
Updates expected fulltext query fields to all_field. |
src/nuxeo_mcp/tools.py |
Adds new search_repository parameters; uses NUXEO_URL for passthrough; increases repository ES probe timeout. |
src/nuxeo_mcp/nl_parser.py |
Updates highlight configuration (field + configurable fragment sizing). |
src/nuxeo_mcp/es_query_builder.py |
Changes default fulltext search field list to ["all_field"]. |
src/nuxeo_mcp/es_passthrough.py |
Threads highlight params; adds _source passthrough and modifies result formatting. |
nuxeo_mcp_config.md |
Adds Docker note about setting NUXEO_URL to a reachable service hostname. |
Dockerfile |
Uses explicit octal permissions (--chmod=0755) for the entrypoint copy. |
Comments suppressed due to low confidence (1)
src/nuxeo_mcp/es_passthrough.py:96
source_fieldsis documented/used as “extra fields to include”, but it is passed straight into_source.includes. When a caller supplies only an extra field (e.g.['ecm:binarytext']), Elasticsearch will omit the standard metadata fields (dc:title,ecm:path, etc.), and the formatted results will end up with empty strings for those base properties. Consider always including the required base fields plus any extras when building_source.includes(or treatingsource_fieldsas additive rather than replacing).
# Parse natural language to Elasticsearch DSL
es_request = self.nl_parser.parse_to_elasticsearch(
query,
index="repository",
include_sort=True,
include_pagination=True,
include_highlight=True,
highlight_fragment_size=highlight_fragment_size,
highlight_number_of_fragments=highlight_number_of_fragments,
apply_acl=True,
user_principals=[principal] + groups,
source_includes=source_fields,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
76e40f6 to
910c56e
Compare
guirenard
approved these changes
May 27, 2026
ataillefer
reviewed
May 28, 2026
…g inside Docker
Use os.environ.get("NUXEO_URL", nuxeo.client.host) so a Docker-aware URL
can be injected via environment variable. Also increases the ES connectivity
probe timeout from 2s to 10s.
Query field changed from ecm:fulltext to all_field (the copy_to aggregate used by Nuxeo OpenSearch mapping). Highlight field changed from ecm:fulltext to ecm:binarytext with require_field_match=false, since all_field has no store setting and cannot be highlighted directly.
…ch_repository Adds highlight_fragment_size and highlight_number_of_fragments parameters to search_repository tool, ElasticsearchPassthrough.search_repository() and NaturalLanguageParser.parse_to_elasticsearch(). Also adds source_fields passthrough so callers can request extra _source fields when needed.
910c56e to
ab6338d
Compare
ataillefer
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three fixes and one improvement to
search_repositoryandsearch_audit.SUPINT-2477 — Fix search_repository/search_audit failing inside Docker
nuxeo.client.hoststores the host as seen at connection time (http://localhost:8080), which does not resolve to the Nuxeo service from inside a Docker container.Fix: use
os.environ.get("NUXEO_URL", nuxeo.client.host)so the Docker service name can be injected via env var, with the client host as fallback. Also increases the ES probe timeout from 2s to 10s.SUPINT-2478 — Fix fulltext search targeting non-existent ecm:fulltext field
ElasticsearchQueryBuilderwas queryingecm:fulltext/ecm:fulltext.titleandNaturalLanguageParserwas requesting highlights onecm:fulltext. These fields do not exist in the Nuxeo OpenSearch mapping — the catch-all field isall_field(acopy_toaggregate).Fix:
ecm:fulltext→all_fieldecm:fulltext→ecm:binarytextwithrequire_field_match: false(all_fieldhas nostoreso cannot be highlighted directly)SUPINT-2479 — Add configurable highlight fragment size and count
Fixed-size highlights (150 chars) were too small for meaningful content extraction from large PDFs. Added two optional parameters throughout the call chain (
tools.py→es_passthrough.py→nl_parser.py):highlight_fragment_size(default: 150, max: ~9,000,000)highlight_number_of_fragments(default: 3; set to 0 for entire field)Also adds
source_fieldspassthrough for cases where raw_sourcecontent is needed.Docs
nuxeo_mcp_config.md: note onNUXEO_URLDocker service name requirementUSAGE.md: examples for newsearch_repositoryparameters