-
Notifications
You must be signed in to change notification settings - Fork 0
SEC-438: Solr 9 update. #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Also, accordingly change the version of the `solr-ocrhighlighting` plugin to _not_ use the Solr 7/8 version.
Would generally get suppressed in when storage is mounted into here, anyway.
WalkthroughParameterizes the Docker base image and module paths, relocates OCR highlighting into a computed Solr module path, updates many Solr schema fieldTypes/dynamicFields and analyzers for Solr 9, replaces legacy caches with CaffeineCache, adjusts solrconfig options (streaming, versioning, circuit-breaker scaffold), bumps core properties, and adds data/.gitignore. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Builder as Docker build
participant ImageFS as Image filesystem
participant Container as Solr runtime
rect rgb(230,248,255)
note right of Builder: Build-time (Dockerfile)
Builder->>ImageFS: FROM solr:$SOLR_VERSION
Builder->>ImageFS: create $SOLR_MODULE_DIR and $SOLR_HOCR_PLUGIN_PATH
Builder->>ImageFS: ADD/download ocrhighlighting.jar -> $SOLR_HOCR_PLUGIN_PATH
Builder->>ImageFS: set permissions (chmod 444) and ownership (0:0)
Builder->>ImageFS: set USER solr
end
rect rgb(255,249,230)
note right of Container: Runtime (container start)
Container->>Container: start without -Dsolr.hocr.plugin.path sysprop
Container->>Container: solrconfig_extra has no OCR <lib/>
Container->>Container: discover modules under Solr module path provided in image
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
islandora8/data/index/write.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
Dockerfile(1 hunks)islandora8/conf/schema.xml(8 hunks)islandora8/conf/schema_extra_types.xml(8 hunks)islandora8/conf/solrconfig.xml(10 hunks)islandora8/conf/solrconfig_extra.xml(2 hunks)islandora8/conf/solrconfig_query.xml(1 hunks)islandora8/conf/solrcore.properties(1 hunks)islandora8/data/.gitignore(1 hunks)solr.in.sh(0 hunks)
💤 Files with no reviewable changes (1)
- solr.in.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-29T19:08:13.952Z
Learnt from: adam-vessey
Repo: discoverygarden/solr-image PR: 15
File: Dockerfile:13-16
Timestamp: 2025-08-29T19:08:13.952Z
Learning: Solr automatically maps environment variables that start with SOLR_ to system properties by lowercasing the name and replacing underscores with dots. For example, SOLR_HOCR_PLUGIN_PATH becomes solr.hocr.plugin.path as a system property that can be referenced in Solr configuration files.
Applied to files:
islandora8/conf/solrconfig_extra.xmlislandora8/conf/solrconfig.xmlislandora8/conf/solrcore.properties
🔇 Additional comments (1)
islandora8/conf/schema_extra_types.xml (1)
25-25: Verify tokenizer changes don't break existing search behavior.Multiple fieldTypes were updated to use
StandardTokenizerFactoryinstead ofWhitespaceTokenizerFactory. This is a functional change that affects how text is tokenized:
WhitespaceTokenizerFactorysplits on whitespace onlyStandardTokenizerFactoryuses Unicode word boundaries and handles punctuationThis change improves tokenization quality but represents a behavioral shift. Affected fieldTypes:
text_edge,text_phonetic_und,text_und,text_spell_und,text_ngramstring,text_ngram.Verify that this change is intentional and won't negatively impact existing search behavior. Consider whether reindexing is needed for existing data to benefit from the improved tokenization.
Also applies to: 36-36, 105-105, 115-115, 195-195, 205-205, 221-221, 256-256, 267-267
Doesn't quite seem to be documented, so probably shouldn't depend on it.
| # extraction,langid,ltr,analysis-extras are required by search_api_solr, so | ||
| # let's set 'em by default. | ||
| ENV SOLR_MODULES=extraction,langid,ltr,analysis-extras |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved over to be something set by the containers using the image, but... probably fine here, especially considering we already have some binding here with the presence of the config-set.
Supersedes #16
Summary by CodeRabbit
New Features
Improvements
Infrastructure
Behavioral