Skip to content

hOCR Files not being Indexed #15

@HyphenHook

Description

@HyphenHook

Setup

  1. Got the latest isle-dc stack up
  2. Installed and enabled the module
  3. Ran drush migrate:import islandora_hocr_media_uses
  4. Added islandora_hocr_field:content property to be indexed in Solr via the Search API and also set its Type to Fulltext ("islandora_hocr")
  5. Setup a Repository item with a .tif file and .hocr file in media
  • The .hocr file has the hOCR media use on it
  1. Installed the Solr OCR Highlighting Plugin per instructions from the documentation (In case I messed up the config but here is my configs for Solr)
  • The schema for the installation is here
  • I've also included the plugin's directive in the solrconfig.xml and added the needed lines of config in solrconfig_extra.xml here
  1. Set the correct path for the SOLR_HOCR_PLUGIN_PATH environment variable
  2. Restarted the SOLR container and also Indexed the nodes in Drupal

Problem

I cannot seem to get the hOCR to be indexed into Solr even after all the above setup steps. I've traced the code and found that the processor is properly doing its job of reading the content out and adding the value into Solr. However, using the Solr web interface I cannot see the field when I perform a query. I can see the field as the raw file content in the Solr query if I change the islandora_hocr_field:content property type to Fulltext. The OCR highlighting also doesn't show anything.

Am I missing something from the setup steps that are preventing the module from working? Some guidance would be appreciated! Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions