Skip to content

Setting Stemmer for unlisted languages #139

Open
@deeplearning101

Description

@deeplearning101

Hello,
I'm interested in using opensemanticsearch to index documents in Norwegian.
I see that Norwegian is not listed in setup http://[yourserver]/search-apps/setup/ in the Document Language section.

However, opensemanticsearch integrates SOLR and TIKA versions that support Norwegian and many other languages which are not covered by the opensemanticsearch officially supported languages.

Is it possible to manually set the configuration files to enable at least stemming (or other grammar-related features) for languages that are supported by SOLR but not listed in opensemanticsearch settings?

My need is just to search for PDFs and I have NO need to use all of the other language dependent features (e.g. named entity recognition, OCR, etc).

I think my request may be of general public interest since it would allow to extend opensemanticsearch users to people focused on unlisted languages in the official webpage.

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions