Skip to content

Support for Slovene language #257

@filips123

Description

@filips123

Is it possible to add official support for Slovene stemming algorithm to Snowball?

Martin Porter started working on Slovene stemmer in 2005, but never finished it because it had some problems. That stemmer could probably be used as a starting point.

I found some papers about Slovene stemming that might be useful:

I'm not familiar with now Snowball algorithms work, but here are my suggestions for some of the questions for the original algorithm:

Would not sloven (or slov), be a more desirable stem in this case?

I think "sloven" would be the most appropriate.

Another point. I notice a common -ah suffix, which you have not removed, as for example here [...] besedah besedah [...] Could this be added to the list of suffixes?

I don't know which other things that will affect, but -ah suffix should be removed in cases like "besedah".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions