Support for Slovene language

Is it possible to add official support for Slovene stemming algorithm to Snowball?

Martin Porter [started working](http://snowball.tartarus.org/archives/snowball-discuss/0725.html) on Slovene stemmer in 2005, but [never finished it](http://snowball.tartarus.org/archives/snowball-discuss/0930.html) because it had some problems. That stemmer could probably be used as a starting point.

I found some papers about Slovene stemming that might be useful:

* Stemming of Slovenian library science texts: https://www.researchgate.net/publication/50392133_Stemming_of_Slovenian_library_science_texts
* The effectiveness of stemming for natural-language access to Slovene textual data: https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-4571%28199206%2943%3A5%3C384%3A%3AAID-ASI6%3E3.0.CO%3B2-L
* Processing of documents and queries in a Slovene language free text retrieval system: https://academic.oup.com/dsh/article-abstract/5/2/182/943275 (this one is actually referenced in the Snowball introduction)

I'm not familiar with now Snowball algorithms work, but here are my suggestions for some of the questions for [the original algorithm](http://snowball.tartarus.org/archives/snowball-discuss/0725.html):

> Would not sloven (or slov), be a more desirable stem in this case?

I think "sloven" would be the most appropriate.

> Another point. I notice a common -ah suffix, which you have not removed, as for example here [...] besedah besedah [...] Could this be added to the list of suffixes?

I don't know which other things that will affect, but -ah suffix should be removed in cases like "besedah".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Slovene language #257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Slovene language #257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions