Skip to content
This repository was archived by the owner on Oct 6, 2025. It is now read-only.

add slovak (sk) language #41

@neurlang

Description

@neurlang

I would like to suggest adding the dataset.txt of 24865 slovak words, these are hand reviewed. What license would be preferrable to the gruut project? I am the author, can release it under any license you prefer.

https://github.com/neurlang/toipa/tree/master/sk2ipa

Fixes which would be needed:

  1. remove the ' character
  2. replace θ to c
  3. add spaces between phonemes
  4. remove words which map to the A / F placeholder

Then they would be loaded into the lexicon.db word_phonemes table.

What is g2p_alignments table for?

I can also generate a larger dictionary using the neural network (up to 300k words) but these could contain mistakes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions