Skip to content
@bitextor

Bitextor Team

Translation memories generator

Pinned Loading

  1. bitextor bitextor Public

    Bitextor generates translation memories from multilingual websites

    Python 293 43

  2. bicleaner bicleaner Public

    Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    Python 157 22

  3. bifixer bifixer Public

    Tool to fix bitexts and tag near-duplicates for removal

    Python 30 3

  4. biroamer biroamer Public

    Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

    Python 10 2

  5. pdf-extract pdf-extract Public

    PDF parser and converter to HTML

    Java 85 13

  6. warc2text warc2text Public

    Extracts plain text, language identification and more metadata from WARC records

    C++ 22 5

Repositories

Showing 10 of 29 repositories

Top languages

Loading…

Most used topics

Loading…