Skip to content

Releases: malaysia-ai/malaya

Version 4.0

16 Nov 04:27

Choose a tag to compare

  1. Added quantized models to all Malaya models, reduce inference time by 2x and model size by 4x.
  2. Retrain constituency parsing, improved accuracy slightly by ~1-2%.
  3. Added vectorization interface for sentence / word level for all classification models.

Version 3.8.1

16 Aug 16:07

Choose a tag to compare

  1. Released constituency parsing

Version 3.8

05 Aug 18:13

Choose a tag to compare

  1. Improved spelling correction.
  2. Improved normalizer.
  3. Improved EN-MS translation, now support longer texts and US style texts.

Version 3.7

10 Jul 05:02

Choose a tag to compare

  1. Added translation EN to MS and MS to EN modules.
  2. Added paraphrase module.
  3. Added keyword extraction module.

Version 3.4

27 Apr 13:53

Choose a tag to compare

release 3.4

Version 2.7

07 Aug 17:36

Choose a tag to compare

  1. BERT-Bahasa interface available.
  2. Added BERT-Multilanguage, BERT-Base and BERT-small for emotion analysis.
  3. Added BERT-Multilanguage, BERT-Base and BERT-small for Naming Entity Recognition.
  4. Added BERT-Multilanguage, BERT-Base and BERT-small for Part-Of-Speech.
  5. Added BERT-Multilanguage and BERT-Base for relevancy analysis.
  6. Added BERT-Multilanguage, BERT-Base and BERT-small for sentiment analysis.
  7. Added encoder interface for text similarity, can use skip-thought / BERT / XLNET as encoder model.
  8. Added tree plot visualization for text similarity.
  9. Added BERT-Multilanguage, BERT-Base and BERT-small for subjectivity analysis.
  10. Added encoder interface for text summarization, can use skip-thought / BERT / XLNET as encoder model.
  11. Added BERT / XLNET interface for topic modeling.
  12. Added BERT-Multilanguage, BERT-Base and BERT-small for toxicity analysis.
  13. Remove siamese models for text similarity.
  14. Remove fast-text-char models, replace by BERT model.
  15. Malaya no longer support training interface.
  16. XLNET-Bahasa interface available.
  17. Sequence models now no longer improve by Malaya, we move on using Attention model.

Version 2.6

25 Jun 03:56

Choose a tag to compare

  1. Added deep siamese network, https://malaya.readthedocs.io/en/latest/Similarity.html#deep-siamese-network.
  2. Added BERT deep siamese network, https://malaya.readthedocs.io/en/latest/Similarity.html#bert-model
  3. Added Doc2Vec to calculate semantic similarity, https://malaya.readthedocs.io/en/latest/Similarity.html#calculate-similarity-using-doc2vec
  4. Now all extractive summarization is use TextRank algorithm as scoring algorithm.
  5. Added Doc2Vec for extractive summarization, https://malaya.readthedocs.io/en/latest/Summarization.html#load-doc2vec-summarization

Version 2.4

01 Jun 05:40

Choose a tag to compare

  1. Added relevancy analysis, to study an article or a piece of text is relevant, tendency to become a fake news. https://malaya.readthedocs.io/en/latest/Relevancy.html
  2. Added visualization dashboard for emotion analysis, relevancy analysis, sentiment analysis, subjectivity analysis and toxicity analysis. Very easy to use, call predict_words function and it will popup.
  3. Added neutral class for relevancy analysis, sentiment analysis and subjectivity analysis.
  4. Use Malaya preprocessing for all deep learning models classification.

Version 1.9

27 Feb 14:34

Choose a tag to compare

  1. Fix some english loading bugs
  2. Added clustering visualization, https://malaya.readthedocs.io/en/latest/Cluster.html
  3. Added text augmentation, https://malaya.readthedocs.io/en/latest/Generator.html
  4. Normalizer and Spelling now able to detect english words.

Version 1.7

15 Feb 12:37

Choose a tag to compare

  1. Added text similarity and released partial topics related, https://malaya.readthedocs.io/en/latest/Similarity.html
  2. Added word-mover distance interface, https://malaya.readthedocs.io/en/latest/Mover.html
  3. Added pretrained fast-text based on wikipedia, https://malaya.readthedocs.io/en/latest/Fasttext.html
  4. Improve sentiment analysis, trained on more than 800k sentences and more sensitive towards social media texts.
  5. Remove n-grams for all fast-text models to reduce dimension curse.
  6. Remove sparse limit for all fast-text-char models to improve n-grams sensitivity.