Skip to content

Java implementation of a sophisticated Kneser-Ney trigram model using customized open address hash maps with linear probing for efficient data storage and retrieval.

Notifications You must be signed in to change notification settings

jakemdaly/Advanced-Language-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

AdvancedLanguageModeling

Java implementation of a sophisticated Kneser-Ney trigram model using customized open address hash maps with linear probing for efficient data storage and retrieval.

git clone https://github.com/jakemdaly/AdvancedLanguageModeling.git

java/src/

  • Contains the KN trigram implementation, open address hash map implementations for various N-gram orders, and the database used for storing counts.

utils/edu/berkeley/nlp

  • Contains helper functions from the Berkeley NLP libraries

Abstract

In this project we will build a Kneser-Ney trigram language model to be used and tested in a machine translation system. Kneser-Ney smoothing is a sophisticated form of regularization for N-gram modeling which uses absolute discounting and context fertility to redistribute mass from higher order N-gram terms to lower order ones. Kneser-Ney-Trigram.pdf is a full exploration of the model and implementation methods, and is organized as follows: the introduction gives a high level overview of language modeling with N-grams; the next section provides the motivation and intuition behind Kneser-Ney smoothing, and then formalizes the theory with a recursive equation; after this we unveil lower level details about choices that were made for efficient implementation of the model; the last section shows statistical and computation performance results.

Keywords : language modeling, Kneser-Ney smoothing, context fertility, discounting, smoothing, machine translation

About

Java implementation of a sophisticated Kneser-Ney trigram model using customized open address hash maps with linear probing for efficient data storage and retrieval.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages