Problem: Prolog project to evaluate the correctness of english sentence using bigram model
Approach: The project constructs a Prolog bigram language model using small DA_Corpus.text corpus.
Steps taken (bigram_model.pl):
- The DA_Corpus.text corpus is normalized using unix commands.
- Created a prolog readable unigram.pl and bigram.pl database from normalized corpus.
- In the final step, implemented bigram_model.pl which computes the probability of any word sequence, of any size, via a predicate called calc_prob/2. The predicate calc_prob/2 works in log space and applies laplace smoothing on fly to compute the probability of given sentence.
Sample outputs: As shown in the output below, sentence like "the book fell" will have better value than "i fell on the book"
Similarly the sentence like "the book that he wanted fell on my feet" will have better value than "book the that he wanted fell on my feet"