In this project we will generate the sentences using ngrams
We will be using 20 newsgroup dataset which is standard dataset for text related tasks.
-
In this project we will do the following tasks:
- train the unigram, bigram, trigram model using all files of rec.sport.baseball and rec.motorcycle
- given a sentence find the log probabilty of the sentence for above models
- given a sentence find the perplexity of the sentence for different above models
- given a sentence find the log probabilty using good turing smoothing for different models
-
Code is self documented in python notebook