Skip to content

ML model detecting the language of a text based on the Bayesian classification.

Notifications You must be signed in to change notification settings

KajaBraz/LanguageDetector

Repository files navigation

LanguageDetector

The program is an implementation of a language model aimed at detecting the language of a given text. Firstly, the data (texts) undergo the proprocessing stage in order to remove the noise and prepare them for the next steps. As the learning method, the Multinomial Naive Bayes was used. It consists of training of the model on a set of texts in different languages, prediction of the language of validation data, and evaluation of the model performance.

Here, we use Italian, English, Portuguese, and Polish data.

The performance of the created model is shown on a confusion matrix where we can observe how successful the model is and in case of which languages its prediction accuracy is the highest or the lowest.