LanguageDetector

The program is an implementation of a language model aimed at detecting the language of a given text. Firstly, the data (texts) undergo the proprocessing stage in order to remove the noise and prepare them for the next steps. As the learning method, the Multinomial Naive Bayes was used. It consists of training of the model on a set of texts in different languages, prediction of the language of validation data, and evaluation of the model performance.

Here, we use Italian, English, Portuguese, and Polish data.

The performance of the created model is shown on a confusion matrix where we can observe how successful the model is and in case of which languages its prediction accuracy is the highest or the lowest.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
confusion_matrix.png		confusion_matrix.png
detect_language.py		detect_language.py
english_data.txt		english_data.txt
english_validation_data.txt		english_validation_data.txt
italian_data.txt		italian_data.txt
italian_validation_data.txt		italian_validation_data.txt
language_data.py		language_data.py
naive_bayes_model.py		naive_bayes_model.py
polish_data.txt		polish_data.txt
polish_validation_data.txt		polish_validation_data.txt
portuguese_data.txt		portuguese_data.txt
portuguese_validation_data.txt		portuguese_validation_data.txt
requirements.txt		requirements.txt
text_handling.py		text_handling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LanguageDetector

About

Releases

Packages

Languages

KajaBraz/LanguageDetector

Folders and files

Latest commit

History

Repository files navigation

LanguageDetector

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages