Skip to content

bakkyn/Spam-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam detection

MultinomialNB implements the naive Bayes algorithm for multiunit distributed data and is one of the two classic naive Bayes variants used in text classification. Based on the classical Naive Bayes, Q (yi) specifies the probability of occurrence of variable i of P (xi, y). The relevant address coding stage was taken as a reference for the application of the formulas. (https://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes)
Code details are presented in the code as a comment line.

Basically, the steps taken are as follows;
-Purification of data from unwanted characters
-Removing words
-Spam and raw data probability calculation
-Train process and realization of test process (application of the specified formula)
-Getting the predicted values of TN, FN, TP, FP
-Extraction of results as Accuracy, Precision, Recall, F-Measure

Train data was used as presented in the given file, while test data was used in a single file for spam and raw data.

In the project, data capture and printing, data processing, ready-made classifier functions, packages and libraries other than standard data structures were not used.

The results are as follows;
Accuracy(TP+TN)/(TP+TN+FP+FN) : 0.8833333333333333

Precision(TP/(TP+FP)) : 0.822429906542056

Recall(TP/(TP+FN)) : 0.9777777777777777

F-Measure(2PR/(P+R)) : 0.8934010152284264