- Purvi Harniya, 1814023
- Neelay Jagani, 1814024
- Esha Gupta,1814025
Data has been increasing at an unprecedented range in an exponential manner and is producing 2.7 quintillion bytes of data everyday.
The definition of fake news is information that pushes people down the wrong road. Fake news is spreading like wildfire these days, and people are sharing it without confirming it. This is frequently done to promote or impose specific views, and it is frequently accomplished through political agendas.
As a result, it is vital to recognise fake news.
Fake News have become more prevalent in recent years and with great amount of dynamism in internet and social media, differentiating between facts and opinions, relating to commercial or political upheavals has become more difficult than ever.
Fake information is purposely or unintentionally spread throughout the internet. The massive dissemination of fake news has left an indelible mark on people and culture.
We use various NLP and preprocessing methodologies like tokenization, stop words removal, lemmatization, stemming and machine learning classification algorithms - logistic regression, pac, ada, naive bayes, svm, random forest, xgboost, decision trees and rnn, to build a model that differentiates between fake news and real news and also analyze the performance of these various classification methodologies to choose the best classifier on out dataset.
The ISOT Fake news dataset was downloaded from https://www.uvic.ca/ecs/ece/isot/datasets/index.php.
A comparison of various classification algorithms to determine the best for our dataset.
The results are as follows:
Models | Accuracy | Precision | F1 Score | Recall | |
---|---|---|---|---|---|
1 | Logstic Regression | 0.987973 | 0.986926 | 0.987387 | 0.987848 |
2 | ADA | 0.988241 | 0.987661 | 0.987661 | 0.987661 |
3 | PAC | 0.995724 | 0.994958 | 0.995516 | 0.996074 |
4 | XGB | 0.990468 | 0.994342 | 0.989954 | 0.985605 |
5 | RF | 0.984677 | 0.986835 | 0.983874 | 0.980931 |
6 | Naive Bayes | 0.952339 | 0.951087 | 0.949930 | 0.948775 |
7 | SVM | 0.994833 | 0.993287 | 0.994586 | 0.995887 |
8 | DT | 0.985835 | 0.986684 | 0.985114 | 0.983548 |
9 | RNN | 0.992428 | 0.994877 | 0.992104 | 0.989347 |
We downloaded the ISOT dataset from https://www.uvic.ca/ecs/ece/isot/datasets/index.php and uploaded it to our drive, and then loaded it and preprocessed it using various NLP algorithms like tokenization, stop words removal, lemmatization and stemming. We vectorized the text documents using count vectorizer and tf-idf vectorizer. After preprocessing, we split the data into testing and training and we built nine models using nine different classification algorithms and used the predictions to calculate the performance metrics. The details of each are given below:
From the ROC curve and the bar plot which compares the performance of all the models, we conclude that the SVM (accuracy-99.48%, precision - 99.33%, f1 score-99.46%, recall-99.59%) is the best algorithm on our ISOT dataset for the task of fake news detection and classification.
The main notebook file is 'Group6_Code_ML_IA2_Implementation.ipynb' - https://github.com/Purviharniya/Fake-news-detection/blob/master/Group6_Code_ML_IA2_Implementation.ipynb
The dataset is available in the dataset folder - https://github.com/Purviharniya/Fake-news-detection/tree/master/dataset
The notebook file's pdf is available at - https://github.com/Purviharniya/Fake-news-detection/blob/master/Group6_CodePDF_ML_IA2.pdf
The ppt is available at - https://github.com/Purviharniya/Fake-news-detection/blob/master/Group6_PPT_ML_IA2.pptx
The summary document is available at - https://github.com/Purviharniya/Fake-news-detection/blob/master/Group6_Document_FakeNewsDetection_ML_IA2_EXP8.docx
The screen cast is available at - https://github.com/Purviharniya/Fake-news-detection/blob/master/Group6_Screencast_ML_IA2.mp4
The research paper for IA1 - https://github.com/Purviharniya/Fake-news-detection/blob/master/Group6_Reasearch%20Paper_ML_IA.pdf