Plagiarism Detector in C++

Uses naive methods to detect observable plagiarism in text files.

Tests implemented

1. Token frequency matching

Target file's word count is matched with each individual textfile's token counts in the database folder to find a high degree of similarity in the tokens and their frequency of use.

2. N-Gram matching

A direct match of consecutive tokens (or ngrams) was performed to detect similarity in patterns and neighbourhood of tokens. The value of N for the N-Gram generation was varied and a cumulative result was obtained by a weighted average over all of the results.

3. Cosine matching

Cosine of the angle between the vectors obtained from the target and the base text files is computed to estimate the simiilarity in the token vectors of both the files.

Getting started

1. Place all the reference text files in the database directory.

2. Place all the text files required to be checked in the target directory.

3. (Optional) Edit the stopwords.txt text file as per requirement, to add words which are to be ignored in the analysis.

4. Change the database and target_folder variables with the actual location of them.

5. Compile run.cpp in C++11 (or above), with the command g++ run.cpp -std=c++11.

6. Run the generated executable with the command ./a.out (in Linux environment).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Project Docs		Project Docs
database		database
target		target
LICENSE		LICENSE
README.md		README.md
WhatsApp Image 2021-03-18 at 11.05.43 PM.jpeg		WhatsApp Image 2021-03-18 at 11.05.43 PM.jpeg
calculations.h		calculations.h
globals.h		globals.h
ignore_words.txt		ignore_words.txt
plagiMain.cpp		plagiMain.cpp
stringOperations.h		stringOperations.h
test.cpp		test.cpp
tests.h		tests.h
tokenization.h		tokenization.h
verdict.h		verdict.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plagiarism Detector in C++

Tests implemented

1. Token frequency matching

2. N-Gram matching

3. Cosine matching

Getting started

Image of the project

About

Releases

Packages

Languages

License

AhmedRaja1/Plagiarism-Detector

Folders and files

Latest commit

History

Repository files navigation

Plagiarism Detector in C++

Tests implemented

1. Token frequency matching

2. N-Gram matching

3. Cosine matching

Getting started

Image of the project

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages