Skip to content

A neural network classifier for detecting plagiarism in student answers

License

Notifications You must be signed in to change notification settings

khadija267/Plagiarism-Detection

Repository files navigation

Plagiarism-Detection

A neural network classifier for detecting plagiarism in student answers

This repository contains code and associated files for deploying a plagiarism detector using AWS SageMaker.

It is part of Udacity Machine Learning Engineering Nanodegree. Project Overview

In this project, you will be tasked with building a plagiarism detector that examines a text file and performs binary classification; labeling that file as either plagiarized or not, depending on how similar that text file is to a provided source text. Detecting plagiarism is an active area of research; the task is non-trivial and the differences between paraphrased answers and original work are often not so obvious.

This project will be broken down into three main notebooks:

Notebook 1: Data Exploration

Load in the corpus of plagiarism text data.
Explore the existing data features and the data distribution.
This first notebook is not required in your final project submission.

Notebook 2: Feature Engineering

Clean and pre-process the text data.
Define features for comparing the similarity of an answer text and a source text, and extract similarity features.
Select "good" features, by analyzing the correlations between different features.
Create train/test .csv files that hold the relevant features and class labels for train/test data points.

Notebook 3: Train and Deploy Your Model in SageMaker

Upload your train/test feature data to S3.
Define a binary classification model and a training script.
Train your model and deploy it using SageMaker.
Evaluate your deployed classifier.

About

A neural network classifier for detecting plagiarism in student answers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published