A Java program to check Plagiarisms between multiple documents using the method of Shingling, MinHashing and Locality Sensitive Hashing.
-
Updated
Jul 31, 2020 - Java
A Java program to check Plagiarisms between multiple documents using the method of Shingling, MinHashing and Locality Sensitive Hashing.
Explored Jaccard distance, Min-Hashing, and LSH for user similarity in a movie rating dataset. Tasks involve dataset preprocessing, exact Jaccard Similarity computation, Min-Hash signatures, and LSH implementation. Results and observations are documented in code, output files, and a report
Finding Similar Items: Textually Similar Documents
Add a description, image, and links to the min-hashing topic page so that developers can more easily learn about it.
To associate your repository with the min-hashing topic, visit your repo's landing page and select "manage topics."