Parallel-PDF-Clustering-using-Spark

The number of clusters are not pre-defined and the found during execution. The processing and clustering is done in distributive environment for efficiency using Spark

The link for the dataset is below: https://drive.google.com/drive/folders/10Ysiq2I1TQy_319uSWvzLIu_xzxSXQM5?usp=sharing

Different datasets can be used for experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PDF_clustering.ipynb		PDF_clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel-PDF-Clustering-using-Spark

About

Releases

Packages

Languages

shreyansh-kothari/Parallel-PDF-Clustering-using-Spark

Folders and files

Latest commit

History

Repository files navigation

Parallel-PDF-Clustering-using-Spark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages