Similarity Search

This project contains implementations for the following dimensionality reduction methods and their application to the domains of similarity search (i.e. approximate k-Nearest Neighbors) and clustering:

Clustering through reverse assignment using range queries
---------------------------------------------------------

1. Index all data points of interest in either data structure (LSH hash tables or a single Hypercube hash table)
2. Initialize all cluster centroids
3. At each iteration, for each centroid execute a ranged query around it and assign all neighboring points to it  
4. The initial radii can be computed as min(dist between centers)/2 and after each iteration they are doubled
5. Repeat the above steps until the algorithm converges to a fix point, or certain criteria are met
6. If there are any unassigned points, we can assign them using the classic K-Means algorithm
7. Output computed clusters

For clustering, we used the initialization scheme of K-Means++ and we implemented the classic Lloyd's algorithm, comparing it to the above methods based on the well-known Silhouette metric. See the project documentation, benchmark results and the related bibliography.

Contributors

• George Sittas (Jo)
• Dimitra Kousta (Demesta)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bin		bin
config		config
input		input
modules		modules
src		src
Makefile		Makefile
README.md		README.md
details.pdf		details.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Similarity Search

Contributors

About

Releases

Packages

Languages

georgesittas/similarity-search

Folders and files

Latest commit

History

Repository files navigation

Similarity Search

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages