Skip to content

Implementation and survey of similarity search methods that rely on dimensionality reduction (e.g. LSH), D-dimensional vector clustering

Notifications You must be signed in to change notification settings

georgesittas/similarity-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Similarity Search

This project contains implementations for the following dimensionality reduction methods and their application to the domains of similarity search (i.e. approximate k-Nearest Neighbors) and clustering:

Clustering through reverse assignment using range queries
---------------------------------------------------------

1. Index all data points of interest in either data structure (LSH hash tables or a single Hypercube hash table)
2. Initialize all cluster centroids
3. At each iteration, for each centroid execute a ranged query around it and assign all neighboring points to it  
4. The initial radii can be computed as min(dist between centers)/2 and after each iteration they are doubled
5. Repeat the above steps until the algorithm converges to a fix point, or certain criteria are met
6. If there are any unassigned points, we can assign them using the classic K-Means algorithm
7. Output computed clusters

For clustering, we used the initialization scheme of K-Means++ and we implemented the classic Lloyd's algorithm, comparing it to the above methods based on the well-known Silhouette metric. See the project documentation, benchmark results and the related bibliography.

Contributors

George Sittas (Jo)
Dimitra Kousta (Demesta)

About

Implementation and survey of similarity search methods that rely on dimensionality reduction (e.g. LSH), D-dimensional vector clustering

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published