-
Notifications
You must be signed in to change notification settings - Fork 8
Code to speed up k-means clustering. Originally at BaylorCS/baylorml.
License
ghamerly/fast-kmeans
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
=============================== Fast K-means Clustering Toolkit =============================== ---------------------- Version 0.1 (Sat May 17 17:41:11 CDT 2014) - Initial release. ---------------------- WHAT: This software is a testbed for comparing variants of Lloyd's k-means clustering algorithm. It includes implementations of several algorithms that accelerate the algorithm by avoiding unnecessary distance calculations. ---------------------- WHO: Greg Hamerly (hamerly@cs.baylor.edu, primary contact) and Jonathan Drake (drakej@hp.com). ---------------------- HOW TO BUILD THE SOFTWARE: type "make" (and hope for the best) ---------------------- HOW TO RUN THE SOFTWARE: The driver is designed to take commands from standard input, usually a file that's been redirected as input: ./kmeans < commands.txt You can read the source to find all the possible commands, but here is a summary: - threads T -- use T threads for clustering - maxiterations I -- use at most I iterations; default (or negative) indicates an unlimited number - dataset D -- use the given path name to a file as the dataset for clustering. The dataset should have a first line with the number of points n and dimension d. The next (nd) tokens are taken as the n vectors to cluster. - initialize k {kpp|random} -- use the given method (k-means++ or a random sample of the points) to initialize k centers - lloyd, hamerly, annulus, elkan, compare, sort, heap, adaptive -- perform k-means clustering with the given algorithm (requires first having initialized the centers). The adaptive algorithm is Drake's algorithm with a heuristic for choosing an initial B - drake B -- use Drake's algorithm with B lower bounds - kernel [gaussian T | linear | polynomial P] -- use kernelized k-means with the given kernel - elkan_kernel [gaussian T | linear | polynomial P] -- use kernelized k-means with the given kernel, and Elkan's accelerations - center -- give the previously-loaded dataset a mean of 0. - quit -- quit the program Note that when a set of centers is initialized, that same set of centers is used from then on (until a new initialization occurs). So running a clustering algorithm multiple times will use the same initialization each time. Here is an example of a simple set of commands: dataset smallDataset.txt initialize 10 kpp annulus hamerly adaptive heap elkan sort compare ---------------------- CAVEATS: - This software has been developed and tested on Linux. Other platforms may not work. Please let us know if you have difficulties, and if possible fixes for the code. - This software uses a non-standard pthreads function called pthread_barrier_wait(), which is implemented on Linux but not on OSX. Therefore, multithreading doesn't currently work on OSX. To turn it off, comment out the lines in the Makefile that say: CPPFLAGS += -DUSE_THREADS LDFLAGS += -lpthread ---------------------- REFERENCES: Phillips, Steven J. "Acceleration of k-means and related clustering algorithms." In Algorithm Engineering and Experiments, pp. 166-177. Springer Berlin Heidelberg, 2002. Elkan, Charles. "Using the triangle inequality to accelerate k-means." In ICML, vol. 3, pp. 147-153. 2003. Hamerly, Greg. "Making k-means Even Faster." In SDM, pp. 130-140. 2010. Drake, Jonathan, and Greg Hamerly. "Accelerated k-means with adaptive distance bounds." In 5th NIPS Workshop on Optimization for Machine Learning. 2012. Drake, Jonathan. "Faster k-means clustering." MS thesis, 2013. Hamerly, Greg, and Jonathan Drake. "Accelerating Lloyd's algorithm for k-means clustering." To appear in Partitional Clustering Algorithms, Springer, 2014.
About
Code to speed up k-means clustering. Originally at BaylorCS/baylorml.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published