Experiments in initialisation strategies for the K-means data clustering algorithm, as research for my MSc by Dissertation at the University of Essex. This is the software used to perform the experiments for our paper published in IEEE Access.
runner.py
: the main bootstrap file. Runs experiments using parallelisation where possibleinitialisations/
: implementations of K-means initialisation algorithmsdatasets/
: data importers, preprocessors/wranglers and resulting datametrics/
: implementations and wrappers of algorithms used to measure clustering successnotebooks/
: Jupyter notebooks used to demonstrate clustering using the initialisationstests/
: unit testskmeans.py
: implementation of K-means algorithm. It is not anticipated that this will be used for the experiments, though parts of it have been incorporated into implemented initialisationscluster.py
: ...dataset.py
: ...
$ python3 runner.py <algorithm> <datadir> <restarts>
The parameters which must be supplied to runner.py
as above are:
algorithm
: the identifier for the initialisation algorithm to be run, each of which can be found in Table 4.12datadir
: the relative path to the directory containing the data sets for the experimental runrestarts
: the number of restarts to be performed per data set, which will typically be 1 for deterministic initialisation algorithms and more for non-deterministic algorithms