By Beyza Bozdemir¹, Sébastien Canard², Orhan Ermis¹, Helen Möllering³, Melek Önen¹, and Thomas Schneider³. (¹EURECOM; ²Applied Cryptography Group, Orange Labs; ³ENCRYPTO, TU Darmstadt)
In 16th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2021). Paper available here.
This work is based on the ABY framework for efficient mixed-protocol secure two-party computation.
For simplicity, we will only describe the short version of the build process here, including the differences and additional steps required compared to the ABY build procedure. Please also refer to the detailed build instructions of ABY.
This code is provided as an experimental implementation for testing purposes and should not be used in a productive environment. We cannot guarantee security and correctness.
This code requires all ABY requirements and can be set up and executed like any other ABY example.
-r [Role: 0/1, required]
-d address of dataset file
-e input dataset size
More (optional) parameters can be inspected by simply running ./ppDBSCAN_distance
.
-r [Role: 0/1, required]
-d address of dataset file
-e input dataset size
-i index of input record that is currently checked if it creates a cluster (including the cluster expansion by neighbors).
More (optional) parameters can be inspected by simply running ./ppDBSCAN_grouping
.
For an easy test of our ppDBSCAN implementation, we included two scripts (1 and 2), each for one of the two computing parties, that run ppDBSCAN on the Lsun dataset [1] scaled to integer values.
As visible in the scripts, ppDBSCAN is split into distance calculation and grouping. Furthermore, the grouping is interrupted after checking each element of the input dataset to reset the ABY circuit and free memory occupied from ABY. This is realized by writing the secret shares at each of the two computing parties into a local file at the respective party. Thereby, nothing about the result is leaked as all data is kept as secret-shares.
The Lsun data file must be placed in ABY/build/bin/data
to be automatically found when executing the two scripts.
To cluster different datasets, the values of ppDBSCAN's 'eps' and 'minPts' as well as the dataset's dimension (indicated by 'dim') should be adapted in this file. Additionally, we recommend to re-use the scripts for clustering Lsun by simply adapting the parameters for the dataset's size and changing the file's address. The dataset size must also be used for the number of grouping iterations (i.e., the number of calls to ppDBSCAN_grouping) as each iteration checks for one element if it creates a cluster including the recursive expansion through potential neighbors. Further, if the new dataset is not given in the same format, the function loadData(..)
in this file should be adapted.
For running the grouping phase of ppTRACLUS, one has to change the distance measure in the function calculateDistances(..)
from buildSquaredEuclideanDistCircuit(..)
to buildApproxTraclusDistanceCircuit(..)
in this file.
ABY has a known problem where it randomly returns invalid results in about 1 out of 10 executions. This problem is still an open and also reported in several issues, e.g., here or here. Given that we have to do many runs of an ABY program in the splitted grouping phase, it follows that our prototype can only give valid runtimes but not valid clustering results. To get valid clustering results, the split of the grouping phase must be removed.
[1] Ultsch, A. (2005). Clustering with SOM: U*C. Proc. Workshop on Self-Organizing Maps, Paris, France, pp. 75-82.