Privacy-preserving DBSCAN and TRACLUS

By Beyza Bozdemir¹, Sébastien Canard², Orhan Ermis¹, Helen Möllering³, Melek Önen¹, and Thomas Schneider³. (¹EURECOM; ²Applied Cryptography Group, Orange Labs; ³ENCRYPTO, TU Darmstadt)
In 16th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2021). Paper available here.

This work is based on the ABY framework for efficient mixed-protocol secure two-party computation.

For simplicity, we will only describe the short version of the build process here, including the differences and additional steps required compared to the ABY build procedure. Please also refer to the detailed build instructions of ABY.

This code is provided as an experimental implementation for testing purposes and should not be used in a productive environment. We cannot guarantee security and correctness.

(Short) Requirements and Installation

This code requires all ABY requirements and can be set up and executed like any other ABY example.

Parameters Distance Calculation

 -r [Role: 0/1, required]
 -d address of dataset file
 -e input dataset size

More (optional) parameters can be inspected by simply running ./ppDBSCAN_distance.

Parameters Grouping

 -r [Role: 0/1, required]
 -d address of dataset file
 -e input dataset size
 -i index of input record that is currently checked if it creates a cluster (including the cluster expansion by neighbors).

More (optional) parameters can be inspected by simply running ./ppDBSCAN_grouping.

Running ppDBSCAN on Lsun

For an easy test of our ppDBSCAN implementation, we included two scripts (1 and 2), each for one of the two computing parties, that run ppDBSCAN on the Lsun dataset [1] scaled to integer values.

As visible in the scripts, ppDBSCAN is split into distance calculation and grouping. Furthermore, the grouping is interrupted after checking each element of the input dataset to reset the ABY circuit and free memory occupied from ABY. This is realized by writing the secret shares at each of the two computing parties into a local file at the respective party. Thereby, nothing about the result is leaked as all data is kept as secret-shares.

The Lsun data file must be placed in ABY/build/bin/data to be automatically found when executing the two scripts.

Clustering other datasets

To cluster different datasets, the values of ppDBSCAN's 'eps' and 'minPts' as well as the dataset's dimension (indicated by 'dim') should be adapted in this file. Additionally, we recommend to re-use the scripts for clustering Lsun by simply adapting the parameters for the dataset's size and changing the file's address. The dataset size must also be used for the number of grouping iterations (i.e., the number of calls to ppDBSCAN_grouping) as each iteration checks for one element if it creates a cluster including the recursive expansion through potential neighbors. Further, if the new dataset is not given in the same format, the function loadData(..) in this file should be adapted.

Running the grouping phase of ppTRACLUS

For running the grouping phase of ppTRACLUS, one has to change the distance measure in the function calculateDistances(..) from buildSquaredEuclideanDistCircuit(..) to buildApproxTraclusDistanceCircuit(..) in this file.

Issues & Notes

ABY has a known problem where it randomly returns invalid results in about 1 out of 10 executions. This problem is still an open and also reported in several issues, e.g., here or here. Given that we have to do many runs of an ABY program in the splitted grouping phase, it follows that our prototype can only give valid runtimes but not valid clustering results. To get valid clustering results, the split of the grouping phase must be removed.

References

[1] Ultsch, A. (2005). Clustering with SOM: U*C. Proc. Workshop on Self-Organizing Maps, Paris, France, pp. 75-82.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
src/examples/ppdbscan		src/examples/ppdbscan
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client_script.sh		client_script.sh
server_script.sh		server_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy-preserving DBSCAN and TRACLUS

(Short) Requirements and Installation

Parameters Distance Calculation

Parameters Grouping

Running ppDBSCAN on Lsun

Clustering other datasets

Running the grouping phase of ppTRACLUS

Issues & Notes

References

About

Releases

Packages

Languages

License

encryptogroup/ppDBSCAN

Folders and files

Latest commit

History

Repository files navigation

Privacy-preserving DBSCAN and TRACLUS

(Short) Requirements and Installation

Parameters Distance Calculation

Parameters Grouping

Running ppDBSCAN on Lsun

Clustering other datasets

Running the grouping phase of ppTRACLUS

Issues & Notes

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages