dare-rf is a python library that implements machine unlearning for random forests, enabling the efficient removal of training data without having to retrain from scratch. It is built using Cython and is designed to be scalable to large datasets.
pip install dare-rf
Simple example of removing a single training instance:
import dare
import numpy as np
# training data
X_train = np.array([[0, 1], [0, 1], [0, 1], [1, 0], [1, 0]])
y_train = np.array([1, 1, 1, 0, 1])
X_test = np.array([[1, 0]]) # test instance
# train a DaRE RF model
rf = dare.Forest(n_estimators=100,
max_depth=3,
k=5, # no. thresholds to consider per attribute
topd=0, # no. random node layers
random_state=1)
rf.fit(X_train, y_train)
rf.predict_proba(X_test) # prediction before deletion => [0.5, 0.5]
rf.delete(3) # delete training example at index 3 ([1, 0], 0)
rf.predict_proba(X_test) # prediction after deletion => [0.0, 1.0]
Brophy and Lowd. Machine Unlearning for Random Forests. ICML 2021.
@inproceedings{brophy2021machine,
title={Machine Unlearning for Random Forests},
author={Brophy, Jonathan and Lowd, Daniel},
booktitle={International Conference on Machine Learning},
pages={1092--1104},
year={2021},
organization={PMLR}
}