Implmentation for TMLR paper: Retiring ΔDP: New Distribution-Level Metrics for Demographic Parity, [Openreview], [Arxiv], by Xiaotian Han*, Zhimeng Jiang*, Hongye Jin*, Zirui Liu, Na Zou, Qifan Wang, Xia Hu
Lots of fairness definitions (e.g., demographic parity, equalized opportunity) has been proposed to solve different types of fairness issues. In this paper, we focus on the measurement of demographic parity,
In this paper, we rethink the rationale of
- Zero-value
$\Delta DP$ does not guarantee zero violation of demographic parity.
$\Delta DP$ value is highly correlated to the selection of the threshold for the classification task.
We propose two distribution-level metrics, namely Area Between Probability density function Curves (ABPC) and Area Between Cumulative density function Curves (ABCC), to retire
ABPC and ABCC have following advantages:
-
Zero-value ABPC/ABCC is a necessary and sufficient condition to achieve demographic parity.
-
The prediction independency to sensitive attributes can be guaranteed over any threshold.
torch 1.10.0
statsmodels 0.13.1
scikit-learn 1.0.1
pandas 1.3.4
numpy 1.21.2
aif360 0.4.0
run the following commands at current directy
bash run.sh
python -u ./src/bs_tabular_mlp.py --data_path ./data/adult --dataset adult --sensitive_attr sex --exp_name adult_mlp --batch_size 256 --epoch 10 --seed 31314
python -u ./src/bs_tabular_reg.py --data_path ./data/adult --dataset adult --sensitive_attr sex --exp_name adult_reg --batch_size 256 --epoch 10 --seed 31314 --lam 1
python -u ./src/bs_tabular_adv.py --data_path ./data/adult --dataset adult --sensitive_attr sex --exp_name adult_adv --batch_size 256 --epoch 40 --seed 31314 --lam 170
def ABPC( y_pred, y_gt, z_values, bw_method = "scott", sample_n = 5000 ):
y_pred = y_pred.ravel()
y_gt = y_gt.ravel()
z_values = z_values.ravel()
y_pre_1 = y_pred[z_values == 1]
y_pre_0 = y_pred[z_values == 0]
# KDE PDF
kde0 = gaussian_kde(y_pre_0, bw_method = bw_method)
kde1 = gaussian_kde(y_pre_1, bw_method = bw_method)
# integration
x = np.linspace(0, 1, sample_n)
kde1_x = kde1(x)
kde0_x = kde0(x)
abpc = np.trapz(np.abs(kde0_x - kde1_x), x)
return abpc
def ABCC( y_pred, y_gt, z_values, sample_n = 10000 ):
y_pred = y_pred.ravel()
y_gt = y_gt.ravel()
z_values = z_values.ravel()
y_pre_1 = y_pred[z_values == 1]
y_pre_0 = y_pred[z_values == 0]
# empirical CDF
ecdf0 = ECDF(y_pre_0)
ecdf1 = ECDF(y_pre_1)
# integration
x = np.linspace(0, 1, sample_n)
ecdf0_x = ecdf0(x)
ecdf1_x = ecdf1(x)
abcc = np.trapz(np.abs(ecdf0_x - ecdf1_x), x)
return abcc
Please kindly cite the following paper if you found our code helpful!
@article{han2023retiring,
title={Retiring $$\backslash$Delta $ DP: New Distribution-Level Metrics for Demographic Parity},
author={Han, Xiaotian and Jiang, Zhimeng and Jin, Hongye and Liu, Zirui and Zou, Na and Wang, Qifan and Hu, Xia},
journal={arXiv preprint arXiv:2301.13443},
year={2023}
}
We use Github Copilot to generate some comments in important code.