This is a python code for simulation experiments on methods to release private statistics in GWAS. We applied the Laplace mechanism, which is based on the concept of differential privacy, to the main statistical tests using contingency tables: the chi-squared test, Fisher's exact test, and Cochran-Armitage's trend test.
Nowadays, it is not desirable to publish genome statistics based on their
The significance of this study (at present) would be that the results provide evidence that
For the statistics in the chi-squared test, we experimented with methods for releasing the P-values, log(P), chi-squared statistics when using
2 × 2 and 3 × 2 contingency tables. As for log(P) and chi-squared statistics, we demonstrated the appropriate thresholds
for several privacy levels
For the Fisher's exact test, as in the case of the chi-squared test, we conducted the experiments to evaluate the utility of our proposed methods and the thresholds for practical use of the P-value and log(P).
For the Cochran-Armitage's trend test using a 3 × 2 contingency table, we focused on the chi-squared statistics and log(P). The methods for generating the simulation data is the same as above.
(2023/07) The original paper states that the Cochran-Armitage trend test has 2 degrees of freedom, which is incorrect. It has 1 degree of freedom. Accordingly, our Theorem 9 on the sensitivity of
In the Supplementary Material, the last sentense in S1.3 is incorrect, and the discussion on Theorem S9 (= Theorem 9 in the main text) is meaningless.
(Regarding the discussion on
<------ I uploaded the revised version of Theorem 9 (Revised_Theorem9.pdf).
・Restrictions on the number of cases and controls.
・No consideration of genome dependencies.
・The analyses for the Fisher's exact test are not rigorous. We only considered the exact probabilities obtained from the given contingency tables as shown in Supplementary Material Section S1.2. (i.e, We did not consider the effects of more extreme data than given table on its
<----- For data with a small number of individuals, a different approach other than adding noise to the output values may by desired.
・Our methods for publishing log(P) in the Cochran-Armitage trend test (in the original paper) must not be used. Instead, please refer to Revised_Theorem9.pdf. In the future, while analyzing the
For more details, please see our paper entitled "More practical differentially private publication of key statistics in GWAS" (https://doi.org/10.1093/bioadv/vbab004) published in Bioinformatics Advances.
Errata:
・p.8. 3.3. l.5-7 "the degree of freedom of the Cochran-Armitage's trend test ~ is 2" → "~ is 1"
・p.8. 3.3. l.8 "33.6" → "29.7"
・Supplementary Material p.1. S1.3. The last sentence is incorrect. Please see Revised_Theorem9.pdf.
Akito Yamamoto
Division of Medical Data Informatics, Human Genome Center,
the Institute of Medical Science, the University of Tokyo