Skip to content
This repository has been archived by the owner on Feb 5, 2024. It is now read-only.

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
pommedeterresautee committed Aug 13, 2018
1 parent 23cedf9 commit 638a19b
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,10 @@ To each type, dataset augmentation and miscellaneous tricks have been applied.

In the future, French legislation may require to pseudo-anonymize following mentions in addition to those already known:

* first name of natural person
* judge name
* clerk name
* lawyer name
- first name of natural person
- judge name
- clerk name
- lawyer name

> Only taking care of `PERS` and `ADDRESS` entities has been tried at first.
It appeared that there was some issues with the other entity types.
Expand All @@ -105,10 +105,12 @@ Therefore, these entity types have been added, greatly improving the quality of

Type of entities that will not be included:

- social security numbers: there are too few, not enough to learn anything and it makes the associated risk very low (3 numbers for 30 000 cases checked)
- credit card number: not found in 30 000 cases, very low risk.
- social security numbers: Too few examples to learn from (3 numbers for 30 000 cases checked). Low risk.
- credit card number: not found in 30 000 cases, but lots of false positive. Low risk.

All the types to add may be managed by `regex`.
For both types of entity, there are lots of false positives.
To limit these cases, we check the control number included in these Ids, but it's not enough to remove all false positives.
Therefore, it seems smarter to not search for these Ids, moreover, they are quite hard to use for re-identification.

## Model

Expand Down

0 comments on commit 638a19b

Please sign in to comment.