Add report notes

This commit is contained in:
Hugo Sales 2022-06-02 17:25:37 +01:00
parent 25951c7b12
commit ecd5ad68ef
Signed by untrusted user who does not match committer: someonewithpc
GPG Key ID: 7D0C7EAFC9D835A0
1 changed files with 60 additions and 0 deletions

60
report.md Normal file
View File

@ -0,0 +1,60 @@
Despite high values of distinction (66.48%) and separation (99.99%) the `fnlwgt` column is not a QID becuase it represents a weight, not a
count of individuals in the same equivalence class in the original dataset. Additionally, it's not easily connected to
another auxiliary info dataset.
We determined that `age` is a QID, since it's widely regarded as such, in all datasets, according to HIPPA recommendations.
We noted this dataset contains more males than females.
Higer Precision (Generation Intensity) implies the attributes are closer to the ones in the original dataset, therefore
provide higher utility.
We noted that the contingency between `sex` and `relationship` maintained the same distribution after anonymization,
meaning that these changes don't mean `relationship` can identify an individual's `sex` any more than in the original dataset.
We exported the anonymized dataset and used the following command to verify there weren't any discrepencies between the
`education` and `education-num` columns:
```bash
cat anonymized.csv | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f4,5 | sort -u
```
/projects/uni/DataAnonymisation/ (master)$ cat adult_data.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d',' -f8,10 | sort | uniq -c | sort -n
1 Husband, Female
2 Wife, Male
430 Other-relative, Female
551 Other-relative, Male
792 Unmarried, Male
1566 Wife, Female
2245 Own-child, Female
2654 Unmarried, Female
2823 Own-child, Male
3875 Not-in-family, Female
4430 Not-in-family, Male
13192 Husband, Male
~/projects/uni/DataAnonymisation/ (master)$ cat anonymized.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f8,10 | sort | uniq -c | sort -n
1 Husband Female
2 Wife Male
168 Other-relative *
336 Own-child *
342 Other-relative Female
471 Other-relative Male
552 Wife *
573 Unmarried Male
728 Unmarried *
1014 Wife Female
1649 Not-in-family *
2042 Husband *
2081 Own-child Female
2145 Unmarried Female
2651 Own-child Male
3209 Not-in-family Female
3447 Not-in-family Male
11150 Husband Male