Add report notes
This commit is contained in:
parent
25951c7b12
commit
ecd5ad68ef
60
report.md
Normal file
60
report.md
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
Despite high values of distinction (66.48%) and separation (99.99%) the `fnlwgt` column is not a QID becuase it represents a weight, not a
|
||||||
|
count of individuals in the same equivalence class in the original dataset. Additionally, it's not easily connected to
|
||||||
|
another auxiliary info dataset.
|
||||||
|
|
||||||
|
We determined that `age` is a QID, since it's widely regarded as such, in all datasets, according to HIPPA recommendations.
|
||||||
|
|
||||||
|
We noted this dataset contains more males than females.
|
||||||
|
|
||||||
|
Higer Precision (Generation Intensity) implies the attributes are closer to the ones in the original dataset, therefore
|
||||||
|
provide higher utility.
|
||||||
|
|
||||||
|
We noted that the contingency between `sex` and `relationship` maintained the same distribution after anonymization,
|
||||||
|
meaning that these changes don't mean `relationship` can identify an individual's `sex` any more than in the original dataset.
|
||||||
|
|
||||||
|
We exported the anonymized dataset and used the following command to verify there weren't any discrepencies between the
|
||||||
|
`education` and `education-num` columns:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat anonymized.csv | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f4,5 | sort -u
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
/projects/uni/DataAnonymisation/ (master)$ cat adult_data.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d',' -f8,10 | sort | uniq -c | sort -n
|
||||||
|
1 Husband, Female
|
||||||
|
2 Wife, Male
|
||||||
|
430 Other-relative, Female
|
||||||
|
551 Other-relative, Male
|
||||||
|
792 Unmarried, Male
|
||||||
|
1566 Wife, Female
|
||||||
|
2245 Own-child, Female
|
||||||
|
2654 Unmarried, Female
|
||||||
|
2823 Own-child, Male
|
||||||
|
3875 Not-in-family, Female
|
||||||
|
4430 Not-in-family, Male
|
||||||
|
13192 Husband, Male
|
||||||
|
~/projects/uni/DataAnonymisation/ (master)$ cat anonymized.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f8,10 | sort | uniq -c | sort -n
|
||||||
|
1 Husband Female
|
||||||
|
2 Wife Male
|
||||||
|
168 Other-relative *
|
||||||
|
336 Own-child *
|
||||||
|
342 Other-relative Female
|
||||||
|
471 Other-relative Male
|
||||||
|
552 Wife *
|
||||||
|
573 Unmarried Male
|
||||||
|
728 Unmarried *
|
||||||
|
1014 Wife Female
|
||||||
|
1649 Not-in-family *
|
||||||
|
2042 Husband *
|
||||||
|
2081 Own-child Female
|
||||||
|
2145 Unmarried Female
|
||||||
|
2651 Own-child Male
|
||||||
|
3209 Not-in-family Female
|
||||||
|
3447 Not-in-family Male
|
||||||
|
11150 Husband Male
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Reference in New Issue
Block a user