Add report notes
This commit is contained in:
parent
25951c7b12
commit
ecd5ad68ef
60
report.md
Normal file
60
report.md
Normal file
@ -0,0 +1,60 @@
|
||||
Despite high values of distinction (66.48%) and separation (99.99%) the `fnlwgt` column is not a QID becuase it represents a weight, not a
|
||||
count of individuals in the same equivalence class in the original dataset. Additionally, it's not easily connected to
|
||||
another auxiliary info dataset.
|
||||
|
||||
We determined that `age` is a QID, since it's widely regarded as such, in all datasets, according to HIPPA recommendations.
|
||||
|
||||
We noted this dataset contains more males than females.
|
||||
|
||||
Higer Precision (Generation Intensity) implies the attributes are closer to the ones in the original dataset, therefore
|
||||
provide higher utility.
|
||||
|
||||
We noted that the contingency between `sex` and `relationship` maintained the same distribution after anonymization,
|
||||
meaning that these changes don't mean `relationship` can identify an individual's `sex` any more than in the original dataset.
|
||||
|
||||
We exported the anonymized dataset and used the following command to verify there weren't any discrepencies between the
|
||||
`education` and `education-num` columns:
|
||||
|
||||
```bash
|
||||
cat anonymized.csv | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f4,5 | sort -u
|
||||
```
|
||||
|
||||
|
||||
/projects/uni/DataAnonymisation/ (master)$ cat adult_data.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d',' -f8,10 | sort | uniq -c | sort -n
|
||||
1 Husband, Female
|
||||
2 Wife, Male
|
||||
430 Other-relative, Female
|
||||
551 Other-relative, Male
|
||||
792 Unmarried, Male
|
||||
1566 Wife, Female
|
||||
2245 Own-child, Female
|
||||
2654 Unmarried, Female
|
||||
2823 Own-child, Male
|
||||
3875 Not-in-family, Female
|
||||
4430 Not-in-family, Male
|
||||
13192 Husband, Male
|
||||
~/projects/uni/DataAnonymisation/ (master)$ cat anonymized.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f8,10 | sort | uniq -c | sort -n
|
||||
1 Husband Female
|
||||
2 Wife Male
|
||||
168 Other-relative *
|
||||
336 Own-child *
|
||||
342 Other-relative Female
|
||||
471 Other-relative Male
|
||||
552 Wife *
|
||||
573 Unmarried Male
|
||||
728 Unmarried *
|
||||
1014 Wife Female
|
||||
1649 Not-in-family *
|
||||
2042 Husband *
|
||||
2081 Own-child Female
|
||||
2145 Unmarried Female
|
||||
2651 Own-child Male
|
||||
3209 Not-in-family Female
|
||||
3447 Not-in-family Male
|
||||
11150 Husband Male
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user