This repository has been archived on 2023-08-20. You can view files and clone it, but cannot push or open issues or pull requests.
DataAnonymisation/report.md
2022-06-02 17:25:37 +01:00

2.1 KiB

Despite high values of distinction (66.48%) and separation (99.99%) the fnlwgt column is not a QID becuase it represents a weight, not a count of individuals in the same equivalence class in the original dataset. Additionally, it's not easily connected to another auxiliary info dataset.

We determined that age is a QID, since it's widely regarded as such, in all datasets, according to HIPPA recommendations.

We noted this dataset contains more males than females.

Higer Precision (Generation Intensity) implies the attributes are closer to the ones in the original dataset, therefore provide higher utility.

We noted that the contingency between sex and relationship maintained the same distribution after anonymization, meaning that these changes don't mean relationship can identify an individual's sex any more than in the original dataset.

We exported the anonymized dataset and used the following command to verify there weren't any discrepencies between the education and education-num columns:

cat anonymized.csv | sed -r 's/,([^ ])/\t\1/g' | cut -d'  ' -f4,5 | sort -u

/projects/uni/DataAnonymisation/ (master)$ cat adult_data.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d',' -f8,10 | sort | uniq -c | sort -n 1 Husband, Female 2 Wife, Male 430 Other-relative, Female 551 Other-relative, Male 792 Unmarried, Male 1566 Wife, Female 2245 Own-child, Female 2654 Unmarried, Female 2823 Own-child, Male 3875 Not-in-family, Female 4430 Not-in-family, Male 13192 Husband, Male ~/projects/uni/DataAnonymisation/ (master)$ cat anonymized.csv | tail -n +2 | sed -r 's/,([^ ])/\t\1/g' | cut -d' ' -f8,10 | sort | uniq -c | sort -n 1 Husband Female 2 Wife Male 168 Other-relative * 336 Own-child * 342 Other-relative Female 471 Other-relative Male 552 Wife * 573 Unmarried Male 728 Unmarried * 1014 Wife Female 1649 Not-in-family * 2042 Husband * 2081 Own-child Female 2145 Unmarried Female 2651 Own-child Male 3209 Not-in-family Female 3447 Not-in-family Male 11150 Husband Male