Minor layout changes and text revision
This commit is contained in:
parent
7b7fc0e617
commit
380400f343
44
README.md
44
README.md
@ -7,12 +7,21 @@ author:
|
|||||||
date: 2022/06/02
|
date: 2022/06/02
|
||||||
geometry: margin=2cm
|
geometry: margin=2cm
|
||||||
output: pdf_document
|
output: pdf_document
|
||||||
|
header-includes: |
|
||||||
|
\usepackage{float}
|
||||||
|
\let\origfigure\figure
|
||||||
|
\let\endorigfigure\endfigure
|
||||||
|
\renewenvironment{figure}[1][2]{\expandafter\origfigure\expandafter[H]}{\endorigfigure}
|
||||||
---
|
---
|
||||||
|
|
||||||
|
\vspace{3em}
|
||||||
|
|
||||||
# Attribute classification
|
# Attribute classification
|
||||||
|
|
||||||
We classified the attributes as follows:
|
We classified the attributes as follows:
|
||||||
|
|
||||||
|
\vspace{3em}
|
||||||
|
|
||||||
Attribute | Classification
|
Attribute | Classification
|
||||||
-----------------+---------------
|
-----------------+---------------
|
||||||
`age` | QID
|
`age` | QID
|
||||||
@ -33,6 +42,7 @@ Attribute | Classification
|
|||||||
|
|
||||||
Table: Attribute classifications
|
Table: Attribute classifications
|
||||||
|
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
## Justifications
|
## Justifications
|
||||||
|
|
||||||
@ -46,7 +56,7 @@ set of attributes.
|
|||||||
According to HIPPA recommendations, and together with it's very high separation value (99.87%), we classify
|
According to HIPPA recommendations, and together with it's very high separation value (99.87%), we classify
|
||||||
this attribute as a QID.
|
this attribute as a QID.
|
||||||
|
|
||||||
![Hierarchy for attribute `age`](coding-model/hierarchies/age/age.png){width=14cm}
|
![Hierarchy for attribute `age`](coding-model/hierarchies/age/age.png){width=10cm}
|
||||||
|
|
||||||
|
|
||||||
### `workclass`
|
### `workclass`
|
||||||
@ -60,7 +70,7 @@ deemed Insensitive.
|
|||||||
Despite high values of distinction (66.48%) and separation (99.99%) the `fnlwgt` column is not a QID
|
Despite high values of distinction (66.48%) and separation (99.99%) the `fnlwgt` column is not a QID
|
||||||
because it represents a weight, not a count of individuals in the same equivalence class in the
|
because it represents a weight, not a count of individuals in the same equivalence class in the
|
||||||
original dataset. This can be seen with the results below. Additionally, it's not easily connected
|
original dataset. This can be seen with the results below. Additionally, it's not easily connected
|
||||||
to another auxiliary info dataset.
|
to other auxiliary datasets.
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
$ tail -n '+2' adult_data.csv | awk -F',' '{count[$10] += $3;} \
|
$ tail -n '+2' adult_data.csv | awk -F',' '{count[$10] += $3;} \
|
||||||
@ -79,23 +89,26 @@ Table: Sum of `fnlwgt` for each `sex`
|
|||||||
The sum of these values is 6,179,373,392. This value is much larger than the population of the
|
The sum of these values is 6,179,373,392. This value is much larger than the population of the
|
||||||
U.S.A., the origin of the dataset, which implies this attribute is not a count, as stated.
|
U.S.A., the origin of the dataset, which implies this attribute is not a count, as stated.
|
||||||
|
|
||||||
We also note there are substantially more Male than Female records (more than double the `fnlwgt`).
|
We also note there are substantially more Male than Female records,
|
||||||
|
being that the sum of `fnlwgt` for Male is more than double that of
|
||||||
|
Female, as well as that the number of rows with Female is 10771 and
|
||||||
|
for Male is 21790.
|
||||||
|
|
||||||
### `education`
|
### `education`
|
||||||
|
|
||||||
This attribute presents a separation of 80.96%, which is quite high, thus we classified it as a QID.
|
This attribute presents a separation of 80.96%, which is quite high, thus we classified it as a QID.
|
||||||
|
|
||||||
![Hierarchy for attribute `education`](coding-model/hierarchies/education/education.png){width=14cm}
|
![Hierarchy for attribute `education`](coding-model/hierarchies/education/education.png){width=18cm}
|
||||||
|
|
||||||
|
\vspace{-2em}
|
||||||
|
|
||||||
### `education-num`
|
### `education-num`
|
||||||
|
|
||||||
We exported the anonymized dataset and used the following command to verify there weren't any discrepencies
|
We used the following command to verify there weren't any
|
||||||
between the `education` and `education-num` columns:
|
discrepencies between the `education` and `education-num` columns:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
$ cat anonymized.csv | sed -r 's/,([^ ])/\t\1/g' | awk -F';' '{print $5, $4}' | sort -un
|
$ cat adult_data.csv | awk -F',' '{print $5, $4}' | sort -un
|
||||||
```
|
```
|
||||||
|
|
||||||
Since there was a one-to-one mapping, we confirmed this was just a
|
Since there was a one-to-one mapping, we confirmed this was just a
|
||||||
@ -103,7 +116,9 @@ representation of the `education` attribute. As such, this attribute
|
|||||||
recieves the same classification, which is backed by the equally high
|
recieves the same classification, which is backed by the equally high
|
||||||
separation value of 80.96%, so it's classified as a QID.
|
separation value of 80.96%, so it's classified as a QID.
|
||||||
|
|
||||||
![Hierarchy for attribute `education-num`](coding-model/hierarchies/education/education-num.png){width=14cm}
|
\vspace{-1em}
|
||||||
|
|
||||||
|
![Hierarchy for attribute `education-num`](coding-model/hierarchies/education/education-num.png){height=9.5cm}
|
||||||
|
|
||||||
|
|
||||||
### `marital-status`
|
### `marital-status`
|
||||||
@ -111,21 +126,22 @@ separation value of 80.96%, so it's classified as a QID.
|
|||||||
With a relatively high separation value of 66.01%, together with the fact that it could be cross
|
With a relatively high separation value of 66.01%, together with the fact that it could be cross
|
||||||
referenced with other available datasets, we classify this attribute as a QID.
|
referenced with other available datasets, we classify this attribute as a QID.
|
||||||
|
|
||||||
![Hierarchy for attribute `marital-status`](coding-model/hierarchies/marital-status/marital-status.png){width=14cm}
|
![Hierarchy for attribute `marital-status`](coding-model/hierarchies/marital-status/marital-status.png){width=10cm}
|
||||||
|
|
||||||
|
|
||||||
### `occupation`
|
### `occupation`
|
||||||
|
|
||||||
With a separation of 90.02%, this attribute is classified as a QID.
|
With a separation of 90.02%, this attribute is classified as a QID.
|
||||||
|
|
||||||
![Hierarchy for attribute `occupation`](coding-model/hierarchies/occupation.png){width=14cm}
|
![Hierarchy for attribute `occupation`](coding-model/hierarchies/occupation.png){width=8cm}
|
||||||
|
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
### `relationship`
|
### `relationship`
|
||||||
|
|
||||||
Given it's separation value of 73.21%, this attribute is classified as a QID.
|
Given it's separation value of 73.21%, this attribute is classified as a QID.
|
||||||
|
|
||||||
![Hierarchy for attribute `relationship`](coding-model/hierarchies/relationship/relationship.png){width=14cm}
|
![Hierarchy for attribute `relationship`](coding-model/hierarchies/relationship/relationship.png){width=8cm}
|
||||||
|
|
||||||
|
|
||||||
### `race`
|
### `race`
|
||||||
@ -134,7 +150,7 @@ This collumn presents some weirdly specific values (Amer-Indian-Eskimo), but has
|
|||||||
that this attribute could be cross referenced with other datases, it is classified as a QID, so
|
that this attribute could be cross referenced with other datases, it is classified as a QID, so
|
||||||
it may be transformed into more generic values.
|
it may be transformed into more generic values.
|
||||||
|
|
||||||
![Hierarchy for attribute `race`](coding-model/hierarchies/race.png)
|
![Hierarchy for attribute `race`](coding-model/hierarchies/race.png){width=7cm}
|
||||||
|
|
||||||
|
|
||||||
### `sex`
|
### `sex`
|
||||||
@ -165,7 +181,7 @@ Doctorate | 86 | 327
|
|||||||
|
|
||||||
Table: Number of records with each `education` for each `sex`
|
Table: Number of records with each `education` for each `sex`
|
||||||
|
|
||||||
![Hierarchy for attribute `sex`](coding-model/hierarchies/sex.png)
|
![Hierarchy for attribute `sex`](coding-model/hierarchies/sex.png){width=7cm}
|
||||||
|
|
||||||
|
|
||||||
### `capital-gain` & `capital-loss`
|
### `capital-gain` & `capital-loss`
|
||||||
|
BIN
report.pdf
BIN
report.pdf
Binary file not shown.
Reference in New Issue
Block a user