75 lines
1.1 KiB
Markdown
75 lines
1.1 KiB
Markdown
# Clean dataset
|
|
|
|
The cleaned dataset will have the following structure:
|
|
|
|
| Index | Name | Type | Range |
|
|
|-------|------------|-------|-------|
|
|
| 0 | Grade | int | [1-5] |
|
|
| 1 | Sex | enum | [0-1] |
|
|
| 2 | GPA | float | [1-5] |
|
|
| 3 | Math | int | [1-5] |
|
|
| 4 | Slovak | int | [1-5] |
|
|
| 5 | English | int | [1-5] |
|
|
| 6 | SES | enum | [0-2] |
|
|
| 7 | Occupation | enum | [0-5] |
|
|
| 8 | Living | enum | [0-4] |
|
|
| 9 | Commute | enum | [0-4] |
|
|
| 10 | Sleep | enum | [0-2] |
|
|
| 11 | Absence | int | [0-∞] |
|
|
|
|
It will be saved in a `.npy` file (numpy format)
|
|
|
|
### Sex
|
|
|
|
```
|
|
0 - female
|
|
1 - male
|
|
```
|
|
|
|
### SES
|
|
|
|
```
|
|
0 - lower class
|
|
1 - middle class
|
|
2 - upper class
|
|
```
|
|
|
|
### Occupation
|
|
|
|
```
|
|
0 - work hours / week >= 10
|
|
1 - work hours / week < 10
|
|
2 - sport
|
|
3 - music
|
|
4 - other
|
|
5 - none
|
|
```
|
|
|
|
### Living
|
|
|
|
```
|
|
0 - with family
|
|
1 - with family member
|
|
2 - alone / roomates
|
|
3 - dorms
|
|
4 - other
|
|
```
|
|
|
|
### Commute
|
|
|
|
```
|
|
0 - dorms
|
|
1 - <= 15m
|
|
2 - <= 30m
|
|
3 - <= 1h
|
|
4 - > 1h
|
|
```
|
|
|
|
### Sleep
|
|
|
|
```
|
|
0 - short sleepers
|
|
1 - medium sleepers
|
|
2 - long sleepers
|
|
```
|