soc-2024/DATASET.md
Daniel Svitan dfbd7498ed 📝 Updates docs
2024-12-21 21:53:45 +01:00

75 lines
1.1 KiB
Markdown

# Clean dataset
The cleaned dataset will have the following structure:
| Index | Name | Type | Range |
|-------|------------|-------|-------|
| 0 | Grade | int | [1-5] |
| 1 | Sex | enum | [0-1] |
| 2 | GPA | float | [1-5] |
| 3 | Math | int | [1-5] |
| 4 | Slovak | int | [1-5] |
| 5 | English | int | [1-5] |
| 6 | SES | enum | [0-2] |
| 7 | Occupation | enum | [0-5] |
| 8 | Living | enum | [0-4] |
| 9 | Commute | enum | [0-4] |
| 10 | Sleep | enum | [0-2] |
| 11 | Absence | int | [0-∞] |
It will be saved in a `.npy` file (numpy format)
### Sex
```
0 - female
1 - male
```
### SES
```
0 - lower class
1 - middle class
2 - upper class
```
### Occupation
```
0 - work hours / week >= 10
1 - work hours / week < 10
2 - sport
3 - music
4 - other
5 - none
```
### Living
```
0 - with family
1 - with family member
2 - alone / roomates
3 - dorms
4 - other
```
### Commute
```
0 - dorms
1 - <= 15m
2 - <= 30m
3 - <= 1h
4 - > 1h
```
### Sleep
```
0 - short sleepers
1 - medium sleepers
2 - long sleepers
```