Datasets.md 2.2 KB

Used datasets

The column "feat." is the number of values for each point in the dataset. The value "points total" is the sum of number of points in the minority set and the number of points in the majority set. The column "minority" contains the numer of points in the minority set. The value "common" gives the sum of number of points in the majority set that are also in the minority set and the number of points in the minority set that are also in the majority set. If the value "common" is not zero then even a perfect classifier will have false positive or false negative predictions.

dataset feat. points total minority common
folding
$~$ abalone (17 vs. 7, 8, 9, 10) 8 2338 58 0
$~$ abalone (9-18) 8 731 42 0
$~$ car good 6 1728 69 0
$~$ car vgood 6 1728 65 0
$~$ flare-F 11 1066 43 93
$~$ hypothyroid 25 3163 151 2
$~$ kddcup (guess passwd vs. satan) 38 1642 53 0
$~$ kr-vs-k (3 vs. 11) 6 2935 81 0
$~$ kr-vs-k (zero-one vs. draw) 6 2901 105 0
$~$ shuttle (2 vs. 5) 9 3316 49 0
$~$ winequality (red 4) 11 1599 53 0
$~$ yeast4 10 1484 51 0
$~$ yeast5 10 1484 44 0
$~$ yeast6 10 1484 35 0
imblearn
$~$ webpage 300 34780 981 152
$~$ mammography 6 11183 260 3329
$~$ protein homo 74 145751 1296 0
$~$ ozone level 72 2536 73 0
kaggle
$~$ creditcard 30 284807 492 0