Think about that you simply’ve skilled a predictive mannequin with an accuracy rating as excessive as 0.9. The analysis metrics like precision, recall and f1-score additionally seem promising. However your expertise and instinct advised you that one thing isn’t proper so you probably did additional investigation and located this:
The mannequin’s seemingly sturdy efficiency is pushed by the bulk class 0
in its goal variable. Because of the evident imbalance between the bulk and minority courses, the mannequin excels at predicting its majority class 0
whereas the efficiency of the minority class 1
is much from passable. Nonetheless, as a result of class 1
represents a really small portion of the goal variable, its efficiency has little influence on the general scores of those analysis metrics, which provides you an phantasm that the mannequin is robust.
This isn’t a uncommon case. Quite the opposite, knowledge scientists often come throughout imbalanced datasets within the real-world tasks. An imbalanced dataset refers to a dataset the place the courses or classes will not be…