Whenever we do classification in ML, we often assume that target label is evenly distributed in our dataset. This helps the training algorithm to learn the features as we have enough examples for all the different cases. For example, in learning a spam filter, we should have good amount of data which corresponds to emails which are spam and non spam.
This even distribution is not always possible. I’ll discuss one of the techniques known as Undersampling that helps us tackle this issue.
Undersampling is one of the techniques used for handling class imbalance. In this technique, we under sample majority class to match the minority class.
To view the video
- Click here
- Click on the image below