Dataset Imbalance treatment with re-samplers pipeline
Okorie, Ndifreke (2022)
Okorie, Ndifreke
2022
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2022060214854
https://urn.fi/URN:NBN:fi:amk-2022060214854
Tiivistelmä
Imbalance in dataset is an age long thing and it is receiving lots of attention because of how it impacts the outcome of models. Imbalance in the sample simply means a class of sample is over-represented while the other is under-represented. A lot of data analyst use over-sampling and under-sampling, which are some of the methods used in balancing the samples. In this thesis we present both, a derivative of one of them (called SMOTE -Synthetic Minority Oversampling Technique-) so has to see their performance as against that of combining two re-sampling techniques in pipeline. This work compares and contrasts between these standalone and ensembled techniques result, note the pros and cons of each method and show that the ensembled method works to the advantage of the minority class by repeating the experiment with a second severely imbalanced dataset which gives credence to the result because of consistency.