Learning Imbalanced Data Sets with Noisy Replication
Dong, Ensheng. (2017). Learning Imbalanced Data Sets with Noisy Replication. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/dong_idaho_0089n_11106.html
- Title:
- Learning Imbalanced Data Sets with Noisy Replication
- Author:
- Dong, Ensheng
- Date:
- 2017
- Keywords:
- Imbalanced data Machine learning Noisy replication
- Program:
- Statistical Sciences
- Subject Category:
- Statistics; Mathematics
- Abstract:
-
The noisy replication method has been proven to be an effective approach in learning the imbalanced binary data set in previous researches. This thesis expands its concept and effectiveness in broader scenarios: we study with several levels of sigma noise, a wide range of imbalanced ratios (IR), eight commonly used machine learning models, both binary and multi-class data sets, adding both noise and anti-noise, and more than 60 simulated and real data sets, etc. This thesis finds that the performance of the noisy replication method is significantly improved with the increase of IR by adding a relatively small noise for some models, KNN, Neural Network and C5.0, for instance. Moreover, it further shows that the noisy replication method is an ideal model-free approach in learning both the binary and the multi-class imbalanced data sets in terms of ROC area and Kullback-Leibler distance.
- Description:
- masters, M.S., Statistical Sciences -- University of Idaho - College of Graduate Studies, 2017
- Major Professor:
- Lee, Stephen S
- Committee:
- Wiest, Michelle M; Gao, Fuchang
- Defense Date:
- 2017
- Identifier:
- Dong_idaho_0089N_11106
- Type:
- Text
- Format Original:
- Format:
- application/pdf
- Rights:
- In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
- Standardized Rights:
- http://rightsstatements.org/vocab/InC-EDU/1.0/