A Genetic Algorithm-based Local Outlier Factor for Efficient Big Data Stream Processing
Alghushairy, Omar Saleh. (2021-05). A Genetic Algorithm-based Local Outlier Factor for Efficient Big Data Stream Processing. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/alghushairy_idaho_0089e_12040.html
- Title:
- A Genetic Algorithm-based Local Outlier Factor for Efficient Big Data Stream Processing
- Author:
- Alghushairy, Omar Saleh
- Date:
- 2021-05
- Keywords:
- Data Science Outlier Detection
- Program:
- Computer Science
- Subject Category:
- Computer science
- Abstract:
-
Interest in outlier detection methods is increasing because detecting outliers is an important operation for many applications such as detecting fraud transactions in credit card, network intrusion detection and data analysis in different domains. We are now in the big data era, and an important type of big data is data stream. With the increasing necessity for analyzing high-velocity data streams, it becomes difficult to apply older outlier detection methods efficiently. Local Outlier Factor (LOF) is a well-known outlier algorithm. A major challenge of LOF is that it requires the entire dataset and the distance values to be stored in memory. Another issue with LOF is that it needs to be recalculated from the beginning if any change occurs in the dataset. This research proposes a novel local outlier detection algorithm for data streams, called Genetic-based Incremental Local Outlier Factor (GILOF). Moreover, we further improved the GILOF performance in data streams by proposing a new calculation method for LOF, called Local Outlier Factor by Reachability distance (LOFR). The improved algorithm for local outlier detection in data stream is called the Genetic-based Incremental Local Outlier Factor by Reachability distance (GILOFR). The GILOF and GILOFR algorithms work without any previous knowledge of data distribution, and they are able to execute in limited memory. The outcomes of our experiments with various real-world datasets demonstrate that the proposed algorithms have very good performance in execution time and accuracy of outlier detection.
- Description:
- doctoral, Ph.D., Computer Science -- University of Idaho - College of Graduate Studies, 2021-05
- Major Professor:
- Ma, Xiaogang
- Committee:
- Soule, Terence ; Sheldon, Frederick; Song, Jia
- Defense Date:
- 2021-05
- Identifier:
- Alghushairy_idaho_0089E_12040
- Type:
- Text
- Format Original:
- Format:
- application/pdf
- Rights:
- In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
- Standardized Rights:
- http://rightsstatements.org/vocab/InC-EDU/1.0/