Hybrid model approach in data mining
- Batuhan Bakirarar ,
- Erdal Cosgun ,
- Atilla Halil Elhan
Communications in Statistics - Simulation and Computation |
Studies on hybrid data mining approach has been increasing in recent years. Hybrid data mining is defined as an effective combination of various data mining techniques to use the power of each technique and compensate for each other’s weaknesses. The purpose of this study is to present state-of-the-art data mining algorithms and applications and to propose a new hybrid data mining approach for classifying medical data. In addition, in the study, it was aimed to calculate performance metrics of data mining methods and to compare these metrics with the metrics obtained from the hybrid model. The study utilized simulated datasets produced on the basis of various scenarios and hepatitis dataset obtained from the UCI database. Supervised learning algorithms were used. In addition, hybrid models were created by combining these algorithms. In simulated datasets, it was observed that MCC values increased with a higher sample size and higher correlation between the independent variables. In addition, as the correlation between independent variables increased in imbalanced datasets, a noticeable increase was observed in the performance metrics of the group with lower sample size. A similar case was observed with the actual datasets.