Induced Partitioning for Incremental Feature Selection via Rough Set Theory and Long-tail Position Grey Wolf Optimizer

Said Al Afghani Edsa, Khamron Sunat

Induced Partitioning for Incremental Feature Selection via Rough Set Theory and Long-tail Position Grey Wolf Optimizer

Číslo: 1/2025
Periodikum: Acta Informatica Pragensia
DOI: 10.18267/j.aip.254

Klíčová slova: Optimizer; Rough set theory; Feature selection; Incremental; Data partitioning

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: Background: Feature selection methods play a crucial role in handling challenges such as imbalanced classes, noisy data and high dimensionality. However, existing techniques, including swarm intelligence and set theory approaches, often struggle with high-dimensional datasets due to repeated reassessment of feature selection, leading to increased processing time and computational inefficiency.

Objective: This study aims to develop an enhanced incremental feature selection method that minimizes dependency on the initial dataset while improving computational efficiency. Specifically, the approach focuses on dynamic sampling and adaptive optimization to address the challenges in high-dimensional data environments.

Methods: We implement a dynamic sampling approach based on rough set theory, integrating the Long-Tail Position Grey Wolf Optimizer. This method incrementally adjusts to new data samples without relying on the original dataset for feature selection, reducing variance in partitioned datasets. The performance is evaluated on benchmark datasets, comparing the proposed method to existing techniques.

Results: Experimental evaluations demonstrate that the proposed method outperforms existing techniques in terms of F1 score, precision, recall and computation time. The incremental adjustment and reduced dependence on the initial data improve the overall accuracy and efficiency of feature selection in high-dimensional contexts.

Conclusion: This study offers a significant advancement in feature selection methods for high-dimensional datasets. By addressing computational demands and improving accuracy, the proposed approach contributes to data science and machine learning, paving the way for more efficient and reliable feature selection processes in complex data environments. Future work may focus on extending this method to new optimization frameworks and enhancing its adaptability.