Employee Turnover Prediction Model Based on Feature Selection and Imbalanced Data Handling

Project Code :TCMAPY2109

Objective

This study predicts employee turnover using the Kaggle HR dataset. A machine learning pipeline, including preprocessing, EDA, feature selection (correlation matrix, Chi-Square, RFE), and class imbalance handling (SMOTE, GAN), was applied. Five models were evaluated: SVM + CNN, Stacking Classifier, LightGBM, CatBoost, and a hybrid SVM + CNN model. The hybrid model, trained on GAN-balanced data with RFE features, achieved the best performance with 98% accuracy, 92% recall, and 95% F1-score. The approach was also applied to the IBM HR dataset, showing robust performance across datasets.

Abstract

Employee turnover prediction is vital for organizations aiming to retain talent and reduce operational costs. In this study, we utilized the Kaggle HR dataset to predict employee turnover through a comprehensive machine learning pipeline, including preprocessing, exploratory data analysis (EDA), and feature selection using correlation matrix analysis, Chi-Square tests, and Recursive Feature Elimination (RFE). To address class imbalance, we applied SMOTE and GAN-based oversampling techniques. We evaluated five models: SVM with CNN, Stacking Classifier, LightGBM, CatBoost, and a hybrid SVM + CNN model, assessing their performance using recall, F1-score, Cohen’s Kappa, ROC-AUC, PR-AUC, and confusion matrix. Results revealed that the hybrid SVM + CNN model, trained on GAN-balanced data and using RFE-selected features, achieved the best performance with an accuracy of 0.98, precision of 0.98, recall of 0.92, F1-score of 0.95, and Cohen’s Kappa of 0.94. The Stacking Classifier, LightGBM, and CatBoost also performed well, with results comparable to the hybrid model. The methodology was applied to the IBM HR dataset, where the same techniques improved minority class detection, demonstrating the robustness and generalizability of the framework across different datasets.

Keywords: Employee turnover prediction, human resource management, Kaggle HR dataset, machine learning, recursive feature elimination (RFE), SMOTE

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

SOFTWARE REQUIREMENS

Operating System                               :  Windows 7/8/10

Server side Script                                :  HTML, CSS, Bootstrap & JS

Programming Language                     :  Python

Libraries                                              : Flask, Pandas, TensorFlow, Keras, Sklearn,Numpy , Seaborn,Mysql.connector

IDE/Workbench                                  :  VSCode

Server Deployment                             :  Xampp Server

Database                                             :  SQLite  

 

HARDWARE REQUIREMENTS

Processor                                   - I3/Intel Processor

RAM                                       - 8GB (min)

Hard Disk                                - 128 GB

Key Board                               - Standard Windows Keyboard

Mouse                                      - Two or Three Button Mouse

Monitor                                    - Any

Demo Video

mail-banner
call-banner
contact-banner
Request Video