Project Details
Predicting Factors Behind Employee Attrition
Using Logistic Regression, Random Forest Classifier, and XGBoost
This is the capstone project from the Google Advanced Data Analytics Professional Program on Coursera
Introduction
In today's fiercely competitive job market, employee attrition poses a significant challenge for organizations. High turnover rates disrupt productivity, strain resources, and impact overall business performance. To tackle this issue, I embarked on a data-driven journey to uncover the factors behind employee attrition and identify potential solutions.
Stakeholders:
The stakeholders for this project include the HR department, management team, and company executives, all of whom are invested in understanding employee turnover and identifying factors contributing to attrition.
Goal
The goal of this project is to analyze employee turnover and identify the key factors influencing attrition. This insight enables the company to take proactive measures to improve employee retention and enhance workforce management.
Objective
The objective was to analyze a comprehensive HR dataset and build predictive models to gain insights into why employees leave the company. By understanding the key drivers of attrition, we aimed to provide actionable recommendations to reduce turnover and boost employee retention.
Approach
Situation
Our dataset included various employee attributes such as satisfaction level, last evaluation, number of projects, average monthly hours, tenure, work accidents, promotions, salary, and department. It covered both employees who had left the company and those who had stayed.
Task
The primary task was to conduct Exploratory Data Analysis (EDA) to understand the relationships between variables and identify patterns and trends. We aimed to explore the impact of different factors on employee attrition and create predictive models for informed decision-making.
Action
During EDA, we identified and removed duplicates from the dataset. Columns were renamed, and categorical variables were encoded for model building. Data visualization tools such as box plots, scatter plots, and bar charts were used to uncover insights.
We then applied machine learning models, including Logistic Regression, Random Forest, and XGBoost, to predict employee attrition. Hyperparameters were optimized using GridSearchCV, and models were evaluated based on accuracy, precision, recall, F1-score, and AUC-ROC.
View the full notebook below:
Comparison of Evaluation Metrics:
Model | Accuracy | Precision | Recall | F1-score | AUC-ROC |
---|---|---|---|---|---|
Logistic Regression | 82.51% | 46.69% | 26.19% | 33.56% | 88.13% |
Random Forest Classifier | 98.50% | 98.05% | 92.80% | 95.35% | 98.06% |
Result
Key insights from the analysis include:
- Low satisfaction levels and low tenure were strongly associated with higher attrition rates.
- Work accidents had a minimal impact on attrition.
- Lack of promotions was linked to increased attrition.
- Sales and technical departments experienced higher attrition.
- Salary levels were crucial, with employees in lower salary brackets more likely to leave.
Conclusion
Our data-driven approach provided valuable insights into employee attrition. Organizations can leverage these findings to take targeted actions aimed at improving employee retention.
Recommendations
Based on the results, we recommend:
- Conducting regular employee satisfaction surveys to identify and address areas of improvement.
- Implementing talent development programs to offer career advancement opportunities and promotions.
- Reviewing and adjusting salary scales to ensure competitiveness and retain top talent.
- Providing support for work-life balance to reduce burnout and stress.
Next Steps
While our models demonstrated high accuracy and predictive power, continuous monitoring and refinement are essential. Organizations should regularly update the models with fresh data to maintain their relevance and effectiveness.
Ethical Considerations
Throughout the project, we adhered to ethical principles, ensuring the confidentiality and privacy of sensitive employee data.
In conclusion, our data-driven insights into employee attrition offer a roadmap for organizations to build a motivated and engaged workforce. Employee retention is an investment in the future, and by taking informed actions, companies can create a thriving workplace and foster long-term success.