FRITO-LAY ATTRITION
Employee Retention Predictive Analytics
Comprehensive statistical analysis of employee attrition patterns using advanced machine learning models to identify key retention factors and predict turnover risk across 870 employee records.
This data science project delivered actionable business insights through multiple modeling approaches, resulting in targeted retention strategies with measurable impact on organizational costs.
Dataset Overview
The dataset contained comprehensive employee information including demographics, job characteristics, satisfaction metrics, and compensation data, providing a robust foundation for predictive modeling.
Key Findings
Attrition Patterns
- Higher attrition among younger employees, especially those earning below $5,000
- Employees with 1-5 years showed 10% higher attrition than 6-10 year veterans
- Job satisfaction alone was not a decisive factor in retention
Departmental Insights
- Sales: $6,500 average income, 20% attrition rate
- R&D: $5,800 average income, 12% attrition rate
- Advanced job levels showed only 2% lower attrition than entry levels
Model Performance
Attrition Prediction
Naïve Bayes with Feature Engineering
Salary Prediction
Linear Regression Model
Features: Age, Department, YearsAtCompany, JobLevel, TotalWorkingYears
Business Recommendations
Competitive Salary Review
Address salary discrepancies across departments
Targeted Retention Programs
Focus on high-risk groups identified by the model
Enhanced Job Satisfaction
Implement measures beyond traditional satisfaction metrics
Flexible Work Arrangements
Address work-life balance concerns
Revise Salary Structure
Align compensation with performance and market rates
Foster Internal Mobility
Create clear career advancement pathways
Implement Predictive Analytics
Use models for proactive retention interventions
Policy Changes
Target interventions for employees with 1-5 years tenure
Technical Approach
Feature Engineering
Enhanced model performance through strategic feature creation including income bands, interaction terms between job satisfaction and income, high-income flags, and satisfaction-adjusted income metrics.
Model Selection
Evaluated multiple approaches including KNN, Linear Regression, and Naïve Bayes models. The Naïve Bayes model with feature engineering achieved optimal performance with balanced sensitivity and specificity.
Statistical Validation
All models demonstrated statistical significance (p < 0.001) with robust cross-validation procedures. Threshold optimization at 0.6 provided the best balance between false positives and false negatives.