Data Science
“IBM” HR Analytics Data (EDA)
Using analytics to decode hiring
Data Analytics is the main driving force of change for HR Professionals across industries. Right from hiring the right talent to increasing the employee retention rate, HR analytics can change it all.
“Today HR has a seat at the table, and in order to maintain that business partnership, you need to have an analytics framework” — Andy Kaslow
For exploring the HR analytics domain, I have downloaded the IBM HR Analytics dataset from Kaggle. This dataset is quite a detailed one, providing details on employees who have left the firm as well as those who are still with the firm.
Let’s begin with the EDA process.
Data Preparation and Cleaning
- Reading the CSV file and doing initial statistical analysis (shape, values etc)
- Data Preprocessing: Reading the uniques values for each column and removing those which won’t be significant in the analysis further.
- Create a new dataframe to proceed with the analysis further.
for i in emp_df: print(“No of Values for {} is {}”.format(i,str(emp_df[i].nunique())))
Exploratory Analysis and Visualization
- Find patterns of data through visualization and reveal the hidden trends from data.
- Using both matplotlib and seaborn library to visualize the data
- Finding relationships between features using bar graphs, histograms, box plots, heatmap
- Analyzing both the numerical and the categorical columns separately
Have come up with few questions to do my analysis around attrition reasons. I have also tried using all the learned concepts to achieve the same.
Q1: Which department and job role has seen the most attrition?
I have displayed the rate of attrition across departments and job roles. As we can see, Human Resources and Technical Degree had most attrition rate. While the Sales representative job role was most impacted by attrition.
Q2: What is the average no of companies worked previously?
Created a column called generation based on the age of employees. This was done to understand if there is a difference in the behavior of young versus old generation employees. It currently shows a lesser number for millennials compared to the boomers as they obviously are younger. In future, I would study the attrition rate of the generation category to get a better picture.
Q3: Does promotion affect attrition?
As we can see in the above scatter plot, all the employees who attrited have got promotions in less than 1 year duration. While employees who haven’t got promotions are still with the company. It shows the pattern that employees maybe wait in a company until they get promotion and start looking for opportunities outside as soon as they get promoted.
Inferences and Conclusion
- The correlation graph showed a correlation between different features. It showed how performance rating & hike is highly correlated.
- Sales Representative Job Role has the highest attrition rate.
- Most employees leave their job after getting a promotion.
- Employees of the same department who get attrited have lesser monthly income compared to those who did not get attrited
- Promotion impacts attrition rate: Employees who get promoted leaves sooner than those who didn't get promotions.
References and Future Work
I will be applying the Machine learning algorithm to this dataset to predict if employees will leave organization in the next X months.
References:
- https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas
- https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
- https://matplotlib.org/3.1.1/index.html
- https://pandas.pydata.org/pandas-docs/stable/index.html
- https://www.geeksforgeeks.org/
- https://seaborn.pydata.org/examples/index.html
You can see the complete code on my github profile:
Suggestions
Please share your thoughts, ideas, and suggestions below.
Thanks for the read!