Edu2Job is a machine learning project developed as part of the Infosys Springboard Internship (Batch 3).
The goal is to predict suitable job roles based on a candidate’s educational qualification, experience, and skills.
- Dataset Source: Kaggle –
candidate_job_role_dataset - Tasks Completed:
- Data Cleaning (duplicates, missing values, normalization)
- Feature extraction (
num_skills,qualification_course) - Exploratory Data Analysis (EDA)
- Visualization using Seaborn & Matplotlib
Cleaned file: cleaned_job_roles.csv
Notebook: milestone1_EDA.ipynb
- Encoded categorical features (qualification, experience_level, etc.)
- Standardized numerical columns
- Feature Selection
- Split data into train and test sets
Machine Learning Models Used -Logistic Regression -Decision Tree Classifier -Random Forest Classifier -Support Vector Machine (SVM) -XGBoost Classifier
Techniques Applied
1️.Cross-Validation: To ensure reliability and reduce bias, k-Fold Cross Validation (k=5) was applied to assess each model’s performance consistency across different splits.
2️.Regularization: Regularization techniques were applied to control overfitting by limiting model complexity Logistic Regression: Used L2 regularization Random Forest: Reduced max_depth, max_features, and increased min_samples_split XGBoost: Tuned gamma, reg_alpha, and reg_lambda parameters
3️.Hyperparameter Tuning (RandomizedSearchCV): To identify optimal hyperparameters efficiently, RandomizedSearchCV was applied to XGBoost and Random Forest models to achieve better generalization and balanced accuracy.
- XGBoost - best model
- Streamlit – for the web interface
- SQLite3 – for storing user and prediction data
- Scikit-learn / Joblib – for machine learning model handling
- Predicts top 3 job roles based on user profile
- Uses trained ML model (
best_model.pkl) for prediction - Stores user data using SQLite
- Simple and interactive Streamlit web interface
- Modular design with separate database helper file (
db_helper.py) - Admin Dashboard
This project is licensed under the MIT License – see the 'LICENSE' file for details.
For any questions, reach out to Milka Joseph.