Loan approvals play a crucial role in financial stability, helping individuals and businesses achieve their goals. However, banks and financial institutions need a robust predictive model to assess creditworthiness and minimize risks. This project tackles this challenge using XGBoost, one of the most powerful machine learning algorithms for structured data. Our goal? To predict loan approvals with high accuracy, leveraging extensive Exploratory Data Analysis (EDA) and feature importance techniques to gain deep insights.
Before building our predictive model, we conducted a thorough EDA to uncover trends, distributions, and relationships between variables. Here are the key visual insights:
A histogram showcasing the distribution of loan amounts, revealing common loan ranges and outliers.
A count plot visualizing the proportion of approved and rejected loans, helping us understand class imbalances.
A scatter plot depicting how annual income influences loan amounts, highlighting risk patterns.
A histogram showing the distribution of CIBIL scores, which is a key factor in loan approvals.
A histogram to analyze how loan terms vary across applications, indicating typical repayment periods.
A bar chart showing whether self-employed individuals have higher rejection rates.
A visualization comparing loan approval rates across different education levels.
A correlation heatmap revealing how different features interact, helping us select the most predictive variables.
A boxplot showing how loan amounts differ for approved vs. rejected applications.
A pairplot showing the relationships between major features like income, loan amount, CIBIL score, and loan term.
To build a high-performing model, we utilized XGBoost, an optimized gradient boosting algorithm known for its efficiency and predictive power. Feature selection was guided by gain-based and SHAP-based importance scores.
These scores indicate an extremely accurate model with minimal prediction errors.
Understanding which factors impact loan approvals the most is crucial. We analyzed feature importance using two techniques:
To replicate the results, follow these steps:
git clone https://github.com/DhruvSTrivedi/loan_approval_prediction
cd loan-approval-prediction
Ensure you have Python and the required libraries installed:
pip install -r requirements.txt
python EDA.py
This will generate EDA visuals in the EDA_Visuals
folder.
python mix_XGBoost.py
This will train the XGBoost model and display key performance metrics.
Results, including feature importance and SHAP values, will be saved as images in the output directory.
├── EDA.py # Exploratory Data Analysis Script
├── mix_XGBoost.py # XGBoost Model Training Script
├── feature_importance_gain.png # Gain-Based Feature Importance
├── Feature_importance.png # Overall Feature Importance
├── mean_shap_impact.png # SHAP Value Mean Impact
├── shap_value_impact.png # SHAP Summary Plot
├── README.md # Project Documentation
Contributions are welcome! Feel free to fork the repo and submit PRs.
This project is licensed under the MIT License.
📌 Happy Coding! 🚀