loan_approval_prediction

Loan Approval Prediction using XGBoost

📖 Project Story: Why This Matters?

Loan approvals play a crucial role in financial stability, helping individuals and businesses achieve their goals. However, banks and financial institutions need a robust predictive model to assess creditworthiness and minimize risks. This project tackles this challenge using XGBoost, one of the most powerful machine learning algorithms for structured data. Our goal? To predict loan approvals with high accuracy, leveraging extensive Exploratory Data Analysis (EDA) and feature importance techniques to gain deep insights.


📊 Exploratory Data Analysis (EDA)

Before building our predictive model, we conducted a thorough EDA to uncover trends, distributions, and relationships between variables. Here are the key visual insights:


1️⃣ Loan Amount Distribution

A histogram showcasing the distribution of loan amounts, revealing common loan ranges and outliers.

Loan Amount Distribution


2️⃣ Loan Status Distribution (Approved vs. Rejected)

A count plot visualizing the proportion of approved and rejected loans, helping us understand class imbalances.

Loan Amount Distribution


3️⃣ Income vs. Loan Amount (Scatter Plot)

A scatter plot depicting how annual income influences loan amounts, highlighting risk patterns.

Loan Amount Distribution


4️⃣ CIBIL Score Distribution

A histogram showing the distribution of CIBIL scores, which is a key factor in loan approvals.

Loan Amount Distribution


5️⃣ Loan Term Distribution

A histogram to analyze how loan terms vary across applications, indicating typical repayment periods.

Loan Amount Distribution


6️⃣ Self-Employment vs. Loan Status

A bar chart showing whether self-employed individuals have higher rejection rates.

Loan Amount Distribution


7️⃣ Education Level Impact

A visualization comparing loan approval rates across different education levels.

Loan Amount Distribution


8️⃣ Feature Correlation Heatmap

A correlation heatmap revealing how different features interact, helping us select the most predictive variables.

Loan Amount Distribution


9️⃣ Loan Amount by Loan Status

A boxplot showing how loan amounts differ for approved vs. rejected applications.

Loan Amount Distribution


🔟 Pairplot of Key Features

A pairplot showing the relationships between major features like income, loan amount, CIBIL score, and loan term.

Loan Amount Distribution


🚀 Model Used: XGBoost

To build a high-performing model, we utilized XGBoost, an optimized gradient boosting algorithm known for its efficiency and predictive power. Feature selection was guided by gain-based and SHAP-based importance scores.

🎯 Final Model Performance:

These scores indicate an extremely accurate model with minimal prediction errors.


🔥 Feature Importance Analysis

Understanding which factors impact loan approvals the most is crucial. We analyzed feature importance using two techniques:

1️⃣ Gain-Based Feature Importance

Feature Importance (Gain)


2️⃣ SHAP Value Impact

Feature Importance (Weight)


Mean SHAP Impact


SHAP Value Impact


🛠️ How to Run This Project?

To replicate the results, follow these steps:

1️⃣ Clone the Repository

git clone https://github.com/DhruvSTrivedi/loan_approval_prediction
cd loan-approval-prediction

2️⃣ Install Dependencies

Ensure you have Python and the required libraries installed:

pip install -r requirements.txt

3️⃣ Run EDA Analysis

python EDA.py

This will generate EDA visuals in the EDA_Visuals folder.

4️⃣ Train the Model

python mix_XGBoost.py

This will train the XGBoost model and display key performance metrics.

5️⃣ Evaluate Predictions

Results, including feature importance and SHAP values, will be saved as images in the output directory.


📌 Project Structure

├── EDA.py                    # Exploratory Data Analysis Script
├── mix_XGBoost.py            # XGBoost Model Training Script
├── feature_importance_gain.png  # Gain-Based Feature Importance
├── Feature_importance.png     # Overall Feature Importance
├── mean_shap_impact.png       # SHAP Value Mean Impact
├── shap_value_impact.png      # SHAP Summary Plot
├── README.md                 # Project Documentation

🎯 Future Improvements


🤝 Contributing

Contributions are welcome! Feel free to fork the repo and submit PRs.


📜 License

This project is licensed under the MIT License.

📌 Happy Coding! 🚀