Loan Safety with Decision Trees and a Small Random Forest • Varun Panuganti

Problem & Motivation:

Given LendingClub data with features like grade, home ownership, purpose, term, and debt-to-income ratio, predict whether a loan is a safe loan (+1) or a risky loan (-1).

Data & Approach:

Loaded lending-club-data.csv, created the safe_loans label from bad_loans, and explored features such as grade and home_ownership.
Selected the assignment’s feature list and used pd.get_dummies to one-hot encode categorical columns for sklearn.
Trained DecisionTreeClassifier models with different max_depth values, and used GridSearchCV over max_depth and min_samples_leaf to tune early-stopping settings.
Implemented a small RandomForest416 class that fits multiple trees on bootstrap samples and predicts by majority vote, then compared its train/validation accuracy to a single tree.

Results:

Deeper trees fit the training data very well but did not always improve validation accuracy, showing overfitting at large depths.
The RandomForest416 model generally gave better validation accuracy than a single decision tree at similar depths.
Features based on grade, sub_grade, home_ownership, purpose, and term became usable once expanded into one-hot encoded columns.

Limitations:

Used only decision trees and a small random forest on train/validation splits (no separate held-out test set or additional model families beyond the assignment scope).