Case Study
House Prices Prediction
2025
- Python
- scikit-learn
- pandas
Used pandas and linear regression to explore housing data and build models that predict house prices.
Problem & Motivation:
Given a dataset of houses, the goal was to use the input columns to predict the price column without leaking information from the validation or test sets.
Data & Approach:
- Explored the dataset using pandas to understand the rows, columns, and basic statistics.
- Split the data into train, validation, and test sets using the provided code.
- Trained two linear regression models in scikit-learn: one using a small set of basic features and one using a larger set of advanced features.
- Evaluated both models by computing the RMSE on the training and validation sets.
Results:
- The model with the advanced features performed better on the validation data, so it was used to compute the final test error.
Limitations:
Only linear regression was used; no other feature sets or modeling choices were explored beyond the assignment requirements.