Home Page

Case Study

House Prices Prediction

2025
  • Python
  • scikit-learn
  • pandas

Used pandas and linear regression to explore housing data and build models that predict house prices.

Problem & Motivation:

Given a dataset of houses, the goal was to use the input columns to predict the price column without leaking information from the validation or test sets.

Data & Approach:

  • Explored the dataset using pandas to understand the rows, columns, and basic statistics.
  • Split the data into train, validation, and test sets using the provided code.
  • Trained two linear regression models in scikit-learn: one using a small set of basic features and one using a larger set of advanced features.
  • Evaluated both models by computing the RMSE on the training and validation sets.

Results:

  • The model with the advanced features performed better on the validation data, so it was used to compute the final test error.

Limitations:

Only linear regression was used; no other feature sets or modeling choices were explored beyond the assignment requirements.