Home Page

Case Study

INFO 201 Capstone — Stock Market as an Economic Indicator

2025
  • R
  • tidyverse
  • Data Wrangling
  • Visualization

RMarkdown analysis tying S&P 500 trends to CPI, tax burden, and housing costs (US vs non-US), with fully reproducible data cleaning, merging, and visualization.

Problem & Motivation:

Test whether the S&P 500 can be used as a proxy for real-world economic conditions by relating it to inflation (CPI), tax costs, and housing costs in the US and abroad.

Data & Approach:

  • Ingested three public datasets in R (Yahoo Finance S&P 500, FRED CPIAUCSL, Kaggle cost-of-living data) and converted all to a yearly panel with consistent Year keys.
  • Used tidyverse pipelines to clean and aggregate data: yearly averages for CPI and S&P 500, grouped cost-of-living metrics by Year × Country, and derived numeric cost components (housing, healthcare, education, transport, and tax) from percentage fields.
  • Tagged rows as US vs Non-US, merged CPI and S&P 500 only where appropriate (S&P 500 kept for US only), and enforced one row per country-year via summarize(across(..., mean, na.rm = TRUE)).
  • Built ggplot2 visuals for four research questions: S&P 500 vs CPI (scatter + lm line), S&P 500 vs tax cost (dual bar chart + trend), US vs non-US housing cost trends (line plot), and US housing vs S&P 500 (scaled overlay).

Results:

  • Found a positive correlation between S&P 500 closing values and average CPI, consistent with the idea that strong markets often co-occur with periods of rising prices and economic expansion.
  • Observed that S&P 500 levels trend upward over time while tax costs (in dollar terms) fluctuate without a clear long-run pattern, suggesting tax burden is shaped more by policy and macro conditions than by market performance alone.
  • Showed that US housing costs are noticeably more volatile than non-US housing costs from 2000–2023, with sharper booms and dips likely tied to US-specific cycles and policy shocks.
  • For the US, housing costs and the S&P 500 both rise over the long run and sometimes move together (especially post-2012), but the relationship is non-linear and clearly influenced by distinct drivers (rates, demand, supply, etc.).

Limitations:

Correlation-only yearly aggregates (no causal identification); S&P 500 used as a single broad index with no sector breakdown; cross-country housing comparisons ignore purchasing-power differences; some cost-of-living and tax fields contain missing data that were averaged but not imputed, and external macro factors (rates, employment, policy regimes) were left out of the models.