U.S. Traffic, Pollution & Accidents — Data Analysis • Varun Panuganti

Problem & Motivation:

Understand how congestion, weather, and pollution interact—e.g., what conditions create high accident risk, how congestion varies by month/state, and whether high-traffic areas contribute more to emissions.

Data & Approach:

Cleaned & merged accidents, pollution, and congestion datasets by grouping to (date, state) and aggregating severity, delays, emissions, and accident counts.
Built EDA: scatterplots, weather-wise accident curves, month-level trends, state-level geospatial maps for congestion and pollutants.
Trained models: linear regression for accidents, logistic regression for congestion buckets, and DecisionTree/RandomForest/GradientBoosting for pollutant prediction.

Results:

Accidents peak at medium congestion under clear weather; delay metrics are right-skewed and weakly correlated.
Congestion stable across states except low-population regions (e.g., ND, ME); peaks in late spring/summer.
SO₂ and NO₂ show the strongest (but still weak) pollution–congestion alignment; GradientBoosting achieved lowest RMSE for all pollutants.

Limitations:

State-level aggregation hides local patterns; weather bucketing simplifies rich categories; correlations can’t confirm causation.