U.S. Traffic, Pollution & Accidents — Data Analysis
2025Merged 3 nationwide datasets (33M congestion rows, 46-col accidents, 22-col pollution) into a unified state-day panel; explored how congestion relates to accidents, seasonality, and emissions using statistical models & geospatial plots.
Pythonpandasscikit-learnGeoPandasData CleaningRegressionClassification
LLM Uncertainty Quantification
2025Tool that runs multiple LLMs on the same dataset and reports confidence, calibration (ECE), and an aggregated ensemble prediction. Includes CSV uploads and a simple React + Node/Python workflow.
PythonpandasMachine LearningGit/GitHub
Stock Market Prediction — CPI → S&P 500
2025Full-stack app that tests how inflation data (CPI) relates to short-term S&P 500 returns. Includes a FastAPI backend for models and a Next.js dashboard for running scenarios.
Pythonscikit-learnpandasMachine LearningGit/GitHubFastAPINext.js
Student Social Media Addiction — Relational DB & Analytics
2025Designed a normalized SQL Server schema and analytic queries to study how student social media use relates to sleep, mental health, relationships, and academics.
SQL ServerT-SQLDatabase DesignERDData Modeling
Deques (ArrayDeque & LinkedDeque)
2024Implemented two representations of the Deque abstract data type: an array-based version with front/back indices and a node-based version using sentinel nodes. Focused on correctness, invariants, and performance.
JavaData StructuresJUnit/Testing
Autocomplete
2024Built multiple implementations of the autocomplete operation and compared how different data structures handle prefix queries, sorting, and returning the top matches.
JavaData StructuresAlgorithmsJUnit/Testing
Priority Queues (MinPQ)
2024Implemented multiple versions of a priority queue and compared how their different representations affect operations like remove-smallest and changing priorities.
JavaData StructuresAlgorithmsJUnit/Testing
Shortest Paths & Seam Carving
2024Implemented shortest-path algorithms on weighted directed graphs and applied the same ideas to find minimum-energy seams for image resizing.
JavaGraphsDynamic ProgrammingShortest Paths
House Prices Prediction
2025Used pandas and linear regression to explore housing data and build models that predict house prices.
Pythonscikit-learnpandas
Sentiment Analysis (Amazon Reviews)
2025Used product review data from Amazon.com, turned reviews into word-count features, and trained logistic regression models to predict whether a review is positive or negative.
Pythonpandasscikit-learnLogistic Regression
Loan Safety with Decision Trees and a Small Random Forest
2025Used LendingClub loan data to predict whether a loan is safe (+1) or risky (-1) with decision trees and a simple random forest, and compared training vs validation accuracy across different tree depths.
Pythonpandasscikit-learnDecision Trees
CIFAR-10 Image Classification (NetA–NetD)
2025Built and trained several PyTorch neural networks (NetA–NetD) on CIFAR-10 to classify 32×32 color images into 10 classes, comparing fully connected and convolutional models using GPU training.
PythonPyTorchDeep LearningNeural Networks
K-Means from Scratch (Wikipedia, TF-IDF)
2025Implemented k-means in NumPy and applied it to TF-IDF vectors of ~5.9k Wikipedia biographies to study clustering behavior under different inits and K values.
PythonNumPyClusteringMachine Learning
Twitter Topic Modeling (NMF)
2025Modeled ~119k April 30, 2020 COVID-era tweets with TF-IDF + NMF to discover latent topics, inspect top words per topic, and analyze tweet–topic weights and outliers.
PythonNMF/Topic ModelingMachine Learning
INFO 201 Capstone — Stock Market as an Economic Indicator
2025RMarkdown analysis tying S&P 500 trends to CPI, tax burden, and housing costs (US vs non-US), with fully reproducible data cleaning, merging, and visualization.
RtidyverseData WranglingVisualization
Stochastic Model Selection via MC3
2025MC3 over add/remove-one neighborhoods; rcdd linearity checks; MH with neighbor-count correction; compared to greedy selections.
RStatistical ComputingBayesian Methods
Bayesian Univariate Logistic Regression (Laplace + MH)
2025Posterior mode via Newton–Raphson; Laplace approximation; MH sampler; parallelized 60 fits with snow; posterior means + MLE sanity checks.
RStatistical ComputingBayesian MethodsParallel Computing
Marginal Likelihood for Linear Regression (C/C++: LAPACK & GSL)
2025Two high-perf C/C++ versions (LAPACKE, GSL) for LM marginal likelihood; GEMM/solve/log-det with careful memory/layout; matched R baseline and spec.
C/C++LAPACKGSLStatistical ComputingPerformance Optimization
MPI Volleyball Match Simulator
202513-process simulation (referee + 12 players) with point-to-point messages; rally probabilities; compact payloads; scoring/set logic; clean termination.
C/C++, MPIParallel Computing