Home Page

Projects

Here are my projects:

U.S. Traffic, Pollution & Accidents — Data Analysis

2025

Merged 3 nationwide datasets (33M congestion rows, 46-col accidents, 22-col pollution) into a unified state-day panel; explored how congestion relates to accidents, seasonality, and emissions using statistical models & geospatial plots.

Pythonpandasscikit-learnGeoPandasData CleaningRegressionClassification

LLM Uncertainty Quantification

2025

Tool that runs multiple LLMs on the same dataset and reports confidence, calibration (ECE), and an aggregated ensemble prediction. Includes CSV uploads and a simple React + Node/Python workflow.

PythonpandasMachine LearningGit/GitHub

Stock Market Prediction — CPI → S&P 500

2025

Full-stack app that tests how inflation data (CPI) relates to short-term S&P 500 returns. Includes a FastAPI backend for models and a Next.js dashboard for running scenarios.

Pythonscikit-learnpandasMachine LearningGit/GitHubFastAPINext.js

Student Social Media Addiction — Relational DB & Analytics

2025

Designed a normalized SQL Server schema and analytic queries to study how student social media use relates to sleep, mental health, relationships, and academics.

SQL ServerT-SQLDatabase DesignERDData Modeling

Deques (ArrayDeque & LinkedDeque)

2024

Implemented two representations of the Deque abstract data type: an array-based version with front/back indices and a node-based version using sentinel nodes. Focused on correctness, invariants, and performance.

JavaData StructuresJUnit/Testing

Autocomplete

2024

Built multiple implementations of the autocomplete operation and compared how different data structures handle prefix queries, sorting, and returning the top matches.

JavaData StructuresAlgorithmsJUnit/Testing

Priority Queues (MinPQ)

2024

Implemented multiple versions of a priority queue and compared how their different representations affect operations like remove-smallest and changing priorities.

JavaData StructuresAlgorithmsJUnit/Testing

Shortest Paths & Seam Carving

2024

Implemented shortest-path algorithms on weighted directed graphs and applied the same ideas to find minimum-energy seams for image resizing.

JavaGraphsDynamic ProgrammingShortest Paths

House Prices Prediction

2025

Used pandas and linear regression to explore housing data and build models that predict house prices.

Pythonscikit-learnpandas

Sentiment Analysis (Amazon Reviews)

2025

Used product review data from Amazon.com, turned reviews into word-count features, and trained logistic regression models to predict whether a review is positive or negative.

Pythonpandasscikit-learnLogistic Regression

Loan Safety with Decision Trees and a Small Random Forest

2025

Used LendingClub loan data to predict whether a loan is safe (+1) or risky (-1) with decision trees and a simple random forest, and compared training vs validation accuracy across different tree depths.

Pythonpandasscikit-learnDecision Trees

CIFAR-10 Image Classification (NetA–NetD)

2025

Built and trained several PyTorch neural networks (NetA–NetD) on CIFAR-10 to classify 32×32 color images into 10 classes, comparing fully connected and convolutional models using GPU training.

PythonPyTorchDeep LearningNeural Networks

K-Means from Scratch (Wikipedia, TF-IDF)

2025

Implemented k-means in NumPy and applied it to TF-IDF vectors of ~5.9k Wikipedia biographies to study clustering behavior under different inits and K values.

PythonNumPyClusteringMachine Learning

Twitter Topic Modeling (NMF)

2025

Modeled ~119k April 30, 2020 COVID-era tweets with TF-IDF + NMF to discover latent topics, inspect top words per topic, and analyze tweet–topic weights and outliers.

PythonNMF/Topic ModelingMachine Learning

INFO 201 Capstone — Stock Market as an Economic Indicator

2025

RMarkdown analysis tying S&P 500 trends to CPI, tax burden, and housing costs (US vs non-US), with fully reproducible data cleaning, merging, and visualization.

RtidyverseData WranglingVisualization

Stochastic Model Selection via MC3

2025

MC3 over add/remove-one neighborhoods; rcdd linearity checks; MH with neighbor-count correction; compared to greedy selections.

RStatistical ComputingBayesian Methods

Bayesian Univariate Logistic Regression (Laplace + MH)

2025

Posterior mode via Newton–Raphson; Laplace approximation; MH sampler; parallelized 60 fits with snow; posterior means + MLE sanity checks.

RStatistical ComputingBayesian MethodsParallel Computing

Marginal Likelihood for Linear Regression (C/C++: LAPACK & GSL)

2025

Two high-perf C/C++ versions (LAPACKE, GSL) for LM marginal likelihood; GEMM/solve/log-det with careful memory/layout; matched R baseline and spec.

C/C++LAPACKGSLStatistical ComputingPerformance Optimization

MPI Volleyball Match Simulator

2025

13-process simulation (referee + 12 players) with point-to-point messages; rally probabilities; compact payloads; scoring/set logic; clean termination.

C/C++, MPIParallel Computing