Home Page

Case Study

LLM Uncertainty Quantification

2025
  • Python
  • pandas
  • Machine Learning
  • Git/GitHub

Tool that runs multiple LLMs on the same dataset and reports confidence, calibration (ECE), and an aggregated ensemble prediction. Includes CSV uploads and a simple React + Node/Python workflow.

Problem & Motivation:

Single-model LLM outputs can be unstable, and teams often need a clearer read on confidence before trusting model decisions.

Data & Approach:

  • Built a UI for uploading CSV prompts and entering a Hugging Face token.
  • Backend (Node + Python) runs multiple models on inputted dataset, collects their confidence scores, and computes ECE.
  • Combined the model outputs into a simple confidence-weighted ensemble and exported results as JSON/CSV.

Results:

  • More stable predictions compared to using any one model alone.
  • ECE highlighted the confidence levels of certain models.
  • Easy to plug in and remove models without changing code because of the UI flow.

Limitations:

Slow when many models are selected; model outputs can still be correlated since they’re trained on similar data.