AIED 2026 · Long Paper · Seoul

Can We Trust AI’s
Self-Assessment?

Evaluating and improving LLM confidence calibration in educational dialogue coding.

University of Florida · Florida State University · VIABLE Lab

Hongming (Chip) LiDr. Huan KuangDr. Anthony F. Botelho
Confidence distribution density under three anchoring conditions
When a model says “confidence: 0.9,” can we believe it?