@henrytdowling So overall I'd say FC is less prioritized as it's newer and can be intension with traditional notions of calibration. Re. the post-training thought if you're interested please keep an eye out for another preprint announcement coming soon :))
๐ฅExcited to share our paper: Quantifying Faithful Confidence Expression in Large Reasoning Models (LRMs)!๐ฅ
We trust reasoning models partly because they show their work. But do their words reflect how confident they really are? ๐ค
Check our preprint to find out!
Details ๐งต๐
@henrytdowling Knowing likelihood of accuracy is good, but in some cases expressing this type of "factually" calibrated confidence can undermine truthfulness, the property that LLMs truthfully convey their inner states. In such settings faithful calibration is also important.
When multimodal AI meets real-world expertise, reasoning gets harder, deeper, and much more exciting.
Join us at KnowledgeMR @ #CVPR2026 to push this frontier forward!
๐๏ธThu June 4 | 8am | Room 704/706
Speakers: @thoma_gu@huang_biwei@pliang279@MengdiWang10@xwang_lk
๐ฅExcited to share our paper: Quantifying Faithful Confidence Expression in Large Reasoning Models (LRMs)!๐ฅ
We trust reasoning models partly because they show their work. But do their words reflect how confident they really are? ๐ค
Check our preprint to find out!
Details ๐งต๐
โ ๏ธ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด #๐ฒ: Different intrinsic confidence estimators produce ๐ ๐ฑ๐ถ๐๐ฒ๐ฟ๐ด๐ฒ๐ป๐ faithfulness profiles on identical CoT traces. This reveals fragility in prior evaluation methods & suggests LRM uncertainty signals do not maps neatly to linguistic expression.
(13/n) If you're interested in LLM reasoning, uncertainty, or faithfulness, check out our paper and analysis framework! We'd love feedback or questions ๐
๐ Paper: https://t.co/q1cSFBqc9D
๐ Github: https://t.co/dF0Il5iloR
โ ๏ธ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด #๐ฑ: Trajectory-level faithfulness dynamics ๐ vary with model and estimator. Expressed confidence of later reasoning steps is ๐ซ not uniformly more faithful than earlier ones, despite being more calibrated with accuracy.
(3/n) Yet studying FC in LRMs is uniquely hard ๐. Long CoT traces lack clean step boundaries, exhibit inconsistent step structure, and encode complex conditional dependencies that evolve throughout the trace โ making existing FC evaluation methods ill-suited to this setting.
โ ๏ธ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด #๐ฐ: Prompt interventions that boost FC in standard LLMs fail to generalize to LRMs ๐. Even metacognitive prompting โ shown in prior work to robustly improve faithful calibration of non-reasoning models โ yields minimal gains in the reasoning setting.
โ ๏ธ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด #๐ฏ: ๐๐ถ๐๐๐ถ๐น๐น๐ฎ๐๐ถ๐ผ๐ป differentially reshapes & ๐ฑ๐ถ๐๐๐ผ๐ฟ๐๐ FC vs. reasoning training in ways that cannot be inferred from architecture, scale, or accuracy alone โ ๐ญ distilled models should ๐ป๐ผ๐ be treated as FC proxies for their teachers!
โ ๏ธ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด #๐ฎ: โ๏ธ Reasoning training ๐ degrades FC. Comparing matched reasoning & non-reasoning checkpoints of the same model backbone, reasoning-tuned variants produce more hesitation, but surface-level caution does not correspond to lower internal confidence.
โ ๏ธ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด #๐ญ: Reasoning behaviors do not automatically translate to improved faithfulness of uncertainty expression. LRMs remain highly decisive even when frequently wrong ๐ฌ, and model size provides limited assistance to LRMs, โ in contrast to FC of LLMs.
(6/n) We apply our framework across ๐ค 7 models, ๐งฉ 5 diverse reasoning-intensive datasets (math, science, law, multi-step soft reasoning), and various ๐งช prompt interventions, finding that faithful confidence expression remains a significant challenge for LRMs ๐.
(5/n) This gives a ๐ญ multi-dimensional view of faithfulness throughout a CoT trace. We also introduce a ๐ก prefix-conditioned sampling approach to control for conditional dependencies and structure across sampled trace โa key challenge that existing methods overlook.
(4/n) To address this, we present a novel framework to systematically quantify FC in LRMs ๐ฏ. Our framework analyzes linguistic decisiveness against 3๏ธโฃ complementary sources of internal confidence, derived from ๐ต hidden states, โ๏ธ token probabilities, & sampling consistency โ๏ธ.
(2/n) Faithful calibration (FC)โthe alignment between models' ๐ช๐ฏ๐ต๐ณ๐ช๐ฏ๐ด๐ช๐ค & ๐ฆ๐น๐ฑ๐ณ๐ฆ๐ด๐ด๐ฆ๐ฅ uncertaintyโis a persistent failure mode for LLMs ๐. This is especially consequential for LRMs, whose reasoning traces are seen as concrete signals of competence & confidence.