2026-05-27alignmentagentscode

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

Gabrielle Kaili-May Liu, Arman Cohan

Key claim

LLMs struggle with stable and aligned confidence markers.

This paper investigates how well large language models (LLMs) use linguistic confidence markers to express their uncertainty. The key finding is that LLMs are often miscalibrated in their use of these markers, indicating a need for improved alignment to enhance trustworthiness.

In plain English

The authors of this paper explored how well large language models (LLMs) use language to express their confidence in their answers, specifically through phrases that indicate uncertainty, like 'it is likely...'. They discovered that LLMs often misrepresent their confidence levels, meaning they don't reliably use these phrases to reflect their true uncertainty. This is a shift from previous research, which mainly focused on how LLMs understand these markers without assessing their actual performance in using them. The findings suggest that improving how LLMs use these confidence markers could enhance their reliability and trustworthiness in applications. Builders should care about this because better calibration of LLMs can lead to more accurate and dependable AI systems, which is crucial for user trust and effective decision-making.

Novelty

8.0/10

The paper introduces a new framework for evaluating LLMs' confidence markers, which extends existing understanding of calibration.

Reliability

7.5/10

The study employs multiple metrics and diverse models, providing solid evidence for its claims about miscalibration.

Deep reliability assessment

The methodology supports the claim that LLMs struggle to consistently apply their own linguistic confidence framework, but it may overclaim the generalizability of these findings across all LLMs and contexts.

Reproducibility

yes, the paper mentions open source code available at https://github.com/yale-nlp/marker_internal_confidence

Discussion questions

1.How does the assumption that LLMs can have a consistent internal confidence framework align with human cognitive processes?
2.What are the practical implications for developers aiming to improve LLMs' uncertainty communication in real-world applications?
3.What evidence or results would contradict the paper's conclusion that LLMs are faithfully miscalibrated even under model-centric interpretation?

Key figure

Figure 1 illustrates the framework for calculating the internal confidence a model associates with the epistemic marker 'I think.'

GitHub1 repo

yale-nlp/marker_internal_confidenceOfficial