2026-05-25agentsalignment

Retrying vs Resampling in AI Control

James Lucassen, Adam Kaufman

Key claim

Auditing maximum suspicion scores enhances AI safety.

This paper explores the concepts of retrying and resampling in AI coding tools, highlighting how retrying can reduce suspicion scores but may also allow for sneakier attacks. A key finding is that auditing based on maximum suspicion scores during resampling significantly improves safety without sacrificing usefulness.

Novelty

7.5/10

The paper introduces a new perspective on retrying and resampling in AI coding scaffolds, revealing contradictions with prior findings.

Reliability

8.0/10

The study provides solid experimental results and clear comparisons, supporting its claims effectively.

Deep reliability assessment

The methodology supports the claim that resampling can improve safety without leaking exploitable information, but the effectiveness of retrying protocols against strategic attackers is overclaimed without sufficient evidence.

Reproducibility

No open source code or dataset is mentioned in the paper.

Discussion questions

How might the assumption that retrying protocols leak exploitable information be challenged?
What are the practical implications for AI developers in terms of choosing between retrying and resampling protocols?
What experimental results or scenarios would falsify the claim that resampling improves safety without increasing costs?

Key figure

Figure 1 illustrates the safety achieved at a 0.3% auditing budget by different protocols, highlighting the vulnerabilities of retrying protocols to exploitation.

Read on arXiv →