2026-05-26visionmultimodaldata

When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection

Kim Jihyeon, Sohee Kim, Soosan Lee, Souhwan Jung, James Matthew Rehg, Hyesong Choi

Key claim

Social Gaze Consistency improves generative model detection.

This paper presents a novel approach to detecting low-level artifacts in generative models by introducing Social Gaze Consistency, which enhances the coherence of gaze direction among individuals. The authors demonstrate that this method improves performance on vision-language tasks, indicating its effectiveness across different model architectures. The key result shows a notable increase in balanced accuracy on benchmark datasets.

In plain English

The authors of this paper discovered a new method for detecting AI-generated images by focusing on how people in the images look at each other, which they call Social Gaze Consistency. Unlike previous methods that relied on identifying low-level visual artifacts like pixel errors, this approach looks at the overall coherence of gaze direction and eye alignment among individuals in the image. This change allows for better detection of manipulated images, even when the changes are subtle. For builders, this means that by incorporating this method, they can improve the reliability of AI systems in distinguishing between real and fake images, which is crucial for applications in security, content moderation, and media verification. Overall, this research highlights a new angle for enhancing AI detection capabilities that could lead to more robust and trustworthy AI applications.

Novelty

8.0/10

The introduction of Social Gaze Consistency as a new detection axis represents a significant advancement in generative model evaluation.

Reliability

7.5/10

The paper provides solid experimental validation across multiple datasets and demonstrates consistent improvements in model performance.

Deep reliability assessment

The methodology supports the introduction of Social Gaze Consistency as a novel detection cue for AI-generated images, but claims of universal applicability across all generative models may be overstated. The results are primarily validated on a specific dataset and may not generalize to all contexts.

Reproducibility

Yes, the authors will release the code upon acceptance to facilitate reproducibility.

Discussion questions

1.How might the reliance on gaze consistency as a detection cue limit the model's effectiveness in diverse scenarios?
2.What are the implications of this research for developers of AI-generated content detection tools in real-world applications?
3.What specific conditions or evidence would contradict the effectiveness of Social Gaze Consistency in detecting AI-generated images?

Key figure

Figure 1 illustrates the three coupled components of the proposed method: paired-edit data construction, Block-Compositional Caption Supervision, and a mixed fine-tuning protocol.

Benchmark results

COCOAI Interactionbalanced accuracy: 71.5vs FakeVLM origin+3.7 ppSOTA

COCOAI Personbalanced accuracy: 84.3vs FakeVLM origin+1.3 ppSOTA