VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading
Jinzhou Wu, Zhengwu Ma, Jixing Li, Baoping Tang, Zitong Lu
Read on arXiv →Key claim
Multimodal pretraining selectively enhances human-like language representations.
This research investigates how multimodal pretraining affects language models' alignment with human reading processes. The key finding is that while multimodal training may not universally enhance human-like text processing, it can selectively improve alignment when visual semantic content is stronger.
In plain English
This research investigates how multimodal pretraining affects language models' alignment with human reading processes. The key finding is that while multimodal training may not universally enhance human-like text processing, it can selectively improve alignment when visual semantic content is stronger.
The paper provides a meaningful extension by isolating the effects of multimodal training on language models, which is a significant new finding in the field.
The study uses a controlled experimental framework with fMRI and eye-tracking data, supporting its claims with solid evidence.