2026-05-27agentsvisionscalingmultimodal

Ω-QVLA: Robust Quantization for Vision-Language-Action Models via Composite Rotation and Per-step Scaling

Xinyu Wang, Mingze Li, Sicheng Lyu, Dongxiu Liu, Kaicheng Yang, Ziyu Zhao, Yufei Cui, Xiao-Wen Chang, Peng Lu

Key claim

Omega-QVLA enables efficient quantization of VLA models.

The Omega-QVLA framework allows for efficient on-device deployment of Vision-Language-Action models by uniformly quantizing both the language and action components. It achieves high task success rates while significantly reducing memory usage, demonstrating its effectiveness in real-world manipulation tasks.

In plain English

The authors developed Omega-QVLA, a new framework that allows for efficient compression of Vision-Language-Action (VLA) models, which combine visual perception, language understanding, and action control. Unlike previous methods that only partially quantized these models or used mixed precision, Omega-QVLA uniformly quantizes both the language and action components to a lower precision, making it more stable and effective. This results in high task success rates while significantly reducing the memory required to run these models on devices. Builders should care because this advancement enables the deployment of complex AI models on resource-constrained devices, opening up new possibilities for real-world applications in areas like robotics and interactive systems.

Novelty

8.0/10

Omega-QVLA introduces a novel training-free quantization framework that effectively compresses both the language backbone and action head, challenging existing assumptions in the field.

Reliability

8.0/10

The paper provides strong experimental results on multiple models and includes a comparison to FP16 references, supporting its claims with solid evidence.

Deep reliability assessment

The methodology supports the claim that Ω-QVLA can uniformly quantize both the language backbone and the DiT action head of a VLA model to W4A4 precision without sacrificing task performance. However, the claim that it overturns the belief that the DiT action head is too sensitive to uniformly quantize may be overclaimed without broader validation across more diverse models and tasks.

Reproducibility

No open source code or dataset is explicitly mentioned in the paper.

Discussion questions

1.How does the composite SVD·Hadamard rotation specifically address the sensitivity of the DiT action head to quantization?
2.What are the practical implications of deploying Ω-QVLA on real-world robotic systems in terms of latency and energy efficiency?
3.What specific scenarios or model architectures would falsify the claim that Ω-QVLA can uniformly quantize the entire VLA model without performance loss?

Key figure

Figure 1 illustrates the overall quantization pipeline of Ω-QVLA, highlighting the transformation from W16A16 to W4A4 precision using two-level rotation and per-step scaling.

Benchmark results

LIBEROaverage task success rate: 87.8vs FP16 reference+0.8%SOTA

LIBEROaverage task success rate: 98vs FP16 reference+0.9%SOTA