2026-06-04agentsvisiondata

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu, Huaxiu Yao, Zhiwu Lu, Mingyu Ding

Key claim

TempoVLA enables dynamic speed control for robot manipulation.

TempoVLA introduces a novel approach to robot manipulation by allowing control over execution speed during different phases of tasks. The key result is that it achieves flexible speed control while maintaining motion accuracy, enhancing performance through better data utilization.

In plain English

The authors developed a new system called TempoVLA that helps robots move at different speeds depending on the task's risk level. Unlike previous models that only used a fixed speed, TempoVLA can adjust its speed dynamically, speeding up in safe situations and slowing down when precision is needed. This is achieved through a new method that modifies how robot actions are timed. Builders should care because this flexibility can lead to more efficient and safer robotic operations in real-world applications.

Novelty

8.0/10

The introduction of TempoVLA and Variable-Speed Trajectory Augmentation represents a significant advancement in controlling execution speed in robot manipulation.

Reliability

7.5/10

The paper provides experimental validation in both simulation and real-world tasks, supporting its claims with solid evidence.

Deep reliability assessment

The methodology supports the claim that action-magnitude retiming plus explicit speed conditioning can give a VLA controllable execution speeds on the reported simulation and real-world manipulation tasks. The broader claim that the approach is lightweight and applicable to all existing VLAs, and that VLM-driven dynamic speed scheduling reliably improves performance, is overclaimed without broader architectures, tasks, safety constraints, and quantitative evidence in the provided excerpt.

Reproducibility

No open-source code or project URL is mentioned in the provided abstract, introduction, conclusion, or visible paper text. The paper mentions experiments on LIBERO and real-world tasks, but no released dataset, implementation, or repository details are provided here.

Discussion questions

1.Does scaling action magnitude truly preserve task semantics under contact, dynamics, actuator limits, and closed-loop feedback delays, or does it only work for quasi-static manipulation regimes?
2.For robotics builders, is it safer and simpler to implement speed control inside the policy as TempoVLA proposes, or at the motion-controller level where collision checking, torque limits, and compliance are already handled?
3.What result would falsify TempoVLA: failure to hit commanded speeds, degraded success at 1x after VSTA, poor transfer to a different VLA decoder family, or unsafe behavior during contact-rich tasks at high speed?

Key figure

Figure 1 shows TempoVLA using Variable-Speed Trajectory Augmentation to split or merge demonstration actions, then conditioning a VLA policy on a scalar speed so that one policy produces slower or faster motion trails while the low-level controller is unchanged.