2026-06-03data

RIDE: An Open Dataset and Benchmark for Train Delay Prediction

Clément Elliker, Mathis Le Bail, Clément Mantoux, Jesse Read, Sonia Vanier

PDF preview unavailable

Key claim

Learning-based models significantly outperform non-learning models.

The authors present RIDE, a large-scale dataset and benchmark for train delay prediction in Belgium. Their key finding is that learning-based methods, particularly graph neural networks, outperform traditional non-learning models, providing a new standard for evaluation in this domain.

In plain English

The authors created a new dataset called RIDE to help predict train delays more accurately. This dataset includes millions of train events and weather records, making it much easier to compare different prediction methods. Unlike previous approaches, RIDE standardizes how predictions are made and evaluated, which helps researchers understand which models work best. Builders should care because this framework can lead to better train scheduling and improved passenger experiences.

Novelty

8.0/10

The introduction of a nationwide dataset and benchmark for train delay prediction significantly extends the existing methods in the field.

Reliability

8.0/10

The paper provides a comprehensive evaluation framework with strong baselines and a standardized approach, supporting its claims.