2026-05-26data

Transfer Learning using 66 Diseases for Disease Forecasting Applications

Lauren J Beesley, Alexander C Murph, Dave Osthus, Lauren A Castro

PDF preview unavailable

Key claim

Incorporating diverse data streams enhances disease forecasting accuracy.

This paper explores the integration of multiple data streams for forecasting infectious diseases, showing that this approach improves performance in 84.9% of cases. It emphasizes the importance of data quality, indicating that irrelevant data can harm forecasts. A key contribution is the creation of a publicly-available database for the forecasting community.

In plain English

Novelty

8.0/10

The paper significantly extends existing methods by synthesizing data from multiple diseases, which is a meaningful advancement in disease forecasting.

Reliability

7.5/10

The findings are supported by a large dataset and demonstrate solid performance improvements across various models, though the quality of data is highlighted as a limitation.

Deep reliability assessment

The methodology supports a retrospective comparison that adding cross-stream/cross-disease training data often improves forecasts across the paper’s evaluated 20 disease data streams and model structures. It does not support a blanket claim that “more disease data is always better,” since the authors themselves report negative transfer when added streams are too dissimilar to the target.

Reproducibility

Dataset: yes, the paper claims a publicly available compiled database spanning 66 infectious diseases and multiple data streams, but no dataset URL is visible in the provided excerpt. Code: no open-source code repository is mentioned in the provided abstract, introduction, conclusion, or visible references.

Discussion questions

1.Does cross-disease transfer work because of shared epidemiological dynamics, or because the models are learning generic seasonality/reporting artifacts that may fail under regime shifts?
2.For builders deploying public-health forecasting systems in SEA, how should we decide which external disease streams are similar enough to include without introducing negative transfer?
3.What prospective evaluation would falsify the paper’s main result: failure to outperform single-stream baselines during new outbreaks, across unseen geographies, or under changed surveillance/reporting systems?

Key figure

The key conceptual diagram is a transfer-learning setup comparing progressively broader training pools: a single target data stream, same-disease streams, same-transmission-mode diseases, and all available disease streams.

Benchmark results

~20 evaluated disease data streams drawn from a compiled corpus of 66 infectious diseases and multiple surveillance data sourcespercentage of time series and model structures where adding other data streams improved forecast performance: 84.9vs single data stream forecasting modelsnot reported