2026-05-22data

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

Anastasiia Sedova, Natalie Schluter, Skyler Seto, Maartje ter Hoeve

Key claim

LINK improves cross-lingual knowledge transfer with lexical substitutions.

The LINK method enhances cross-lingual knowledge transfer by using lexical substitutions in high-resource training data. This approach requires only a bilingual vocabulary and leads to significant improvements in downstream tasks, achieving up to a 2x speedup in training time for equivalent performance.

Novelty

7.5/10

The proposed LINK method introduces a novel data-level intervention for cross-lingual knowledge transfer.

Reliability

8.0/10

The evaluation across multiple languages and model sizes demonstrates solid methodology and results.

Deep reliability assessment

The methodology supports that simple lexical interventions can improve cross-lingual knowledge transfer, but the claim of broad applicability to all low-resource languages may be overclaimed given the variability in vocabulary availability and language characteristics.

Reproducibility

No open source code or dataset is provided, which limits the ability to directly reproduce the results.

Discussion questions

How does the reliance on bilingual vocabularies impact the method's effectiveness across truly low-resource languages with limited lexical resources?
What are the practical implications for integrating this method into existing multilingual language model training pipelines?
What specific scenarios or data conditions would demonstrate the ineffectiveness of the LINK method?

Key figure

Figure 1 illustrates the LINK method where randomly selected words in a high-resource language corpus are replaced with their translations from a bilingual vocabulary to facilitate cross-lingual knowledge transfer.

Read on arXiv →