Forgotten Words: Benchmarking NeoBERT for Dementia Detection in Low-Resource Conversational Filipino and English Speech
Rez Samantha Z. Floresca, Edric Castel C. Hao, Hannah Grachiella Buñales, Chelsea Dominique E. Temprosa, Georgianna Z. Reyes, Kervin Gabriel L. Chua
Read on arXiv →Key claim
Bilingual fine-tuning eliminates cross-lingual performance degradation.
This paper presents the first evaluation of transformer-based models for dementia detection in Filipino speech, highlighting the importance of bilingual fine-tuning. The key finding is that bilingual fine-tuning significantly improves model performance, achieving a Macro-F1 score of 0.969-0.973, demonstrating the necessity of linguistic coverage in training.
In plain English
This paper presents the first evaluation of transformer-based models for dementia detection in Filipino speech, highlighting the importance of bilingual fine-tuning. The key finding is that bilingual fine-tuning significantly improves model performance, achieving a Macro-F1 score of 0.969-0.973, demonstrating the necessity of linguistic coverage in training.
This work introduces a novel application of transformer models for dementia detection in a bilingual context, addressing a significant gap in the literature.
The study employs a systematic evaluation with multiple model families and a well-constructed bilingual dataset, supporting its claims with robust experimental results.
Deep reliability assessment
The methodology supports the claim that bilingual fine-tuning can significantly reduce cross-lingual performance gaps in dementia detection models, but it may overclaim the generalizability of these findings to other low-resource languages without further validation on larger, organically produced datasets.
Reproducibility
Yes, the paper mentions that the code is publicly available at https://github.com/rezsam09/Filipino-English-Dementia-Classification.
Discussion questions
- 1.How does the reliance on manually translated datasets affect the validity of the cross-lingual performance claims?
- 2.What are the practical implications of these findings for developing dementia detection tools in other low-resource, code-switched languages?
- 3.What specific conditions or findings would falsify the claim that bilingual fine-tuning is the most effective strategy for stable cross-lingual performance?
Key figure
Figure 1 provides an overview of the experimental pipeline across datasets, model families, training configurations, and evaluation settings.