2026-05-25datacode

Forgotten Words: Benchmarking NeoBERT for Dementia Detection in Low-Resource Conversational Filipino and English Speech

Rez Samantha Z. Floresca, Edric Castel C. Hao, Hannah Grachiella Buñales, Chelsea Dominique E. Temprosa, Georgianna Z. Reyes, Kervin Gabriel L. Chua

PDF preview unavailable

Read on arXiv →

Key claim

Bilingual fine-tuning eliminates cross-lingual performance degradation.

This paper presents the first evaluation of transformer-based models for dementia detection in Filipino speech, highlighting the importance of bilingual fine-tuning. The key finding is that bilingual fine-tuning significantly improves model performance, achieving a Macro-F1 score of 0.969-0.973, demonstrating the necessity of linguistic coverage in training.

In plain English

Novelty

8.0/10

This work introduces a novel application of transformer models for dementia detection in a bilingual context, addressing a significant gap in the literature.

Reliability

8.0/10

The study employs a systematic evaluation with multiple model families and a well-constructed bilingual dataset, supporting its claims with robust experimental results.

Deep reliability assessment

The methodology supports the claim that bilingual fine-tuning can significantly reduce cross-lingual performance gaps in dementia detection models, but it may overclaim the generalizability of these findings to other low-resource languages without further validation on larger, organically produced datasets.

Reproducibility

Yes, the paper mentions that the code is publicly available at https://github.com/rezsam09/Filipino-English-Dementia-Classification.

Discussion questions

1.How does the reliance on manually translated datasets affect the validity of the cross-lingual performance claims?
2.What are the practical implications of these findings for developing dementia detection tools in other low-resource, code-switched languages?
3.What specific conditions or findings would falsify the claim that bilingual fine-tuning is the most effective strategy for stable cross-lingual performance?

Key figure

Figure 1 provides an overview of the experimental pipeline across datasets, model families, training configurations, and evaluation settings.

Benchmark results

DementiaBank-derived transcriptsMacro-F1: 0.973vs BERT+0.021SOTA

GitHub1 repo

rezsam09/Filipino-English-Dementia-ClassificationOfficial