Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora
Idris Abdulmumin, Mokgadi Penelope Matloga, Tadesse Destaw Belay, Botshelo Kondowe, Letlhogonolo Mohleleng, Hareaipha Nkopo Letsoalo, Shamsuddeen Hassan Muhammad, Vukosi Marivate
Read on arXiv →Key claim
Temporal simultaneity significantly affects annotation quality.
This paper presents a new sentiment dataset for Setswana and analyzes the decline in inter-annotator agreement over time. A key finding is that tweets labeled within one minute achieve a much higher agreement score than those labeled further apart, highlighting the importance of temporal factors in annotation quality.
In plain English
This paper presents a new sentiment dataset for Setswana and analyzes the decline in inter-annotator agreement over time. A key finding is that tweets labeled within one minute achieve a much higher agreement score than those labeled further apart, highlighting the importance of temporal factors in annotation quality.
The paper introduces a new dataset and insights into annotation quality over time, which is significant for the field of sentiment analysis in African languages.
The study provides strong empirical evidence through multiple analyses and benchmarks against established models, supporting its claims.
Deep reliability assessment
The timestamped, per-annotation analysis supports a strong association between shorter inter-annotator time gaps and higher agreement on this Setswana Twitter sentiment dataset. Causal claims should be treated cautiously because temporal simultaneity may proxy for batch difficulty, shared context, annotator availability, or other unmeasured campaign effects, and the annotator pool is only three people from one institution.
Reproducibility
Yes, partially: the paper states that the dataset, per-annotation timestamps, and analysis code are released publicly, but the supplied text does not include a repository or dataset URL. The dataset is under a controlled-access NOODL license, so reproducibility may require approval rather than direct download.
Discussion questions
- 1.Does temporal simultaneity genuinely improve annotation quality, or does it merely correlate with easier tweets, better annotator focus, or periods when annotators were more aligned in interpretation?
- 2.For builders running low-resource annotation projects, is it better to enforce synchronized annotation windows, add calibration meetings, increase annotator diversity, or spend budget on adjudication and gold checks?
- 3.What result would falsify the paper's main claim: for example, if a new multi-language annotation campaign controlled for item difficulty and annotator identity but found no κ improvement from synchronized labeling?
Key figure
No Figure 1 or architectural diagram is provided in the supplied excerpt; the key setup is a timestamped three-annotator sentiment-labeling campaign over multiple tweet batches.