2025-09-08data

Are Targeted Data Poisoning Attacks as Effective as We Think?

William Xu, Chenyu Zhang, Yihan Wang, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu, Yiwei Lu

Key claim

Identifying hardest samples improves attack evaluation and defense.

The paper presents a novel approach to identify the easiest and hardest samples to poison in targeted data poisoning attacks. By leveraging clean model information, it enables better evaluation of attack effectiveness and proactive defenses against vulnerabilities. A key result is the reliable stratification of samples by poisoning vulnerability.

Novelty

8.0/10

This work introduces a new evaluation framework focusing on the hardest samples to poison.

Reliability

7.5/10

The methodology is solid, using both coarse and fine-grained evaluations.

Deep reliability assessment

The methodology supports the claim that average attack success rates can obscure true worst-case effectiveness by showing significant variance in poisoning difficulty across samples. However, it may overclaim the generalizability of the metrics across all datasets, particularly more complex ones like TinyImageNet.

Reproducibility

No, the paper does not provide open source code or datasets for reproduction.

Discussion questions

How might the findings change if applied to different model architectures or datasets?
What specific strategies can builders implement to proactively defend against identified vulnerabilities?
If a new metric were developed that consistently outperformed EPA and DPS, how would that impact the conclusions drawn in this paper?

Key figure

Figure 1 shows a histogram of attack success rates (ASR) for gradient matching on 100 test samples in the class 'plane' from CIFAR-10, illustrating the variance in attack effectiveness across samples.

Read on arXiv →