Normal Guidance is what Attention Needs
Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes
Read on arXiv →Key claim
Normal Guidance improves attention-based MIL for medical imaging.
This paper explores a novel approach to training classifiers for 3D medical images using a single binary label. The proposed Normal Guidance technique significantly enhances attention-based methods for slice-level localization, outperforming state-of-the-art techniques while maintaining competitive performance in whole-scan classification.
In plain English
This paper explores a novel approach to training classifiers for 3D medical images using a single binary label. The proposed Normal Guidance technique significantly enhances attention-based methods for slice-level localization, outperforming state-of-the-art techniques while maintaining competitive performance in whole-scan classification.
The introduction of Normal Guidance as a regularization technique represents a meaningful extension to existing attention-based methods in medical imaging.
The study is supported by extensive experiments across multiple datasets, demonstrating solid performance improvements over existing methods.
Deep reliability assessment
The experiments support that, for these CT weak-supervision benchmarks using frozen slice encoders plus MIL heads, adding a Gaussian-shaped attention regularizer improves slice-level localization AUROC while preserving scan-level AUROC. The paper would overclaim if read as proving attention is a faithful causal explanation or that bell-shaped priors will generalize to pathologies with off-center, multi-focal, or non-contiguous evidence.
Reproducibility
Code: yes, the paper links github.com/tufts-ml/normal-guidance. Datasets: partially reproducible; the paper evaluates on semi-synthetic, Head CT, Chest CT, and Abdomen CT datasets with over 4M slices, but medical dataset access, preprocessing, and slice-level annotation availability may require following the paper-specific data instructions and permissions.
Discussion questions
- 1.Is the core gain really from learning better visual evidence, or from encoding an anatomical prior that lesions are usually contiguous and near the volume center?
- 2.For builders deploying weakly supervised medical imaging systems, should this be treated as an interpretability improvement, or only as a localization heuristic that still needs clinician validation?
- 3.What result would falsify Normal Guidance: strong performance on datasets where positive slices are intentionally off-center, multi-modal, sparse, or distributed near the scan edges?
Key figure
The key architectural idea is to train an attention-based or transformer-based MIL model while regularizing its learned slice-attention distribution toward a fitted bell-shaped Normal distribution over ordered axial slices.