← Back to feed
2026-05-26visiondatacode

Normal Guidance is what Attention Needs

Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes

PDF preview unavailable
Read on arXiv →

Key claim

Normal Guidance improves attention-based MIL for medical imaging.

This paper explores a novel approach to training classifiers for 3D medical images using a single binary label. The proposed Normal Guidance technique significantly enhances attention-based methods for slice-level localization, outperforming state-of-the-art techniques while maintaining competitive performance in whole-scan classification.

In plain English

This paper explores a novel approach to training classifiers for 3D medical images using a single binary label. The proposed Normal Guidance technique significantly enhances attention-based methods for slice-level localization, outperforming state-of-the-art techniques while maintaining competitive performance in whole-scan classification.

Novelty
7.5/10

The introduction of Normal Guidance as a regularization technique represents a meaningful extension to existing attention-based methods in medical imaging.

Reliability
8.0/10

The study is supported by extensive experiments across multiple datasets, demonstrating solid performance improvements over existing methods.

Deep reliability assessment

The experiments support that, for these CT weak-supervision benchmarks using frozen slice encoders plus MIL heads, adding a Gaussian-shaped attention regularizer improves slice-level localization AUROC while preserving scan-level AUROC. The paper would overclaim if read as proving attention is a faithful causal explanation or that bell-shaped priors will generalize to pathologies with off-center, multi-focal, or non-contiguous evidence.

Reproducibility

Code: yes, the paper links github.com/tufts-ml/normal-guidance. Datasets: partially reproducible; the paper evaluates on semi-synthetic, Head CT, Chest CT, and Abdomen CT datasets with over 4M slices, but medical dataset access, preprocessing, and slice-level annotation availability may require following the paper-specific data instructions and permissions.

Discussion questions

  1. 1.Is the core gain really from learning better visual evidence, or from encoding an anatomical prior that lesions are usually contiguous and near the volume center?
  2. 2.For builders deploying weakly supervised medical imaging systems, should this be treated as an interpretability improvement, or only as a localization heuristic that still needs clinician validation?
  3. 3.What result would falsify Normal Guidance: strong performance on datasets where positive slices are intentionally off-center, multi-modal, sparse, or distributed near the scan edges?

Key figure

The key architectural idea is to train an attention-based or transformer-based MIL model while regularizing its learned slice-attention distribution toward a fitted bell-shaped Normal distribution over ordered axial slices.

Benchmark results

Semi-SyntheticAUROC: 0.706vs best non-Normal-Guidance baseline, Smooth Operator ABMIL at 0.693 AUROC+0.013 AUROCSOTA
Head CTAUROC: 0.871vs Centered Gaussian image-free baseline at 0.850 AUROC+0.021 AUROCSOTA
Chest CTAUROC: 0.869vs Centered Gaussian image-free baseline at 0.780 AUROC+0.089 AUROCSOTA
Abdomen CTAUROC: 0.663vs Centered Gaussian image-free baseline at 0.573 AUROC+0.090 AUROCSOTA
GitHub1 repo
tufts-ml/normal-guidanceOfficial