Conformal Prediction
for Long-Tailed Classification



Joseph Salmon

IMAG, Univ Montpellier, CNRS, Inria, Montpellier, France

Pl@ntnet Consortium

Joint work with


Tiffany Ding

UC Berkeley

Jean-Baptiste Fermanian

Inria


and all the team from


Pl@ntNet: ML for citizen science


A citizen science platform using machine learning to help people identify plants with their mobile phones

Pl@ntNet: usage and popularity


  • 25M+ users
  • 200+ countries
  • Up to 2M image uploaded/day
  • 75K+ species (=labels)
  • 1B+ total images
  • 10M+ labeled / validated

Pl@ntNet and long-tail



Pl@ntNet-300K dataset (Garcin et al., 2021): a baby Pl@ntNet, available on Zenodo


  • ~300,000 images spanning 1,000+ plant species
  • Classification setting: \(X_i\) are images, \(Y_i\) are labels=species
  • Note the long tail: many rarely collected species (notably endangered one)

Conformal Prediction (Vovk et al., 2005)



Goal: Propose the most probable classes with confidence level \(1-\alpha\)



Main idea: Return classes with predicted score above a threshold:

\[ \mathcal{C}_{\alpha}(X) = \big\{ y : s(X,y) \geq t_\alpha \big\} \]



Key assumption: The data \((X_i, Y_i)\) are exchangeable



Question: How to select \(t_\alpha\) to get statistical guarantees?

Optimal sets (Sadinle et al., 2019)


Marginal coverage targets:

\[\mathbb{P}\big[ Y \in \mathcal{C}_{\alpha}(X) \big ] \geq 1 - \alpha.\]

Class conditional coverage targets:

\[\forall y,\quad \mathbb{P}\big[ Y \in \mathcal{C}_\alpha(X) | Y=y \big ] \geq 1 - \alpha.\]

Theorem (Informal)

The optimal set of minimum size and marginal coverage of at least \(1-\alpha\) is:

\[ \mathcal{C}_{\alpha}(x) = \left\{ y : p(y|x) \geq t_\alpha \right\} \]

Theorem (Informal)

The optimal set of minimum size and conditional coverage of at least \(1-\alpha\) is:

\[ \mathcal{C}_{\alpha}(x) = \left\{ y : p(y|x) \geq t_\alpha^{y} \right\} \]


In practice: consider a conformal score \(s(x,y) = \hat{p}(y|x)\) (= softmax score)


Marginal: calibrate \(t_{\alpha}\) on whole calibration set \((X_i, Y_i)_{i=1}^n\)

Conditional: calibrate \(t_{\alpha}^y\) only on \((X_i, Y_i)\) such that \(Y_i = y\)

Targeting Macro-Coverage


Goal: Average coverage across all classes (better for long-tail!)

\[ \text{MacroCoverage} = \frac{1}{|\mathcal{Y}|} \sum_{y \in \mathcal{Y}} \mathbb{P}\big( y \in \mathcal{C}(X) \, | \, Y = y \big) \]

Theorem (Informal)

The optimal set of minimum size and Macro-Coverage of at least \(1-\alpha\) is:

\[ \mathcal{C}_{\alpha}(x) = \left\{ y : \frac{p(y|x)}{p(y)} \geq t_\alpha \right\} \]

Prevalence Adjusted Softmax: consider a conformal score \(s(x,y) = \frac{\hat{p}(y|x)}{\hat{p}(y)}\)

Experiments on Pl@ntNet-300K



References

References


Garcin, C., Joly, A., Bonnet, P., Lombardo, J.-C., Affouard, A., Chouet, M., Servajean, M., Lorieul, T., & Salmon, J. (2021). Pl@ntNet-300K: A plant image dataset with high label ambiguity and a long-tailed distribution. NeurIPS Datasets and Benchmarks 2021.
Sadinle, M., Lei, J., & Wasserman, L. (2019). Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 114(525), 223–234.
Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic learning in a random world. Springer.