{{
}}
Citizen science & machine learning for plant Identification: challenges from Pl@ntNet
Joseph Salmon
IMAG, Univ Montpellier, CNRS, Inria, Montpellier, France
Consortium Pl@ntnet
A citizen science platform using machine learning to help people identify plants with their mobile phones
Alexis Joly
Primary investigator, INRIA
ResearchGate
Pierre Bonnet
Primary investigator, CIRAD
ResearchGate
Hervé Goëau
Researcher, CIRAD
ResearchGate
Antoine Affouard
Backend & Staff engineer, INRIA
LinkedIn
Jean-Christophe Lombardo
IA engineer, INRIA
LinkedIn
Mathias Chouet
Backend engineer, CIRAD
GitHub
Thomas Paillot
Front engineer, INRIA
LinkedIn
Rémi Palard
Geo & Fullstack engineer, CIRAD
LinkedIn
Vanessa Hequet
Botanist, IRD
LinkedIn
Murielle Simo-Droissart
Botanist, IRD
ResearchGate
Théo Simoes
Backend engineer, INRAE
LinkedIn
Jean-Marc Sadaillan
Project manager, INRAE
LinkedIn
Christophe Botella
Researcher, INRIA
ResearchGate
Joseph Salmon
Researcher, INRIA
Website
Benjamin Bourel
Researcher, INRIA
Website
Théo Larcher
PhD candidate, INRIA
LinkedIn
Giulio Martellucci
PhD candidate, INRIA
LinkedIn
Raphaël Benerradi
PhD candidate, INRIA
LinkedIn
Ilyass Moummad
Post-doc, INRIA
Personal website | Google Scholar
Note: I am mostly innocent, I started working with the Pl@ntNet team in 2020
We need you: come and help us improve it!
@ Rkitko (Wikimedia Commons)
© RBG Kew
© Benoît Janichon
Guizotia abyssinica (L.f.) Cass.
© Patrice SIROT
Diascia rigescens E.Mey. ex Benth.
© Borquez Vicent
Lapageria rosea Ruiz & Pav.
© Наталья
Casuarina cunninghamiana Miq. LC
© Annette Bejany
Guizotia abyssinica (L.f.) Cass.
© A Lee
Diascia rigescens E.Mey. ex Benth.
© Daniel Barthelemy
Lapageria rosea Ruiz & Pav.
© Campos Ignacio
Casuarina cunninghamiana Miq. LC
© stefano mazzotti
Cirsium rivulare (Jacq.) All.
© buqa Jarmil
Chaerophyllum aromaticum L.
© Walter Reider
Adenostyles leucophylla Rchb.
© furs
Petrosedum montanum (Songeon & E.P.Perrier) Grulich
© Rene Weck
Cirsium tuberosum (L.) All.
© Jcm Arthur
Chaerophyllum temulum L.
© pierre Lamy
Adenostyles alpina (L.) Bluff & Fingerh.
© Wolfi 41
Petrosedum rupestre (L.) P.V.Heath

© Dieter Wagner
© David Eickhoff − EOL
© Patrick Cartier
© Maximilien Perrin
Popular datasets limitations:
Release a dataset sharing similar features as the Pl@ntNet dataset to foster research in plant identification
\(\implies\) Pl@ntNet-300K (Garcin et al., 2021)
“The collective behavior induced by frictionless research exchange is the emergent superpower driving many events that are so striking today.” (Donoho, 2024)
Note: long tail preserved by genera subsampling
Caracteristics:
Zenodo, 1 click download
https://zenodo.org/record/5645731
Code to train models
https://github.com/plantnet/PlantNet-300K
Tiffany Ding
UC Berkeley
within

Jean-Baptiste Fermanian
Inria
“Conformal Prediction for Long-Tailed Classification”
T. Ding, J.-B. Fermanian and J. Salmon
ICLR 2026
Elements to help guide the users
For an input image \(X\), propose the most probable classes \(y\) with confidence level \(1-\alpha\) (with small \(\alpha\))
\[ \mathcal{C}_{\alpha}(X) = \big\{ y : s(X,y) \geq t_\alpha \big\} \]
Conformal prediction: sets \(t_{\alpha}\) as the \((1-\alpha)\) quantile of the scores on a calibration set
Marginal coverage targets:
\[\mathbb{P}\big[ Y \in \mathcal{C}_{\alpha}(X) \big ] \geq 1 - \alpha.\]
Class conditional coverage targets:
\[\forall y,\quad \mathbb{P}\big[ Y \in \mathcal{C}_\alpha(X) | Y=y \big ] \geq 1 - \alpha.\]
The optimal set of minimum size and marginal coverage of at least \(1-\alpha\) is: \[ \mathcal{C}_{\alpha}(x) = \left\{ y : p(y|x) \geq t_\alpha \right\} \]
The optimal set of minimum size and conditional coverage of at least \(1-\alpha\) is: \[ \mathcal{C}_{\alpha}(x) = \left\{ y : p(y|x) \geq t_\alpha^{y} \right\} \]
Marginal:
calibrate \(t_{\alpha}\) on whole calibration set \((X_i, Y_i)_{i=1}^n\)
Conditional:
calibrate \(t_{\alpha}^y\) only on \((X_i, Y_i)\) such that \(Y_i = y\)
\[ \text{MacroCoverage} = \frac{1}{|\mathcal{Y}|} \sum_{y \in \mathcal{Y}} \mathbb{P}\big( y \in \mathcal{C}(X) \, | \, Y = y \big) \]
The optimal set of minimum size and Macro-Coverage of at least \(1-\alpha\) is: \[ \mathcal{C}_{\alpha}(x) = \left\{ y : \frac{p(y|x)}{p(y)} \geq t_\alpha \right\} \]
Given user-chosen class weights \(\omega(y)\) for \(y \in \mathcal{Y}\) that sum to one, we can define the \(\omega\)-weighted macro-coverage as
\[ \begin{align} \mathrm{MacroCov}_{\omega}(\mathcal{C}) = \sum_{y \in \mathcal{Y}} \omega(y) \mathbb{P}(Y \in \mathcal{C}(X) \mid Y = y). \end{align} \]
The optimal set of minimum size and Macro-Coverage of at least \(1-\alpha\) is: \[ \begin{align} \mathcal{C}^*(x) = \left\{ y \in \mathcal{Y} : \omega(y) \dfrac{p(y|x)}{p(y)} \geq t\right\}, \end{align} \]
\[ \omega(y) = \begin{cases} \frac{\gamma}{W} & \text{if } y \in \mathcal{Y}_{\text{at-risk}} \quad (\text{with } W = \gamma|\mathcal{Y}_{\text{at-risk}}| + |\mathcal{Y} \setminus \mathcal{Y}_{\text{at-risk}}|)\\ \frac{1}{W} & \text{otherwise}, \end{cases} \]

Tanguy Lefort
Now at Seenovate
within

Benjamin Charlier
Inrae
“Cooperative learning of Pl@ntNet’s Artificial Intelligence algorithm:
how does it work and how can we improve it?”
T. Lefort et al.
Methods in Ecology and Evolution, 2025

Images from users… so are the labels!
But users can be wrong or not experts
Several labels can be available per image!


Link: https://identify.plantnet.org/weurope/observations/1012500059
Weighting scheme: weight user vote by its number of identified species
Dataset release:
Code release:
Future work