Context
Sustainable food production is one of the main challenges facing our world today. At INRAE (Institut national de recherche pour l'agriculture, l'alimentation et l'environnement) different experts come together to study production methods. These experts can be biologists and agronomists (domain experts) as well as statisticians and mathematicians (modelers) producing models that simulate complex biological and agricultural processes.
The internship work is part of a research project [1] which aims to develop, in collaboration with stakeholders in the cereal sector, a tool that predicts the quality of wheat from data collected each year in order to better anticipate the difficulties caused by climate events. A knowledge base is built using an existing ontology, from which Bayesian networks are generated (Figure 1) and then refined with expert knowledge.
Figure 1 : An example of a learned graphical model (a Bayesian network).
Currently the learning models are developed offline, and then presented to domain experts in a scheduled meeting. Through discussion, experts give feedback with regards to what variables to consider, the correct dependencies, and those that need to be modified. Although experts find it very useful to be involved in the modeling process, there are currently a number of issues that hamper the collaboration between the modelers and domain experts: (a) feedback about the correctness of the model is manual and tedious, and existing model quality indicators can be hard to interpret by the experts. The modeler currently takes notes and then goes back to the code to re-implement the changes; and (b) the process is time consuming and error prone.
Objectives of the internship
The aim of this internship is to facilitate the dialogue between domain experts in agronomy and food technology and the model builders through an online visualization system. The selected student will perform the following tasks:
- a literature review on visualization / explanation methods for Bayesian networks, and expert knowledge elicitation interfaces.
- follow a user-centered design methodology to design and implement an online visualization system that provides:
- a graphical representation of the learned model (e.g., node-link diagram or an adjacency matrix);
- a dashboard containing a set of indicators such as entropy, confidence, precision, robustness, etc, allowing domain experts to have interpretable information about the quality of the model.
- user interactions to allow experts to edit the graphical model taking into account their own expertise (e.g., to order or group variables, add constraints, etc); and
- a history of the different iterations on the model allowing for easy comparisons between the different versions.
- evaluate the developed prototype with domain experts and modelers.
The selected student will build on existing work from the ANR EVAGRAIN project. The following resources are currently available for an immediate start of this internship: data about the different wheat and flour quality tests; a learning model implementation in Python; two preliminary interactive visualization prototypes [4,5], video recordings of online exchanges and discussions between modelers and domain experts when confronted with a learning model; and access to domain experts to gather user requirements and feedback.
Required and desirable skills
Required skills include: web development; programming (JavaScript, Python, other); user-centered design; information visualization. Interest in working with real-world data and domain experts. Knowledge in machine learning is not required but experience is a plus.
Work environment
- Supervisors : send CV and motivation letter to Nadia Boukhelifa nadia.boukhelifa@inrae.fr (INRAE, Univ. Paris-Saclay), Anastasia Bezerianos anastasia.bezerianos@universite-paris-saclay.fr (Polytech, Univ. Paris-Saclay)
- Internship duration : 6 months, start in February or March 2024
- Location : AgroParistech Saclay Campus, 22 place de l'agronomie 91120 palaiseau
- Allowance : around 600 euros a month
