Skip to Main content Skip to Navigation
New interface
Theses

Des données aux systèmes : étude des liens entre données d’apprentissage et biais de performance genrés dans les systèmes de reconnaissance automatique de la parole

Abstract : Machine learning systems contribute to the reproduction of social inequalities, because of the data they use and for lack of critical approches, thys feeding a discourse on the ``biases of artificial intelligence''. This thesis aims at contributing to collective thinking on the biases of automatic systems by investigating the existence of gender biases in automatic speech recognition (ASR) systems.Critically thinking about the impact of systems requires taking into account both the notion of bias (linked with the architecture, or the system and its data) and that of discrimination, defined at the level of each country's legislation. A system is considered discriminatory when it makes a difference in treatment on the basis of criteria defined as breaking the social contract. In France, sex and gender identity are among the 23 criteria protected by law.Based on theoretical considerations on the notions of bias, and in particular on the predictive (or performance) bias and the selection bias, we propose a set of experiments to try to understand the links between selection bias in training data and predictive bias of the system. We base our work on the study of an HMM-DNN system trained on French media corpus, and an end-to-end system trained on audio books in English. We observe that a significant gender selection bias in the training data contributes only partially to the predictive bias of the ASR system, but that the latter emerges nevertheless when the speech data contain different utterance situations and speaker roles. This work has also led us to question the representation of women in speech data, and more generally to rethink the links between theoretical conceptions of gender and ASR systems.
Document type :
Theses
Complete list of metadata

https://theses.hal.science/tel-03770207
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, September 6, 2022 - 11:27:24 AM
Last modification on : Thursday, September 8, 2022 - 3:06:13 AM

File

GARNERIN_2022_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03770207, version 1

Collections

Citation

Mahault Garnerin. Des données aux systèmes : étude des liens entre données d’apprentissage et biais de performance genrés dans les systèmes de reconnaissance automatique de la parole. Linguistique. Université Grenoble Alpes [2020-..], 2022. Français. ⟨NNT : 2022GRALL006⟩. ⟨tel-03770207⟩

Share

Metrics

Record views

36

Files downloads

18