Investigating techniques for a deeper understanding of Neural Machine Translation (NMT) systems through data filtering and fine-tuning strategies - Productions scientifiques du CLILLAC-ARP Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Investigating techniques for a deeper understanding of Neural Machine Translation (NMT) systems through data filtering and fine-tuning strategies

Résumé

In the context of this biomedical shared task, we have implemented data filters to enhance the selection of relevant training data for finetuning from the available training data sources. Specifically, we have employed textometric analysis to detect repetitive segments within the test set, which we have then used for refining the training data used to fine-tune the mBart-50 baseline model. Through this approach, we aim to achieve several objectives: developing a practical fine-tuning strategy for training biomedical in-domain fr<>en models, defining criteria for filtering in-domain training data, and comparing model predictions, finetuning data in accordance with the test set to gain a deeper insight into the functioning of Neural Machine Translation (NMT) systems.
Fichier principal
Vignette du fichier
WMT_biomedical_2023_CLILLF-3.pdf (852.23 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-04306041 , version 1 (27-11-2023)

Identifiants

  • HAL Id : hal-04306041 , version 1

Citer

Lichao Zhu, Maria Zimina-Poirot, Maud Bénard, Behnoosh Namdarzadeh, Nicolas Ballier, et al.. Investigating techniques for a deeper understanding of Neural Machine Translation (NMT) systems through data filtering and fine-tuning strategies. 8th World Machine Translation, Association for Computational Linguistics, Dec 2023, Singapore (SG), Singapore. pp.275-281. ⟨hal-04306041⟩
57 Consultations
25 Téléchargements

Partager

Gmail Facebook X LinkedIn More