Easy-to-use combination of POS and BERT model for domain-specific and misspelled terms - Information, Langue Ecrite et Signée Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Easy-to-use combination of POS and BERT model for domain-specific and misspelled terms

Résumé

In this paper, we present BERT-POS, a simple method for encoding syntax into BERT embeddings without retraining or finetuning data, based on Part-Of-Speech (POS). Although fine-tuning is the most popular method to apply BERT models on domain datasets, it remains expensive in terms of training time, computing resources, training data selection and retraining frequency. Our alternative works at the preprocessing level and relies on POS tagging sentences. It gives interesting results for words similarity regarding out-of-vocabulary both in terms of domain-specific words and misspellings. More specifically, the experiments were done on French language, but we believe that they would be similar on others.
Fichier principal
Vignette du fichier
paper132.pdf (1.45 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03474696 , version 1 (10-12-2021)

Identifiants

  • HAL Id : hal-03474696 , version 1

Citer

Alexandra Benamar, Meryl Bothua, Cyril Grouin, Anne Vilnat. Easy-to-use combination of POS and BERT model for domain-specific and misspelled terms. NL4IA Workshop Proceedings, Nov 2021, Milan, Italy. ⟨hal-03474696⟩
298 Consultations
1160 Téléchargements

Partager

Gmail Facebook X LinkedIn More