Combining Deep Learning with Domain Adaptation and Filtering Techniques for Speech Recognition in Noisy Environments

Velásquez Martínez, Emmanuel de J.; Becerra Sánchez, Aldonso; de la Rosa Vargas, José I.; González Ramírez, Efrén; Rodarte Rodríguez, Armando; Zepeda Valles, Gustavo

DSpace Principal
→
Maestría en Ciencias del Procesamiento de la Información
→
*Documentos Académicos*-- M. en Ciencias del Proc. de la Info.
→
Ver ítem

dc.contributor	31249	en_US
dc.contributor.advisor	Escalante García Nivia I.	en_US
dc.contributor.advisor	Olvera González J. Ernesto	en_US
dc.contributor.other	https://orcid.org/0000-0002-7337-8974	en_US
dc.coverage.spatial	Global	en_US
dc.creator	Velásquez Martínez, Emmanuel de J.
dc.creator	Becerra Sánchez, Aldonso
dc.creator	de la Rosa Vargas, José I.
dc.creator	González Ramírez, Efrén
dc.creator	Rodarte Rodríguez, Armando
dc.creator	Zepeda Valles, Gustavo
dc.date.accessioned	2023-11-06T19:36:26Z
dc.date.available	2023-11-06T19:36:26Z
dc.date.issued	2023-10-22
dc.identifier	info:eu-repo/semantics/acceptedVersion	en_US
dc.identifier.isbn	979-8-3503-3688-7	en_US
dc.identifier.uri	http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/3435
dc.identifier.uri	http://dx.doi.org/10.48779/ricaxcan-266
dc.description.abstract	Speech recognition is a common task in various everyday user systems; however, its effectiveness is limited in noisy environments such as moving vehicles, homes with ambient noise, mobile phones, among others. This work proposes to combine deep learning techniques with domain adaptation and filtering based on Wavelet Transform to eliminate both stationary and non-stationary noise in speech signals in automatic speech recognition (ASR) and speaker identification tasks. It demonstrates how a deep neural network model with domain adaptation, using Optimal Transport, can be trained to mitigate different types of noise. Evaluations were conducted based on Short-Term Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ). The Wavelet Transform (WT) was applied as a filtering technique to perform a second processing on the speech signal enhanced by the deep neural network, resulting in an average improvement of 20% in STOI and 9% in PESQ compared to the noisy signal. The process was evaluated on a pre-trained ASR system, achieving a general decrease in WER of 14.24%, while an average 99% accuracy in speaker identification. Thus, the proposed approach provides a significant improvement in speech recognition performance by addressing the problem of noisy speech.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.relation.isbasedon	UAZ-2022 38599	en_US
dc.relation.uri	generalPublic	en_US
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.source	IEEE International Autumn Meting on Power, Electronics and Computing (Ixtapa, Méx.), México	en_US
dc.subject.classification	INGENIERIA Y TECNOLOGIA [7]	en_US
dc.subject.other	Deep Learning	en_US
dc.subject.other	Domain Adaptation	en_US
dc.subject.other	Filtering	en_US
dc.title	Combining Deep Learning with Domain Adaptation and Filtering Techniques for Speech Recognition in Noisy Environments	en_US
dc.type	info:eu-repo/semantics/conferenceProceedings	en_US