Repositorio Dspace

Combining Deep Learning with Domain Adaptation and Filtering Techniques for Speech Recognition in Noisy Environments

Mostrar el registro sencillo del ítem

dc.contributor 31249 en_US
dc.contributor.advisor Escalante García Nivia I. en_US
dc.contributor.advisor Olvera González J. Ernesto en_US
dc.contributor.other https://orcid.org/0000-0002-7337-8974 en_US
dc.coverage.spatial Global en_US
dc.creator Velásquez Martínez, Emmanuel de J.
dc.creator Becerra Sánchez, Aldonso
dc.creator de la Rosa Vargas, José I.
dc.creator González Ramírez, Efrén
dc.creator Rodarte Rodríguez, Armando
dc.creator Zepeda Valles, Gustavo
dc.date.accessioned 2023-11-06T19:36:26Z
dc.date.available 2023-11-06T19:36:26Z
dc.date.issued 2023-10-22
dc.identifier info:eu-repo/semantics/acceptedVersion en_US
dc.identifier.isbn 979-8-3503-3688-7 en_US
dc.identifier.uri http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/3435
dc.identifier.uri http://dx.doi.org/10.48779/ricaxcan-266
dc.description.abstract Speech recognition is a common task in various everyday user systems; however, its effectiveness is limited in noisy environments such as moving vehicles, homes with ambient noise, mobile phones, among others. This work proposes to combine deep learning techniques with domain adaptation and filtering based on Wavelet Transform to eliminate both stationary and non-stationary noise in speech signals in automatic speech recognition (ASR) and speaker identification tasks. It demonstrates how a deep neural network model with domain adaptation, using Optimal Transport, can be trained to mitigate different types of noise. Evaluations were conducted based on Short-Term Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ). The Wavelet Transform (WT) was applied as a filtering technique to perform a second processing on the speech signal enhanced by the deep neural network, resulting in an average improvement of 20% in STOI and 9% in PESQ compared to the noisy signal. The process was evaluated on a pre-trained ASR system, achieving a general decrease in WER of 14.24%, while an average 99% accuracy in speaker identification. Thus, the proposed approach provides a significant improvement in speech recognition performance by addressing the problem of noisy speech. en_US
dc.language.iso eng en_US
dc.publisher IEEE en_US
dc.relation.isbasedon UAZ-2022 38599 en_US
dc.relation.uri generalPublic en_US
dc.rights Attribution 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by/3.0/us/ *
dc.source IEEE International Autumn Meting on Power, Electronics and Computing (Ixtapa, Méx.), México en_US
dc.subject.classification INGENIERIA Y TECNOLOGIA [7] en_US
dc.subject.other Deep Learning en_US
dc.subject.other Domain Adaptation en_US
dc.subject.other Filtering en_US
dc.title Combining Deep Learning with Domain Adaptation and Filtering Techniques for Speech Recognition in Noisy Environments en_US
dc.type info:eu-repo/semantics/conferenceProceedings en_US


Ficheros en el ítem

El ítem tiene asociados los siguientes ficheros de licencia:

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Attribution 3.0 United States Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution 3.0 United States

Buscar en DSpace


Búsqueda avanzada

Listar

Mi cuenta

Estadísticas