Gender classification and speaker identification using machine learning algorithms

Velásquez Martínez, Emmanuel de J.; Becerra Sánchez, Aldonso; De La Rosa Vargas, José I.; González Ramírez, Efrén; Zepeda Valles, Gustavo; Rodarte Rodríguez, Armando; Escalante García, Nivia I.; Olvera González, J. Ernesto

DSpace Principal
→
Maestría en Ciencias del Procesamiento de la Información
→
*Documentos Académicos*-- M. en Ciencias del Proc. de la Info.
→
Ver ítem

dc.contributor	31249	en_US
dc.contributor.other	https://orcid.org/0000-0002-7337-8974	en_US
dc.coverage.spatial	Global	en_US
dc.creator	Velásquez Martínez, Emmanuel de J.
dc.creator	Becerra Sánchez, Aldonso
dc.creator	De La Rosa Vargas, José I.
dc.creator	González Ramírez, Efrén
dc.creator	Zepeda Valles, Gustavo
dc.creator	Rodarte Rodríguez, Armando
dc.creator	Escalante García, Nivia I.
dc.creator	Olvera González, J. Ernesto
dc.date.accessioned	2023-10-30T18:58:06Z
dc.date.available	2023-10-30T18:58:06Z
dc.date.issued	2022-11-15
dc.identifier	info:eu-repo/semantics/acceptedVersion	en_US
dc.identifier.uri	http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/3430
dc.identifier.uri	http://dx.doi.org/10.48779/ricaxcan-261
dc.description.abstract	The speech is a unique biological feature to each person, and this is commonly used in speaker identification tasks like home automation applications, transaction authentication, health, access control, among others. The purpose of the present work is to compare gender classification and speaker identification experiments in order to determine the machine learning algorithm that shows the best metrics performance based on Mel frequency cepstral coefficients (MFCC) as speech descriptive features. In this process, the machine learning algorithms implemented were logistic regression, random forest, k-nearest neighbors and neural network, which were evaluated with accuracy, specificity, sensitivity and area under the curve. The schemes that revealed the best performance were random forest and k-nearest neighbors, reflecting an AUC (area under the curve) of 1, which indicates that the models have robust capacity of classification both in isolated samples and in complete audio files. The results obtained open guidelines to carry out another type of experimentation using the MFCC features with audios where the environment noise factor is included to measure the performance with these classification algorithms. The experimentation proposed for this work seeks to be applied in the future in different areas, where MFCC are used to describe the voice to perform another type of classification.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE Explore	en_US
dc.relation	https://ieeexplore.ieee.org/Xplore/home.jsp	en_US
dc.relation.isbasedon	UAZ-2022-38599 Diseño de esquemas robustos para reconocimiento de voz y sistemas End-to-End (E2E): uso de nuevas funciones de costo y algoritmos de eliminación de ruido	en_US
dc.relation.uri	generalPublic	en_US
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.source	Congreso Internacional de Mecatrónica Control e Inteligencia Artificial (CIMCIA), UNAM, FESC, Estado de México, 2022	en_US
dc.subject.classification	INGENIERIA Y TECNOLOGIA [7]	en_US
dc.subject.other	Gender classification	en_US
dc.subject.other	machine learning algorithms	en_US
dc.subject.other	MFCC	en_US
dc.subject.other	speaker identification	en_US
dc.title	Gender classification and speaker identification using machine learning algorithms	en_US
dc.type	info:eu-repo/semantics/conferenceProceedings	en_US