Authors
Gurpreet Kaur, Dr. Ranjeet Kumar Singh
Abstract
Spoken language is the natural method of communication that contains the transfer of various information related to linguistics (accent etc.), information related to speakers (emotions, etc.), and also information related to the environment (background noise, etc.). The ability of humans to extract and decode spoken language automatically inspires various researchers to study the distinct prospects of spoken language, which includes recognition of accent or recognition of changed accent, recognition of emotions or gender, etc. “Voiceprint” is a collection of acoustic frequency spectrum which contains the significant features of a human speech that are used for the recognition of a speaker. The voiceprint of an individual has a distinct quality of uniqueness, durability, and strength. Every speaker has unique features of speaking besides those physiological dissimilarities such as the use of specific accent, intonation style, etc. An apprehensive speech is a disguised speech of the speaker recorded under the influence of any threat, nervousness, etc. used for various criminal purposes such as fraud or spam calls, etc. This paper focuses on the areas concerned with the information extraction of an individual’s speech observable in speech signals such as emotional state, intentional accent change, belligerence, etc. will give better clues to the investigator for the differentiation. Some external factors (environmental noise or emotions etc.) impact the effectiveness of speaker identification. But, the basic components of their original voice remain unchanged such as formant frequency in the “Voiceprints” which helps in the recognition process even after using an apprehensive voice. The intonation pattern of formants of the speaker’s original voices will almost be similar to the intonation pattern of formants of the speaker’s deliberate apprehensive voice. Keywords: Apprehensive Voice, Disguised Voice, Speaker Recognition, Voiceprint, Intonation Pattern.
Introduction
Ears have the unique ability to receive and decipher spoken language. Besides that, the ears also have various diverse functions, out of which one function of the ear is the identification of people by their voices (Sharma and Bansal 2013). Forensic auditory analysis has been a subject of methodological and scientific discussion for a long time. It is a globally expanding tendency that the criminals are more willing to disguise their voices to hide their identity, particularly in cases of extortion, threatening calls, emergency calls to the police (Zhang and Tan, 2007). Every individual in this world has a unique voice. The voice of no two individuals is the same due to physiological dissimilarities. The individuality of the individual’s voice can be employed to verify the person’s identity. Every speaker has unique features of speaking besides those physiological dissimilarities such as the use of specific accent, intonation style, rhythm, suffering from any disorder that affects the speaking ability and causes tremors, etc. (Zheng and Li., 2017). There are many possibilities available to the speaker to manipulate his/her voice to falsify an automatic recognition system or even the human ear (Perrot, Aversano, and Chollet, 2007). Biometric access control systems, Automatic Speaker Recognition Systems, auditory analysis in forensics, etc. are several examples of speaker and voice recognition (Lal and Nath N. J., 2015). Biometric accessed security systems are built based on the unique features of humans like voice, fingerprints, etc. These types of systems provide an additional barrier to stop the unauthorized approach for the protection of data by detecting the user’s particular behavioral or physiological features. Biometric accessed security systems are more authentic than the standard traditional method. There are higher demands on speaker identification on modeling the vocal tract features of speakers such as an illness, to provide a more secure approach to financial or sensitive information. The verification of speakers gives more barriers to stop uncertified access to secure the data and also improves the protection provided by personal identification (Li, Yang, and Dai, 2014).
Apprehensive voice is mainly that voice which reflects fear, anger, anxiety, nervousness, shivered voice because of illness which in final disguises the voice intentionally or unintentionally. An apprehensive speech is that disguised speech of the speaker which is recorded under the influence of any threat, fear, anger, and nervousness, etc. It also comes under the disguised voice category. This type of voice is used for various criminal and illicit purposes such as fraud or spam calls, threatening calls, and also during the sample collection process of suspects. In the forensic science field, finding a solution to differentiate between apprehensive voices from normal voices will give better indications to the investigator during the investigation process. Voice disguise is an intentional act of a speaker to alter, distort, deviate, or manipulate their normal voice to hide or falsify their identity (Klevans & Rodman, 1998). In the field of acoustic analysis as well as in forensic science, the speaker recognition process or techniques are inexorable and they are used in the speaker identification. Speaker recognition based on Voiceprints is defined as the recognition of an individual or speaker’s identity by using their voiceprints. Various researches proposed that the voiceprint of an individual has a distinct quality of uniqueness, durability, and strength, which always remain stable and become unchanged in adulthood except for some disorders. It is also suggested that the voice of no individuals is the same due to physiological dissimilarities. Every speaker has unique features of speaking besides those physiological dissimilarities such as the use of specific accent, intonation style, rhythm, suffering from any disorder that affects the speaking ability, and causes tremors, etc. Intonation patterns of voice are the patterns of variation generated by the rise and fall in the pitch of the voice. These intonation patterns are also helpful in speaker recognition in case of a disguised or apprehensive voice.
References
C. Zhang, T. Tan. “Voice Disguise and Automatic Speaker Recognition”, Forensic Science International, Vol. 175, 2007, pp. 118–122.
G. S. Didla and H. Hollien. “Voice Disguise and Speaker Identification”, Acoustical Society of America, Proceedings of Meetings on Acoustics, 02-06 November 2015, Vol. 25.
Gangamohan, P., et al. “Analysis of Emotional Speech—A Review.” Toward Robotic Socially Believable Behaving Systems - Volume I Intelligent Systems Reference Library, 2016, pp. 205–238.
George, A. M. et al. “Detection of Voice Disguise by Various Disguising Factors”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 3, Issue 8, August 2015.
Künzel, H. J. “Effects of Voice Disguise on Speaking Fundamental Frequency”, International Journal of Speech, Language and the Law, vol. 7(2), 2000, pp.150-179.
Künzel, Hermann J., et al. “Effect of Voice Disguise on the Performance of a Forensic Automatic Speaker Recognition System.” ODYSSEY04 -- The Speaker and Language Recognition Workshop Toledo, Spain. 2004
Kurian, S. et.al. “Recognition of Electronic Disguised Voices by the Means of MFCC”, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 5, Issue 6, June 2016.
Li, Dongdong, et al. “Cost-Sensitive Learning for Emotion Robust Speaker Recognition.” The Scientific World Journal, vol. 2014, 2014, pp. 1–9.
Lini T Lal et al. “Identification of Disguised Voices using Feature Extraction and Classification”, International Journal of Engineering Research and General Science, Volume 3, Issue 2, Part 2, March-April, 2015 ISSN 2091-2730.
Perrot, Chollet et al. “Detection and Recognition of Voice Disguise”, Conference: IAFPA International Association for Forensic Phonetics and Acoustics, 2007.
Perrot, Patrick, et al. “Voice Disguise and Automatic Detection: Review and Perspectives.” Lecture Notes in Computer Science Progress in Nonlinear Speech Processing, 2007, pp. 101–117.
Pohjalainen, Jouni, and Paavo Alku. “Automatic Detection of Anger in Telephone Speech with Robust Autoregressive Modulation Filtering.” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
Sharma V, Bansal P.K, “A Review on Speaker Recognition Approaches and Challenges”, International Journal of Engineering Research & Technology (IJERT), Vol. 2, Issue 5, May – 2013.
Tiejun Tan. “The Effect of Voice Disguise on Automatic Speaker Recognition”, 3rd International Congress on Image and Signal Processing (CISP2010), Volume 8, 16-18 October 2010.
Tim Polzeh et al., “Anger Recognition in Speech Using Acoustic and Linguistic Cues”, Speech Communication, Volume 53, Issues 9–10, November–December 2011, Pages 1198-1209.
How to cite this article?
APA Style | Kaur, G., & Singh, R. K. (2020). Speaker Recognition: On the Basis of their Habitual and Apprehensive Voice. Academic Journal of Forensic Sciences, 03(01), 12–19. |
Chicago Style | |
MLA Style | |
DOI | |
URL |