Completed research project

Title / Titel Forensic Phonetic Speaker Identification Based on Temporal Evidence
PDF Abstract (PDF, 14 KB)
Summary / Zusammenfassung Everyday experiences tell us that it is typically possible to identify a speaker solely on the basis of his/her voice (e.g. when someone starts a phone call with a simple 'hi' or when people talk in a different room). Such observations reveal that speakers carry individual features in their voices by which they can be identified to a considerable degree. The present project aims at studying the role of temporal characteristics of the speech signal in speaker identification. The study will pay particular attention to possible applications of the results in the field of forensic phonetics in which phonetic knowledge is applied in legal cases where the identity of the speaker in a recording is disputed.

We start from the observation that the acoustic speech signal is made up of dynamic processes resulting from the movements of the articulators. It has been demonstrated successfully in other scientific domains that humans can be identified on the basis of their movements only, e.g. by the way they walk. Our working hypothesis is that the movements of the organs of speech (e.g. jaw, lips or tongue) can be equally idiosyncratic as human gait and that idiosyncratic ways to move the organs of speech leave individual temporal charcateristics in the acoustic speech signal. We will therefore study numerous durational parameters in speech from segment durations (e.g. the durations of consonants and vowels) over syllable and word to prosodic durations (e.g. durational characteristics of intonation). In the first year of the project we are aiming at identifying temporal measures of speech that are most speaker-idiosyncratic. In the second year we will test these measures towards within speaker variability (e.g. different types of voice disguise). In year three we will use behavioral experimental methods to test whether the measures we have identified as being most speaker-idiosyncratic are perceptually salient (i.e. whether listeners can identify a speaker solely on the basis of certain temporal voice characteristics).

It is well possible that we will find that some temporal speaker idiosyncratic features are perceptually salient and others are not. We argue that the salient temporal features will help us explaining how human listeners identify speakers on the basis of their voice. Non-salient features, however, may be less prone to within speaker variability like voice disguise as they should be difficult to control for speakers. Such features may thus be of high value for acoustic voice identification of non-cooperative speakers (i.e. speakers not wishing to be identified) typically found under forensic circumstances.
Publications / Publikationen Dellwo,V., Fourcin,A., Abberton,E. (2007). Rhythmical classification based on voice parameters. International Conference of Phonetic Sciences (ICPhS) [Online]

Dellwo,V., Huckvale,M., Ashby,M. (2007). How is individuality expressed in voice? An introduction to speech production & description for speaker classification. in Mueller,C.

Dellwo, V. and Koreman, J. (2008) How speaker idiosyncratic is acoustically measurable speech rhythm? Abstract presented at the annual IAFPA meeting 2008, Lausanne/Switzerland.

Dellwo, V., Ramyead, S., and Dankovicova, J. (2009) The influence of voice disguise on temporal characteristics of speech. Abstract presented at the annual IAFPA meeting 2009, Cambridge/UK.

Fourcin, A. and Dellwo, V. (2009) Rhythmic classification of languages based on voice timing. UCL Eprints, London, UK (

Dellwo, Volker; Leemann, Adrian; Kolly, Marie-José (2015). The recognition of read and spontaneous speech in local vernacular: The case of Zurich German. Journal of Phonetics, 48:13-28.

Kolly, Marie-José; Leemann, Adrian; Dellwo, Volker (2014). Foreign accent recognition based on temporal information contained in lowpass-filtered speech. In: Interspeech 2014, Singapore, 14 September 2014 - 18 September 2014.

Mixdorff, Hansjörg; Leemann, Adrian; Dellwo, Volker (2014). The influence of speech rate on Fujisaki model parameters. EURASIP Journal on Audio, Speech, and Music Processing, 2014(33):online.

Dellwo, Volker; Hove, Ingrid; Leemann, Adrian; Kolly, Marie-José (2014). Verbrecherjagd mit gesprochener Sprache : Möglichkeiten und Grenzen der forensischen Phonetik. Kriminalistik, (2):90-97.

Kolly, Marie-José; Dellwo, Volker (2014). Cues to linguistic origin: The contribution of speech temporal information to foreign accent recognition. Journal of Phonetics, 42:12-23.

Leemann, Adrian; Kolly, Marie-José; Dellwo, Volker (2014). Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison. Forensic Science International, 238:59-67.

Leemann, Adrian; Mixdorff, Hansjörg; O'Reilly, Maria; Kolly, Marie-José; Dellwo, Volker (2014). Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison. International Journal of Speech, Language and the Law, 21(2):343-370.

Dellwo, Volker; Fourcin, Adrian (2013). Rhythmic characteristics of voice between and within languages. TRANEL - Travaux neuchâtelois de linguistique, 59:87-107.

Friedrichs, Daniel; Dellwo, Volker (2013). Rhythmische Variabilität bei synchronem Sprechen und ihre Bedeutung für die forensische Sprecheridentifizierung. TRANEL - Travaux neuchâtelois de linguistique, 59:149-166.

Schmid, Stephan; Dellwo, Volker (2013). Sprachrhythmus bei bilingualen Sprechern. TRANEL - Travaux neuchâtelois de linguistique, 59:109-126.

Dellwo, Volker; Leemann, Adrian; Kolly, Marie-José (2012). Speaker idiosyncratic rhythmic features in the speech signal. In: Interspeech 2012, Portland (OR), USA, 9 September 2012 - 13 September 2012, 1-4.

Dellwo, Volker (2009). Choosing the right rate normalization method for measurements of speech rhythm. In: Schmid, Stephan; Schwarzenbach, Michael; Studer-Joho, Dieter. La dimensione temporale del parlato. Torriana: EDK, 13-32.

Keywords / Suchbegriffe forensic phonetics, time-domain, temporal parameters, speech prosody, speech rhythm, speech timing
Project leadership and contacts /
Projektleitung und Kontakte
Prof. Dr. Volker Dellwo (Project Leader)
Prof. Dr. Stephan Schmid  
Dr. Adrian Leemann  
Marie-José Kolly  
Other links to external web pages
Funding source(s) /
Unterstützt durch
Universität Zürich (position pursuing an academic career), SNF (Personen- und Projektförderung)
In collaboration with /
In Zusammenarbeit mit
Dr. Ingrid Hove
Phonetisches Labor
Universitaet Zuerich
Dr. Uwe Reichel
Institut fuer Phonetik
Universitaet Muenchen
Prof. Dr.-Ing. habil. Hansjörg Mixdorff
Beuth-Hochschule für Technik Berlin
(University of Applied Sciences)
FB Informatik und Medien
Duration of Project / Projektdauer Aug 2010 to May 2015