Speaking loud, speaking high: non-linearities in voice strength and vocal register variations Christophe d Alessandro LIMSI-CNRS Orsay, France 20/06/13 NOLISP 2013 1 Content of the talk • Introduction: voice quality • 1. Voice quality dimensions • 2. source/filter model in time and frequency • 3. Non-linearities : voice quality dimension vs voice acoustic parameters, using synthesis • Application: performative synthesis 20/06/13 NOLISP 2013 2 Voice, Speech, Singing, Meaning and Expression Functions of voice in communication: 1. Linguistic and pragmatic functions : to convey linguistic meaning (ideas, concepts, facts …), to perform speech acts (command, promise …). Mainly associated to phoneme and words (double articulation). Noted using writing. 2. Expressive function: to make audible attitudes, feelings, emotions, personality, mood. Speech beyond (or below) linguistic meaning. Mainly associated to prosody and voice quality. Difficult to note using writing. “The music of speech” 3. Musical function: singing, non linguistic but highly structured communication 20/06/13 NOLISP 2013 3 Voice Quality: a prosodic feature ? • Prosodic parameters are usually restricted to pitch, duration, pauses and some sort of intensity parameter. • But intonation and voice quality are linked (e.g. voice registers) • In some languages, voice quality has a phonological status (e.g. strangled tones of Vietnamese) • In all languages voice quality has a pragmatic function • Synthesis of expressive speech has demonstrated that convincing natural sounding results are impossible to obtain without dealing with voice quality parameters 20/06/13 NOLISP 2013 4 Voice Quality: expression of emotions ? • Vocal expression of emotions and attitudes is one of the main domains of application for voice quality studies. • Although it has been studied for a long time in psychology, it can be considered as an emerging research domain in many areas of speech communication : speech recognition and synthesis, but also speech coding. • Voice quality is crucial for singing, theatre and other aesthetical vocalizations. 20/06/13 NOLISP 2013 5 Questions related to Voice Quality • Voice quality is still a rather fuzzy concept: what is the timbre of a voice? • What are the domains of variation of every day speech? • How to measure and quantify voice quality dimensions like vocal effort, vocal tension or noise in the voice? • What are the physical and perceptive correlates of voice quality? • What are the relationships between voice quality and others aspects of prosody? 20/06/13 NOLISP 2013 6 Voice quality dimensions Ø A promenade in the landscape of voice quality, speech, singing. Ø Phonation dimensions Ø Vocal tract dimensions 20/06/13 NOLISP 2013 7 Voice quality dimensions • Syllabic or sentence-level voice quality • Dimensions are often defined according to production (instead of perception) • Based on settings of respiration, articulation and phonation 20/06/13 NOLISP 2013 8 Speech production model Four main parts: 1. Respiration 2. Phonation 3. Articulation 4. radiation 20/06/13 NOLISP 2013 9 Speech production model Voice quality is in: 1. Respiration: laughter, subglottal pressure 2. Phonation: phonation types, voice registers, effort, tension, voicing, noise 3. Articulation: smile, rounding, rate, strength, front/back, vocal tract length 20/06/13 NOLISP 2013 10 Voice quality dimensions: examples (1) Breathiness Whispe r Semivoiced voiced Pressed 1 Pressed 2 Nasalisation Nasal 2 Nasal 1 Modal Denasaliz ed 1 Denasaliz ed 2 Roughness Modal Rough 1 Rough 2 Rough 3 Creakiness Modal Creaky 1 Creaky 2 Creak y3 Long 1 Modal Short 1 Short 2 Lax Modal Tense 1 Tense 2 Vocal Tract Length Long 2 Tension Press ed 3 1 female speaker, 1 sentence: "Il est sorti avant le jour." with various vocal qualities. Tense 3 Lips Rounde d2 Rounde d1 Modal Retracted 1 Retracted 2 Pitch Low 2 Low 1 Modal High 1 High 2 High 3 High 4 High 5 High 6 Weak Modal Loud 1 Loud 2 Loud 3 Loud 4 Loud 5 Lou d6 Laugh 4 Laugh 5 Laugh 6 Laug h7 Laug h8 4 5 6 Loudness Laughs Laugh 1 Laugh 2 Laugh 3 Smiling Smiling 1 Smiling 2 Smilin g3 Autres 1 2 3 20/06/13 NOLISP 2013 Loud 6b Lou d7 Loud 7b 11 Voice quality dimensions: examples (2) 1 male speaker, 1 sentence: "She (has) left for a great party today" with various vocal qualities. 20/06/13 Modal voice mod al1 mod al2 moda l3 moda l4 moda l5 modal 6 Nasalization nasal 1 nasa l2 nasal 3 nasal 4 Roughness/ Creakiness roug h1 roug h2 roug h3 creak y1 creak y2 creaky 3 Vocal tract short 1 short 2 long1 long2 open ed closed 1 closed 2 Tension relax 1 relax 2 tense d1 tense d2 tense d3 tense d4 tensed 5 Lip protrusion roun d1 roun d2 smile 1 smile 2 Pitch low1 low2 low3 high1 high2 high3 high4 Loudness(1) whis per soft1 soft2 soft3 soft4 soft5 Loudness(2) loud1 loud 2 loud3 loud4 loud5 loud6 strong shout1 shout2 sho ut3 Laughs laugh 1 laug h2 laugh 3 laugh 4 Others left centr al right clear yawn theatri cal omino us1 omino us2 mysteri ous dark NOLISP 2013 tensed 6 12 Phonation types The three main sources of sound in the larynx are (Catford, 1977): 1. vocal fold vibration (voiced speech) 2. turbulent noise produced through open vocal folds (unvoiced speech) 3. ventricular band vibrations (ventricular speech) 4. Mixtures of voiced, noisy and ventricular phonation types 20/06/13 NOLISP 2013 13 Phonation types Sound examples: 1. vocal fold vibration (voiced speech) 2. turbulent noise produced through open vocal folds (unvoiced speech) 3. ventricular band vibrations (ventricular speech) 4. Mixtures of voiced, noisy and ventricular phonation types 5. Polyphonic voice (ventricular + vocal folds) 20/06/13 NOLISP 2013 14 Main voice quality dimensions Four main dimensions: 1. voice registers :voice “mechanisms”: creak, modal, falsetto, whistle 2. noise: breathiness, hoarseness 3. Pressure: pressed/lax voice, “strangled” tones. 4. Effort: accentuation, force. 20/06/13 NOLISP 2013 15 Voice registers Phonation Description type Production Voice registers Creak V e r y l o w Mechanism 0 of vocal folds vibration. f r e q u e n c y , Thick and heavy vocal folds, low subperiodic air pulses glottal pressure, low mean flow Modal Usual voice for Mechanism 1 of vocal folds vibration. most males and Thick and heavy vocal folds vibrating l o w - p i t c h e d along their whole lengths females, low to medium F0 register. Falsetto Usual voice for Mechanism 2 of vocal fold vibration. Thin h i g h p i t c h e d and light vocal folds, vibrating along females, high F0 about 2/3 of their anterior lengths register 20/06/13 NOLISP 2013 16 Ventricular phonation Phonation Description type Production Ventricular phonation Ventricular A harsh quality, Produced between the ventricular with a lot of bands, or “false vocal folds” aperiodicities, low F0 Ventricular V e r y l o w Ventricular bands vibration, low subcreak f r e q u e n c y , glottal pressure, low mean flow periodic air pulses 20/06/13 NOLISP 2013 17 Aperiodicities Phonatio Description n type Production Aperiodicities B r e a t h Unvoiced speech phonation Glottis wide open, high mean flow B r e a t h y A m i x t u r e o f Incomplete folds closure. High mean voice breath and voice flow. Glottal chink Whisper Unvoiced speech Narrowed opening compared to breath phonation, low mean flow Whispery A m i x t u r e o f Incomplete folds closure. Low mean voice whisper and voice flow. Narrow glottal chink. H o a r s e Irregular, rough A voice with structural aperiodicities, voice quality jitter or shimmer Multipho A v o i c e w i t h Dissymmetric vibration of the vocal ny multiple F0 and/or folds, or combination of ventricular and sub-harmonics voiced vibrations 20/06/13 NOLISP 2013 18 Lax-tense voice Phonation Description type Production Lax-tense dimension Tense A hard or sharp Adduction of the posterior part of vocal quality, audible folds glottal formant Lax A relaxed, soft Abduction of the posterior part of vocal voice quality folds 20/06/13 NOLISP 2013 19 Vocal effort Phonation Description type Production Vocal effort dimension Loud A strong voice, with High sub-glottal pressure, high tension of the much vocal force vocal folds, moderate flow, high voicing amplitude Flow voice A strong voice, with Normal sub-glottal pressure, tension of the high amplitude of vocal folds, high flow, high voicing voicing and flow. amplitude Weak 20/06/13 A w e a k v o i c e , Low sub-glottal pressure, low tension of the without vocal force vocal folds, low flow, low voicing amplitude NOLISP 2013 20 The voice registers dimension Voice register depend on the underlying voice mechanism: Mechanism 0: vocal fry (creaky voice), very low F0, thick and heavy vocal folds, low sub-glottal pressure, low mean flow Mechanism I: modal voice, usual voice for males and lowpitched females, low to medium F0 register. Thick and heavy vocal folds vibrating along their whole lengths Mechanism II: falsetto voice, usual voice for high pitched females, high F0. Thin and light vocal folds, vibrating along about 2/3 of their anterior lengths Mechanism III: whistle. Very high pitch, mostly children and possibly female 20/06/13 NOLISP 2013 21 The voice registers dimension (Henrich et Castellengo, 2001) Modal voice Falsetto voice (after Vennard, 1967) 20/06/13 NOLISP 2013 22 The voice registers dimension Glissando (barytone) 20/06/13 NOLISP 2013 (Henrich, 2001) 23 The voice registers dimension (Glissando contre-ténor) (Henrich, 2001) 20/06/13 NOLISP 2013 24 The noise dimension Represents the relative amount of noise in the speech signal. 1. Additive noises. Whispery voice, breathy voice. Turbulent flow at the glottal constriction. 2. Structural noises. Hoarseness, roughness: 1. Jitter: This is a random fluctuation of the duration of fundamental periods; 2. Shimmer: This is a random fluctuation of amplitude for successive periods. 20/06/13 NOLISP 2013 25 The noise dimension Sound examples 1. Additive noises. 1. Whispery voice. narrow glottis 2. breathy voice. Wide glottis, voicing constriction. 2. Structural noises. Hoarseness, roughness: 20/06/13 NOLISP 2013 26 The pressed/lax dimension The vocal folds can be pressed together more or less strongly at their posterior extremities (arytenoids cartilages): 1. Pressed voice: sometimes called “tense” or “sharp” voice quality 2. Lax voice: if the arytenoids are separated, a chink is created at the posterior part of the glottis. Note that this pressed quality may be relatively independent of the vocal effort. 20/06/13 NOLISP 2013 27 The pressed/lax dimension Sound Examples 1. Pressed voice: sometimes called “tense” or “sharp” voice quality 2. Lax voice: if the arytenoids are separated, a chink is created at the posterior part of the glottis. Note that this pressed quality may be relatively independent of the vocal effort. 20/06/13 NOLISP 2013 28 The vocal effort dimension – important for stress and accentuation – important for emotion, affect, attitude etc – Loudness = spectral balance and voice amplitude. – Results of tension and stiffness of the vocal folds, high sub-glottal pressure 20/06/13 NOLISP 2013 29 The vocal effort dimension Sound examples 1. Speech 1. Soft 2. Loud 3. shouting 2. Singing 3. Emotions… 20/06/13 NOLISP 2013 30 Voice range profile (phonetogram) (Sulter, Wit, Schutte, Miller, (1994)) 20/06/13 NOLISP 2013 31 Vocal tract settings – important for emotion, affect, attitude etc – Important for styles – Co-variation with source – Very few systematic acoustic studies 20/06/13 NOLISP 2013 32 The vocal tract dimension Sound examples 1. Smiling 2. Rounding 3. Bite block 4. Lengthening 5. Shortening 6. yawning 20/06/13 NOLISP 2013 33 Conclusions on voice quality dimensions • About 4 main dimensions for phonation (+ pitch/f0) • Vocal tract dimensions of voice quality mostly unknown • Respiration dimension of voice quality mostly unknown 20/06/13 NOLISP 2013 34 Modelling the voice source Ø Voice source signals models Ø Time-domain and spectral parameters Ø Physical model and signal models 20/06/13 NOLISP 2013 35 Glottal flow models : time domain Examples: Rosenberg C (Rosenberg, 1971) LF (Liljencrants & Fant, 1985) Klatt (Klatt & Klatt, 1990) R++ (Veldhuis, 1998) 20/06/13 NOLISP 2013 36 Glottal flow models KLGLOTT88 (Klatt & Klatt, Jasa 1988) Rosenberg C (Rosenberg Jasa 1971) LF model,( Liljenkrants, Fant, Lin KTH -STL, 1985) 20/06/13 NOLISP 2013 37 A unified set: 5 time-domain parameters (Doval, d’Alessandro & henrich, Acta Acustica 2006) • T0, fundamental period • Av, voiced amplitude • Oq , open quotient • am, asymmetry coefficient (equivalent to speed quotient) • Qa, return phase quotient Other parameters of interest :J, total flow of a single pulse • 20/06/13 E, negative peak amplitude of the glottal flow derivative NOLISP 2013 38 Time-domain equations In the case of Qa = 0 (abrupt closure), the GFM can all be expressed as : ng (x, am) depends on the model 20/06/13 normalized glottal flow model : NOLISP 2013 39 Glottal flow models : frequency domain Glottal flow: Glottal flow derivative: Ng (x, am) : Fourier transform of ng (x, am) N’g (x, am) : Fourier transform of n’g (x, am) These two functions depend on the model 20/06/13 NOLISP 2013 40 Glottal flow models : spectral description Doval, d’Alessandro, Henrich (2006) « glottal formant » : Fg = 1 2π OqT0 en (α m ) 1 = jn (α m ) 2π E J Ag = Av en (α m ) jn (α m ) = E J spectral slope : 20/06/13 NOLISP 2013 Fa ≈ 1 2π Qa (1 − OqT0 ) Aa = Fg E = Ag 2π Fa Fa 41 Spectral / Time domain :open quotient, asymmetry 20/06/13 NOLISP 2013 42 Spectral / Time domain: spectral tilt Effect of E and Spectral tilt 20/06/13 NOLISP 2013 43 Causal-Anticausal linear voice source model (CALM) Doval, d’Alessandro, Henrich (2003) Anticausal filter Convergence region for a stable CALM Causal filter Glottal pulse (CALM vs. R++) 20/06/13 NOLISP 2013 Frequency response 44 Voice quality dimensions and acoustic parameters Ø Non-linear relationships between parameters of the acoustic model and voice quality dimensions Ø General relationships Ø Speaker low-high Ø Speaking soft-loud 20/06/13 NOLISP 2013 45 Voice quality dimensions and source parameters 20/06/13 dimension Time domain Spectral domain Registers F0, Open quotient T0, Glottal formant Noise Noise, Jitter, Shimmer Noise, harmonic widths Tension Open quotient Glottal formant Force Closure, peak flow Spectral tilt, amplitude NOLISP 2013 46 Voice quality dimensions and source parameters Parameter Description Duality Main effect Phonation on Time domain parameters Av Amplitude of voicing E, Ags Flow Oq Open quotient Fg Tenseness, Am Asymmetry Bg, Sq Tenseness, Loudness Qa Return phase Fa, Loudness Alternative time domain parameters E Derivative peak Av Flow, Loudness SPL Sound Pressure Level Av, E Flow, Loudness Sq Speed quotient Bg, Am Tenseness, loudness Rd Amplitude quotient (Fant) AV, E, F0 Loudness, Tenseness Aq Amplitude quotient (Alku) AV, E Loudness, Tenseness 20/06/13 NOLISP 2013 47 Voice quality dimensions and source parameters Parameter Description Duality Main effect on Phonation E Derivative peak Av Flow, Loudness SPL Sound Pressure Level Av, E Flow, Loudness Sq Speed quotient Bg, Am Tenseness, loudness Rd Amplitude quotient (Fant) AV, E, F0 Loudness, Tenseness Aq Amplitude quotient (Alku) AV, E Loudness, Tenseness Spectral parameters Fg Glottal formant frequency Oq Tenseness Bg Glottal formant bandwidth Sq, Am Tenseness, loudness Fa Spectral tilt frequency Qa, Tl Loudness Ags Glottal formant amplitude Av, SPL Flow Alternative spectral parameters H1*-H2* 1rst and 2nd Harmonic amplitude differences Oq, Am Tenseness H1*-F3* 1rst harmonic to 3rd formant amplitude difference Tl, Qa,Fa Loudness Tl Spectral tilt Qa Loudness HRF Harmonic richness factor Qa Loudness 20/06/13 NOLISP 2013 48 Voice quality dimensions and source parameters Parameter Description Duality Main effect Phonation on Aperiodicities Jitter Period-to-period frequency variations Roughness Shimmer Period-to-period amplitude variation Roughness PAPR Periodic-aperiodic ratio Breathiness, whisper LoV Limit of voicing Breathiness, whisper NTL Noise spectral tilt Breathiness, whisper IHN Inter harmonic noise Breathiness, whisper 20/06/13 NOLISP 2013 49 Speaking high Low-high dimension • voice registers : signal changes with pitch height: open quotient, amplitude • phonétogram : SLP dependence with pitch height • formant tuning: vocal tract changes with pitch height • FG tuning : open quotient 20/06/13 NOLISP 2013 50 20/06/13 • • • • • • • • • Speaking loud Soft-loud dimension voice spectral tilt changes spl changes noise in the source formant tuning: vowel opening F0 rise F0 contour F1 tuning (Liénard) FG tuning Peakiness, impulsiveness NOLISP 2013 51 Application to Performative synthesis • Formant + CALM source • Real-time control • Including non-linear source-filter interactions • Including a phonetogram • DEMO : Cantor Digitalis 22/06/13 NOLISP 2013 52 Acknowledgements Contributions of : Sylvain Le Beux, Nicolas D Alessandro, Lionel Feugère, Boris Doval, Olivier Perrotin For the Cantor Digitalis Are gratefully acknowledged 20/06/13 NOLISP 2013 53
© Copyright 2025 Paperzz