slides as PDF

Speaking loud, speaking high:
non-linearities in voice strength
and vocal register variations
Christophe d Alessandro
LIMSI-CNRS
Orsay, France
20/06/13
NOLISP 2013
1
Content of the talk
•  Introduction: voice quality
•  1. Voice quality dimensions
•  2. source/filter model in time and
frequency
•  3. Non-linearities : voice quality dimension
vs voice acoustic parameters, using
synthesis
•  Application: performative synthesis
20/06/13
NOLISP 2013
2
Voice, Speech, Singing, Meaning and Expression
Functions of voice in communication:
1.  Linguistic and pragmatic functions : to
convey linguistic meaning (ideas, concepts,
facts …), to perform speech acts (command,
promise …). Mainly associated to phoneme
and words (double articulation). Noted using
writing.
2.  Expressive function: to make audible
attitudes, feelings, emotions, personality,
mood. Speech beyond (or below) linguistic
meaning. Mainly associated to prosody and
voice quality. Difficult to note using writing.
“The music of speech”
3.  Musical function: singing, non linguistic but
highly structured communication
20/06/13
NOLISP 2013
3
Voice Quality: a prosodic feature ?
•  Prosodic parameters are usually restricted
to pitch, duration, pauses and some sort of
intensity parameter.
•  But intonation and voice quality are linked
(e.g. voice registers)
•  In some languages, voice quality has a
phonological status (e.g. strangled tones of
Vietnamese)
• In all languages voice quality has a pragmatic
function
•  Synthesis of expressive speech has
demonstrated that convincing natural sounding
results are impossible to obtain without
dealing with voice quality parameters
20/06/13
NOLISP 2013
4
Voice Quality: expression of emotions ?
• Vocal expression of emotions and attitudes is
one of the main domains of application for
voice quality studies.
•  Although it has been studied for a long time
in psychology, it can be considered as an
emerging research domain in many areas of
speech communication : speech recognition
and synthesis, but also speech coding.
• Voice quality is crucial for singing, theatre
and other aesthetical vocalizations.
20/06/13
NOLISP 2013
5
Questions related to Voice Quality
• Voice quality is still a rather fuzzy concept:
what is the timbre of a voice?
• What are the domains of variation of every
day speech?
• How to measure and quantify voice quality
dimensions like vocal effort, vocal tension or
noise in the voice?
• What are the physical and perceptive
correlates of voice quality?
• What are the relationships between voice
quality and others aspects of prosody?
20/06/13
NOLISP 2013
6
Voice quality dimensions
Ø A promenade in the landscape of
voice quality, speech, singing.
Ø Phonation dimensions
Ø Vocal tract dimensions
20/06/13
NOLISP 2013
7
Voice quality dimensions
•  Syllabic or sentence-level voice
quality
•  Dimensions are often defined
according to production
(instead of perception)
•  Based on settings of
respiration, articulation and
phonation
20/06/13
NOLISP 2013
8
Speech production model
Four main parts:
1.  Respiration
2.  Phonation
3.  Articulation
4.  radiation
20/06/13
NOLISP 2013
9
Speech production model
Voice quality is in:
1.  Respiration: laughter, subglottal pressure
2.  Phonation: phonation types, voice registers,
effort, tension, voicing, noise
3.  Articulation: smile, rounding, rate,
strength, front/back, vocal tract length
20/06/13
NOLISP 2013
10
Voice quality dimensions: examples (1)
Breathiness
Whispe
r
Semivoiced
voiced
Pressed 1
Pressed 2
Nasalisation
Nasal 2
Nasal 1
Modal
Denasaliz
ed 1
Denasaliz
ed 2
Roughness
Modal
Rough 1
Rough 2
Rough
3
Creakiness
Modal
Creaky 1
Creaky 2
Creak
y3
Long 1
Modal
Short 1
Short 2
Lax
Modal
Tense 1
Tense 2
Vocal Tract
Length
Long 2
Tension
Press
ed 3
1 female speaker, 1
sentence: "Il est sorti
avant le jour." with various
vocal qualities.
Tense
3
Lips
Rounde
d2
Rounde
d1
Modal
Retracted
1
Retracted
2
Pitch
Low 2
Low 1
Modal
High 1
High 2
High 3
High
4
High
5
High
6
Weak
Modal
Loud 1
Loud 2
Loud 3
Loud
4
Loud
5
Lou
d6
Laugh 4
Laugh 5
Laugh
6
Laug
h7
Laug
h8
4
5
6
Loudness
Laughs
Laugh
1
Laugh 2
Laugh
3
Smiling
Smiling
1
Smiling
2
Smilin
g3
Autres
1
2
3
20/06/13
NOLISP 2013
Loud
6b
Lou
d7
Loud
7b
11
Voice quality dimensions: examples (2)
1 male speaker, 1 sentence:
"She (has) left for a great party today" with various vocal qualities.
20/06/13
Modal voice
mod
al1
mod
al2
moda
l3
moda
l4
moda
l5
modal
6
Nasalization
nasal
1
nasa
l2
nasal
3
nasal
4
Roughness/
Creakiness
roug
h1
roug
h2
roug
h3
creak
y1
creak
y2
creaky
3
Vocal tract
short
1
short
2
long1
long2
open
ed
closed
1
closed
2
Tension
relax
1
relax
2
tense
d1
tense
d2
tense
d3
tense
d4
tensed
5
Lip protrusion
roun
d1
roun
d2
smile
1
smile
2
Pitch
low1
low2
low3
high1
high2
high3
high4
Loudness(1)
whis
per
soft1
soft2
soft3
soft4
soft5
Loudness(2)
loud1
loud
2
loud3
loud4
loud5
loud6
strong
shout1
shout2
sho
ut3
Laughs
laugh
1
laug
h2
laugh
3
laugh
4
Others
left
centr
al
right
clear
yawn
theatri
cal
omino
us1
omino
us2
mysteri
ous
dark
NOLISP 2013
tensed
6
12
Phonation types
The three main sources of sound in
the larynx are (Catford, 1977):
1.  vocal fold vibration (voiced speech)
2.  turbulent noise produced through
open vocal folds (unvoiced speech)
3.  ventricular band vibrations
(ventricular speech)
4.  Mixtures of voiced, noisy and
ventricular phonation types
20/06/13
NOLISP 2013
13
Phonation types
Sound examples:
1.  vocal fold vibration (voiced speech)
2.  turbulent noise produced through open
vocal folds (unvoiced speech)
3.  ventricular band vibrations (ventricular
speech)
4.  Mixtures of voiced, noisy and ventricular
phonation types
5.  Polyphonic voice (ventricular + vocal folds)
20/06/13
NOLISP 2013
14
Main voice quality dimensions
Four main dimensions:
1.  voice registers :voice “mechanisms”:
creak, modal, falsetto, whistle
2.  noise: breathiness, hoarseness
3.  Pressure: pressed/lax voice,
“strangled” tones.
4.  Effort: accentuation, force.
20/06/13
NOLISP 2013
15
Voice registers
Phonation Description
type
Production
Voice registers
Creak
V e r y l o w Mechanism 0 of vocal folds vibration.
f r e q u e n c y , Thick and heavy vocal folds, low subperiodic air pulses glottal pressure, low mean flow
Modal
Usual voice for Mechanism 1 of vocal folds vibration.
most males and Thick and heavy vocal folds vibrating
l o w - p i t c h e d along their whole lengths
females, low to
medium F0
register.
Falsetto
Usual voice for Mechanism 2 of vocal fold vibration. Thin
h i g h p i t c h e d and light vocal folds, vibrating along
females, high F0 about 2/3 of their anterior lengths
register
20/06/13
NOLISP 2013
16
Ventricular phonation
Phonation Description
type
Production
Ventricular phonation
Ventricular A harsh quality, Produced between the ventricular
with a lot of bands, or “false vocal folds”
aperiodicities,
low F0
Ventricular V e r y l o w Ventricular bands vibration, low subcreak
f r e q u e n c y , glottal pressure, low mean flow
periodic air
pulses
20/06/13
NOLISP 2013
17
Aperiodicities
Phonatio Description
n type
Production
Aperiodicities
B r e a t h Unvoiced speech
phonation
Glottis wide open, high mean flow
B r e a t h y A m i x t u r e o f Incomplete folds closure. High mean
voice
breath and voice
flow. Glottal chink
Whisper
Unvoiced speech
Narrowed opening compared to breath
phonation, low mean flow
Whispery A m i x t u r e o f Incomplete folds closure. Low mean
voice
whisper and voice flow. Narrow glottal chink.
H o a r s e Irregular, rough A voice with structural aperiodicities,
voice
quality
jitter or shimmer
Multipho A v o i c e w i t h Dissymmetric vibration of the vocal
ny
multiple F0 and/or folds, or combination of ventricular and
sub-harmonics
voiced vibrations
20/06/13
NOLISP 2013
18
Lax-tense voice
Phonation Description
type
Production
Lax-tense dimension
Tense
A hard or sharp Adduction of the posterior part of vocal
quality, audible folds
glottal formant
Lax
A relaxed, soft Abduction of the posterior part of vocal
voice quality
folds
20/06/13
NOLISP 2013
19
Vocal effort
Phonation Description
type
Production
Vocal effort dimension
Loud
A strong voice, with High sub-glottal pressure, high tension of the
much vocal force
vocal folds, moderate flow, high voicing
amplitude
Flow voice A strong voice, with Normal sub-glottal pressure, tension of the
high amplitude of vocal folds, high flow, high voicing
voicing and flow.
amplitude
Weak
20/06/13
A w e a k v o i c e , Low sub-glottal pressure, low tension of the
without vocal force vocal folds, low flow, low voicing amplitude
NOLISP 2013
20
The voice registers dimension
Voice register depend on the underlying voice
mechanism:
Mechanism 0: vocal fry (creaky voice), very low F0, thick
and heavy vocal folds, low sub-glottal pressure, low
mean flow
Mechanism I: modal voice, usual voice for males and lowpitched females, low to medium F0 register. Thick
and heavy vocal folds vibrating along their whole
lengths
Mechanism II: falsetto voice, usual voice for high
pitched females, high F0. Thin and light vocal folds,
vibrating along about 2/3 of their anterior lengths
Mechanism III: whistle. Very high pitch, mostly children
and possibly female
20/06/13
NOLISP 2013
21
The voice registers dimension
(Henrich et Castellengo, 2001)
Modal voice
Falsetto voice
(after Vennard, 1967)
20/06/13
NOLISP 2013
22
The voice registers dimension
Glissando (barytone)
20/06/13
NOLISP 2013
(Henrich, 2001)
23
The voice registers dimension
(Glissando contre-ténor)
(Henrich, 2001)
20/06/13
NOLISP 2013
24
The noise dimension
Represents the relative amount of noise
in the speech signal.
1.  Additive noises. Whispery voice,
breathy voice. Turbulent flow at the
glottal constriction.
2.  Structural noises. Hoarseness,
roughness:
1.  Jitter: This is a random fluctuation of
the duration of fundamental periods;
2.  Shimmer: This is a random fluctuation of
amplitude for successive periods.
20/06/13
NOLISP 2013
25
The noise dimension
Sound examples
1.  Additive noises.
1.  Whispery voice. narrow glottis
2.  breathy voice. Wide glottis, voicing
constriction.
2.  Structural noises. Hoarseness,
roughness:
20/06/13
NOLISP 2013
26
The pressed/lax dimension
The vocal folds can be pressed together
more or less strongly at their
posterior extremities (arytenoids
cartilages):
1.  Pressed voice: sometimes called
“tense” or “sharp” voice quality
2.  Lax voice: if the arytenoids are
separated, a chink is created at the
posterior part of the glottis.
Note that this pressed quality may be
relatively independent of the vocal
effort.
20/06/13
NOLISP 2013
27
The pressed/lax dimension
Sound Examples
1.  Pressed voice: sometimes called
“tense” or “sharp” voice quality
2.  Lax voice: if the arytenoids are
separated, a chink is created at
the posterior part of the glottis.
Note that this pressed quality may be
relatively independent of the vocal
effort.
20/06/13
NOLISP 2013
28
The vocal effort dimension
–  important for stress and
accentuation
–  important for emotion, affect,
attitude etc
–  Loudness = spectral balance and voice
amplitude.
–  Results of tension and stiffness of
the vocal folds, high sub-glottal
pressure
20/06/13
NOLISP 2013
29
The vocal effort dimension
Sound examples
1.  Speech
1.  Soft
2. Loud
3. shouting
2. Singing
3. Emotions…
20/06/13
NOLISP 2013
30
Voice range profile (phonetogram)
(Sulter, Wit, Schutte, Miller, (1994))
20/06/13
NOLISP 2013
31
Vocal tract settings
–  important for emotion, affect,
attitude etc
–  Important for styles
–  Co-variation with source
–  Very few systematic acoustic studies
20/06/13
NOLISP 2013
32
The vocal tract dimension
Sound examples
1.  Smiling
2. Rounding
3. Bite block
4. Lengthening
5. Shortening
6. yawning
20/06/13
NOLISP 2013
33
Conclusions on voice quality
dimensions
•  About 4 main dimensions for
phonation (+ pitch/f0)
•  Vocal tract dimensions of voice
quality mostly unknown
•  Respiration dimension of voice
quality mostly unknown
20/06/13
NOLISP 2013
34
Modelling the voice source
Ø Voice source signals models
Ø Time-domain and spectral parameters
Ø Physical model and signal models
20/06/13
NOLISP 2013
35
Glottal flow models : time domain
Examples:
Rosenberg C
(Rosenberg, 1971)
LF
(Liljencrants & Fant, 1985)
Klatt
(Klatt & Klatt, 1990)
R++
(Veldhuis, 1998)
20/06/13
NOLISP 2013
36
Glottal flow models
KLGLOTT88 (Klatt & Klatt, Jasa
1988)
Rosenberg C (Rosenberg Jasa
1971)
LF model,( Liljenkrants, Fant, Lin
KTH -STL, 1985)
20/06/13
NOLISP 2013
37
A unified set: 5 time-domain parameters
(Doval, d’Alessandro & henrich, Acta
Acustica 2006)
• 
T0, fundamental period
•  Av, voiced amplitude
•  Oq , open quotient
•  am, asymmetry coefficient
(equivalent to speed quotient)
•  Qa, return phase quotient
Other parameters of
interest :J, total flow of a single
pulse
• 
20/06/13
E, negative peak amplitude of
the glottal flow derivative
NOLISP 2013
38
Time-domain equations
In the case of
Qa = 0
(abrupt closure),
the GFM can all
be expressed as :
ng (x, am)
depends
on the
model
20/06/13
normalized glottal flow model :
NOLISP 2013
39
Glottal flow models : frequency
domain
Glottal flow:
Glottal flow derivative:
Ng (x, am) : Fourier transform of ng (x, am)
N’g (x, am) : Fourier transform of n’g (x, am)
These two functions depend on the model
20/06/13
NOLISP 2013
40
Glottal flow models : spectral
description
Doval, d’Alessandro, Henrich (2006)
« glottal formant » :
Fg =
1
2π OqT0
en (α m ) 1
=
jn (α m ) 2π
E
J
Ag = Av en (α m ) jn (α m ) = E J
spectral slope :
20/06/13
NOLISP 2013
Fa ≈
1
2π Qa (1 − OqT0 )
Aa =
Fg
E
=
Ag
2π Fa Fa
41
Spectral / Time domain :open quotient,
asymmetry
20/06/13
NOLISP 2013
42
Spectral / Time domain: spectral tilt
Effect of E and
Spectral tilt
20/06/13
NOLISP 2013
43
Causal-Anticausal linear voice
source model (CALM)
Doval, d’Alessandro, Henrich (2003)
Anticausal filter
Convergence region for a stable CALM
Causal filter
Glottal pulse (CALM vs. R++)
20/06/13
NOLISP 2013
Frequency response
44
Voice quality dimensions
and acoustic parameters
Ø Non-linear relationships between
parameters of the acoustic model and
voice quality dimensions
Ø General relationships
Ø Speaker low-high
Ø Speaking soft-loud
20/06/13
NOLISP 2013
45
Voice quality dimensions and
source parameters
20/06/13
dimension Time domain
Spectral domain
Registers F0, Open quotient
T0, Glottal
formant
Noise
Noise, Jitter,
Shimmer
Noise, harmonic
widths
Tension
Open quotient
Glottal formant
Force
Closure, peak flow
Spectral tilt,
amplitude
NOLISP 2013
46
Voice quality dimensions
and source parameters
Parameter
Description
Duality
Main effect
Phonation
on
Time domain parameters
Av
Amplitude of voicing
E, Ags
Flow
Oq
Open quotient
Fg
Tenseness,
Am
Asymmetry
Bg, Sq
Tenseness, Loudness
Qa
Return phase
Fa,
Loudness
Alternative time domain parameters
E
Derivative peak
Av
Flow, Loudness
SPL
Sound Pressure Level
Av, E
Flow, Loudness
Sq
Speed quotient
Bg, Am
Tenseness, loudness
Rd
Amplitude quotient (Fant)
AV, E, F0
Loudness, Tenseness
Aq
Amplitude quotient (Alku)
AV, E
Loudness, Tenseness
20/06/13
NOLISP 2013
47
Voice quality dimensions
and source parameters
Parameter
Description
Duality
Main effect on Phonation
E
Derivative peak
Av
Flow, Loudness
SPL
Sound Pressure Level
Av, E
Flow, Loudness
Sq
Speed quotient
Bg, Am
Tenseness, loudness
Rd
Amplitude quotient (Fant)
AV, E, F0
Loudness, Tenseness
Aq
Amplitude quotient (Alku)
AV, E
Loudness, Tenseness
Spectral parameters
Fg
Glottal formant frequency
Oq
Tenseness
Bg
Glottal formant bandwidth
Sq, Am
Tenseness, loudness
Fa
Spectral tilt frequency
Qa, Tl
Loudness
Ags
Glottal formant amplitude
Av, SPL
Flow
Alternative spectral parameters
H1*-H2*
1rst and 2nd Harmonic amplitude differences
Oq, Am
Tenseness
H1*-F3*
1rst harmonic to 3rd formant amplitude difference
Tl, Qa,Fa
Loudness
Tl
Spectral tilt
Qa
Loudness
HRF
Harmonic richness factor
Qa
Loudness
20/06/13
NOLISP 2013
48
Voice quality dimensions
and source parameters
Parameter
Description
Duality
Main effect
Phonation
on
Aperiodicities
Jitter
Period-to-period frequency variations
Roughness
Shimmer
Period-to-period amplitude variation
Roughness
PAPR
Periodic-aperiodic ratio
Breathiness, whisper
LoV
Limit of voicing
Breathiness, whisper
NTL
Noise spectral tilt
Breathiness, whisper
IHN
Inter harmonic noise
Breathiness, whisper
20/06/13
NOLISP 2013
49
Speaking high
Low-high dimension
•  voice registers : signal changes with
pitch height: open quotient,
amplitude
•  phonétogram : SLP dependence with
pitch height
•  formant tuning: vocal tract changes
with pitch height
•  FG tuning : open quotient
20/06/13
NOLISP 2013
50
20/06/13
• 
• 
• 
• 
• 
• 
• 
• 
• 
Speaking loud
Soft-loud dimension
voice spectral tilt changes
spl changes
noise in the source
formant tuning: vowel opening
F0 rise
F0 contour
F1 tuning (Liénard)
FG tuning
Peakiness, impulsiveness
NOLISP 2013
51
Application to
Performative synthesis
•  Formant + CALM source
•  Real-time control
•  Including non-linear source-filter
interactions
•  Including a phonetogram
•  DEMO : Cantor Digitalis
22/06/13
NOLISP 2013
52
Acknowledgements
Contributions of :
Sylvain Le Beux, Nicolas
D Alessandro, Lionel Feugère, Boris
Doval, Olivier Perrotin
For the Cantor Digitalis
Are gratefully acknowledged
20/06/13
NOLISP 2013
53