From Sound to Sense and back again

From Sound to Sense and back again:
The integration of lexical and speech processes
David Gow
Bob McMurray
Massachusetts General Hospital
Dept. of Brain and Cognitive Sciences
University of Rochester
The Speech Chain
Complex computations
from sound to sense must
be broken up for study.
Assume intermediate
representations:
Phonemes…
Words…
Syntactic Phrases…
Sense
Sound
The Standard Paradigm
Sense
The Standard Paradigm
Phonology
Words
Phonemes
Sound
The Standard Paradigm
Sense
The Standard Paradigm
Delimited fields of study.
• Speech Perception
Phonology
Words
• Spoken Word
Recognition
• Phonology
Phonemes* essential
Phonemes
Sound
*
or other sublexical category
Why?
Categorical Perception (CP)
Continuous Acoustic Detail => Discrete Categories
Does CAD affect speech categorization?
100
Discrimination
% /p/
100
Discrimination
ID (%/pa/)
0
B
VOT
0
• Sharp
identification of
tokens on a
continuum.
P
• Discrimination poor within a phonetic category.
Sense
Categorical Perception (CP)
Defined fundamental
computational problems.
Input to
• Phonology
• Word recognition.
Phonology
CP is output of
• Speech perception
Words
Phonemes
Sound
But…
CP
• Not all speech contrasts are categorical.
• Lots of tasks show non-categorical perception.
Fry, Abramson, Eimas & Liberman (1962) Pisoni & Tash
(1974) Pisoni & Lazarus (1974) Carney, Widden & Viemeister
(1977) Hary & Massaro (1982) Pisoni, Aslin, Perey &
Hennessy (1982) Healy & Repp (1982) Massaro & Cohen
(1983) Miller (1997) Samuel (1997)…
Why has the Standard Paradigm persisted?
Categorical Perception is
about phonetic classification.
The minimal computational
problem: compute meaning
from sound.
Sense
Words
CP tasks don’t necessarily tap a stage
of this problem.
Lexical activation… seems a good bet.
?
CP
Sound
Why has the Standard Paradigm persisted?
Even when continuous acoustic detail
affects word recognition, it is seen as
outside of core word recognition.
Why has the Standard Paradigm persisted?
• Vowel Length
• Stress/Meter
• Coarticulation
Cue extra-segmental process.
Words
Phonemes
CAD
Segmentation
Example: Word Segmentation
Word Recognition
Even when continuous acoustic detail
affects word recognition, it is seen as
outside of core word recognition.
Does continuous acoustic detail affect
interpretation via core word-recognition processes?
 No.
Standard Paradigm is fine… Sublexical Filter
 Yes.
Hmm…
(phonemes)
Need to use stimuli with:
• Precise control over CAD
Need to use tasks that:
• reflect only minimal computational problem: meaning.
• are sensitive to acoustic detail.
Visual World Paradigm
Visual World Paradigm
• Subjects hear spoken language and manipulate
objects in a visual world.
• Visual world includes set of objects with
interesting linguistic properties (names)
• Eye-movements to each object are monitored
throughout the task.
Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy (1995)
Allopenna, Magnuson & Tanenhaus (1998)
• Meaning based, natural task: Subjects must
interpret speech to perform task.
• Fixation probability maps onto dynamics of
lexical activation.
• Context is controlled:
meaning  lexical activation.
• Eye-movements fast and timelocked to speech.
Does continuous acoustic
detail affect interpretation?
Is lexical activation sensitive to
continuous acoustic detail?
McMurray, Tanenhaus & Aslin (2003)
Combine tools of
• speech perception:
9-step VOT continuum.
• spoken word recognition:
visual world paradigm
Methods
A moment
to view the
items
500 ms later
Bear
Repeat
1080
times…
200 ms
Trials
1
2
3
4
5
Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
Time
VOT=0
Response=
Fixation proportion
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
400
800
Time (ms)
1200
1600
Predictions
What would lexical sensitivity to CAD look like?
Systematic effect on competitor dynamics.
Fixations to the competitor.
target
competitor
time
Gradient Effect
Fixation proportion
Fixation proportion
Categorical Results
target
competitor
time
Results
Response=
Response=
Competitor Fixations
0.16
VOT
VOT
0.14
0 ms
5 ms
10 ms
15 ms
0.12
0.1
20 ms
25 ms
30 ms
35 ms
40 ms
0.08
0.06
0.04
0.02
0
0
400
800
1200
1600
0
400
800
Time since word onset (ms)
1200
1600
2000
Task?
Phoneme ID
P
L
B
Sh
Not part of minimal
computational problem.
Same stimuli in
metalinguistic task…
…more categorical pattern of fixations
Continuous acoustic detail is not helpful in
metalinguistic tasks…
Summary
Word recognition shows gradient sensitivity to
continuous acoustic detail.
Not extra-segmental: VOT
CAD affects higher-level processes.
Consistent with other studies:
Andruski, Blumstein & Burton (1994)
Marslen-Wilson & Warren (1994)
Utman, Blumstein & Burton (2000)
Dahan, Magnuson, Tanenhaus & Hogan (2001)
McMurray, Clayards, Aslin & Tanenhaus (2004)
McMurray, Aslin, Tanenhaus, Spivey & Subik (in prep)
The Standard Paradigm?
Sense
CAD affects higher-level
processes.
From other work:
Phonology
Lexical activation influences
sublexical representations.
Words
Samuel & Pitt (2003)
Magnuson, McMurray, Tanehaus & Aslin
(2003)
Samuel (1997)
Elman & McClelland (1988)
Phonemes
Continuous
Acoustic Detail
The Standard Paradigm?
Sense
CAD affects higher-level
processes.
From other work:
Phonological regularity affects
signal interpretation.
Phonology
Lexical activation influences
sublexical representations.
Words
Massaro & Cohen (1983)
Halle, Segui, Frauenfelder & Meunier (1998)
Pitt (1998)
Dupoux,Kakehi, Hirose, Pallier & Mehler, (1999)
Phonemes
Continuous
Acoustic Detail
Sense
Perhaps interaction and
integration make sense.
Do they help solve sticky
problems?
YES
Phonology
Words
Phonemes
Continuous
Acoustic Detail
The Emerging Paradigm
Integration of work in:
• spoken word recognition
• speech perception
• phonology
New computations simplify old problems and solve
new ones.
• Cognitive processes: Lexical activation &
competition.
• Perceptual processes: sensitivity to CAD &
perceptual grouping.
CAD is helpful in language comprehension.
• Word segmentation
• Coping with lawful variability due to assimilation
Combination of approaches helps solve both problems.
Lexical Segmentation
Some lexical processes can’t
work in the Standard
Paradigm
The SWR Solution
I v
d
I
p
A
 t
m I
n
t]
I v d
I
p A
 t
active
m I
n
t]
I v d
I
p A
 t
active
m I
n
t]
department
I v
d
I
p
A
 t
active
act of
a
m I
n
t]
department
dip art mint
part
depart in
are
par
Standard Paradigm: Template matching overgenerates
Frauenfelder & Peeters (1990)
succeed
suck
seed
‘ k s I d -
Cycle
• Overgeneration resolved through competition in
TRACE (McClelland & Elman 1986)
Problem: What if the speaker is trying to say “suck seeds”?
The Speech Solution
Cues shown to affect segmentation:
Words
Phonemes
CAD
Lehiste, 1960; Garding,1967; Lehiste, 1972;
Umeda, 1975; Nakatani & Dukes, 1977;
Nakatani & Schaffer,1978; Cutler & Norris, 1988…..
Segmentation
Implied processing
model requires separate
segmentation process
Recognition
• Initial strong syllable
• Initial lengthening
• Increased aspiration
• Increased glottalization
Problem: cues are subtle and varied,
extra-segmental processes are inelegant
Phonemes
CAD
Segmentation
Recognition
Is there a better mechanism?
Words
Gow & Gordon (1995)
The proposal had a strange syntax that nobody liked.
^
Syntax
GRAMMAR primed
Tax
INCOME inhibited

The proposal had a strange sin tax that nobody liked.
^
Syntax
GRAMMAR primed
Tax
INCOME primed
• CAD affects interpretation.
• does not trigger segmentation.
Good Start Model
• Observation: All segmentation cues happen to enhance
word-initial features
•
Strengthened cues facilitate activation, making
intended words stronger competitors
Incorporating CAD:
• Solves overgeneration problem.
• No extra-segmental segmentation process.
Gow & Gordon (1995)
Summary
When continuous acoustic
detail affects lexical
activation, speech and
SWR models can be
integrated and simplified
Assimilation
The emerging paradigm
reframes computational
problems
Redefining Computational Problems
English coronal place assimilation
/coronal # labial/  [labial # labial]
/coronal #velar/  [velar # velar]
Standard Paradigm: Change is
• discrete
• phonemically neutralizing
[ G  I m]# berries nonword?
ripe berries?
[ a I
p
]# berries
right berries?
Standard Paradigm solution: Phonological inference
(Gaskell & Marslen-Wilson, 1996; 1998; 2001)
Knowledge driven inference:
If
[labial # labial] infer /coronal # labial/
 greem beans  green
(Gaskell & Marslen-Wilson, 1996; Gow, 2001)
 ripe berries  right (Gaskell & Marslen-Wilson, 2001; Gow, 2002)
ripe
Moreover: Assimilation effects dissociated from linguistic
knowledge (Gow & Im, in press)
Assimilation Produces CAD
Assimilatory modification is acoustically continuous
F3 Transitions in /æC/
Contexts
1850
2800
1800
2750
1750
coronal
assimilated
labial
1700
1650
1600
1550
Frequency (Hz)
Frequency (Hz)
F2 Transitions in /æC/
Contexts
coronal
assimilated
labial
2700
2650
2600
2550
Pitch Period
Pitch Period
This is not discrete feature change!
Regressive Context Effects
Select the
Sma
t
ca p box
Subject Hears: Assim_Non-Coronal (cat/p box)
Fixation Proportion
0.6
0.5
0.4
0.3
0.2
Coronal (cat)
0.1
Non-Coronal (cap)
0
0
400
800
Time (ms)
1200
1600
Subject Hears: Assim Non-Coronal (cat/p drawing)
Fixation Proportion
0.6
0.5
0.4
0.3
0.2
Coronal (cat)
0.1
Non-Coronal (cap)
0
0
400
800
Time (ms)
1200
1600
Progressive Context Effects
Looks to Final Non-coronal (box)
Fixation Proportion
0.7
0.6
0.5
0.4
0.3
0.2
Assim Non-Coronal
0.1
Coronal Non-Coronal
0
0
400
800
1200
1600
Time (ms)
Progressive effect in the same experiment
Assimilation: Use of CAD
Assimilation is resolved through phonological context.
Partially-assimilated items show
regressive context effects (Gow, 2002; 2003)
progressive context effects (Gow, 2001; 2003)
Fully assimilated items show neither*
(Gaskell & Marslen-Wilson, 2001; Gow, 2002;2003)
assimilation # context
Infinite regress (eternal ambiguity)….
or something more interesting?
Continuous acoustic detail
is subject to basic perceptual
processes
A Perceptual Account
Feature cue parsing (Gow, 2003)
[k

t
b
p
3000
0
0
0.760454
Time (s)
l
Feature cue parsing (Gow, 2003)
3000
0
0
0.760454
Time (s)
Features encoded by multiple cues that are integrated
Feature cue parsing (Gow, 2003)
3000
0
0
0.760454
Time (s)
Feature cue parsing (Gow, 2003)
3000
0
0
0.760454
Time (s)
Assimilation creates cues consistent with multiple places
Feature cue parsing (Gow, 2003)
Extract feature cues
Feature cue parsing (Gow, 2003)
Group feature cues by similarity and resolve ambiguity
Feature cue parsing (Gow, 2003)
example: eight….
catp # box
|
[cor]
[lab]
[LAB]
catp # drawing catp # 
|
|
|
[cor] [COR]
[lab]
[cor]
[lab]
Feature cue parsing (Gow, 2003)
example: eight….
catp # Box
|
[cor]
[lab] [LAB]
catp #
Drawing catp # 
|
[cor] [COR]
[lab]
[cor]
[lab]
Progressive and regressive effects fall out of grouping
Summary
SWR problem (eternal ambiguity) replaced by simpler
perceptual problem
CAD important in solution: processing obstacle
facilitates perception.
Integration of continuous perceptual features facilitates
higher-level processes.
Facilitation via core-word recognition mechanisms—no
extra-segmental routines required.
The Standard Paradigm
Standard paradigm
• Created artificial boundaries that misframed
issues.
• Continous acoustic detail is variability to be
conquered..
The basis of the standard paradigm is undercut.
• Meaning-based processes are affected by CAD.
• CAD is an essential component of word recognition.
The Emerging Paradigm
The emerging paradigm
• Emphasis on methodologies that tap the minimal
computational problem: meaning.
• Stresses integration of speech and spoken word
recognition, questions methods and theory.
• Continuous acoustic detail is useful signal, not
noise.
From Sound to Sense and back again:
The integration of lexical and speech processes
David Gow
Bob McMurray
Massachusetts General Hospital
Dept. of Brain and Cognitive Sciences
University of Rochester