UvA-DARE (Digital Academic Repository)
From onset to entropy : spelling-pronunciation patterns in six languages
Borgwaldt, S.R.
Link to publication
Citation for published version (APA):
Borgwaldt, S. R. (2003). From onset to entropy : spelling-pronunciation patterns in six languages
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),
other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating
your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask
the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,
The Netherlands. You will be contacted as soon as possible.
UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)
Download date: 17 Jun 2017
WennWenn die Ideenfehlen, sind Worte leicht zur Hand.
JohannJohann Wolfgang von Goethe
O N S E TT E N T R O P Y M A T T E R S SPELLING-TO-SOUNDD E N T R O P Y C O U N T S
C H A P T E RR 4
Thiss chapter is based on the article "Onset Entropy Matters - Letter-To-Phoneme
Mappingss in Six Languages" (in revision) co-authored with F. Hellwig and A.M.B, de Groot.
4.11 Introduction
Alphabeticc orthographies, although all based on the principle of letter-phoneme
correspondences,, deviate from this principle to varying degrees. Languages with
transparentt orthographies, like Italian, have quite predictable spelling-to-sound
correspondences.. In opaque orthographies, like the English one, spelling-to-sound
correspondencess are often very ambiguous.
Thee ambiguity of spelling-to-sound mappings can in principle be captured
inn various ways. Rule-approaches or analogy-based approaches can be used, the
letterr level, grapheme level or sublexical unit level can be analyzed, and the
degreee of (un)ambiguity can for example be expressed as a continuous or as a
dichotomouss variable. Recent investigations into the (irregularity of alphabetic
orthographiess have focused on expressing the ambiguity in terms of entropy
values.. The entropy concept was introduced into information theory by Shannon
(1948).. Expressing (un)ambiguities in spelling-sound mappings as entropy values
willl result in continuous variables, starting at zero for a totally unambiguous,
predictablee correspondence between spelling and sound patterns, and increasing
withh increasing degrees of uncertainty.
InIn Chapter 3 we have presented a cross-linguistic corpus-based analysis of
word-initiall bi-directional spelling-sound correspondences for Dutch, English,
French,, German, Hungarian, and Italian. We computed the ambiguities for various
grainn sizes and expressed them as entropy values. Then we averaged these values
acrosss all word-initial letters, in order to generate one overall entropy value for
eachh language. These metrics allowed us to rank these six languages on the
continuumm from opaque to transparent orthographies.
61 1
62 2
CHAPTERR 4
Fromm a psycholinguistic point of view, the degree of spelling-to-sound
ambiguityy is known as one of the factors that affect reading performance. Words
thatt are pronounced in a deviating way, like YACHT, /JDt/, AISLE, /ail/, or PINT,
/paint/,, are read slower than regularly pronounced words like CAT, /kaet/, PILE,
/pail/,, or HINT, /hint/.
Usingg the entropy paradigm described in Chapter 3 more in detail, the goal
off this chapter is to expand and complement previous cross-linguistic research on
spelling-soundd relations. We will first investigate the distribution of entropy
valuess in the six languages studied in more detail, concentrating on the (un)
predictabilityy of languages' vowels and consonants. Then we will explore whether
thee ambiguities between word-initial letters and phonemes, generated in the
previouss corpus study and stated in entropy values, correlate with human wordprocessingg performance. If the generated entropy metrics reflect the impact of
spelling-to-soundd ambiguities in word processing, in other words, if our
calculationss based on language structures do possess descriptive adequacy for
languagee processing, then we might expect correlations between the entropy
valuesvalues and reaction times in tasks that reflect word processing. If, on the other
hand,, higher letter-based entropy values do not correspond to longer response
latenciess in word-processing tasks, this might be a clue that the grain size of word
processingg is not letter-based, but relies on larger grain sizes such as graphemes,
orr on sublexical units like rimes, or even on morphemes.
Firstt we will investigate the distribution of entropy values in the six
languagess studied in more detail and then we examine the impact of the generated
entropyy variables on word recognition performance.
Comparingg Dutch, English, French, German, Hungarian, and Italian
accordingg to their word-initial letter-to-phoneme entropy values resulted in the
orderr depicted in Figure 4.1.
63 3
ONSETT ENTROPY MATTERS
0.6 6
0.5 5
0.4 4
vv
MM
SS
££
0-3
ee
oo
02 02
0.1 1
Dutchh
English
French
German Hungarian
Italian
FigureFigure 4.1: The relative position of the six languages examined in terms of wordinitialinitial letter to phoneme mappings, expressed in averaged entropy values.
Iff an orthography were fully transparent at the letter-phoneme level, that is,
iff any letter always mapped onto the same phoneme, its entropy value in Figure
4.11 would be zero. No orthography investigated here is that transparent. The
resultss show clearly that when comparing word-initial letter-to-phoneme
mappings,, English has the most ambiguous orthography, followed by, in
descendingg order, French, German, Dutch, and Italian, with Hungarian having the
mostt predictable orthography of the six languages examined.
Ass discussed in more detail in Chapter 3, the entropy value for the
mappingss between the first letter and the first phoneme can be considered to
expresss three distinct ambiguity components, rolled into one. One component is
thee degree of letter-to-phoneme ambiguity, as for example, the German letter <v>
thatt is pronounced like /f/ in VATER, but like hi in VASE. However, high letter-tophonemee entropy values could in principle also reflect letter-grapheme
complexitiess (e.g., the English letter <p>, that could be part of the unambiguously
pronouncedd grapheme <ph> as in PHILOLOGY). The degree of letter-grapheme
ambiguityy constitutes another component of the overall ambiguity of first letter to
firstfirst phoneme mappings. The third component that contributes to ambiguous
mappingss at the first letter to first phoneme level in isolation is the contextsensitivityy of some spelling patterns. An example is the word-initial <c> in
64 4
CHAPTERR 4
Englishh and Dutch, whose pronunciation is quite reliably determined by the
subsequentt vowels {Isi in words like CEILING and CITY, i.e., before front vowels,
andd /k/ in words like CAR and COPE, i.e., before back vowels).
Computationss showed that also when taking larger contexts, like the first
twoo or three letters, into account and calculating the ambiguity of letter-tophonemee mappings, the pattern of relative ambiguity of the six languages as
comparedd to one another remained about the same (see Chapter 3). At a grain size
off three letters, English still remained more ambiguous than the other
orthographies,, whereas French, Dutch, and German approximated one another and
Hungariann and Italian showed almost no ambiguities anymore. The drop in Dutch
wass slightly smaller than in the other languages. This finding suggests that in
Dutchh the word-initial letter-to-phoneme entropy value has a strong letter-tophonemee ambiguity component, whereas in French, German, Italian, and
Hungariann also letter-grapheme complexity and further disambiguating context
contributee strongly to the word-initial letter-to-phoneme entropy values.
Exampless of ambiguous pronunciations of letters when looking at larger
grainn sizes in Dutch mostly concern loan words in the corpus. In Dutch these are
oftenn pronounced according to the rules of the language they originate from. For
example,, the <ou> in the French loan word's OUVERTURE pronunciation /uvartyira/
deviatess from the otherwise completely regular pronunciation of <ou> at the
beginningg of native Dutch words as, for example, in OUD, /out/. Spelling-to-sound
ambiguitiess observed for <ou> in our corpus analysis originate exclusively from
thee existence of loan words in the corpus. As discussed in Chapter 1, analyzing
nativee and non-native words in the same way, regardless of their etymology,
mightt be somewhat debatable from a linguist's point of view. However, from a
psycholinguisticc perspective we can afford to take a strictly synchronic point of
view,, and neglect all diachronic processes and etymological reasons that may have
causedd the ambiguities discovered. As this example demonstrates, to ensure truly
comparablee results in a corpus-based cross-linguistic comparison, it is therefore
importantt that the corpora used cover comparable vocabularies. If corpora for
somee languages contained loan words and corpora for other languages contained
onlyy native words, results could be biased.
ONSETT ENTROPY MATTERS
65 5
4.22 Spelling-to-Sound ambiguities: Vowels versus Consonants
Earlierr investigations into the nature of the English orthography have shown that
especiallyy the unpredictable pronunciation of vowels in isolation contributes to its
highh ambiguity (Brown & Besner, 1987; Treiman et al., 1995). Reasons for the
largerr ambiguity of vowels as compared to the ambiguity of consonants are mostly
historicall and He in the imbalance between number of vowel phonemes and
numberr of vowel letters. English, a language with over 20 vowel phonemes,
monophthongss and diphthongs, faces some obvious difficulties to express them
withh the six vowel letters the Roman alphabet provides: <a>, <e>, <i>, <o>, <u>,
and<y>!. .
Notee that the phonemic status of diphthongs is somewhat controversial.
Whereass the majority of phonological theories assume diphthongs to be
independentt vowel phonemes, there are also frameworks that assume that
diphthongss are just a sequence of two monophthongs, and have therefore no
phonemee status. In this thesis, however, all diphthongs are supposed to be separate
phonemess (for a detailed discussion, see, e.g., Kohier, 1995; Ternes, 1997).
AA comparison of the six languages' consonant phoneme and vowel
phonemee inventories is presented in Table 4.1. In this table all letters with
diacriticss such as accents or umlauts were counted as separate letters. Ambiguous
letters,, such as <y> that in German can denote vowels, like in YPSILON, as well as
consonants,, like in YOGHURT, were classified in this analysis according to the type
off phoneme they represent the most often.
11
The status of hybrid, ambiguous letters that can denote vowels as well as consonants
willl be discussed in more detail in the next section.
22
Analogously, also the phoneme status of affricates, that is, consonants like /tJ7 in CHAIR,
consistingg of a plosive followed by a fricative, is controversial, according to some phonological
theories.. In this thesis however, affricates are considered to be independent phonemes, leading to
relativelyy large phoneme inventories reported for the six languages as compared with other
analyses. .
66 6
CHAPTERR 4
TableTable 4.1: Letter and phoneme inventories, split up into vowels and consonants.
Language eLetters s Phonemes s
Consonant tConsonant t Vowel l Vowel l
Letters s Phonemes s Letters s Phonemes s
Dutch h
30 0
41 1
20 0
22 2
10 0
19 9
English h
27 7
46 6
20 0
24 4
77
22 2
French h
33 3
36 6
21 1
20 0
12 2
16 6
German n 29 9
43 3
20 0
24 4
99
19 9
Hungarian n33 3
62 2
19 9
48 8
14 4
14 4
Italian n
50 0
18 8
43 3
55
77
23 3
Comparingg the letter and phoneme inventories in Table 4.1 might suggest
att first sight the existence of only few spelling-to-sound ambiguities for languages
likee French with a relatively moderate imbalance between the letter and phoneme
inventories,, and more ambiguities for languages like Hungarian, where the
numberr of phonemes is almost twice the number of letters to map onto. However,
ass the entropy calculations in Figure 4.1 show, this would be a misleading
assumption.. The creation of multi-letter graphemes leads to an expansion of a
language'ss grapheme inventory and may solve existing letter-phoneme
imbalances.. Additionally, the mismatch between overall phoneme letter
inventoriess reported in Table 4.1 might be misleading, as in a specific position, for
example,, word-initially, the phoneme-letter ratio might be more balanced in some
languages.. On the other hand, even at first sight quite balanced inventories might
displayy rather ambiguous mappings, for example due to (partly) silent letters, like
<h>> in English (e.g., HOUR, /aua/) and French (e.g., HEURE, /oer/).
Inn order to further investigate the relative ambiguity for word-initial letterto-phonemee mappings, and to provide a more accurate account of the distinct
characteristicss of vowels and consonants, we first computed the absolute number
off word-initial letter-to-phoneme mappings. Then we calculated the average
numberr of phonemes that the letters denote across languages in a word-initial
positionn by dividing the number of word-initial letter-to-phoneme mappings by
thee number of word-initial letters per language.
Thee results of these calculations are shown in Table 4.2.
67 7
ONSETT ENTROPY MATTERS
TableTable 4.2: Word-initial letter-to-phoneme mappings.
## of word-initial letter-to-
Phonemee average
phonemee mappings
perr letter
Dutch h
80 0
2.666666 6
English h
105 5
3.888888 8
French h
94 4
2.848484 4
German n
77 7
2.655172 2
Italian n
50 0
2.173913 3
Hungarian n
40 0
1.212121 1
Language e
Inn order to investigate the relative contributions of a language's vowels and
consonantss to the overall orthographic transparency expressed as entropy values
wee calculated separate average entropy values for the language's vowel and
consonantt letters. In these calculations we took the existence of ambiguous letters
thatt denote vowels as well as consonants into account. We did this by adding all
entropyy values for the separate letter-to-phoneme mappings, differentiating
betweenn mappings to vowel phonemes and to consonant phonemes, and then
dividingg the sum by the number of letters, thus receiving an average value for the
ambiguityy of vowels versus consonants. To give an example, the entropy values of
letter-to-phonemee mappings of English word-initial <u> that maps to a consonant,
ass in UNIFORM, or to a vowel, as in UNDER, were split up and counted separately.
Inn the above mentioned example the overall entropy value of <u>, H(u) equals
0.695379.. From this value 0.340729 contributed to the vowel entropy, and
0.3546500 to the consonant entropy. This is illustrated in Table 4.3.
TableTable 4.3: Vowel and consonant entropy values for an ambiguous initial letter.
s,
1 stt letter 1 phon n
example e
probability y
HH value for V HH value for C
uu
3: :
urban n
0.015081 1
0.091258 8
uu
ua a
urdu u
0.002320 0
0.020305 5
uu
33
until l
0.004640 0
0.035970 0
uu
uu
umlaut t
0.001160 0
0.011313 3
uu
AA
ugly y
uu
jj
0.864269 9
0.181883 3
total: :
total: :
0.887471 1
0.340729 9
uniform m 0.112529 9
total: :
0.112529 9
0.354650 0
total:0.354650 0
CHAPTERR 4
68 8
Thee results of these calculations, that is, the averaged entropy values for
vowelss and consonants are presented in Figure 4.2.
0.40 0
ii
0.35 5
r— "
II
ii
i
'.'.
: English
©
-
French1 1
mm
0.30 0
Germarii
ffr r
II 0.2
-E-E 0.20
Hungari an
i n
©Italiain n
oo
to o
0.15 5
„Dutchh :
OO
0.10 0
0.05 5
ii
0.2 2
0.44
''
0.6
0.8
1
Vowell entropy
FigureFigure 4.2: Word-initial letter feedforward
forfor vowels and consonants.
<<
1.2 2
i
1.4 4
entropy values averaged
1.6 6
separately
Inn this figure we can see striking differences between vowel entropy values
andd consonant entropy values, reflecting the obvious imbalance between vowel
phonemess and vowel letters. For vowels, English is the language with the most
ambiguouss orthography-to-phonology mappings, followed by, in decreasing order,
German,, Dutch, French, Italian, and Hungarian. Of the six languages, Hungarian
iss the only one that shows no ambiguities for vowels. The unambiguity for
Hungariann vowels and the low ambiguity for Italian vowels presumably result
fromm the fact that neither Hungarian nor Italian contains diphthongs in the vowel
33
Note, that in the computations above the relative frequency of word-initial letters was
nott taken into account, that is, we averaged across letters without taking into account that certain
letterss might occur in word-initial position only very rarely, whereas other letters might occur
veryy often in that position.
ONSETT ENTROPY MATTERS
69 9
phonemee inventory. Diphthongs are often denoted with digraphs, and would
thereforee cause ambiguity at the first letter to first phoneme level. In both
languages,, Hungarian and Italian, the numbers of vowel letters and vowel
phonemess are quite balanced, as has been shown in Table 4.1: In Hungarian all
vowell phonemes are differentiated by diacritics, e.g., <u>, <ü>, <ü>, and <ii>. In
thiss way every vowel phoneme is denoted by a distinct vowel letter. In Italian only
twoo vowel letters, <e> and <o> can correspond to two different phonemes.
Forr consonants, the pattern changes: Here, French is the language with the
mostt ambiguous orthography-to-phonology mappings, followed by, in decreasing
order,, English, German, Hungarian, Italian, and Dutch. Two languages, Italian and
Hungarian,, showed greater entropy values for consonants than for vowels.
Exampless of ambiguous consonant letter to consonant phoneme mappings in
Hungariann followed from the existence of digraphs and trigraphs in Hungarian.
Forr example, word-initial <s> can be part of the digraph <sz>, denoting the
fricativefricative /s/, and word-initial <d> can be part of the trigraph <dzs>, denoting the
affricatee /dz/. In Italian, like in Hungarian, ambiguous consonant letter to
consonantt phoneme mappings resulted from the existence of digraphs and
trigraphs,, or, like in Dutch or English, from the existence of context-dependent
phonemes.. For example, analogous to the contextual dependency of the letter <c>
inn English, also in Italian the pronunciation of the letters <c> and <g> depends on
thee following letters.
Thee results presented above show the differently distributed orthographyto-phonologyy ambiguities for vowels and consonants in the six languages
investigated.. In addition to providing more differentiated descriptions of
languages'' orthographic transparency, the obtained results might be useful for
furtherr validating reading models that propose separate phonological activation
processess for vowels versus consonants. In Berent and Perfetti's two-cycles model
off phonology (1995), based on autosegmental theories of phonology (e.g.,
Goldsmith,, 1990), an English word's pronunciation is derived using two distinct
processes.. First, consonant phonemes are assembled by an automatic
computationall mechanism, and, later, vowel phonemes are added in a separate
slowerr process. Empirical support for this hypothesis was presented in backward
masking,, naming, and fast priming paradigms, where consonant-preserving
conditionss produced higher accuracies than vowel-preserving conditions at very
brieff presentations (Berent & Perfetti, 1995; Lee, Rayner, & Pollatsek, 2001; Lee,
Rayner,, & Pollatsek, 2002). However, Colombo, Zorzi, Cubelli, and Brivio (2003)
foundd the opposite pattern of results for Italian, that is, they obtained a processing
70 0
CHAPTERR 4
priorityy for vowels in Italian. They argued that differences in consonant-vowel
processingg across languages might suggest that this process is not a structural
hypothesishypothesis (i.e., that consonants and vowels are represented phonologically in
differentt ways) but just reflects language-specific characteristics (i.e., different CVV properties such as spelling-to-sound ambiguity), and is, as such, only a
statisticalstatistical hypothesis. In the statistical analyses of spelling-to-sound ambiguities
reportedd above the finding emerged that especially in English many letters were
ambiguouss in terms of consonant-vowel status, that is, they could denote
consonantss as well as vowels. Therefore it seems unlikely that inherently
ambiguouss items could be encoded as distinct linguistic entities, like the Berent
andd Perfetti model predicts. We suggest, in line with Lee et al (2001), that the
resultss obtained by Berent and Perfetti can best be explained in terms of a
statisticall hypothesis. Our results presented above can be used to predict the
behaviorr of the other languages in terms of their consonant-vowel encoding.
Too summarize, the distinct vowel-consonant measures and the onset
entropyy calculations seem to provide quite reliable estimations of the languages'
orthographicc transparency, resulting in rankings that are in line with earlier
descriptionss of the languages' spelling and pronunciation patterns (cf. Carney,
1994;; Eisenberg, 1998; Maraschio, 1993; Nunn, 1998; Peereman & Content,
1999;; Siptar & Törkenczy, 2000). As further analyses revealed, the six languages
analyzedd displayed different characteristics in terms of vowel versus consonant
ambiguity.. These characteristics can explain language-specific behavior of
phonologicall encoding during the reading process.
Inn order to test whether the degree of (un)ambiguity of the letter-tophonemee correspondences affects actual language processing, we proceeded to
investigatee the influence of the onset entropy values on naming latencies. If,
insteadd of relying on letter-phoneme mappings, language users rely exclusively on
other,, larger, functional reading units like graphemes, sublexical units, or
morphemes,, they might not be impaired by ambiguities on a letter level.
4.33 The Role of Onset Entropy in W o r d Naming
Spelling-to-soundd ambiguities are known to influence reaction times. The choice
off spelling-to-sound variables with the goal to use them as predictors for human
performancee should aim at capturing ambiguities that influence human
performance.. As a consequence of this principle, an analysis of the sub-parts of
ONSETT ENTROPY MATTERS
71 1
wordss should concentrate on those sub-parts whose spelling-sound irregularities
influencee human performance the most.
Whilee for this reason the majority of researchers (e.g., Glushko, 1979;
Zieglerr et al., 1996, 1997a) have focused on rime analyses (of monosyllabic
words),, another candidate for a detailed investigation of spelling-to-sound
mappingss is the beginning of a word. At least some models of reading, for
instance,, dual-route models (Coltheart, 1978), assume serial processing, from left
too right, or contain serial parsing components. Such processing consequently
predictss a larger impact on reaction times by ambiguities at the beginning of a
wordd than by ambiguities at the end of a word. This prediction is supported by
humann performance in word naming tasks, resulting in a positional regularity
effect,effect, showing that effects of irregularity decrease the later the position of the
irregularityy in the word to bee read. Ambiguities at the beginning of a word cause a
largerr delay than ambiguities at the end of a word, because in the latter case the
pronunciationn output from the lexical route, that operates in parallel to the
nonlexicall route, may already have been delivered by the system before the
ambiguityy is actually encountered (Coltheart & Rastle, 1994, but see Zorzi, 2000).
Thee positional regularity effect has been observed in studies by Coltheart
andd Rastle (1994) and has been replicated by Cortese (1998) and Rastle and
Coltheartt (1999b). Further empirical evidence to support this prediction is
provided,, for example, by Treiman et al. (1995), who showed that ambiguities in
(consonantal)) onsets accounted for more of the variance in reaction times than
ambiguitiess in (consonantal) codas.
Thatt the word's onset plays a crucial role in lexical access tasks has been
demonstratedd in research on spoken word recognition. For instance, studies testing
thee cohort model of spoken word recognition (Marslen-Wilson & Welsh, 1978;
Marslen-Wilson,, 1987; Marslen-Wilson & Zwitserlood, 1989) predict sequential
processingg and stress the importance of word onsets. In addition, there is
considerablee evidence that also in written word recognition tasks word onsets are
salientt and disproportionately important identification units (cf. Cutler, 1982;
Raynerr & Pollatsek, 1989; Smith & Silverberg, in revision). An inherent
advantagee of looking at word-initial spelling-sound relations lies in the relative
stabilityy of pronunciation of the word's onsets. A study by Greenberg, Carvey,
Hitchcock,, and Chang (2002), comparing pronunciation in spontaneous speech
withh canonical pronunciation, revealed that syllable nuclei or codas were much
72 2
CHAPTERR 4
moree likely to be deleted than syllable onsets, that, in contrast, were almost
alwayss realized.
Inn order to test the impact of onset entropy values cross-linguistically, we
correlatedd the onset letter-phoneme entropy values, that is, the entropy values for
thee mappings between the first letter and first phoneme with reaction times of
largee scale word naming studies carried out in three of the six languages. The
languagess were Italian, with a transparent orthography (, Dutch, with an
intermediatee orthography, and English, with an opaque orthography. The Italian
studyy reports mean naming latencies of 626 Italian nouns, collected on 30
participantss (Barca, Burani, & Arduino, 2003). The Dutch and English study
reportss mean naming latencies of 440 Dutch and English nouns, collected on 40
participantss for each language (De Groot et al., 2002).
Thiss meant that each word was assigned the entropy variable for its first
letter,, expressing the general degree of the letter's ambiguity, independent of the
probabilityy of this specific mapping. For, example, all words starting with the
letterr <a> received the same entropy value, regardless of (the probability of) their
pronunciation.. This is illustrated in Table 4.5.
TableTable 4.5: Entropy values assigned to the words - independent of the probability
ofof the specific mapping.
Englishh sample from the words in De Groot et al. (2002)
Word d
1stt phoneme
abuse e
hi hi
0.387927 7
0.482669 9
action n
/»/ /
0.375264 4
0.482669 9
advantage e hi hi
0.387927 7
0.482669 9
age e
0.O42634 4
0.482669 9
leil leil
probabilityy of the mapping
entropyy value
Ass in the reported experiments the naming latencies had been recorded with
aa voice key, it is likely that they have responded to specific articulatory features of
thee word's beginnings such as its sound intensity (Kessler, Treiman, & Mullennix,
2002,, in revision; Rastle & Davis, 2002). In our study we wanted to remove such
73 3
ONSETT ENTROPY MATTERS
voicee key effects and other possibly confounding effects from the effects that we
weree primarily interested in, that is, the effects of entropy.
Off the onset variables studied by De Groot et al. (2002), two correlated
significantlyy with the present entropy variables, namely, consonant cluster
structuree and sound intensity. Consonant cluster structure was coded in terms of
word-initiall consonant phonemes, ranging from 0 (for words starting with a
vowel)) to 3 (for words starting with three consonants).The sound intensity
variablee was intended to attenuate some inadvertent voice key effects by coding
thee initial phonemes according to their sound intensity using the speech-soundintensityy classification scheme presented by Fry (1979, in Crystal, 1987, p. 134),
whichh classifies English speech sounds in terms of their average intensity in
decibels.. Scores were assigned to the stimulus words depending upon the intensity
levell of their initial phoneme (see De Groot et al., 2002, for detailed descriptions
off the variables, ONS1, the consonant cluster variable and INT1, the sound
intensityy variable).
Wee removed the effects of these two confounding variables by computing
partiall correlations between the mean reaction times and the entropy values of the
initiall letter-to-phoneme mappings, partialling out consonant cluster structure and
soundd intensity in all three languages.
Forr all three languages, of varying orthographic transparency, the partial
correlationss were significant: The higher the onset entropy, the longer the reaction
times.. This is shown in Table 4.6.
TableTable 4.6: Partial correlations between letter-phoneme entropy values and
namingnaming latencies in three languages of varying orthographic transparency.
Language e Partiall correlation coefficient
p-valuee (two-tailed)
(letter-phonemee entropy - RT's)
Italian n
0.34 4
<0.001 1
Dutch h
0.39 9
<0.001 1
English h
0.30 0
<0.001 1
74 4
CHAPTERR 4
Itt is a well known fact that spelling-to-sound ambiguities do affect reaction
times.. However, the majority of relevant earlier studies could only demonstrate
thee impact of spelling-to-sound ambiguities for languages with a rather opaque
orthography,, for example English (Treiman et al., 1995) and French (Lange &
Content,, 1999). Therefore, the most noteworthy aspect of the present finding is the
factt that thee effects of word-initial letter-to-phoneme ambiguity can also be shown
forr languages with a very transparent orthography, as we demonstrated for the
Italiann data set.
4.44 Conclusion
Inn this chapter we investigated ambiguities in word-initial letter-to-phoneme
mappingss in six languages in detail concentrating on differences in vowel versus
consonantt entropy values and demonstrated significant correlations between the
descriptivee statistics and naming latencies in three of these languages.
Alll orthographies examined deviate to various degrees from the "ideal"
one-to-onee mapping between letters and phonemes. In line with earlier
comparisonss between a subset of these languages (Martensen et al., 2000; Van den
Boschh et al., 1995; Ziegler et al., 1997a) we have found that in terms of overall
spelling-to-soundd relations examined at the word-initial letter-phoneme level,
Englishh has the most ambiguous orthography, followed by (in decreasing order),
French,, German, Dutch, Italian, and Hungarian. The pattern changes slightly when
consonantt and vowel letters are analyzed separately. For vowels, English remains
thee language with the most ambiguous letter-to-sound relations, followed by (in
decreasingg order) German, Dutch, French, Italian, and Hungarian. Hungarian
showss completely unambiguous vowel letter-to-vowel phoneme mappings. For
consonants,, French shows the highest letter-to-phoneme ambiguity, followed by
(inn decreasing order) English, German, Hungarian, Italian, and Dutch. None of the
studiedd orthographies displayed completely unambiguous mappings between
consonantt letters and consonant phonemes.
AA clear effect of deviations from a one-to-one mapping between wordinitiall letters and phonemes as expressed in entropy values on naming was found.
Evenn in very transparent orthographies (like Italian) and also in languages of
intermediatee (Dutch) and opaque (English) orthographic transparency, these
ambiguitiess influence reaction times in naming tasks. This suggests that the
ambiguityy of letter-phoneme mappings cannot be ignored in favor of an exclusive
ONSETT ENTROPY MATTERS
75 5
focuss on larger grain sizes like sublexical units or graphemes, where most of the
currentt research is focusing on.
Additionall support for this view comes from the "whammy effect"
demonstratedd by Rastle and Coltheart (1998). They studied nonword naming,
assumingg a dual route model, and concluded that the functional reading unit of the
indirectt route is the letter, as it took their participants longer to name nonwords
withh phonemes mapping to digraphs or trigraphs than nonwords with a one-to-one
mappingg of letters to phonemes. That is, monosyllabic nonwords consisting of five
letterss and five phonemes, such as TRUSP, caused shorter naming times than
monosyllabicc nonwords consisting of five letters but three phonemes, such as
FOOCE.FOOCE. These findings provide a serious challenge for reading theories that
postulatee only sublexical units as functional reading units, and for models of
readingg that do not allow for any graded effects of ambiguities (see also Lange &
Content,, 1999).
Thee entropy paradigm as described in Chapter 3 seems to be not only a
methodd to get insights about languages' overall orthographic transparency, but it
alsoo provides a way to classify single words according to the degree of spelling-tosoundd ambiguity of their word-initial letters, a variable that, as demonstrated here,
correlatess significantly with naming latencies.
Too summarize, the research presented here leads us to the conclusion that
onsett entropy values provide a valid basis for assessments of orthographic
transparency.. An interesting future line of research we will to pursue is to see
whetherr and how the reverse entropy values, that is, in sound-to-spelling direction,
relatee to the statistical and behavioral data. As letter-phoneme mappings are bidirectionall in nature, we can perform all entropy calculations in the reverse
direction,, that is, from phonemes to letters. Analogously to the correlations
betweenn letter-to-phoneme entropies and reaction times, reported in this chapter,
wee can then explore the influence of phoneme-to-letter entropies on reaction
times,, in order to investigate possible feedback effects.
766
CHAPTER 4
© Copyright 2025 Paperzz