L2 production & perception
|Discussion Topic 4:
Learning to produce & perceive the L2
- Flege (1993) examined Chinese Ss’ production and perception of vowel duration as a cue to the word-
final distinction between English /t/-/d/, yielding a correlation of r = 0.54, p > 0.01;
- Research examining word-initial stops (Flege & Schmidt, 1995; Schmidt & Flege, 1995) focused on native
Spanish Ss’ production of VOT in English /p/ and the location of the “best” /p/ in continua differing in
VOT; r = 0.54, p < 0.01. In a post-hoc analysis, Ss were divided according to overall degree of foreign
accent. A significant production-perception correlation was obtained for “proficient” Ss (i.e., those with
relatively mild foreign accents, r = 0.49, p > 0.01) but not for less proficient Ss (r = -0.004);
- Flege et al. (1997) examined the production and perception of English vowels by 20 native speakers
each of German, Spanish, Korean and Mandarin. The measure of perception was the size of the shift
from one vowel category to another based on changes in F1 frequency. The measure of production was
the size of F1 differences produced in pairs of English vowels. A production-perception correlation of r =
0.53 was obtained for English /i/ and /ɪ/, a correlation of r = 0.52 was obtained for /ɛ/ and /ӕ/.
The relation between the production and perception of L2 sounds has generated a lot of
attention over the years, and even more confusion. Here I lay out the SLM position after
providing some background information. Then I sumarize some relevant data, concluding
this post with a brief description of what I consider to be the most appropriate way to
assess the production-perception relation in L2 speech learning.
Work focusing on infant development suggests that infants begin showing an influence of
the surrounding linguistic environment on perception somewhat before showing an
influence of the ambient language on their vocal output. Research on (monolingual)
speech and language development indicates that young children's perception of L1
segments generally "leads" their production of those segments. However, adult-child
perceptual differences aren't evident to the casual observer; and children's articulation of
segments continues to be refined long beyond the point that their productions are
Whatever its time course, alignment presupposes error correction mechanisms. One mechanism enables
children to modify their production so that their vocal output corrresponds to the perceptual representations
they have developed (which, in turn, is based on what they have heard over a fairly long period of time). The
other error correction mechanism helps guide speech in real time via self-hearing. Using these mechanisms, L1
learning children eventually stop "misarticulating" L1 sounds. L1 phonetic categories continue to develop but,
eventually, these too reach completion so that, at some point during adolescence, the children become mature
"speaker-hearers" of their L1.
The neural representations used in the auditory processing and articulation of segments
are localized in different regions of the brain, but are connected to one another directly
and via higher processing centers. At a neural level, production and perception show a
mutual influence, which depends on level of processing and type of stimulation.
The child's need to align patterns of production and perception is demonstrated by
cross-language research, which shows that although languages may differ in the
phonetic specification of speech sounds, they always show co-ordinated patterns of
production and perception. As an example, Spanish and English both have /t/, but the
segment is realized with longer VOT in English than Spanish. Correspondingly, English
adults require longer VOT values to identify stimuli as /t/ (as opposed to /d/) than
Spanish-speaking adults do.
It is uncertain how long it takes children learning an L1 to align production and perception. Here too the results
of cross-language research is relevant. Flege and Eefting (1987a,b. 1988) showed that children learning
Spanish and English as an L1 differed from adult monolingual speakers of Spanish and English in much the
identification experiment, showed phoneme boundaries (i.e, cross-overs from predominantly /d/ to /t/
judgments) at shorter VOT values than adults did. Even though they had not yet reached adult-like levels, the
children's production and perception were aligned. Perhaps production changes little by little as perception
changes so that there is never a large gap between production and perception.
When researchers talk about the "attunement" of infants and children to their L1, they are
generally referring to a gradual modification of auditory perceptual representations. It is
generally assumed that these perceptual representations ("phonetic categories" in the
SLM framework) guide the development of articulatory motor plans that eventually can be
used by children to reproduce the sounds they have heard.
Once the L1 phonetic system has been fully established, error correction mechanisms are less important. For
adults who become profoundly deaf as the result of taking ototoxic drugs, the ability to produce L1 segments in
a native-like fashion is affected minimally, and not even right away. These unfortunate individuals have little
need, it seems, for feedback to maintain the correct articulatory patterns they established as children.
2. L2 speech learning.
The SLM proposes that all of the capacities that were used in successful L1 speech development -- including
the ability to align production and perception -- remain intact and accessible to learners of an L2. More
specifically, the SLM proposes that, as in L1 learning, perception generally "leads" production. Accurate
perception does not entail accurate production; however, accurate production requires accurate perception.
In an invited talk presented at the ICPhS meeting held in San Francisco, I (Flege (1999a) reviewed research
comparing the production and perception of phonetic segments in an L2. In this talk, I summarized research in
which the relation between segmental production and perception was evaluated through correlational analyses.
All of the studies yielded moderate correlations of about r = 0.50, including these:
I can perceptually distinguish Italian trilled /r/ from other variants of /r/ but, to my great embarrassment, can
not produce trills. (I take consolation in the fact that trills are learned late by most Italian children and not at
all by some Italians, including my late father-in-law who used a uvular /r/ instead of a trilled Italian /r/.)
2. The “transportation” of properties from (perceptual) phonetic categories to phonetic implementation rules
takes time. This observation implies that if two groups differed in perception but not production at Time 1 of a
longitudinal study, they may differ in production at Time 2.
3. The measure of segmental production and/or perception submitted to correlation analyses are limited by
4. Production and perception are inherently incommensurable, making comparison difficult. Flege (1999)
cited a study examining the perception and production of phonemic length contrasts in Swedish. The
phonetic dimension of interest in both domains was overall vowel duration, seemingly a commensurable
dimension. A correlation of r = 0.70 was obtained.
In my ICPhS talk, I offered several explanations as to why L2 research does not yield higher production-
These considerations suggest that correlation analysis does not provide the best method to evaluate the
contingency at the heart of the SLM hypothesis, i.e., that production accuracy can not be greater than
perceptual accuracy. .
In the ICPhS talk, I cited a study which made use of a more appropriate form of analysis. The study in question
examined production and perception of English vowels by native speakers of Italian living in Canada. Most of
the Ss examined succeeded in producing most English vowels in a readily identifiable fashion.
One exception was English /ʌ/, which was produced inaccurately by 31 of the 72 native Italian Ss. Given that
the Italian vowel that is perceptually closest to English /ʌ/ is /a/, a categorial discrimination test was used to
evaluated the discrimination of English /ʌ/ from Italian /a/.
The Ss who produced English /ʌ/ accurately were found to discriminate English /ʌ/-Italian /a/ significantly better
than the 31 Italian Ss who produced /ʌ/ poorly. A difference in production accuracy was not obtained, however,
when the Ss were divided into subgroups based on the ability to discriminate English /ʌ/-English /ӕ/ or the
ability to discriminate English /ʌ/-English /α/.
is no further need for the “internal communication” between production and perception. As a result, the psycho-
grammar which served to align production and perception in L1 speech acquisition “falls into disrepair because
of disuse”. Use it or lose it. Bever suggested, however, that losing the capacity to align production and
perception is not inevitable so long as “one is continually learning a new language.”