Context Effect in the Categorical Perception of Mandarin Tones



Context Effect in the Categorical Perception of Mandarin Tones

Fei Chen1 & Gang Peng1,2

Received: 14 November 2014 /Revised: 6 March 2015 /Accepted: 21 April 2015 # Springer Science+Business Media New York 2015

Abstract The categorical perception of tones is based not only on word-internal F0 cues but also on external F0 cues in the contexts. The present study focuses on the effects of different types of preceding contexts onMandarin tone perception. In the experiment, subjects were required to identify a target tone with the preceding context. The target tone was from a tone continuum ranging fromMandarin Tone 1 (high-level tone) to Tone 2 (mid-rising tone). It was preceded by four types of contexts (normal speech, reversal speech, fine-structure sound, and non-speech) with different mean F0 values. Results indicate that the categorical perception of Mandarin tones is influenced only by the normal speech context, and the effect is contrastive.

For instance, in a normal speech context with a higher mean F0, the following tone is more likely to be perceived as a lowerfrequency tone (Tone 2), whereas with a lower mean F0, the following tone is more likely to be perceived as a higherfrequency tone (Tone 1). These findings suggest that Mandarin tone normalization is mediated by speech-specific processes and that the speech context needs to be intelligible.

Keywords Mandarin tone . Context effect . Categorical perception . Speech-specific mechanisms 1 Introduction

Tone languages such as Mandarin use pitch patterns to distinguish lexical meanings [1], and fundamental frequency (F0) is the most important physical correlate of pitch. As can be seen in Fig. 1, Mandarin Chinese has four different lexical tones: high-level tone (Tone 1), mid-rising tone (Tone 2), low-falling-then-rising tone (Tone 3), and high-falling tone (Tone 4). The same syllable Bma^ with different lexical tones will have very different meanings, e.g., Bmother^ (Tone 1), Bhemp^ (Tone 2), Bhorse^ (Tone 3), and Bscold^ (Tone 4) respectively.

However, it is worth noting that in actual speech the exact

F0 values of lexical tones are highly variable across utterances and across talkers. Abundant literature has shown that there is a great deal of inter- and intra-talker variability in speech production, giving rise to varied F0 realizations of tone [2]. For instance, the same word uttered by different talkers (i.e., intertalker variability) or by the same talker on different occasions (i.e., intra-talker variability) may differ significantly in terms of their acoustic properties. How then do listeners deal with such inter- and intra-talker differences in F0?

The term Btone normalization^ has been used to describe the processes by which listeners recognize the same tone produced by different talkers or the same talker in different conditions [3]. There are two types of cues mainly used in these processes – word-internal cues and word-external cues. Wang [1] discussed various word-internal cues that might be available during tone normalization, including F0, duration, intensity profile, voice quality, and other relevant acoustic cues of a word. All of these acoustic cues contain useful information about the tone category, among which F0 is the most important cue for tone perception. As for word-external cues, they mainly refer to acoustic cues in the context, and are thus also known as Bcontextual cues^. Often, listeners make use of both * Gang Peng

Fei Chen 1 Key Laboratory of Human-Machine Intelligence-Synergy Systems,

Shenzhen Institutes of Advanced Technology, Chinese Academy of

Sciences, Shenzhen, China 2 Department of Linguistics and Modern Languages, and Joint

Research Centre for Language & Human Complexity, The Chinese

University of Hong Kong, Hong Kong SAR, China

J Sign Process Syst

DOI 10.1007/s11265-015-1008-2 word-internal F0 cues and contextual F0 cues for tone normalization [3–6]. But on some occasions, contextual cues with information about a talker’s F0 range can be more crucial, such as when the stimuli to be categorized are highly ambiguous because they are located close to the perceptual boundaries of two tones. 1.1 The Effect of Speech Context on Lexical Tone


The speech context with cues of a talker’s F0 exerts an important influence on the perception of target lexical tones, which can be divided further into two categories: level tones that vary in F0 height but have similar contours, or contour tones which can be differentiated in terms of both F0 height and F0 direction.

Comparatively speaking, the effects of the speech context are most evident on level tones. Studies [3–5] have clearly demonstrated that the perception of Cantonese level tones is context-dependent. Francis et al. [3] found that target stimuli were also more likely to be perceived as low-level tones when positioned in a synthesized context with high F0, whereas the same set of stimuli were perceived as high-level tones in a synthesized context with low F0. Moreover, Listeners’ tonal judgments were proportional to the degree of frequency shift.

Wong and Diehl [4] used the three level tones from Cantonese (Tone 1: high-level tone, Tone 3: mid-level tone, and Tone 6: low-level tone) as target stimuli. They asked listeners to judge the identity of these tones with speech contexts which were manipulated to differ in mean F0 height. They found that the same target stimuli were identified as Tone 1 (high-level) 99.5 % of the time when in a lowered F0 context, Tone 6 (low-level) 95.8 % of the time when in a raised F0 context, and Tone 3 (mid-level) 91.9 % of the time when the context had an intermediate mean F0. Zhang et al. [5] further demonstrated that raised or lowered speech context conditions could lead to similar contrastive effects, regardless of the F0 contour being preserved or flattened.

In contrast to level tone perception, however, much more mixed results have been found for the influence of speech context on contour tone perception. Some results showed no significant context effect onMandarin tone perception. Leather [7] tested the perception of syllables produced with Mandarin Tone 1 and Tone 2 following natural spoken sentences.