Speech Perception#

Based on Ch. 5 of Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.


Programming Environment#

import numpy  as np
import pandas as pd

Speech perception is the active, intentional perception of speech sounds as opposed to the meaning of speech.

the detection of mispronunciations or speech errors, more prevelant in?

  • word-initial or -medial

  • vowels or consonants

  • nouns and verbs or grammatical words

speech perception is shaped by general properties of the auditory system that determine

  • what can and cannot be heard

  • what cues will be recoverable in particular segmental contexts

  • how adjacent sounds will influence each other

the cochlea’s nonlinear frequency scale probably underlies the fact that no language distinguishes fricatives on the basis of frequency components above 6000 Hz

What affects what we hear?

  1. The nature of the auditory system constrains our perception of speech sounds

    1. the nonlinearity of the cochlea’s frequency scale: no language distinguishes fricatives on the basis of frequency components above 6K Hz

    2. VOT: aspirated stop vs unaspirated stop

    3. Compensation for Coarticulation

  2. Phonetic knowledge applied to speech sounds affects its perception

    1. Categorical Perception (Johnson and Ralston 1994; Remez et al 1981: sine wave analogs; Best 1995; Flege 1995)

      • Categorical Magnets (Kuhl et al 1992)

    2. Coherence

      • DUPLEX PERCEPTION

      • MCGURK EFFECT

  3. Lexical (word; morpheme) knowledge applied to speech sounds affects its perception

    • slips of the ear (Bond, Zinny 1999)

    • WORD MAGNETS (Ganong 1980)

    • PHONEME RESTORATION (Warren 1970; Samuel, Arthur 1991)

    • (Elman, Jeff and McClelland, Jay 1988)


Measuring Perceptual Similarity via Multidimensional Scaling#

Data from Miller, George & Patricia Nicely. (1955). “An analysis of perceptual confusions among some English consonants.”

Mathematical approach from Shepard, Roger. (1972). “Psychological Representation of Speech Sounds”.

cm = np.array([
  [199,  0, 46,  1,  4,  0,  0,14],
  [  3,177,  1, 29,  0,  4,  0,22],
  [ 85,  2,114,  0, 10,  0,  0,21],
  [  0, 64,  0,105,  0, 18,  0,17],
  [  5,  0, 38,  0,170,  0,  0,15],
  [  0,  4,  0, 22,  0,132, 17,49],
  [  0,  0,  0,  4,  0,  8,189,59],
])
cm
array([[199,   0,  46,   1,   4,   0,   0,  14],
       [  3, 177,   1,  29,   0,   4,   0,  22],
       [ 85,   2, 114,   0,  10,   0,   0,  21],
       [  0,  64,   0, 105,   0,  18,   0,  17],
       [  5,   0,  38,   0, 170,   0,   0,  15],
       [  0,   4,   0,  22,   0, 132,  17,  49],
       [  0,   0,   0,   4,   0,   8, 189,  59]])
fricatives=['f','v','th','dh','s','z','d']
cmdf = pd.DataFrame(
  data   =cm,
  index  =fricatives,
  columns=fricatives+['other'],
)
cmdf['total']=cmdf.sum(axis=1)
cmdf
f v th dh s z d other total
f 199 0 46 1 4 0 0 14 264
v 3 177 1 29 0 4 0 22 236
th 85 2 114 0 10 0 0 21 232
dh 0 64 0 105 0 18 0 17 204
s 5 0 38 0 170 0 0 15 228
z 0 4 0 22 0 132 17 49 224
d 0 0 0 4 0 8 189 59 260
# # submatrices
# for i in range(7):
#   for j in range(7):
#     if i!=j:
#       print()
#       c=cmdf.iloc[[i,j],[i,j]]
#       c.iloc[0]=c.iloc[0]/cmdf.iloc[i,-1]
#       c.iloc[1]=c.iloc[1]/cmdf.iloc[j,-1]
#       print(c.round(2))

The Shepard similarity between category \(i\) and category \(j\) is

\( \begin{aligned} \text{Shepard similarity}\,\,\, S_{ij}=\frac{p_{ij}+p_{ji}}{p_{ii}+p_{jj}} \end{aligned} \)

\( \begin{aligned} \text{Johnson approximation of Shepard similarity}\,\,\, S_{ij}=\frac{p_{ij}+p_{ji}}{2} \end{aligned} \)

Perceptual distance \(d_{ij}\) according to Shepard’s Law (that similarity is exponentially related to perceptual distance)

\( \begin{aligned} \text{perceptual distance}\,\,\, d_{ij}=-\ln(S_{ij}) \iff e^{-d_{ij}}=S_{ij} \end{aligned} \)

# proportions
ps=cmdf.drop(columns='total').div(cmdf.total,axis=0)
ps.round(2)
f v th dh s z d other
f 0.75 0.00 0.17 0.00 0.02 0.00 0.00 0.05
v 0.01 0.75 0.00 0.12 0.00 0.02 0.00 0.09
th 0.37 0.01 0.49 0.00 0.04 0.00 0.00 0.09
dh 0.00 0.31 0.00 0.51 0.00 0.09 0.00 0.08
s 0.02 0.00 0.17 0.00 0.75 0.00 0.00 0.07
z 0.00 0.02 0.00 0.10 0.00 0.59 0.08 0.22
d 0.00 0.00 0.00 0.02 0.00 0.03 0.73 0.23
# similarities
Ss=np.array([
  (ps.iloc[i,j]+ps.iloc[j,i])/(ps.iloc[i,i]+ps.iloc[j,j])
  for j in range(7)
  for i in range(7)
]).reshape(7,7)
Ssdf=pd.DataFrame(data=Ss,index=fricatives,columns=fricatives).clip(lower=1e-10)
Ssdf.round(3)
f v th dh s z d
f 1.000 0.008 0.434 0.003 0.025 0.000 0.000
v 0.008 1.000 0.010 0.345 0.000 0.026 0.000
th 0.434 0.010 1.000 0.000 0.170 0.000 0.000
dh 0.003 0.345 0.000 1.000 0.000 0.169 0.012
s 0.025 0.000 0.170 0.000 1.000 0.000 0.000
z 0.000 0.026 0.000 0.169 0.000 1.000 0.081
d 0.000 0.000 0.000 0.012 0.000 0.081 1.000
# distances
ds=-np.log(Ssdf)
ds.clip(lower=1e-10).round(3)
f v th dh s z d
f 0.000 4.773 0.834 5.814 3.700 23.026 23.026
v 4.773 0.000 4.570 1.064 23.026 3.650 23.026
th 0.834 4.570 0.000 23.026 1.774 23.026 23.026
dh 5.814 1.064 23.026 0.000 23.026 1.779 4.391
s 3.700 23.026 1.774 23.026 0.000 23.026 23.026
z 23.026 3.650 23.026 1.779 23.026 0.000 2.513
d 23.026 23.026 23.026 4.391 23.026 2.513 0.000

Resources#

Casey Connor

  • [Y] Casey Connor. (14 Jan 2022). “Part 7/5 of Psychoacoustics / Audio Illusions”. YouTube.

  • [Y] Casey Connor. (14 Jan 2022). “Part 6/5 of Psychoacoustics / Audio Illusions”. YouTube.

  • [Y] Casey Connor. (18 Apr 2020). “42 Audio Illusions & Phenomena! - Part 5/5 of Psychoacoustics”. YouTube.

  • [Y] Casey Connor. (12 Apr 2020). “42 Audio Illusions & Phenomena! - Part 4/5 of Psychoacoustics”. YouTube.

  • [Y] Casey Connor. (08 Apr 2020). “42 Audio Illusions & Phenomena! - Part 3/5 of Psychoacoustics”. YouTube.

  • [Y] Casey Connor. (28 Mar 2020). “42 Audio Illusions & Phenomena! - Part 2/5 of Psychoacoustics”. YouTube.

  • [Y] Casey Connor. (26 Mar 2020). “42 Audio Illusions & Phenomena! - Part 1/5 of Psychoacoustics”. YouTube

The Ling Space

  • [Y] The Ling Space. (24 Jun 2015). “Phonological Illusions”. YouTube.

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=067d412ef7ad42401bb89bb826f5aea31a26d40b


Figures#

  • [W] Deutsch, Diana (1938-) [Illusions]

  • [W] McGurk, Harry (1936-1998)

  • [W] Shepard, Roger (1929-2022)


Terms#


Bibliography#

Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.