Speech Perception

Speech Perception#

Based on Ch. 5 of Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.

Programming Environment#

import numpy  as np
import pandas as pd

Speech perception is the active, intentional perception of speech sounds as opposed to the meaning of speech.

the detection of mispronunciations or speech errors, more prevelant in?

word-initial or -medial
vowels or consonants
nouns and verbs or grammatical words

speech perception is shaped by general properties of the auditory system that determine

what can and cannot be heard
what cues will be recoverable in particular segmental contexts
how adjacent sounds will influence each other

the cochlea’s nonlinear frequency scale probably underlies the fact that no language distinguishes fricatives on the basis of frequency components above 6000 Hz

What affects what we hear?

The nature of the auditory system constrains our perception of speech sounds
1. the nonlinearity of the cochlea’s frequency scale: no language distinguishes fricatives on the basis of frequency components above 6K Hz
2. VOT: aspirated stop vs unaspirated stop
3. Compensation for Coarticulation
Phonetic knowledge applied to speech sounds affects its perception
1. Categorical Perception (Johnson and Ralston 1994; Remez et al 1981: sine wave analogs; Best 1995; Flege 1995)
  - Categorical Magnets (Kuhl et al 1992)
2. Coherence
  - DUPLEX PERCEPTION
  - MCGURK EFFECT
Lexical (word; morpheme) knowledge applied to speech sounds affects its perception
- slips of the ear (Bond, Zinny 1999)
- WORD MAGNETS (Ganong 1980)
- PHONEME RESTORATION (Warren 1970; Samuel, Arthur 1991)
- (Elman, Jeff and McClelland, Jay 1988)

Measuring Perceptual Similarity via Multidimensional Scaling#

Data from Miller, George & Patricia Nicely. (1955). “An analysis of perceptual confusions among some English consonants.”

Mathematical approach from Shepard, Roger. (1972). “Psychological Representation of Speech Sounds”.

cm = np.array([
  [199,  0, 46,  1,  4,  0,  0,14],
  [  3,177,  1, 29,  0,  4,  0,22],
  [ 85,  2,114,  0, 10,  0,  0,21],
  [  0, 64,  0,105,  0, 18,  0,17],
  [  5,  0, 38,  0,170,  0,  0,15],
  [  0,  4,  0, 22,  0,132, 17,49],
  [  0,  0,  0,  4,  0,  8,189,59],
])
cm

array([[199,   0,  46,   1,   4,   0,   0,  14],
       [  3, 177,   1,  29,   0,   4,   0,  22],
       [ 85,   2, 114,   0,  10,   0,   0,  21],
       [  0,  64,   0, 105,   0,  18,   0,  17],
       [  5,   0,  38,   0, 170,   0,   0,  15],
       [  0,   4,   0,  22,   0, 132,  17,  49],
       [  0,   0,   0,   4,   0,   8, 189,  59]])

fricatives=['f','v','th','dh','s','z','d']
cmdf = pd.DataFrame(
  data   =cm,
  index  =fricatives,
  columns=fricatives+['other'],
)
cmdf['total']=cmdf.sum(axis=1)
cmdf

	f	v	th	dh	s	z	d	other	total
f	199	0	46	1	4	0	0	14	264
v	3	177	1	29	0	4	0	22	236
th	85	2	114	0	10	0	0	21	232
dh	0	64	0	105	0	18	0	17	204
s	5	0	38	0	170	0	0	15	228
z	0	4	0	22	0	132	17	49	224
d	0	0	0	4	0	8	189	59	260

# # submatrices
# for i in range(7):
#   for j in range(7):
#     if i!=j:
#       print()
#       c=cmdf.iloc[[i,j],[i,j]]
#       c.iloc[0]=c.iloc[0]/cmdf.iloc[i,-1]
#       c.iloc[1]=c.iloc[1]/cmdf.iloc[j,-1]
#       print(c.round(2))

The Shepard similarity between category \(i\) and category \(j\) is

\( \begin{aligned} \text{Shepard similarity}\,\,\, S_{ij}=\frac{p_{ij}+p_{ji}}{p_{ii}+p_{jj}} \end{aligned} \)

\( \begin{aligned} \text{Johnson approximation of Shepard similarity}\,\,\, S_{ij}=\frac{p_{ij}+p_{ji}}{2} \end{aligned} \)

Perceptual distance \(d_{ij}\) according to Shepard’s Law (that similarity is exponentially related to perceptual distance)

\( \begin{aligned} \text{perceptual distance}\,\,\, d_{ij}=-\ln(S_{ij}) \iff e^{-d_{ij}}=S_{ij} \end{aligned} \)

# proportions
ps=cmdf.drop(columns='total').div(cmdf.total,axis=0)
ps.round(2)

	f	v	th	dh	s	z	d	other
f	0.75	0.00	0.17	0.00	0.02	0.00	0.00	0.05
v	0.01	0.75	0.00	0.12	0.00	0.02	0.00	0.09
th	0.37	0.01	0.49	0.00	0.04	0.00	0.00	0.09
dh	0.00	0.31	0.00	0.51	0.00	0.09	0.00	0.08
s	0.02	0.00	0.17	0.00	0.75	0.00	0.00	0.07
z	0.00	0.02	0.00	0.10	0.00	0.59	0.08	0.22
d	0.00	0.00	0.00	0.02	0.00	0.03	0.73	0.23

# similarities
Ss=np.array([
  (ps.iloc[i,j]+ps.iloc[j,i])/(ps.iloc[i,i]+ps.iloc[j,j])
  for j in range(7)
  for i in range(7)
]).reshape(7,7)
Ssdf=pd.DataFrame(data=Ss,index=fricatives,columns=fricatives).clip(lower=1e-10)
Ssdf.round(3)

	f	v	th	dh	s	z	d
f	1.000	0.008	0.434	0.003	0.025	0.000	0.000
v	0.008	1.000	0.010	0.345	0.000	0.026	0.000
th	0.434	0.010	1.000	0.000	0.170	0.000	0.000
dh	0.003	0.345	0.000	1.000	0.000	0.169	0.012
s	0.025	0.000	0.170	0.000	1.000	0.000	0.000
z	0.000	0.026	0.000	0.169	0.000	1.000	0.081
d	0.000	0.000	0.000	0.012	0.000	0.081	1.000

# distances
ds=-np.log(Ssdf)
ds.clip(lower=1e-10).round(3)

	f	v	th	dh	s	z	d
f	0.000	4.773	0.834	5.814	3.700	23.026	23.026
v	4.773	0.000	4.570	1.064	23.026	3.650	23.026
th	0.834	4.570	0.000	23.026	1.774	23.026	23.026
dh	5.814	1.064	23.026	0.000	23.026	1.779	4.391
s	3.700	23.026	1.774	23.026	0.000	23.026	23.026
z	23.026	3.650	23.026	1.779	23.026	0.000	2.513
d	23.026	23.026	23.026	4.391	23.026	2.513	0.000

Resources#

Casey Connor

[Y] Casey Connor. (14 Jan 2022). “Part 7/5 of Psychoacoustics / Audio Illusions”. YouTube.
[Y] Casey Connor. (14 Jan 2022). “Part 6/5 of Psychoacoustics / Audio Illusions”. YouTube.
[Y] Casey Connor. (18 Apr 2020). “42 Audio Illusions & Phenomena! - Part 5/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (12 Apr 2020). “42 Audio Illusions & Phenomena! - Part 4/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (08 Apr 2020). “42 Audio Illusions & Phenomena! - Part 3/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (28 Mar 2020). “42 Audio Illusions & Phenomena! - Part 2/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (26 Mar 2020). “42 Audio Illusions & Phenomena! - Part 1/5 of Psychoacoustics”. YouTube

The Ling Space

[Y] The Ling Space. (24 Jun 2015). “Phonological Illusions”. YouTube.

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=067d412ef7ad42401bb89bb826f5aea31a26d40b

Figures#

[W] Deutsch, Diana (1938-) [Illusions]
[W] McGurk, Harry (1936-1998)
[W] Shepard, Roger (1929-2022)

Terms#

[W] Allophone
[W] Auditory Cortex
[W] Auditory Illusion
[W] Auditory Perception (Hearing)
[W] Broca’s Area
[W] Categorical Perception
[W] Coarticulation
[W] Common Coding Theory
[W] Confusion Matrix
[W] Dichotic Listening
[W] Duplex Perception
[W] Exemplar Theory
[W] Ganong Effect
[W] Haskins Laboratories
[W] Hierarchical Cluster Analysis
[W] Levenshtein Distance
[W] McGurk Effect
- https://www.youtube.com/watch?v=2k8fHR9jKVM
- https://www.youtube.com/watch?v=kzo45hWXRWU
[W] Motor Theory of Speech Perception
[W] Multidimensional Scaling (MDS)
[W] Multisensory Integration
[W] Percept
[W] Perception
[W] Phantom Word Illusion
- Diana Deutsch
- https://www.youtube.com/watch?v=muCPjK4nGY4
[W] Phoneme
[W] Phoneme Restoration Effect
- https://www.youtube.com/watch?v=kbzL9PxtFf0
- https://www.youtube.com/watch?v=ZyvyGMkzNQc
[W] Precedence Effect
[W] Psychology of Music
[W] Sapir-Whorf Hypothesis
[W] Shepard Tone
- https://vimeo.com/34749754
- https://www.youtube.com/watch?v=BzNzgsAE4F0
- https://www.youtube.com/watch?v=kzo45hWXRWU
[W] Sensory Cue
[W] Signal-to-Noise Ratio (SNR)
[W] Sound Localization
[W] Speech-to-Song Illusion
- Diana Deutsch
- https://www.youtube.com/watch?v=kbzL9PxtFf0
[W] Speech Perception
[W] Speech Segmentation
[W] Speech Shadowing
[W] Speech Synthesis
[W] Stimulus
[W] Triangulation
[W] Tritone
[W] Tritone Paradox
- Diana Deutsch
- https://www.youtube.com/watch?v=kbzL9PxtFf0
- https://www.youtube.com/watch?v=kzo45hWXRWU
[W] Voice Onset Time (VOT)
[W] Wernicke’s Area

Bibliography#

Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.

	f	v	th	dh	s	z	d	other	total
f	199	0	46	1	4	0	0	14	264
v	3	177	1	29	0	4	0	22	236
th	85	2	114	0	10	0	0	21	232
dh	0	64	0	105	0	18	0	17	204
s	5	0	38	0	170	0	0	15	228
z	0	4	0	22	0	132	17	49	224
d	0	0	0	4	0	8	189	59	260

	f	v	th	dh	s	z	d	other	total
f	199	0	46	1	4	0	0	14	264
v	3	177	1	29	0	4	0	22	236
th	85	2	114	0	10	0	0	21	232
dh	0	64	0	105	0	18	0	17	204
s	5	0	38	0	170	0	0	15	228
z	0	4	0	22	0	132	17	49	224
d	0	0	0	4	0	8	189	59	260