Speech Perception#
Based on Ch. 5 of Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.
Programming Environment#
import numpy as np
import pandas as pd
Speech perception is the active, intentional perception of speech sounds as opposed to the meaning of speech.
the detection of mispronunciations or speech errors, more prevelant in?
word-initial or -medial
vowels or consonants
nouns and verbs or grammatical words
speech perception is shaped by general properties of the auditory system that determine
what can and cannot be heard
what cues will be recoverable in particular segmental contexts
how adjacent sounds will influence each other
the cochlea’s nonlinear frequency scale probably underlies the fact that no language distinguishes fricatives on the basis of frequency components above 6000 Hz
What affects what we hear?
The nature of the auditory system constrains our perception of speech sounds
the nonlinearity of the cochlea’s frequency scale: no language distinguishes fricatives on the basis of frequency components above 6K Hz
VOT: aspirated stop vs unaspirated stop
Compensation for Coarticulation
Phonetic knowledge applied to speech sounds affects its perception
Categorical Perception (Johnson and Ralston 1994; Remez et al 1981: sine wave analogs; Best 1995; Flege 1995)
Categorical Magnets (Kuhl et al 1992)
Coherence
DUPLEX PERCEPTION
MCGURK EFFECT
Lexical (word; morpheme) knowledge applied to speech sounds affects its perception
slips of the ear (Bond, Zinny 1999)
WORD MAGNETS (Ganong 1980)
PHONEME RESTORATION (Warren 1970; Samuel, Arthur 1991)
(Elman, Jeff and McClelland, Jay 1988)
Measuring Perceptual Similarity via Multidimensional Scaling#
Data from Miller, George & Patricia Nicely. (1955). “An analysis of perceptual confusions among some English consonants.”
Mathematical approach from Shepard, Roger. (1972). “Psychological Representation of Speech Sounds”.
cm = np.array([
[199, 0, 46, 1, 4, 0, 0,14],
[ 3,177, 1, 29, 0, 4, 0,22],
[ 85, 2,114, 0, 10, 0, 0,21],
[ 0, 64, 0,105, 0, 18, 0,17],
[ 5, 0, 38, 0,170, 0, 0,15],
[ 0, 4, 0, 22, 0,132, 17,49],
[ 0, 0, 0, 4, 0, 8,189,59],
])
cm
array([[199, 0, 46, 1, 4, 0, 0, 14],
[ 3, 177, 1, 29, 0, 4, 0, 22],
[ 85, 2, 114, 0, 10, 0, 0, 21],
[ 0, 64, 0, 105, 0, 18, 0, 17],
[ 5, 0, 38, 0, 170, 0, 0, 15],
[ 0, 4, 0, 22, 0, 132, 17, 49],
[ 0, 0, 0, 4, 0, 8, 189, 59]])
fricatives=['f','v','th','dh','s','z','d']
cmdf = pd.DataFrame(
data =cm,
index =fricatives,
columns=fricatives+['other'],
)
cmdf['total']=cmdf.sum(axis=1)
cmdf
f | v | th | dh | s | z | d | other | total | |
---|---|---|---|---|---|---|---|---|---|
f | 199 | 0 | 46 | 1 | 4 | 0 | 0 | 14 | 264 |
v | 3 | 177 | 1 | 29 | 0 | 4 | 0 | 22 | 236 |
th | 85 | 2 | 114 | 0 | 10 | 0 | 0 | 21 | 232 |
dh | 0 | 64 | 0 | 105 | 0 | 18 | 0 | 17 | 204 |
s | 5 | 0 | 38 | 0 | 170 | 0 | 0 | 15 | 228 |
z | 0 | 4 | 0 | 22 | 0 | 132 | 17 | 49 | 224 |
d | 0 | 0 | 0 | 4 | 0 | 8 | 189 | 59 | 260 |
# # submatrices
# for i in range(7):
# for j in range(7):
# if i!=j:
# print()
# c=cmdf.iloc[[i,j],[i,j]]
# c.iloc[0]=c.iloc[0]/cmdf.iloc[i,-1]
# c.iloc[1]=c.iloc[1]/cmdf.iloc[j,-1]
# print(c.round(2))
The Shepard similarity between category \(i\) and category \(j\) is
\( \begin{aligned} \text{Shepard similarity}\,\,\, S_{ij}=\frac{p_{ij}+p_{ji}}{p_{ii}+p_{jj}} \end{aligned} \)
\( \begin{aligned} \text{Johnson approximation of Shepard similarity}\,\,\, S_{ij}=\frac{p_{ij}+p_{ji}}{2} \end{aligned} \)
Perceptual distance \(d_{ij}\) according to Shepard’s Law (that similarity is exponentially related to perceptual distance)
\( \begin{aligned} \text{perceptual distance}\,\,\, d_{ij}=-\ln(S_{ij}) \iff e^{-d_{ij}}=S_{ij} \end{aligned} \)
# proportions
ps=cmdf.drop(columns='total').div(cmdf.total,axis=0)
ps.round(2)
f | v | th | dh | s | z | d | other | |
---|---|---|---|---|---|---|---|---|
f | 0.75 | 0.00 | 0.17 | 0.00 | 0.02 | 0.00 | 0.00 | 0.05 |
v | 0.01 | 0.75 | 0.00 | 0.12 | 0.00 | 0.02 | 0.00 | 0.09 |
th | 0.37 | 0.01 | 0.49 | 0.00 | 0.04 | 0.00 | 0.00 | 0.09 |
dh | 0.00 | 0.31 | 0.00 | 0.51 | 0.00 | 0.09 | 0.00 | 0.08 |
s | 0.02 | 0.00 | 0.17 | 0.00 | 0.75 | 0.00 | 0.00 | 0.07 |
z | 0.00 | 0.02 | 0.00 | 0.10 | 0.00 | 0.59 | 0.08 | 0.22 |
d | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 | 0.03 | 0.73 | 0.23 |
# similarities
Ss=np.array([
(ps.iloc[i,j]+ps.iloc[j,i])/(ps.iloc[i,i]+ps.iloc[j,j])
for j in range(7)
for i in range(7)
]).reshape(7,7)
Ssdf=pd.DataFrame(data=Ss,index=fricatives,columns=fricatives).clip(lower=1e-10)
Ssdf.round(3)
f | v | th | dh | s | z | d | |
---|---|---|---|---|---|---|---|
f | 1.000 | 0.008 | 0.434 | 0.003 | 0.025 | 0.000 | 0.000 |
v | 0.008 | 1.000 | 0.010 | 0.345 | 0.000 | 0.026 | 0.000 |
th | 0.434 | 0.010 | 1.000 | 0.000 | 0.170 | 0.000 | 0.000 |
dh | 0.003 | 0.345 | 0.000 | 1.000 | 0.000 | 0.169 | 0.012 |
s | 0.025 | 0.000 | 0.170 | 0.000 | 1.000 | 0.000 | 0.000 |
z | 0.000 | 0.026 | 0.000 | 0.169 | 0.000 | 1.000 | 0.081 |
d | 0.000 | 0.000 | 0.000 | 0.012 | 0.000 | 0.081 | 1.000 |
# distances
ds=-np.log(Ssdf)
ds.clip(lower=1e-10).round(3)
f | v | th | dh | s | z | d | |
---|---|---|---|---|---|---|---|
f | 0.000 | 4.773 | 0.834 | 5.814 | 3.700 | 23.026 | 23.026 |
v | 4.773 | 0.000 | 4.570 | 1.064 | 23.026 | 3.650 | 23.026 |
th | 0.834 | 4.570 | 0.000 | 23.026 | 1.774 | 23.026 | 23.026 |
dh | 5.814 | 1.064 | 23.026 | 0.000 | 23.026 | 1.779 | 4.391 |
s | 3.700 | 23.026 | 1.774 | 23.026 | 0.000 | 23.026 | 23.026 |
z | 23.026 | 3.650 | 23.026 | 1.779 | 23.026 | 0.000 | 2.513 |
d | 23.026 | 23.026 | 23.026 | 4.391 | 23.026 | 2.513 | 0.000 |
Resources#
Casey Connor
[Y] Casey Connor. (14 Jan 2022). “Part 7/5 of Psychoacoustics / Audio Illusions”. YouTube.
[Y] Casey Connor. (14 Jan 2022). “Part 6/5 of Psychoacoustics / Audio Illusions”. YouTube.
[Y] Casey Connor. (18 Apr 2020). “42 Audio Illusions & Phenomena! - Part 5/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (12 Apr 2020). “42 Audio Illusions & Phenomena! - Part 4/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (08 Apr 2020). “42 Audio Illusions & Phenomena! - Part 3/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (28 Mar 2020). “42 Audio Illusions & Phenomena! - Part 2/5 of Psychoacoustics”. YouTube.
[Y] Casey Connor. (26 Mar 2020). “42 Audio Illusions & Phenomena! - Part 1/5 of Psychoacoustics”. YouTube
The Ling Space
[Y] The Ling Space. (24 Jun 2015). “Phonological Illusions”. YouTube.
Figures#
Terms#
[W] Allophone
[W] Auditory Cortex
[W] Auditory Illusion
[W] Auditory Perception (Hearing)
[W] Broca’s Area
[W] Categorical Perception
[W] Coarticulation
[W] Common Coding Theory
[W] Confusion Matrix
[W] Dichotic Listening
[W] Duplex Perception
[W] Exemplar Theory
[W] Ganong Effect
[W] Haskins Laboratories
[W] Hierarchical Cluster Analysis
[W] Levenshtein Distance
[W] McGurk Effect
[W] Motor Theory of Speech Perception
[W] Multidimensional Scaling (MDS)
[W] Multisensory Integration
[W] Percept
[W] Perception
[W] Phantom Word Illusion
[W] Phoneme
[W] Phoneme Restoration Effect
[W] Precedence Effect
[W] Psychology of Music
[W] Sapir-Whorf Hypothesis
[W] Shepard Tone
[W] Sensory Cue
[W] Signal-to-Noise Ratio (SNR)
[W] Sound Localization
[W] Speech-to-Song Illusion
[W] Speech Perception
[W] Speech Segmentation
[W] Speech Shadowing
[W] Speech Synthesis
[W] Stimulus
[W] Triangulation
[W] Tritone
[W] Tritone Paradox
[W] Voice Onset Time (VOT)
[W] Wernicke’s Area
Bibliography#
Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.