Auditory Phonetics

Auditory Phonetics#

Based on Ch. 4 of Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.

The study of speech sound perception

Programming Environment#

Summary of formulae#

\( \begin{aligned} \text{dB SPL} =20\log_{10}\left(\frac{x}{20\,\text{μPa}}\right) \end{aligned} \) where \(x\gt0\)

\( \text{sone}=2^{\frac{\text{dB}-40}{10}} \) where \(\text{dB}\ge40\)

\( \begin{aligned} \text{bark}=6\sinh^{-1}\left(\frac{f}{600}\right) \end{aligned} \)

def dBSPL (x : float) -> float:
  assert x > 0, 'x > 0'
  dB = 20*np.log10(x/20)
  return dB

def sone (dB : float) -> float:
  assert dB >= 40, 'dB >= 40'
  sone = 2**((dB-40)/10)
  return sone

def bark (freq : float) -> float:
  bark = 6*np.arcsinh(freq/600)
  return bark

Basic Audition#

Human auditory system, low-fidelity

amplitude is compressed
frequency is warped
adjacent sounds may blend into one another

listeners perceive auditory objects (mental), not acoustic objects (physical)

Peripheral Auditory System

the part of the auditory system that is not in the brain
translates acoustic signals into neural signals
performs amplitude compression and a kind of Fourier analysis of the signal

anatomy of the PAS

ear, outer
- ear canal
- ear drum
ear, middle
- incus
- malleus
- stapes
ear, inner
- auditory nerve
- basilar membrane
- cochlea

the basilar membrane is thinner at the beginning and thicker at the end

the thin end responds to high-frequency components in the acoustic signal

the thick end responds to low-frequency components in the acoustic signal

Perceived Loudness#

the most physically violent sounds are on the order of 10^7 (ten million) times more violent than the least violent sounds

the loudest sounds are on the order of 10^3 (one thousand) times louder than the quietist sounds

Typical Experience	Absolute Air Pressure Fluctuations [μPa]	Acoustic Intensity [dB SPL]	Perceived Loudness [sones]
absolute threshold	\(2\times10\)	\(0.0\times10^2\)
faint whisper	\(2\times10^2\)	\(0.2\times10^2\)
quiet office	\(2\times10^3\)	\(0.4\times10^2\)	\(2^0\)
conversation	\(2\times10^4\)	\(0.6\times10^2\)	\(2^2\)
city bus	\(2\times10^5\)	\(0.8\times10^2\)	\(2^4\)
subway train	\(2\times10^6\)	\(1.0\times10^2\)	\(2^6\)
loud thunder	\(2\times10^7\)	\(1.2\times10^2\)	\(2^8\)
pain and damage	\(2\times10^8\)	\(1.4\times10^2\)	\(2^{10}\)
RANGE	\(10^7\)	\(\sim10^2\)	\(2^{10}\approx10^3\)
TIMES GREATER PER STEP	\(10\)		\(2^2\)

Decibel Scale#

Expressing the amplitude of a sound wave

acoustic energy as pressure
electrical energy as voltage

The decibel scale is a way of expressing sound amplitude that is better correlated with perceived loudness.

The decibel scale provides an approximation to the nonlinearity of human loudness sensation.

The relative loudness of a sound is measured in terms of acoustic/sound intensity.

Acoustic/sound intensity is proportional to the square of the amplitude on a logarithmic scale.

Acoustic intensity is the amount of acoustic power exerted by the sound wave’s pressure fluctuation per unit of area \([\text{W cm}^{-2}]\)

Consider a sound with average pressure amplitude \(x\).

The intensity of \(x\) relative to a reference sound with pressure amplitude \(r\) is the power ratio

\( \begin{aligned} \frac{x^2}{r^2} =\left(\frac{x}{r}\right)^2 \end{aligned} \)

A bel is the base 10 logarithm of this power ratio.

\( \begin{aligned} \text{B}=\log_{10}\left(\frac{x}{r}\right)^2 \end{aligned} \) where \( \begin{aligned} \frac{x}{r}\gt0 \end{aligned} \)

The unit of loudness, the bel, is larger than desired.

A decibel is one-tenth of a bel.

\( \begin{aligned} \text{dB} =10\log_{10}\left(\frac{x}{r}\right)^2 =20\log_{10}\left(\frac{x}{r}\right) \end{aligned} \) where \( \begin{aligned} \frac{x}{r}\gt0 \end{aligned} \)

There are two common choices for the reference level \(r\) used in dB measurements.

1

20 μPa is the typical absolute threshold (i.e., lowest audible pressure fluctuation) of a 1 kHz tone.

When this reference value is used the units are dB SPL (sound pressure level).

\( \begin{aligned} \text{dB SPL} =20\log_{10}\left(\frac{x}{20\,\text{μPa}}\right) \end{aligned} \) where \(x\gt0\)

2

Rather than use the absolute threshold for a 1 kHz tone as the reference for all frequencies, the loudness of a tone is measured relative to the typical absolute threshold level for a tone at that frequency.

When this approach is employed the units are dB SL (sensation level).

In speech analysis programs amplitude may be expressed in dB relative to

the largest value that can be taken by a sample in the digital speech waveform, in which case the amplitude values are negative numbers.
the smallest value that can be taken by a sample in the digital speech waveform, in which case the amplitude values are positive numbers.

These reference levels are used when one does not need to know the absolute dB SPL value of the signal.

../../../_images/0345c903a6687a9d0bdd6c24243cb741a85925cde301d88dc0efd195952a6e60.png

Sone Scale#

\( \begin{aligned} \text{sone}=2^{\frac{\text{dB}-40}{10}} \end{aligned} \) where \(\text{dB}\ge40\)

The sone scale shows listeners’ judgments of relative loudness scaled so that

a sound about as loud as a quiet office (\(2\times10^3\) μPa) is unity
a sound that is subjectively half as loud as that has a value of one half
a sound that is subjectively twice as loud as that has a value double that

../../../_images/74bde75a0ed684e3e56819f93be8035b8bb9365991eba0e1becbf2da1113591a.png

../../../_images/90bba4ae50e21a8c377b4fbbd5497abc8d2cadce72efa0a39561792c7b491172.png

This graph shows that

for soft sounds, small changes in pressure result in large changes in perceived loudness
for loud sounds, large changes in pressure result in small changes in perceived loudness

At \(10^5\) μPa, how does the perceived loudness change when the sound pressure is increased by \(10^5\) μPa?

The perceived loudness increases by about 50%.

print(sone(dBSPL(1e5)))
print(sone(dBSPL(2e5)))
print(sone(dBSPL(2e5))-sone(dBSPL(1e5)))
print(sone(dBSPL(2e5))/sone(dBSPL(1e5)))

54100128020249
0
45899871979751
5178823694908652

At \(2\times10^6\) μPa, how does the perceived loudness change when the sound pressure is increased by \(10^5\) μPa?

The perceived loudness increases by about 3%.

print(sone(dBSPL(2e6)))
print(sone(dBSPL(2e6+1e5)))
print(sone(dBSPL(2e6+1e5))-sone(dBSPL(2e6)))
print(sone(dBSPL(2e6+1e5))/sone(dBSPL(2e6)))

0
90785888988565
9078588898856452
0298102951544632

Perceived loudness varies as a function of frequency.

The human auditory system is most sensitive to sounds that have frequencies between 2 and 5 kHz.

Sensitivity falls quickly above 10 kHz.

Bark Scale#

The auditory system (namely, the basilar membrane of the cochlea) performs a Fourier analysis of incoming sounds.

However, the auditory system’s frequency response is not linear.

The bark scale is proportional to a scale of perceived pitch called the Mel scale and to distance along the basilar membrane.

\( \begin{aligned} \text{bark}=6\sinh^{-1}\left(\frac{f}{600}\right) \end{aligned} \)

../../../_images/3fc8a11f4bce9058a88f4f2a35fde1eaa5c85aeff67afdee0e9be0b258877c25.png

This graph shows that the auditory system is more sensitive to changes in acoustic frequency at the low end of the audible frequency range than at the high end.

At 0.5 kHz, how does the auditory frequency change when the acoustic frequency increases by 0.5 kHz?

The auditory frequency increases by about 70%.

print(bark(0.5e3))
print(bark(1.0e3))
print(bark(1.0e3)-bark(0.5e3))
print(bark(1.0e3)/bark(0.5e3))

550916823162454
702773976459156
151857153296702
692576304021848

At 5 kHz, how does the auditory frequency change when the acoustic frequency increases by 0.5 kHz?

The auditory frequency increases by about 3%.

print(bark(5.0e3))
print(bark(5.5e3))
print(bark(5.5e3)-bark(5.0e3))
print(bark(5.5e3)/bark(5.0e3))

901948584952663
470097475009343
5681488900566798
033614401747884

Figures#

[W] Bell, Alexander Graham (1847-1922)
[W] Fletcher, Harvey (1884-1981)

Terms#

[W] Absolute Threshold of Hearing
[W] Acoustic Engineering
[W] Acoustic Impedance
[W] Acoustic Intensity (Sound Intensity)
[W] Acoustic Power (Sound Power)
[W] Acoustic Pressure (Sound Pressure)
[W] Acoustics
[W] Anechoic Chamber
[W] Audiogram
[W] Auditory Model
[W] Auditory Phonetics
[W] Auditory Spectrogram (Cochleagram)
[W] Auditory Spectrum
[W] Auditory System
[W] Bark Scale
[W] Bel
[W] Cent
[W] Cochlea
[W] Cochleagram (Auditory Spectrogram)
[W] Computational Auditory Scene Analysis
[W] Correlogram
[W] Decibel
[W] Ear
[W] Eigenfrequency (Natural Frequency)
[W] Equal-Loudness Contour
[W] Frequency Response
[W] Hearing Level
[W] Hearing Range
[W] Incus
[W] Inner Ear
[W] Logarithmic Scale
[W] Loudness
[W] Loudness Compensation
[W] Malleus
[W] Mel Scale
[W] Multidimensional Scaling (MDS)
[W] Natural Frequency (Eigenfrequency)
[W] Neper
[W] Normal Mode
[W] Ossicles
[W] Outer Ear
[W] Pascal
[W] Perilymph
[W] Peripheral Auditory System
[W] Phon
[W] Pitch
[W] Psychoacoustics
[W] Psychophysics
[W] Resonance
[W] Root Mean Square
[W] Sone
[W] Sound
[W] Sound Energy
[W] Sound Energy Density
[W] Sound Intensity (Acoustic Intensity)
[W] Sound Power (Acoustic Power)
[W] Sound Pressure (Acoustic Pressure)
[W] Speed of Sound
[W] Stapes
[W] Threshold of Pain
[W] Voice Onset Time (VOT)

Bibliography#

Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.