Auditory Phonetics#

Based on Ch. 4 of Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.

The study of speech sound perception


Programming Environment#

Hide code cell source
import numpy             as np
import pandas            as pd
import matplotlib.pyplot as plt

Summary of formulae#

\( \begin{aligned} \text{dB SPL} =20\log_{10}\left(\frac{x}{20\,\text{μPa}}\right) \end{aligned} \) where \(x\gt0\)

\( \text{sone}=2^{\frac{\text{dB}-40}{10}} \) where \(\text{dB}\ge40\)

\( \begin{aligned} \text{bark}=6\sinh^{-1}\left(\frac{f}{600}\right) \end{aligned} \)

def dBSPL (x : float) -> float:
  assert x > 0, 'x > 0'
  dB = 20*np.log10(x/20)
  return dB

def sone (dB : float) -> float:
  assert dB >= 40, 'dB >= 40'
  sone = 2**((dB-40)/10)
  return sone

def bark (freq : float) -> float:
  bark = 6*np.arcsinh(freq/600)
  return bark

Basic Audition#

Human auditory system, low-fidelity

  • amplitude is compressed

  • frequency is warped

  • adjacent sounds may blend into one another

listeners perceive auditory objects (mental), not acoustic objects (physical)

Peripheral Auditory System

  • the part of the auditory system that is not in the brain

  • translates acoustic signals into neural signals

  • performs amplitude compression and a kind of Fourier analysis of the signal

anatomy of the PAS

  • ear, outer

    • ear canal

    • ear drum

  • ear, middle

    • incus

    • malleus

    • stapes

  • ear, inner

    • auditory nerve

    • basilar membrane

    • cochlea

the basilar membrane is thinner at the beginning and thicker at the end

the thin end responds to high-frequency components in the acoustic signal

the thick end responds to low-frequency components in the acoustic signal


Perceived Loudness#

the most physically violent sounds are on the order of 10^7 (ten million) times more violent than the least violent sounds

the loudest sounds are on the order of 10^3 (one thousand) times louder than the quietist sounds

Typical Experience

Absolute Air Pressure Fluctuations [μPa]

Acoustic Intensity [dB SPL]

Perceived Loudness [sones]

absolute threshold

\(2\times10\)

\(0.0\times10^2\)

faint whisper

\(2\times10^2\)

\(0.2\times10^2\)

quiet office

\(2\times10^3\)

\(0.4\times10^2\)

\(2^0\)

conversation

\(2\times10^4\)

\(0.6\times10^2\)

\(2^2\)

city bus

\(2\times10^5\)

\(0.8\times10^2\)

\(2^4\)

subway train

\(2\times10^6\)

\(1.0\times10^2\)

\(2^6\)

loud thunder

\(2\times10^7\)

\(1.2\times10^2\)

\(2^8\)

pain and damage

\(2\times10^8\)

\(1.4\times10^2\)

\(2^{10}\)

RANGE

\(10^7\)

\(\sim10^2\)

\(2^{10}\approx10^3\)

TIMES GREATER PER STEP

\(10\)

\(2^2\)


Decibel Scale#

Expressing the amplitude of a sound wave

  • acoustic energy as pressure

  • electrical energy as voltage

The decibel scale is a way of expressing sound amplitude that is better correlated with perceived loudness.

The decibel scale provides an approximation to the nonlinearity of human loudness sensation.

The relative loudness of a sound is measured in terms of acoustic/sound intensity.

Acoustic/sound intensity is proportional to the square of the amplitude on a logarithmic scale.

Acoustic intensity is the amount of acoustic power exerted by the sound wave’s pressure fluctuation per unit of area \([\text{W cm}^{-2}]\)


Consider a sound with average pressure amplitude \(x\).

The intensity of \(x\) relative to a reference sound with pressure amplitude \(r\) is the power ratio

\( \begin{aligned} \frac{x^2}{r^2} =\left(\frac{x}{r}\right)^2 \end{aligned} \)

A bel is the base 10 logarithm of this power ratio.

\( \begin{aligned} \text{B}=\log_{10}\left(\frac{x}{r}\right)^2 \end{aligned} \) where \( \begin{aligned} \frac{x}{r}\gt0 \end{aligned} \)

The unit of loudness, the bel, is larger than desired.

A decibel is one-tenth of a bel.

\( \begin{aligned} \text{dB} =10\log_{10}\left(\frac{x}{r}\right)^2 =20\log_{10}\left(\frac{x}{r}\right) \end{aligned} \) where \( \begin{aligned} \frac{x}{r}\gt0 \end{aligned} \)

There are two common choices for the reference level \(r\) used in dB measurements.

1

20 μPa is the typical absolute threshold (i.e., lowest audible pressure fluctuation) of a 1 kHz tone.

When this reference value is used the units are dB SPL (sound pressure level).

\( \begin{aligned} \text{dB SPL} =20\log_{10}\left(\frac{x}{20\,\text{μPa}}\right) \end{aligned} \) where \(x\gt0\)

2

Rather than use the absolute threshold for a 1 kHz tone as the reference for all frequencies, the loudness of a tone is measured relative to the typical absolute threshold level for a tone at that frequency.

When this approach is employed the units are dB SL (sensation level).

In speech analysis programs amplitude may be expressed in dB relative to

  • the largest value that can be taken by a sample in the digital speech waveform, in which case the amplitude values are negative numbers.

  • the smallest value that can be taken by a sample in the digital speech waveform, in which case the amplitude values are positive numbers.

These reference levels are used when one does not need to know the absolute dB SPL value of the signal.


Hide code cell source
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(12,10));

x=np.linspace(2e1,2e8,1001)
y=20*np.log10(x/20)
ax1.plot(x,y);

for x,y in zip([2e1,2e2,2e3,2e4,2e5,2e6,2e7,2e8],
               [0.0e2,0.2e2,0.4e2,0.6e2,0.8e2,1.0e2,1.2e2,1.4e2]):
  ax1.scatter(x,y);

ax1.set_xlabel('Pressure [μPa]');
ax1.set_ylabel('Acoustic intensity [dB SPL]');
#ax1.set_title('Acoustic intensity [dB SPL] vs Pressure [μPa]');

x=np.linspace(2e1,2e6,1001)
y=20*np.log10(x/20)
ax2.plot(x,y);

for x,y in zip([2e1,2e2,2e3,2e4,2e5,2e6],
               [0.0e2,0.2e2,0.4e2,0.6e2,0.8e2,1.0e2]):
  ax2.scatter(x,y);

ax2.set_xlabel('Pressure [μPa]');
ax2.set_ylabel('Acoustic intensity [dB SPL]');
#ax2.set_title('Acoustic intensity [dB SPL] vs Pressure [μPa]');

x=np.linspace(2e1,2e8,1001)
y=20*np.log10(x/20)
ax3.plot(x,y);

for x,y in zip([2e1,2e2,2e3,2e4,2e5,2e6,2e7,2e8],
               [0.0e2,0.2e2,0.4e2,0.6e2,0.8e2,1.0e2,1.2e2,1.4e2]):
  ax3.scatter(x,y);

ax3.set_xlabel('Pressure [μPa]');
ax3.set_xscale('log');
ax3.set_ylabel('Acoustic intensity [dB SPL]');
#ax3.set_title('Acoustic intensity [dB SPL] vs Pressure [μPa]');

x=np.linspace(2e1,2e6,1001)
y=20*np.log10(x/20)
ax4.plot(x,y);

for x,y in zip([2e1,2e2,2e3,2e4,2e5,2e6],
               [0.0e2,0.2e2,0.4e2,0.6e2,0.8e2,1.0e2]):
  ax4.scatter(x,y);

ax4.set_xlabel('Pressure [μPa]');
ax4.set_xscale('log');
ax4.set_ylabel('Acoustic intensity [dB SPL]');
#ax4.set_title('Acoustic intensity [dB SPL] vs Pressure [μPa]');

fig.suptitle('Acoustic intensity [dB SPL] vs Pressure [μPa]');
../../../_images/0345c903a6687a9d0bdd6c24243cb741a85925cde301d88dc0efd195952a6e60.png

Sone Scale#

\( \begin{aligned} \text{sone}=2^{\frac{\text{dB}-40}{10}} \end{aligned} \) where \(\text{dB}\ge40\)

The sone scale shows listeners’ judgments of relative loudness scaled so that

  • a sound about as loud as a quiet office (\(2\times10^3\) μPa) is unity

  • a sound that is subjectively half as loud as that has a value of one half

  • a sound that is subjectively twice as loud as that has a value double that

Hide code cell source
x=np.linspace(40,140,1001)
y=2**((x-40)/10)
plt.plot(x,y);

for x,y in zip([0.4e2,0.6e2,0.8e2,1.0e2,1.2e2,1.4e2],
               [2**0,2**2,2**4,2**6,2**8,2**10]):
  plt.scatter(x,y);

plt.xlabel('Acoustic intensity [dB SPL]');
plt.ylabel('Perceived loudness [sones]');
plt.title('Perceived loudness [sones] vs Acoustic intensity [dB SPL]');
../../../_images/74bde75a0ed684e3e56819f93be8035b8bb9365991eba0e1becbf2da1113591a.png
Hide code cell source
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(12,10))

x=np.linspace(2e1,2e8,1001)
y=20*np.log10(x/20)
y=2**((y-40)/10)
ax1.plot(x,y);

for x,y in zip([2e3,2e4,2e5,2e6,2e7,2e8],
               [2**0,2**2,2**4,2**6,2**8,2**10]):
  ax1.scatter(x,y);

ax1.set_xlabel('Pressure [μPa]');
ax1.set_ylabel('Perceived loudness [sones]');
#ax1.set_title('Perceived loudness [sones] vs Pressure [μPa]');
ax1.axvline(0.25*2e8);
ax1.axvline(0.50*2e8);
ax1.axvline(0.75*2e8);
ax1.axvline(1.00*2e8);

x=np.linspace(2e1,2e6,1001)
y=20*np.log10(x/20)
y=2**((y-40)/10)
ax2.plot(x,y);

for x,y in zip([2e3,2e4,2e5,2e6],
               [2**0,2**2,2**4,2**6]):
  ax2.scatter(x,y);

ax2.set_xlabel('Pressure [μPa]');
ax2.set_ylabel('Perceived loudness [sones]');
#ax2.set_title('Perceived loudness [sones] vs Pressure [μPa]');
ax2.axvline(0.25*2e6);
ax2.axvline(0.50*2e6);
ax2.axvline(0.75*2e6);
ax2.axvline(1.00*2e6);

x=np.linspace(2e1,2e8,1001)
y=20*np.log10(x/20)
y=2**((y-40)/10)
ax3.plot(x,y);

for x,y in zip([2e3,2e4,2e5,2e6,2e7,2e8],
               [2**0,2**2,2**4,2**6,2**8,2**10]):
  ax3.scatter(x,y);

ax3.set_xlabel('Pressure [μPa]');
ax3.set_xscale('log');
ax3.set_ylabel('Perceived loudness [sones]');
#ax3.set_title('Perceived loudness [sones] vs Pressure [μPa]');
ax3.axvline(0.25*2e8);
ax3.axvline(0.50*2e8);
ax3.axvline(0.75*2e8);
ax3.axvline(1.00*2e8);

x=np.linspace(2e1,2e6,1001)
y=20*np.log10(x/20)
y=2**((y-40)/10)
ax4.plot(x,y);

for x,y in zip([2e3,2e4,2e5,2e6],
               [2**0,2**2,2**4,2**6]):
  ax4.scatter(x,y);

ax4.set_xlabel('Pressure [μPa]');
ax4.set_xscale('log');
ax4.set_ylabel('Perceived loudness [sones]');
#ax2.set_title('Perceived loudness [sones] vs Pressure [μPa]');
ax4.axvline(0.25*2e6);
ax4.axvline(0.50*2e6);
ax4.axvline(0.75*2e6);
ax4.axvline(1.00*2e6);

fig.suptitle('Perceived loudness [sones] vs Pressure [μPa]');
../../../_images/90bba4ae50e21a8c377b4fbbd5497abc8d2cadce72efa0a39561792c7b491172.png

This graph shows that

  • for soft sounds, small changes in pressure result in large changes in perceived loudness

  • for loud sounds, large changes in pressure result in small changes in perceived loudness

At \(10^5\) μPa, how does the perceived loudness change when the sound pressure is increased by \(10^5\) μPa?

The perceived loudness increases by about 50%.

print(sone(dBSPL(1e5)))
print(sone(dBSPL(2e5)))
print(sone(dBSPL(2e5))-sone(dBSPL(1e5)))
print(sone(dBSPL(2e5))/sone(dBSPL(1e5)))
10.54100128020249
16.0
5.45899871979751
1.5178823694908652

At \(2\times10^6\) μPa, how does the perceived loudness change when the sound pressure is increased by \(10^5\) μPa?

The perceived loudness increases by about 3%.

print(sone(dBSPL(2e6)))
print(sone(dBSPL(2e6+1e5)))
print(sone(dBSPL(2e6+1e5))-sone(dBSPL(2e6)))
print(sone(dBSPL(2e6+1e5))/sone(dBSPL(2e6)))
64.0
65.90785888988565
1.9078588898856452
1.0298102951544632

Perceived loudness varies as a function of frequency.

The human auditory system is most sensitive to sounds that have frequencies between 2 and 5 kHz.

Sensitivity falls quickly above 10 kHz.


Bark Scale#

The auditory system (namely, the basilar membrane of the cochlea) performs a Fourier analysis of incoming sounds.

However, the auditory system’s frequency response is not linear.

The bark scale is proportional to a scale of perceived pitch called the Mel scale and to distance along the basilar membrane.

\( \begin{aligned} \text{bark}=6\sinh^{-1}\left(\frac{f}{600}\right) \end{aligned} \)

Hide code cell source
x=np.linspace(1,1e4,1001)
vbark=np.vectorize(bark)
y=vbark(x)

plt.plot(x,y);
plt.xlabel('Acoustic Frequency [kHz]');
plt.xticks(ticks=[int(n*1e4) for n in np.arange(0,1.1,0.1)],labels=range(0,11));
plt.ylabel('Auditory Frequency [barks]');
plt.title('Auditory Frequency vs Acoustic Frequency');
../../../_images/3fc8a11f4bce9058a88f4f2a35fde1eaa5c85aeff67afdee0e9be0b258877c25.png

This graph shows that the auditory system is more sensitive to changes in acoustic frequency at the low end of the audible frequency range than at the high end.

At 0.5 kHz, how does the auditory frequency change when the acoustic frequency increases by 0.5 kHz?

The auditory frequency increases by about 70%.

print(bark(0.5e3))
print(bark(1.0e3))
print(bark(1.0e3)-bark(0.5e3))
print(bark(1.0e3)/bark(0.5e3))
4.550916823162454
7.702773976459156
3.151857153296702
1.692576304021848

At 5 kHz, how does the auditory frequency change when the acoustic frequency increases by 0.5 kHz?

The auditory frequency increases by about 3%.

print(bark(5.0e3))
print(bark(5.5e3))
print(bark(5.5e3)-bark(5.0e3))
print(bark(5.5e3)/bark(5.0e3))
16.901948584952663
17.470097475009343
0.5681488900566798
1.033614401747884

Figures#

  • [W] Bell, Alexander Graham (1847-1922)

  • [W] Fletcher, Harvey (1884-1981)


Terms#

  • [W] Absolute Threshold of Hearing

  • [W] Acoustic Engineering

  • [W] Acoustic Impedance

  • [W] Acoustic Intensity (Sound Intensity)

  • [W] Acoustic Power (Sound Power)

  • [W] Acoustic Pressure (Sound Pressure)

  • [W] Acoustics

  • [W] Anechoic Chamber

  • [W] Audiogram

  • [W] Auditory Model

  • [W] Auditory Phonetics

  • [W] Auditory Spectrogram (Cochleagram)

  • [W] Auditory Spectrum

  • [W] Auditory System

  • [W] Bark Scale

  • [W] Bel

  • [W] Cent

  • [W] Cochlea

  • [W] Cochleagram (Auditory Spectrogram)

  • [W] Computational Auditory Scene Analysis

  • [W] Correlogram

  • [W] Decibel

  • [W] Ear

  • [W] Eigenfrequency (Natural Frequency)

  • [W] Equal-Loudness Contour

  • [W] Frequency Response

  • [W] Hearing Level

  • [W] Hearing Range

  • [W] Incus

  • [W] Inner Ear

  • [W] Logarithmic Scale

  • [W] Loudness

  • [W] Loudness Compensation

  • [W] Malleus

  • [W] Mel Scale

  • [W] Multidimensional Scaling (MDS)

  • [W] Natural Frequency (Eigenfrequency)

  • [W] Neper

  • [W] Normal Mode

  • [W] Ossicles

  • [W] Outer Ear

  • [W] Pascal

  • [W] Perilymph

  • [W] Peripheral Auditory System

  • [W] Phon

  • [W] Pitch

  • [W] Psychoacoustics

  • [W] Psychophysics

  • [W] Resonance

  • [W] Root Mean Square

  • [W] Sone

  • [W] Sound

  • [W] Sound Energy

  • [W] Sound Energy Density

  • [W] Sound Intensity (Acoustic Intensity)

  • [W] Sound Power (Acoustic Power)

  • [W] Sound Pressure (Acoustic Pressure)

  • [W] Speed of Sound

  • [W] Stapes

  • [W] Threshold of Pain

  • [W] Voice Onset Time (VOT)


Bibliography#

Johnson, Keith. (2012). Acoustic and Auditory Phonetics. 3rd Ed. Wiley-Blackwell.