Acoustic and auditory properties of speech sounds

Acoustic and auditory properties of speech sounds #

[6]

LING 497 Phonetic Analysis: Articulation, Acoustics, Audition

The Pennsylvania State University

Prof. Deborah Morton

Revised

31 May 2023

Programming Environment #

'R version 4.3.0 (2023-04-21)'

'/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library'

── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     

── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ dplyr::combine()    masks gridExtra::combine()
✖ purrr::discard()    masks scales::discard()
✖ tidyr::extract()    masks magrittr::extract()
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::lag()        masks stats::lag()
✖ purrr::set_names()  masks magrittr::set_names()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Zürich German Vowels #

F1, F2, F3 observations #

A tibble: 17 × 4
vowel	F1	F2	F3
<chr>	<dbl>	<dbl>	<dbl>
i	338.1619	2311.1416	3176.603
ə	480.7557	1652.5341	2390.098
e	424.6613	2151.9277	2750.249
ə	538.1591	1728.5344	2455.532
ɛ	463.3875	1729.5033	2380.507
æ	598.9991	1485.5933	2374.373
y	302.9121	1627.8432	2094.167
i	329.8811	2399.5911	3370.156
ø	353.5354	1654.6924	2219.495
ə	446.4217	1762.7513	2305.845
œ	556.2411	1437.2258	2492.393
u	321.5930	684.0990	2599.445
o	449.0586	642.1006	2626.705
e	548.7221	1450.5669	2537.659
ɒ	713.4208	1564.5015	3090.859
ɛ	554.6995	1966.4626	2496.108
ə	596.0809	1591.8670	2502.881

F1, F2, F3 averages #

zurich_avg <- zurich_obs %>%
                group_by(vowel) %>%
                  summarize(F1=mean(F1),F2=mean(F2),F3=mean(F3))
zurich_avg

A tibble: 11 × 4
vowel	F1	F2	F3
<chr>	<dbl>	<dbl>	<dbl>
e	486.6917	1801.2473	2643.954
i	334.0215	2355.3664	3273.379
o	449.0586	642.1006	2626.705
u	321.5930	684.0990	2599.445
y	302.9121	1627.8432	2094.167
æ	598.9991	1485.5933	2374.373
ø	353.5354	1654.6924	2219.495
œ	556.2411	1437.2258	2492.393
ɒ	713.4208	1564.5015	3090.859
ə	515.3543	1683.9217	2413.589
ɛ	509.0435	1847.9829	2438.308

Visualization of the F1-F2 acoustic vowel space #

options(repr.plot.width=10, repr.plot.height=10)

plt <- ggplot(zurich_avg, aes(x=F2, y=F1, label=vowel, color=vowel)) +
  geom_text(size=10) +
  scale_x_reverse(
      position='bottom',
      breaks=seq(0, 3000, 200)) +
  scale_y_reverse(
      position='left',
      breaks=seq(0, 1000, 100)) +
  labs(
    x='F2 [Hz]\n',
    y='F1 [Hz]\n',
    title='Zurich German [F1 vs F2]') +
  theme(
    legend.position='none',
    plot.title=element_text(hjust=0.5),
    text=element_text(size=20)
  )

suppressWarnings(print(plt))

../../../../_images/56f5b59a0facddaab26f373128a857b0f2bd82bd5d355f14a99b5acdf8cc2e18.png

Analysis #

[1] Which pairs of vowels look like they are close together and are potentially confusable? (No less than three pairs.)

the near-open front unrounded vowel [æ] and the open front rounded vowel [œ]
the close-mid front unrounded vowel [e] and the open-mid front unrounded vowel [ɛ]
the close-mid front rounded vowel [ø] and the close front rounded vowel [y]

[2] Examine F3 for each pair of vowels. Do you think F3 might help speakers distinguish the vowel pairs from one another? Why or why not?

A tibble: 6 × 2
vowel	F3
<chr>	<dbl>
e	2643.954
y	2094.167
æ	2374.373
ø	2219.495
œ	2492.393
ɛ	2438.308

[3] Do you think any other acoustic cues underlie the observed vowel differences? If so, which ones?

[4] Do you think an auditory plot would show differences not seen in the acoustic plot? Why or why not?

Quechua Stops #

Salasaca Quechua has a three-way contrast in stop phonemes between voiceless, voiceless aspirated, and voiced stops.

Measure the VOT

[p]

[ph]

[b]

[t]

[th]

[d]

[k]

[kh]

[g]

VOT observations #

A tibble: 24 × 3
token	stop	vot
<chr>	<chr>	<dbl>
pungu	p	0.015893
pungu	g	0.033120
patsuk	p	0.009725
sipu	p	0.021879
phaki	ph	0.039716
phaki	k	0.035454
bunga	b	-0.132969
bunga	g	0.021412
wasibi	b	0.000000
taki	t	0.019589
taki	k	0.036247
tuta	t	0.020096
tuta	t	0.012146
thuktu	th	0.032266
thuktu	t	0.015750
dali	d	-0.105084
tshida	d	0.000000
kushni	k	0.035806
wajku	k	0.025784
khata	kh	0.053653
khata	t	0.014999
gan	g	-0.165850
tawga	t	0.015944
tawga	g	0.000000

VOT averages #

A tibble: 9 × 2
stop	vot
<chr>	<dbl>
b	-0.06648450
d	-0.05254200
g	-0.02782950
k	0.03332275
kh	0.05365300
p	0.01583233
ph	0.03971600
t	0.01642067
th	0.03226600

Analysis #

[1] If you only saw the VOT patterns without any transcription, would you think there was a three-way contrast? Why or why not?

[2] Do you think that any acoustic cues other than VOT play a role in these contrasts?

Korean Fricatives #

Examine the phonetic correlates of a cross-linguistically unsual voiceless alveolar fricative constrast in Korean:

[sʰ] lenis aspirated fricative

[ss] fortis fricative

Which acoustic cues appear to be relevant to speaker-listeners’ perception in distinguishing the fricatives? Which ones don’t?

Acoustic cue measurements #

A tibble: 2 × 10
token	duration_ms	F0_transition_Hz	spectral_peak_Hz	spectral_peak_dB	H1_Hz	H1_dB	H2_Hz	H2_dB	H1_minus_H2_dB
<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
[sʰ]	0.208705	125	4347.11	28.5	235.19	43.6	841.14	39.6	4.0
[ss]	0.213125	128	4449.29	36.6	226.67	46.2	834.28	46.1	0.1

Central Arrernte Nasals #

Central Arrernte is an aboriginal Australian language that has nasal phonemes at many places of articulation.

Examine the formant and antiformant properties of the nasal. (Note: A final epenthetic vowel [a] may be heard at the end of consonant-final words pronounced in isolation, but is not included in the transcription.)

[m] bilabial

[ŋ] velar

[n̪] dental

[n] apicoalveolar

[ɲ] palatal

[ɳ] retroflex

F1, F2, F3 observations #

A tibble: 7 × 5
token	nasal	F1_Hz	F2_Hz	F3_Hz
<chr>	<chr>	<dbl>	<dbl>	<dbl>
[aməŋ]	m	285.2174	1542.619	2431.580
[aməŋ]	ŋ	405.2672	2535.106	2605.047
[an̪ək]	n̪	280.2020	1689.419	2615.093
[aɲək]	ɲ	329.7587	2379.265	3247.250
[anək]	n	322.6723	1699.682	2534.887
[aɳək]	ɳ	349.1756	2380.279	4208.140
[aŋək]	ŋ	344.2634	1173.231	2555.466

F1, F2, F3 averages #

A tibble: 6 × 4
nasal	F1_Hz	F2_Hz	F3_Hz
<chr>	<dbl>	<dbl>	<dbl>
m	285.2174	1542.619	2431.580
n	322.6723	1699.682	2534.887
n̪	280.2020	1689.419	2615.093
ŋ	374.7653	1854.169	2580.257
ɲ	329.7587	2379.265	3247.250
ɳ	349.1756	2380.279	4208.140

Terms #

[W] Arrernte
[W] Fortis
[W] Korean phonology
[W] Lenis
[W] Morphophonology
[W] Open Front Rounded Vowel
[W] Quechua
[W] Salasaca
[W] Sandhi
[W] Tone Sandhi
[W] Zürich German

Bibliography #

Breen & Dobson. (2005). [central arrernte nasals].
Fleischer, Jürg & Stephan Schmid. (2006). “Zurich German”. Journal of the International Phonetic Association.
Lee. (1999). [korean fricatives].
Masaquiza & Marlett. (2008). [quechua stops].

Acoustic and auditory properties of speech sounds

Contents

Acoustic and auditory properties of speech sounds #

Programming Environment #

Zürich German Vowels #

F1, F2, F3 observations #

F1, F2, F3 averages #

Visualization of the F1-F2 acoustic vowel space #

Analysis #

Quechua Stops #

VOT observations #

VOT averages #

Spectrograms #

Analysis #

Korean Fricatives #

Acoustic cue measurements #

Spectrograms and power spectra #

Central Arrernte Nasals #

F1, F2, F3 observations #

F1, F2, F3 averages #

Spectrograms #

Terms #

Bibliography #