A02 - Graphical Exploration

A02 - Graphical Exploration#


Revised

21 May 2023


Programming Environment#

packages <- c(
  'COUNT',
  'dcData',
  'esquisse',
  'mosaic',
  'rmarkdown',
  'tidytuesdayR',
  'tidyverse'
)

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# Load packages
invisible(lapply(packages, library, character.only = TRUE))

sessionInfo()
Loading required package: msme
Loading required package: MASS
Loading required package: lattice
Loading required package: sandwich
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2
The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.
Attaching package: ‘mosaic’
The following objects are masked from ‘package:dplyr’:

    count, do, tally
The following object is masked from ‘package:Matrix’:

    mean
The following object is masked from ‘package:ggplot2’:

    stat
The following objects are masked from ‘package:stats’:

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var
The following objects are masked from ‘package:base’:

    max, mean, min, prod, range, sample, sum
── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
 forcats   1.0.0      stringr   1.5.0
 lubridate 1.9.2      tibble    3.2.1
 purrr     1.0.2      tidyr     1.3.0
 readr     2.1.4     
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
 mosaic::count() masks dplyr::count()
 purrr::cross()  masks mosaic::cross()
 mosaic::do()    masks dplyr::do()
 tidyr::expand() masks Matrix::expand()
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 tidyr::pack()   masks Matrix::pack()
 dplyr::select() masks MASS::select()
 mosaic::stat()  masks ggplot2::stat()
 mosaic::tally() masks dplyr::tally()
 tidyr::unpack() masks Matrix::unpack()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.2    forcats_1.0.0      stringr_1.5.0      purrr_1.0.2       
 [5] readr_2.1.4        tidyr_1.3.0        tibble_3.2.1       tidyverse_2.0.0   
 [9] tidytuesdayR_1.0.2 rmarkdown_2.22     mosaic_1.8.4.2     mosaicData_0.20.3 
[13] ggformula_0.10.4   dplyr_1.1.2        Matrix_1.5-4       ggplot2_3.4.3     
[17] esquisse_1.1.2     dcData_0.1.0       COUNT_1.3.4        sandwich_3.0-2    
[21] msme_0.5.3         lattice_0.21-8     MASS_7.3-58.4     

loaded via a namespace (and not attached):
 [1] writexl_1.4.2       tidyselect_1.2.0    IRdisplay_1.1      
 [4] farver_2.1.1        fastmap_1.1.1       tweenr_2.0.2       
 [7] promises_1.2.0.1    labelled_2.11.0     digest_0.6.31      
[10] timechange_0.2.0    mime_0.12           lifecycle_1.0.3    
[13] ellipsis_0.3.2      magrittr_2.0.3      compiler_4.3.0     
[16] rlang_1.1.1         sass_0.4.6          tools_4.3.0        
[19] utf8_1.2.3          data.table_1.14.8   knitr_1.43         
[22] htmlwidgets_1.6.2   curl_5.0.2          shinybusy_0.3.1    
[25] ggstance_0.3.6      xml2_1.3.4          repr_1.1.6         
[28] pbdZMQ_0.3-9        foreign_0.8-84      withr_2.5.0        
[31] shinyWidgets_0.7.6  grid_4.3.0          polyclip_1.10-4    
[34] mosaicCore_0.9.2.1  fansi_1.0.4         xtable_1.8-4       
[37] colorspace_2.1-0    scales_1.2.1        ggridges_0.5.4     
[40] cli_3.6.1           crayon_1.5.2        datamods_1.4.0     
[43] generics_0.1.3      rstudioapi_0.15.0   tzdb_0.4.0         
[46] httr_1.4.6          readxl_1.4.2        cachem_1.0.8       
[49] ggforce_0.4.1       rvest_1.0.3         cellranger_1.1.0   
[52] base64enc_0.1-3     vctrs_0.6.3         jsonlite_1.8.5     
[55] hms_1.1.3           jquerylib_0.1.4     rio_0.5.29         
[58] glue_1.6.2          stringi_1.7.12      gtable_0.3.3       
[61] later_1.3.1         munsell_0.5.0       pillar_1.9.0       
[64] htmltools_0.5.5     IRkernel_1.3.2      reactable_0.4.4    
[67] R6_2.5.1            evaluate_0.21       shiny_1.7.4        
[70] haven_2.5.2         openxlsx_4.2.5.2    httpuv_1.6.11      
[73] bslib_0.5.0         phosphoricons_0.1.2 Rcpp_1.0.10        
[76] zip_2.3.0           uuid_1.1-0          xfun_0.39          
[79] fs_1.6.2            usethis_2.1.6       zoo_1.8-12         
[82] pkgconfig_2.0.3    

glyph (mark, symbol) - the basic graphical unit, often corresponding to a case

  • e.g., scatter, density, bar, etc.

aesthetic - a visual property of a glyph (e.g., position, size, shape, color, etc.)

  • may be mapped, based on the data values (e.g., sex -> color)

  • may be set, or fixed to arbitrary non-data values (color=blue)

scale - a mapping that translates data values to aesthetics

frame - the position scale, describing how data are mapped to the coordinate system

  • What are the axis limits?

  • What kind of scale?: linear, logarithmic, etc.

guide - the legend, helpful for the reader to translage aesthetics back to data values

  • axis ticks, axis labels; legend; labels

facet

layer

stat

data('iris')
iris.glyphready <- iris %>%
  rename(x=Sepal.Length,y=Petal.Length,color=Species)
head(iris.glyphready)
ggplot(data=iris.glyphready,aes(x=x,y=y,color=as.factor(color))) +
  geom_point() +
  geom_smooth(se=TRUE)
ggplot(data=iris,aes(x=Petal.Width)) +
  geom_histogram(bins=15)
A data.frame: 6 × 5
xSepal.WidthyPetal.Widthcolor
<dbl><dbl><dbl><dbl><fct>
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
../../../../_images/9d71c841b8bbb1db52f3d7d2e10149e3ac23b69d7cf038e9fc06c55826c2a066.png ../../../../_images/fa84db37bd9ecde144974912b57d9e051177edd385e5fedb1159c994edff5f9e.png

A02#

data('diamonds',package='ggplot2')
diamonds
A tibble: 53940 × 10
caratcutcolorclaritydepthtablepricexyz
<dbl><ord><ord><ord><dbl><dbl><int><dbl><dbl><dbl>
0.23Ideal ESI2 61.5553263.953.982.43
0.21Premium ESI1 59.8613263.893.842.31
0.23Good EVS1 56.9653274.054.072.31
0.29Premium IVS2 62.4583344.204.232.63
0.31Good JSI2 63.3583354.344.352.75
0.24Very GoodJVVS262.8573363.943.962.48
0.24Very GoodIVVS162.3573363.953.982.47
0.26Very GoodHSI1 61.9553374.074.112.53
0.22Fair EVS2 65.1613373.873.782.49
0.23Very GoodHVS1 59.4613384.004.052.39
0.30Good JSI1 64.0553394.254.282.73
0.23Ideal JVS1 62.8563403.933.902.46
0.22Premium FSI1 60.4613423.883.842.33
0.31Ideal JSI2 62.2543444.354.372.71
0.20Premium ESI2 60.2623453.793.752.27
0.32Premium EI1 60.9583454.384.422.68
0.30Ideal ISI2 62.0543484.314.342.68
0.30Good JSI1 63.4543514.234.292.70
0.30Good JSI1 63.8563514.234.262.71
0.30Very GoodJSI1 62.7593514.214.272.66
0.30Good ISI2 63.3563514.264.302.71
0.23Very GoodEVS2 63.8553523.853.922.48
0.23Very GoodHVS1 61.0573533.943.962.41
0.31Very GoodJSI1 59.4623534.394.432.62
0.31Very GoodJSI1 58.1623534.444.472.59
0.23Very GoodGVVS260.4583543.974.012.41
0.24Premium IVS1 62.5573553.973.942.47
0.30Very GoodJVS2 62.2573574.284.302.67
0.23Very GoodDVS2 60.5613573.963.972.40
0.23Very GoodFVS1 60.9573573.963.992.42
0.70Premium ESI1 60.55827535.745.773.48
0.57Premium EIF 59.86027535.435.383.23
0.61Premium FVVS161.85927535.485.403.36
0.80Good GVS2 64.25827535.845.813.74
0.84Good IVS1 63.75927535.945.903.77
0.77Ideal ESI2 62.15627535.845.863.63
0.74Good DSI1 63.15927535.715.743.61
0.90Very GoodJSI1 63.26027536.126.093.86
0.76Premium IVS1 59.36227535.935.853.49
0.76Ideal IVVS162.25527535.895.873.66
0.70Very GoodEVS2 62.46027555.575.613.49
0.70Very GoodEVS2 62.86027555.595.653.53
0.70Very GoodDVS1 63.15927555.675.583.55
0.73Ideal IVS2 61.35627565.805.843.57
0.73Ideal IVS2 61.65527565.825.843.59
0.79Ideal ISI1 61.65627565.955.973.67
0.71Ideal ESI1 61.95627565.715.733.54
0.79Good FSI1 58.15927566.066.133.54
0.79Premium ESI2 61.45827566.035.963.68
0.71Ideal GVS1 61.45627565.765.733.53
0.71Premium ESI1 60.55527565.795.743.49
0.71Premium FSI1 59.86227565.745.733.43
0.70Very GoodEVS2 60.55927575.715.763.47
0.70Very GoodEVS2 61.25927575.695.723.49
0.72Premium DSI1 62.75927575.695.733.58
0.72Ideal DSI1 60.85727575.755.763.50
0.72Good DSI1 63.15527575.695.753.61
0.70Very GoodDSI1 62.86027575.665.683.56
0.86Premium HSI2 61.05827576.156.123.74
0.75Ideal DSI2 62.25527575.835.873.64
?diamonds
options(repr.plot.width=20,repr.plot.height=30)

plt <- ggplot(data=diamonds) +
  geom_point(aes(x=carat,
                 y=price,
                 color=color,
                 #shape=clarity,
                 size=depth,
                 alpha=table)) +
  facet_wrap(clarity~cut,nrow=8) +
  theme(text=element_text(size=20))
plt

#suppressWarnings(print(plt))
../../../../_images/583d0ea8d71fce2f84c021001c494d3bed83712cfe66b7a2b6e1c329b2c26400.png
# https://www.kaggle.com/datasets/crawford/80-cereals
cereal <- read_csv('cereal.csv')
cereal
Rows: 77 Columns: 16
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): name, mfr, type
dbl (13): calories, protein, fat, sodium, fiber, carbo, sugars, potass, vita...
 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
A spec_tbl_df: 77 × 16
namemfrtypecaloriesproteinfatsodiumfibercarbosugarspotassvitaminsshelfweightcupsrating
<chr><chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
100% Bran NC 704113010.0 5.0 62802531.000.3368.40297
100% Natural Bran QC12035 15 2.0 8.0 8135 031.001.0033.98368
All-Bran KC 7041260 9.0 7.0 53202531.000.3359.42551
All-Bran with Extra Fiber KC 504014014.0 8.0 03302531.000.5093.70491
Almond Delight RC11022200 1.014.0 8 -12531.000.7534.38484
Apple Cinnamon Cheerios GC11022180 1.510.510 702511.000.7529.50954
Apple Jacks KC11020125 1.011.014 302521.001.0033.17409
Basic 4 GC13032210 2.018.0 81002531.330.7537.03856
Bran Chex RC 9021200 4.015.0 61252511.000.6749.12025
Bran Flakes PC 9030210 5.013.0 51902531.000.6753.31381
Cap'n'Crunch QC12012220 0.012.012 352521.000.7518.04285
Cheerios GC11062290 2.017.0 11052511.001.2550.76500
Cinnamon Toast Crunch GC12013210 0.013.0 9 452521.000.7519.82357
Clusters GC11032140 2.013.0 71052531.000.5040.40021
Cocoa Puffs GC11011180 0.012.013 552521.001.0022.73645
Corn Chex RC11020280 0.022.0 3 252511.001.0041.44502
Corn Flakes KC10020290 1.021.0 2 352511.001.0045.86332
Corn Pops KC11010 90 1.013.012 202521.001.0035.78279
Count Chocula GC11011180 0.012.013 652521.001.0022.39651
Cracklin' Oat Bran KC11033140 4.010.0 71602531.000.5040.44877
Cream of Wheat (Quick) NH10030 80 1.021.0 0 -1 021.001.0064.53382
Crispix KC11020220 1.021.0 3 302531.001.0046.89564
Crispy Wheat & Raisins GC10021140 2.011.0101202531.000.7536.17620
Double Chex RC10020190 1.018.0 5 802531.000.7544.33086
Froot Loops KC11021125 1.011.013 302521.001.0032.20758
Frosted Flakes KC11010200 1.014.011 252511.000.7531.43597
Frosted Mini-Wheats KC10030 0 3.014.0 71002521.000.8058.34514
Fruit & Fibre Dates; Walnuts; and OatsPC12032160 5.012.0102002531.250.6740.91705
Fruitful Bran KC12030240 5.014.0121902531.330.6741.01549
Fruity Pebbles PC11011135 0.013.012 252521.000.7528.02576
Multi-Grain Cheerios GC100212202.015.0 6 90 2511.001.0040.10596
Nut&Honey Crunch KC120211900.015.0 9 40 2521.000.6729.92429
Nutri-Grain Almond-RaisinKC140322203.021.0 7130 2531.330.6740.69232
Nutri-grain Wheat KC 90301703.018.0 2 90 2531.001.0059.64284
Oatmeal Raisin Crisp GC130321701.513.510120 2531.250.5030.45084
Post Nat. Raisin Bran PC120312006.011.014260 2531.330.6737.84059
Product 19 KC100303201.020.0 3 4510031.001.0041.50354
Puffed Rice QC 5010 00.013.0 0 15 030.501.0060.75611
Puffed Wheat QC 5020 01.010.0 0 50 030.501.0063.00565
Quaker Oat Squares QC100411352.014.0 6110 2531.000.5049.51187
Quaker Oatmeal QH10052 02.7-1.0-1110 011.000.6750.82839
Raisin Bran KC120312105.014.012240 2521.330.7539.25920
Raisin Nut Bran GC100321402.510.5 8140 2531.000.5039.70340
Raisin Squares KC 9020 02.015.0 6110 2531.000.5055.33314
Rice Chex RC110102400.023.0 2 30 2511.001.1341.99893
Rice Krispies KC110202900.022.0 3 35 2511.001.0040.56016
Shredded Wheat NC 8020 03.016.0 0 95 010.831.0068.23588
Shredded Wheat 'n'Bran NC 9030 04.019.0 0140 011.000.6774.47295
Shredded Wheat spoon sizeNC 9030 03.020.0 0120 011.000.6772.80179
Smacks KC11021 701.0 9.015 40 2521.000.7531.23005
Special K KC110602301.016.0 3 55 2511.001.0053.13132
Strawberry Fruit Wheats NC 9020 153.015.0 5 90 2521.001.0059.36399
Total Corn Flakes GC110212000.021.0 3 3510031.001.0038.83975
Total Raisin Bran GC140311904.015.01423010031.501.0028.59278
Total Whole Grain GC100312003.016.0 311010031.001.0046.65884
Triples GC110212500.021.0 3 60 2531.000.7539.10617
Trix GC110111400.013.012 25 2521.001.0027.75330
Wheat Chex RC100312303.017.0 3115 2511.000.6749.78744
Wheaties GC100312003.017.0 3110 2511.001.0051.59219
Wheaties Honey Gold GC110212001.016.0 8 60 2511.000.7536.18756
cereal %>%
  count(mfr) %>%
    arrange(desc(n))
cereal %>%
  count(type) %>%
    arrange(desc(n))
cereal %>%
  count(calories) %>%
    arrange(desc(n))
cereal %>%
  count(protein) %>%
    arrange(desc(n))
cereal %>%
  count(fat) %>%
    arrange(desc(n))
A spec_tbl_df: 7 × 2
mfrn
<chr><int>
K23
G22
P 9
Q 8
R 8
N 6
A 1
A spec_tbl_df: 2 × 2
typen
<chr><int>
C74
H 3
A spec_tbl_df: 11 × 2
caloriesn
<dbl><int>
11029
10017
12010
90 7
50 3
140 3
70 2
130 2
150 2
80 1
160 1
A spec_tbl_df: 6 × 2
proteinn
<dbl><int>
328
225
113
4 8
6 2
5 1
A spec_tbl_df: 5 × 2
fatn
<dbl><int>
130
027
214
3 5
5 1
options(repr.plot.width=20,repr.plot.height=10)

plt <- ggplot(data=cereal) +
  geom_point(aes(x=sugars,
                 y=rating,
                 #color=color,
                 shape=type)) +
                 #size=depth,
                 #alpha=table)) +
  facet_wrap(~mfr,nrow=2) +
  theme(text=element_text(size=30))
plt

#suppressWarnings(print(plt))
../../../../_images/3fda5bc9386f90ae23bb41d30601ea1cb8f684641fff86902fe0b33462c360e5.png