class: center, middle, inverse, title-slide # R for Data Analysis ## Data Visualization ### Ayush Patel ### 28-Jul-2021 --- layout: true --- name: Introduction class: left,middle .pull-left[ ## Find me [__@ayushbipinpatel__](https://twitter.com/ayushbipinpatel) <img src="" width=5%> [__@AyushBipinPatel__](https://github.com/AyushBipinPatel) <img src="" width=5%> [__ayushpatel.netlify.app__](https://ayushpatel.netlify.app/) <img src="" width=5%> [__ayush.ap58@gmail.com__](ayush.ap58@gmail.com)<img src="" width=5%> ] .pull-right[ <img src = "https://images.metmuseum.org/CRDImages/ad/original/57258.jpg"> .small[ Image: [John Biglin in a Single Scull by Thomas Eakins](https://images.metmuseum.org/CRDImages/ad/original/57258.jpg) ] ] --- class: left, middle .pull-left[ # Pre-requisite .big[You....] understand __different types of objects, how to create objects and assign values to objects__. <br> __how to access specific values within an object.__ <br> know __what a function is and how to use a function.__ <br> know __basics of data wrangling__ <br> ] .pull-right[ <img src = "https://images.metmuseum.org/CRDImages/ad/original/DT84.jpg"> .small[ [Image: Lake George by John Frederick Kensett ](https://images.metmuseum.org/CRDImages/ad/original/DT84.jpg) ] ] --- # Before we get to it.. .big[Continue in the Rmarkdown document you used in the last class or create a new one.] <br> Load the following libraries and data ```r library(tidyverse) library(palmerpenguins) library(nycflights13) read_csv("https://raw.githubusercontent.com/AyushBipinPatel/R-for-Data-Analysis/main/clean_ccs.csv") -> data_ccs ``` --- class: top,center # ggplot2 Package .left-column[ <img src = "https://raw.githubusercontent.com/rstudio/hex-stickers/master/thumbs/ggplot2.png" width = "350px"> <br> [ggplot2 logo from RStudio hexstickers repo](https://raw.githubusercontent.com/rstudio/hex-stickers/master/thumbs/ggplot2.png) ] .column-right[ <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/ggplot2_exploratory.png" width="350px"/> <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/ggplot2_masterpiece.png" width="350px"/> <br> [Art by @allisonhorst](https://github.com/allisonhorst) ] --- class: top, left # The Thought Process <img src = "images/concept_map_visualization.jpg" width = "1000px" height = "550px"> --- class: left, middle # What will you learn? .pull-left[ Types of geometries and when to use them<br><br> How to map variables in data to aesthetics of a plot<br><br> About the theme of a plot<br><br> Labeling the plot<br><br> ] .pull-right[ geom_*() (density, histogram, bar, point, jitter, col, boxplot)<br><br> The aes() (mapping variables to axes, shape, size, colour, fill and alpha of the geom)<br><br> Intro to available themes theme_*()<br><br> The labs() to add title, subtitle, axes labels and caption to the plot.<br><br> ] --- class: left count: false Components of a plot .panel1-comp_plot-auto[ ```r * ggplot(data = penguins) # the plot area and data ``` ] .panel2-comp_plot-auto[ <img src="data-viz_files/figure-html/comp_plot_auto_01_output-1.png" width="100%" /> ] --- count: false Components of a plot .panel1-comp_plot-auto[ ```r ggplot(data = penguins)+ # the plot area and data * geom_boxplot( * aes(species, body_mass_g, * fill = species), * alpha = 0.5 * ) # geom and aesthetic ``` ] .panel2-comp_plot-auto[ <img src="data-viz_files/figure-html/comp_plot_auto_02_output-1.png" width="100%" /> ] --- count: false Components of a plot .panel1-comp_plot-auto[ ```r ggplot(data = penguins)+ # the plot area and data geom_boxplot( aes(species, body_mass_g, fill = species), alpha = 0.5 )+ # geom and aesthetic * geom_jitter( * aes(species,body_mass_g, * colour = species), * alpha = 0.5 * ) # another layer with aesthetics ``` ] .panel2-comp_plot-auto[ <img src="data-viz_files/figure-html/comp_plot_auto_03_output-1.png" width="100%" /> ] --- count: false Components of a plot .panel1-comp_plot-auto[ ```r ggplot(data = penguins)+ # the plot area and data geom_boxplot( aes(species, body_mass_g, fill = species), alpha = 0.5 )+ # geom and aesthetic geom_jitter( aes(species,body_mass_g, colour = species), alpha = 0.5 )+ # another layer with aesthetics * theme_bw() # theme ``` ] .panel2-comp_plot-auto[ <img src="data-viz_files/figure-html/comp_plot_auto_04_output-1.png" width="100%" /> ] --- count: false Components of a plot .panel1-comp_plot-auto[ ```r ggplot(data = penguins)+ # the plot area and data geom_boxplot( aes(species, body_mass_g, fill = species), alpha = 0.5 )+ # geom and aesthetic geom_jitter( aes(species,body_mass_g, colour = species), alpha = 0.5 )+ # another layer with aesthetics theme_bw()+ # theme * labs( * x = "Species", * y = "Body mass in grams", * title = "Distribution of Mass of penguins" * ) # labels ``` ] .panel2-comp_plot-auto[ <img src="data-viz_files/figure-html/comp_plot_auto_05_output-1.png" width="100%" /> ] <style> .panel1-comp_plot-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-comp_plot-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-comp_plot-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ??? explain each layer and aes mapping --- class: left,top # Plotting one continuous variable .pull-left[ <code class ='r hljs remark-code'><span style='background-color:pink'>ggplot</span>(<span style='color:red'>data</span> = <span style='color:CornflowerBlue'>penguins</span>)<span style='background-color:#ffff7f'>+</span><br> <span style='background-color:pink'>geom_density</span>(<span style='background-color:pink'>aes</span>(<span style='color:red'>x</span> = <span style='color:CornflowerBlue'>body_mass_g</span>))</code> <img src="data-viz_files/figure-html/unnamed-chunk-1-1.png" width="100%" /> ] .pull-right[ <code class ='r hljs remark-code'><span style='background-color:pink'>ggplot</span>(<span style='color:red'>data</span> = <span style='color:CornflowerBlue'>penguins</span>)<span style='background-color:#ffff7f'>+</span><br> <span style='background-color:pink'>geom_histogram</span>(<span style='background-color:pink'>aes</span>(<span style='color:red'>x</span> = <span style='color:CornflowerBlue'>body_mass_g</span>))</code> <img src="data-viz_files/figure-html/unnamed-chunk-2-1.png" width="100%" /> ] --- # Activity 1 From the `penguins` data: + Choose a continious variable of your interest + Write code to create a histogram for that variable + Write code to create a density plot for that variable
05
:
00
--- class: left, top # Plotting one discrete variable <code class ='r hljs remark-code'>ggplot(data = penguins)<span style='background-color:#ffff7f'>+</span><br> <span style='background-color:pink'>geom_bar</span>(<span style='background-color:pink'>aes</span>(<span style='color:CornflowerBlue'>x = island</span>))</code> <img src="data-viz_files/figure-html/unnamed-chunk-3-1.png" width="100%" /> --- # Activity 2 Use the `data_ccs` (this is the [RBI's consumer confidence survey](https://dbie.rbi.org.in/DBIE/dbie.rbi?site=unitLevelData) (22nd round) 2012-2014. This is a subset of the survey data.) Write code to create a bar chart for any two discrete variables of your liking. You could choose from: + city + age + gender + occupation_of_respondent + annual_income + education_qualification .....
05
:
00
--- class: left, top # Plotting two continuous variables <code class ='r hljs remark-code'>ggplot(data = penguins)+ <br> <span style='background-color:pink'>geom_point</span>(<span style='background-color:pink'>aes</span>(<span style='color:CornflowerBlue'>x = bill_length_mm</span>,<span style='color:CornflowerBlue'>y = bill_depth_mm</span>))</code> <img src="data-viz_files/figure-html/unnamed-chunk-4-1.png" width="100%" /> --- # Activity 3 View the `penguins` data Write code to plot two continuous variables of you choice and interest.
03
:
00
--- class: left, top ## Plotting one continuous and one discrete variable .yscroll[ .pull-left[ <code class ='r hljs remark-code'>penguins %>% <br> group_by(species) %>% <br> summarise(<br> avg_body_mass = mean(body_mass_g,na.rm = T)<br> ) %>% <br> ggplot() <span style='background-color:#ffff7f'>+</span><br> <span style='background-color:pink'>geom_col</span>(<span style='background-color:pink'>aes</span>(<span style='color:CornflowerBlue'>species,avg_body_mass</span>))</code> <img src="data-viz_files/figure-html/unnamed-chunk-5-1.png" width="100%" /> ] .pull-right[ <code class ='r hljs remark-code'>penguins %>% <br> ggplot() <span style='background-color:#ffff7f'>+</span><br> <span style='background-color:pink'>geom_boxplot</span>(<span style='background-color:pink'>aes</span>(<span style='color:CornflowerBlue'>species,body_mass_g</span>))</code> <img src="data-viz_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> ] ] --- # Activity 4 Use the `flights` data Write code to generate following plots: + A column chart that show the average departure delays from various origin airports + A box plot that shows distributions of departure delays at various origin airports.
05
:
00
--- class: left, top # Aesthetics of geoms .yscroll[ .panelset[ .panel[.panel-name[__fill__] ```r penguins %>% ggplot()+ geom_boxplot(aes(species, body_mass_g, fill = sex)) ``` <img src="data-viz_files/figure-html/fill_ex-1.png" width="100%" /> ] .panel[.panel-name[__colour__] ```r penguins %>% ggplot()+ geom_point(aes(bill_length_mm, bill_depth_mm, colour = species)) ``` <img src="data-viz_files/figure-html/col_ex-1.png" width="100%" /> ] .panel[.panel-name[__shape__] ```r penguins %>% ggplot()+ geom_point(aes(bill_length_mm, bill_depth_mm, shape = species)) ``` <img src="data-viz_files/figure-html/shp_ex-1.png" width="100%" /> ] .panel[.panel-name[__alpha__] ```r penguins %>% ggplot()+ geom_point(aes(bill_length_mm, bill_depth_mm, colour = species), alpha = 0.5) ``` <img src="data-viz_files/figure-html/alp_ex-1.png" width="100%" /> ] .panel[.panel-name[__size__] ```r penguins %>% ggplot()+ geom_point(aes(bill_length_mm, bill_depth_mm, size = species), alpha = 0.5) ``` <img src="data-viz_files/figure-html/siz_ex-1.png" width="100%" /> ] ] ] --- # Activity 5 Replicate these two charts. Chart on the left using `mtcars` and chart on the right using `penguins`. .pull-left[ <img src = "images/replicate1.png" width = "1000px" height = "400px"> ] .pull-right[ <img src = "images/replicate2.png" width = "1000px" height = "400px"> ]
12
:
00
--- class: left, middle # Plot Labels .pull-left[ ``` ..... ..... ## Your awesome plot code ..... + labs( title = "Chart title", subtitle = "Chart Subtitle", x = "x-axis title", y = "y-axis title", caption = "Chart caption" ) ``` ] .pull-right[ <img src="data-viz_files/figure-html/labs_ex-1.png" width="100%" /> ] --- # Activity 6 You have made some good charts during this session. .big[YAY! great job] Pick two charts you like the most, and label these appropriately. --- class: center, middle background-image: url("images/background2.jpg") background-size: cover