class: center, middle, inverse, title-slide # R for Data Analysis ## Functions and Functinal Programming ### Ayush Patel ### 01-Aug-2021 --- layout: true --- name: Introduction class: left,middle .pull-left[ ## Find me [__@ayushbipinpatel__](https://twitter.com/ayushbipinpatel) <img src="" width=5%> [__@AyushBipinPatel__](https://github.com/AyushBipinPatel) <img src="" width=5%> [__ayushpatel.netlify.app__](https://ayushpatel.netlify.app/) <img src="" width=5%> [__ayush.ap58@gmail.com__](ayush.ap58@gmail.com)<img src="" width=5%> ] .pull-right[ <img src = "https://images.metmuseum.org/CRDImages/ad/original/57258.jpg"> .small[ Image: [John Biglin in a Single Scull by Thomas Eakins](https://images.metmuseum.org/CRDImages/ad/original/57258.jpg) ] ] --- class: left, middle .pull-left[ # Pre-requisite .big[You....] understand __different types of objects, how to create objects and assign values to objects__. <br> __how to access specific values within an object.__ <br> know __what a function is and how to use a function.__ <br> know __basics of data wrangling__ <br> ] .pull-right[ <img src = "https://images.metmuseum.org/CRDImages/ad/original/DT84.jpg"> .small[ [Image: Lake George by John Frederick Kensett ](https://images.metmuseum.org/CRDImages/ad/original/DT84.jpg) ] ] --- # Before we get to it.. .big[Continue in the Rmarkdown document you used in the last class or create a new one.] <br> Load the following libraries and data ```r library(tidyverse) ``` --- class: middle # What are we learning today + Benefits of writing functions<br> + The essentials of writing a function in R<br> + The essentials of functional programming in R<br> --- class: middle # So, Why to write Functions?? Consider the following situation. 1) You have data of temperature for three cities in India for the last 15 days.<br><br> 2) You need to report the mean, median and standard deviation for each city.<br><br> .big[What do you do??] --- ## Doing this three (many) timees is repetetive and can cause errors .yscroll[ .panelset[ .panel[.panel-name[The data] ``` ## [1] 7.457085 6.474358 9.798162 8.947500 10.915646 11.387389 8.959049 ## [8] 7.214760 11.728579 8.205420 7.438257 8.152542 14.299598 10.629734 ## [15] 14.545052 ``` ``` ## [1] 22.45574 36.69053 24.08926 28.79723 26.97196 35.62645 26.57931 25.40484 ## [9] 33.96014 15.97104 30.87243 44.05405 24.44423 18.64079 31.29159 ``` ``` ## [1] 31.72114 29.95897 29.95450 27.17400 28.99362 33.30066 32.76496 31.50636 ## [9] 35.83318 28.20373 29.36862 30.94485 26.66012 27.88152 28.84280 ``` And many more..... ] .panel[.panel-name[The code] ```r # Srinagar mean_srinagar <- mean(srinagar_temp,na.rm = T) median_srinagar <- median(srinagar_temp,na.rm = T) sd_srinagar <- sd(srinagar_temp,na.rm = T) # city 2 mean_mumbai <- mean(mumbai_temp,na.rm = T) median_mumbai <- median(mumbai_temp,na.rm = T) sd_srinagar <- sd(mumbai_temp,na.rm = T) # city 3 mean_kochi <- mean(kochi_temp,na.rm = T) median_kochi <- median(kochi_temp,na.rm = T) sd_kochi <- sd(kochi_temp,na.rm = T) ``` ] .panel[.panel-name[Making it easier with a function] ```r city_temp_stats <- function(data_city){ list(city_mean = mean(data_city,na.rm = T), city_median =median(data_city,na.rm = T), city_sd =sd(data_city,na.rm = T) ) } ``` ```r city_temp_stats(srinagar_temp) ``` ``` ## $city_mean ## [1] 9.743542 ## ## $city_median ## [1] 8.959049 ## ## $city_sd ## [1] 2.484058 ``` ```r city_temp_stats(mumbai_temp) ``` ``` ## $city_mean ## [1] 28.38997 ## ## $city_median ## [1] 26.97196 ## ## $city_sd ## [1] 7.295598 ``` ```r city_temp_stats(kochi_temp) ``` ``` ## $city_mean ## [1] 30.20727 ## ## $city_median ## [1] 29.9545 ## ## $city_sd ## [1] 2.502214 ``` ] .panel[.panel-name[Even Better] We will discuss this at the end but... ```r list(Srinagar = srinagar_temp, Mumbai = mumbai_temp, Kochi = kochi_temp) -> records_cities map(records_cities,city_temp_stats) ``` ``` ## $Srinagar ## $Srinagar$city_mean ## [1] 9.743542 ## ## $Srinagar$city_median ## [1] 8.959049 ## ## $Srinagar$city_sd ## [1] 2.484058 ## ## ## $Mumbai ## $Mumbai$city_mean ## [1] 28.38997 ## ## $Mumbai$city_median ## [1] 26.97196 ## ## $Mumbai$city_sd ## [1] 7.295598 ## ## ## $Kochi ## $Kochi$city_mean ## [1] 30.20727 ## ## $Kochi$city_median ## [1] 29.9545 ## ## $Kochi$city_sd ## [1] 2.502214 ``` ] ] ] --- class: middle # Why use fucntions: + Names of the functions increase code readability <br><br> + Changes are easier to handle <br><br> + Possibilities of errors due to copy pasting are eliminated <br><br> --- class: middle # When to write a function? .big[Thumb rule from R4Ds:] >"If you copy paste a code more than twice, write a funciton." --- class: middle # Writing a good function - guidelines + The .big[name] is important - "Choose wisely"<br><br> + Start with code and then write the function - .big[Convert working code to function]<br><br> + How many inputs are needed for these operations/code<br><br> --- class: middle # The syntax ```r name_function <- function(arg1, arg2 = default_val){ Your operations/code go here } ``` --- class: middle # (...) The Dot-Dot-Dot argument + Sometimes the functions that we write are wrapper over other functions.<br><br> + ... allows us to provide arguments to these wrapped functions without being upfront<br><br> + Allows to provide arbitrary number of arguments<br><br> --- class: middle # Lets explore ... ```r learning_about_3dots <- function(...){ list(...) } learning_about_3dots(na.rm = T,fill = "red") ``` ``` ## $na.rm ## [1] TRUE ## ## $fill ## [1] "red" ``` --- # Little more exploration .yscroll[ ```r explore_dot <- function(x,...){ list(...) list( x, mean(c(1,2,3,5,2,5,NA)), mean(c(1,2,3,5,2,5,NA), ...), str_c("fefa","fefg", sep = list(...)[["sep"]]) ) } explore_dot("Dot-Dot-Dot", na.rm = T,sep = "|") ``` ``` ## [[1]] ## [1] "Dot-Dot-Dot" ## ## [[2]] ## [1] NA ## ## [[3]] ## [1] 3 ## ## [[4]] ## [1] "fefa|fefg" ``` ] --- # Return Values .big[The last operation is returned by default is not return is specified] .pull-left[ ```r show_return <- function(){ 2+2 7*5 } show_return() ``` ``` ## [1] 35 ``` ] .pull-right[ ```r show_return1 <- function(){ 2+2 -> a 7*5 -> b return(list(a,b)) } show_return1() ``` ``` ## [[1]] ## [1] 4 ## ## [[2]] ## [1] 35 ``` ] --- # Pipe-able Functions Two Types of pipe-able functions: .big[transformations and side-effects] .yscroll[ ```r show_missings <- function(df) { n <- sum(is.na(df)) cat("Missing values: ", n, "\n", sep = "") invisible(df) } show_missings(mtcars) ``` ``` ## Missing values: 0 ``` ```r x <- show_missings(mtcars) ``` ``` ## Missing values: 0 ``` ```r class(x) ``` ``` ## [1] "data.frame" ``` ```r dim(x) ``` ``` ## [1] 32 11 ``` ```r mtcars %>% show_missings() %>% mutate(mpg = ifelse(mpg < 20, NA, mpg)) %>% show_missings() ``` ``` ## Missing values: 0 ## Missing values: 18 ``` .small[code example from R4DS] ] --- class: middle # Going functional --- class: middle # Why Functionals over Loops + R is a functional programming languange<br> + Less amount of code<br> + Reduced chances of bugs <br> --- class: middle # purrr ``` " How can you solve the problem for a single element of the list? Once you’ve solved that problem, purrr takes care of generalising your solution to every element in the list. If you’re solving a complex problem, how can you break it down into bite-sized pieces that allow you to advance one small step towards a solution? With purrr, you get lots of small pieces that you can compose together with the pipe. This structure makes it easier to solve new problems. It also makes it easier to understand your solutions to old problems when you re-read your old code. " ``` .small[from R4DS] --- class: middle # How are we going to do this? + Look at working with lists - I Know. Yes, again<br><br> + Apply functions on lists,vecs<br><br> + Modify function behaviour<br><br> --- # Working with Lists .yscroll[ .panelset[ .panel[.panel-name[Filter Lists] .big[Selecting By name or index] ```r ex_list <- list( a = c(1,2,5,3,5,2,3), names = c("yellow","red"), b = c(T,F,F,F,T) ) ex_list ``` ``` ## $a ## [1] 1 2 5 3 5 2 3 ## ## $names ## [1] "yellow" "red" ## ## $b ## [1] TRUE FALSE FALSE FALSE TRUE ``` ```r pluck(ex_list,"a") ``` ``` ## [1] 1 2 5 3 5 2 3 ``` .big[Selecting elements that pass a test] ```r keep(ex_list,is.character) ``` ``` ## $names ## [1] "yellow" "red" ``` .big[Discard elements that pass a test] ```r discard(ex_list,is.character) ``` ``` ## $a ## [1] 1 2 5 3 5 2 3 ## ## $b ## [1] TRUE FALSE FALSE FALSE TRUE ``` ] .panel[.panel-name[Summarise Lists] .big[Do all elements pass a test] ```r every(ex_list, is.logical) ``` ``` ## [1] FALSE ``` .big[Do some elements pass a test] ```r some(ex_list, is.logical) ``` ``` ## [1] TRUE ``` .big[Does the list contain an element] ```r has_element(ex_list, c("yellow","red")) ``` ``` ## [1] TRUE ``` ] ] ] --- # Applying Functions .yscroll[ ```r list_df <- list(penguins,mtcars,starwars) map(list_df,dim) ``` ``` ## [[1]] ## [1] 344 8 ## ## [[2]] ## [1] 32 11 ## ## [[3]] ## [1] 87 14 ``` .big[Making some modification] ```r custom_dim <- function(df){ dimension <- dim(df) return(paste("There are", dimension[1], "rows and", dimension[2],"columns",sep = " ")) } map(list_df,custom_dim) ``` ``` ## [[1]] ## [1] "There are 344 rows and 8 columns" ## ## [[2]] ## [1] "There are 32 rows and 11 columns" ## ## [[3]] ## [1] "There are 87 rows and 14 columns" ``` ```r map_chr(list_df,custom_dim) ``` ``` ## [1] "There are 344 rows and 8 columns" "There are 32 rows and 11 columns" ## [3] "There are 87 rows and 14 columns" ``` ] --- # Two or more arguments in a function? .yscroll[ ```r gen_randon_numbers <- function(avg,std_dev,num_obs){ rnorm(n = num_obs,mean = avg,sd = std_dev) } tibble( inp_avg = c(25,35,65), inp_std_dev = c(2,1,3), inp_num_obs = c(10,20,30) ) -> inputs_gen pmap(.l = list(num_obs = inputs_gen$inp_num_obs, std_dev = inputs_gen$inp_std_dev, avg = inputs_gen$inp_avg),.f = gen_randon_numbers) ``` ``` ## [[1]] ## [1] 25.14703 26.51466 18.81805 22.37440 27.87782 19.69114 24.56253 24.01781 ## [9] 28.20675 24.29473 ## ## [[2]] ## [1] 34.80677 35.42030 35.77739 33.66238 33.63052 34.14867 34.10385 35.27784 ## [9] 35.57692 35.30941 34.90019 33.33757 35.78944 36.49809 36.04121 34.41404 ## [17] 34.59161 34.18025 35.37385 35.34764 ## ## [[3]] ## [1] 64.07813 70.04473 63.75626 65.29929 62.07018 63.08618 67.28970 64.41151 ## [9] 64.35564 69.57638 68.55952 65.57495 66.02470 64.13050 67.80675 61.25854 ## [17] 66.28553 66.52511 70.72082 62.57992 57.95174 61.89326 67.83661 62.61786 ## [25] 67.68941 70.48677 67.12381 66.00284 64.68834 67.52739 ``` ] --- # What happens when there is an error .yscroll[ .panelset[ .panel[.panel-name[What if there is error while running] ```r ex_list ``` ``` ## $a ## [1] 1 2 5 3 5 2 3 ## ## $names ## [1] "yellow" "red" ## ## $b ## [1] TRUE FALSE FALSE FALSE TRUE ``` ```r map(ex_list,sum,na.rm =T) ``` ``` ## Error in .Primitive("sum")(..., na.rm = na.rm): invalid 'type' (character) of argument ``` ] .panel[.panel-name[Approach safely] ```r map(ex_list,safely(sum),na.rm =T) ``` ``` ## $a ## $a$result ## [1] 21 ## ## $a$error ## NULL ## ## ## $names ## $names$result ## NULL ## ## $names$error ## <simpleError in .Primitive("sum")(..., na.rm = na.rm): invalid 'type' (character) of argument> ## ## ## $b ## $b$result ## [1] 2 ## ## $b$error ## NULL ``` ] ] ] --- # Activity 1 Copy the following code ```r data_for_purr <- list( a = c(1,52,2,3,58,2,6,2,NA), b = c("Yellow","green","red"), c = c(1,0,12,5,1,3,58,3,6) ) ``` Write a function that gives the of the unique values, length and number of missing values for a vector. Apply this function on the object `data_for_purrr`
10
:
00
--- class: center, middle background-image: url("images/background2.jpg") background-size: cover