layout: true --- <!-- name: cover --> <!-- class: no-footer, inverse -->
purrr
beyond
map()
(fun)ctional programming in R
result <- purrr::modify(.x =
, .f =
)
Hendrik van Broekhuizen
Predictive Insights
2020-03-07
@hendrikvanb
hendrik@predictiveinsights.net
--- class: title-subtitle # The obligatory preamble .fancy.subtitle[Making sure we are all on the same page] <br/> .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Disclaimer**] - _`purrr` fantatic_ ¬□ _`purrr` expert_ - 15min ≠ enough time ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Admissions**] - I'm an _extreme centrist_ w.r.t. `tidyverse` and `data.table` - I love to ` %>% ` ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Setup**] - Working in RStudio in an `.Rproj` context - Using same set of packages throughout - Using the `mpg` dataset (`ggplot2`) throughout ```r library(data.table) library(tidyverse) ``` ] ] --- name: nutshell class: title-subtitle # The `purrr` package .fancy.subtitle[What is it and why should I care?] .autogrid-left[ <div style="font-size:5em; margin-right: -0.5em; width: 1em; text-align:center; justify-items: center;">
</div> .quotation[ > A complete and consistent functional programming toolkit for R <br/>- `help(purrr)`<br/><br/> > ... to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R <br>- [purrr 0.1.0](https://blog.rstudio.com/2015/09/29/purrr-0-1-0/) ] ] -- .accent.negspace[**`map()` is the posterchild, but Narnia lies beyond**] - Other functions get less press - Terse official documentation - lack of package vignettes - few "_deep dive_ tutorials and resources online .accent.negspace[**Things get good when you dive in**] - `purrr` offers one of the highest rates of return on investment for any R package --- <!-- name: use-cases --> class: title-subtitle # `purrr`: what is it good for? .fancy.subtitle[Absolutely .strikeout[nothing] .strikeout[everything] lots of stuff] <br/> .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Iterative tasks**] - `lapply`++ - more consistent, more general, more powerful ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Working with lists**] - Yes, even complex, nested lists - It's lists, all the way down ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Creating consistent, robust functions/routines**] - Consistent syntax - Fail loudly - Nice error handling ] ] --- class: title-subtitle # Some tips .fancy.subtitle[Useful things to keep in mind when using purrr] <br/> .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**When not to use `purrr`**] - `lapply()` is the base equivalent to `map()` (sans `purrr` _helpers_ support) - if you’re only using `map()` from purrr, you can skip the additional dependency and use `lapply()` directly - there is no need to map if the operation is already appropriately vectorised .accent.negspace[**Avoiding nasty surprises**] - `map*()` and `modify()` functions always return output of the same length as the input ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Never forget**] - a data frame is simply a list of [consistently typed] vectors of equal length ] ] --- name: Map primer class: inverse, center, middle, no-footer, fancy-title # A `map()` primer <div class="image" style="margin-top: 2em"><img src="../img/map.svg" width="30%" /></div> --- class: title-subtitle # `map()` .fancy.subtitle[Apply to all] .fitgrid[ .definition[ > .large[`map(.x, .f)`] <br> call function `.f` once for each element of vector `.x`; return the result as a list ] <div class="image" style="text-align: center;"><img src="../img/map.svg" height="110px" /></div> ] -- .accent.question[] Get the square of each number from 1 to 5 ```r # function to get square of number my_square <- function(x) x^2 # get square of each number 1:5 and output as list res1 <- 1:5 %>% map(my_square) # direct call res2 <- 1:5 %>% map(~my_square(.)) # for backward compatibility res3 <- 1:5 %>% map(~my_square(.x)) # formula res4 <- 1:5 %>% map(function(x) my_square(x)) # inline anonymous function # test equivalence identical(res1, res2) & identical(res2, res3) & identical(res3, res4) ``` ``` [1] TRUE ``` --- class: title-subtitle # Passing arguments with `...` .fancy.subtitle[Many ways to do to the same thing] .fitgrid[ .definition[ > .large[`map(.x, .f, ...)`] <br> passes arguments specified in ... along ] <div class="image" style="text-align: center;"><img src="../img/map-arg.svg" height="110px" /></div> ] -- .question[] Use `paste()` to add 'min' as suffix to each number from 1 to 5 ```r # pass arguments along spec1 <- 1:5 %>% map(paste, 'min') # formula specification (two variants) spec2 <- 1:5 %>% map(~paste(., 'min')) spec3 <- 1:5 %>% map(~paste(.x, 'min')) # inline anonymous function specification spec4 <- 1:5 %>% map(function(x) paste(x, 'min')) # test equivalence list(spec2, spec3, spec4) %>% map_lgl(identical, y = spec1) ``` ``` [1] TRUE TRUE TRUE ``` --- class: title-subtitle # Passing arguments: via `...` vs in function .fancy.subtitle[A seemingly subtle, yet important difference] .accent.negspace[**Not all that seems vecorised is...**] - `map()` is only vectorised over its first argument so arguments passed to `map()` after `.f` will be - passed along as is and - evaluated once .accent.negspace[**What is that supposed to mean?**] - Has implications if you pass arguments to function via `...` - errors if you pass vectors as arguments to functions that do not accept vectors as arguments - potentially wrong results even if arguments specified correctly -- .accent.example.negspace.pullup[] ```r # function that multiplies input (arg1) by specified constant (arg2) temp_func <- function(x, constant = 2) { glue::glue('{x} x {constant} = {x*constant}') } # method 1: pass parameterised arg2 directly to map_chr 1:5 %>% map_chr(temp_func, constant = sample(1:10, 1)) # method 2: pass parameterised arg2 into inline anonymous function 1:5 %>% map_chr(function(x) temp_func(x, constant = sample(1:10, 1))) ``` .code-compact[ ``` [1] "1 x 2 = 2" "2 x 2 = 4" "3 x 2 = 6" "4 x 2 = 8" "5 x 2 = 10" [1] "1 x 8 = 8" "2 x 1 = 2" "3 x 7 = 21" "4 x 7 = 28" "5 x 7 = 35" ``` ] --- class: title-subtitle # `map_*()` .fancy.subtitle[Specifying the output format] .autogrid-right.top[ .definition[ > .large[<code>map_*(.x, .f, ...)</code>] <br> call function `.f` once for each element of vector `.x`; return the result as an atomic vector of type <code>*</code>; error if impossible ] - `map_chr(.x, .f)`: character - `map_lgl(.x, .f)`: logical - `map_dbl(.x, .f)`: real - `map_int(.x, .f)`: integer - `map_dfr(.x, .f)`: data frame (`bind_rows`) - `map_dfc(.x, .f)`: data frame (`bind_cols`) ] -- <div style = "margin-top: -2em;"></div> .accent.example.pullup[] .code-compact[ ```r 1:5 %>% map_chr(paste, 'min') %>% class() ``` ``` [1] "character" ``` ```r 1:5 %>% map_lgl(function(x) x < 3) %>% class() ``` ``` [1] "logical" ``` ```r 1:5 %>% map_int(function(x) x * 2L) %>% class() ``` ``` [1] "integer" ``` ```r 1:5 %>% map_dfr(function(x) tibble(value = x)) %>% class() ``` ``` [1] "tbl_df" "tbl" "data.frame" ``` ```r 1:5 %>% map_dfc(function(x) data.table(value = x)) %>% class() ``` ``` [1] "data.table" "data.frame" ``` ] --- name: Map variants class: inverse, center, middle, no-footer, fancy-title # Map variants <div class="image" style="margin-top: 2em"><img src="../img/variant_cat.svg" height=400px /></div> --- class: title-subtitle # `walk()` and `modify()` .fancy.subtitle[`map()` has siblings...] .fitgrid[ .definition[ > .large[<code>walk(.x, .f, ...)</code>] <br> call function `.f` once for each element of `.x`; return nothing ] .definition[ > .large[<code>modify(.x, .f, ...)</code>] <br> call function `.f` once for each element of `.x`; return the result as an object of the same type as `.x` ] ] -- .fitgrid.top[ .normal[ .accent.example[] .code-compact[ ```r # no output 1:5 %>% walk(paste, 'min') ``` ```r # output, but not what you might have expected 1:5 %>% walk(function(x) x ^ 2) %>% print() ``` ``` [1] 1 2 3 4 5 ``` ```r # proof that walk is actually doing stuff 1:5 %>% walk(function(x) print(x ^ 2)) %>% print() ``` ``` [1] 1 [1] 4 [1] 9 [1] 16 [1] 25 [1] 1 2 3 4 5 ``` ] ] .normal[ .accent.example[] .code-compact[ ```r # obviously a character vector x <- c('1', '2', '3', '4', '5') ``` ```r # try to convert each element to integer using map_dbl x %>% map_dbl(as.integer) ``` ``` [1] 1 2 3 4 5 ``` ```r # try to convert each element to integer using modify x %>% modify(as.integer) ``` ``` [1] "1" "2" "3" "4" "5" ``` ] ] ] --- class: title-subtitle # Why `walk()`? Why `modify()`? .fancy.subtitle[What's the point?] .fitgrid[ .definition[ > .large[<code>walk(.x, .f, ...)</code>] <br> call function `.f` once for each element of `.x`; return nothing ] .definition[ > .large[<code>modify(.x, .f, ...)</code>] <br> call function `.f` once for each element of `.x`; return the result as an object of the same type as `.x` ] ] <br/ style="display:block"> .fitgrid[ <div style="font-size:200%; width:70%; text-align:center; display: grid; grid-column-gap: 1em; grid-template-columns: 1fr 1fr 1fr; justify-items: center;">
</div> <div style="font-size:200%; width:70%; text-align:center; display: grid; grid-column-gap: 1em; grid-template-columns: 1fr 1fr 1fr; justify-items: center;">
</div> ] .fitgrid.top[ .normal[ .accent.negspace[**Just do stuff**] - Some functions just need to do stuff, not necessarily return stuff - E.g.: `cat()`, `message()`, `saveRDS()`, etc - Particularly useful for disk I/O operations - Allows input "passthrough" ] .normal[ .accent.negspace[**Change the content; keep the wrapper**] - Some functions just need to change stuff, not necessarily create stuff - Not everything needs to be coerced - What if input is already of the type we want as output? - Type preservation can be essential - Particularly useful are the `modify_if()` and `modify_at()` variants ] ] --- class: title-subtitle # `map()` variants cheatsheet .fancy.subtitle[Basic rules & the matrix of understanding] .accent.negspace[**map variant rules:**] 1. `map()` returns list; `map_*()` returns vector of type specified 1. `modify()` returns same type as input 1. `walk()` returns nothing 1. Iterate over two inputs with `map2()`, `walk2()`, `modify2()` 1. Iterate over input and index with `imap()`, `imodify()`, `iwalk()` 1. Iterate over any number of inputs with `pmap()` and `pwalk()` .accent.negspace[**map variant matrix:**] - map family of functions has orthogonal input and outputs - can organise all the family into a matrix, with inputs in the rows and outputs in the columns <table class="flat-table"> <thead> <tr> <th style="text-align:left;"> arguments </th> <th style="text-align:left;"> list </th> <th style="text-align:left;"> atomic </th> <th style="text-align:left;"> preserve type </th> <th style="text-align:left;"> nothing </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> one argument </td> <td style="text-align:left;"> <code>map()</code> </td> <td style="text-align:left;"> <code>map_lgl(), ...</code> </td> <td style="text-align:left;"> <code>modify()</code> </td> <td style="text-align:left;"> <code>walk()</code> </td> </tr> <tr> <td style="text-align:left;"> two arguments </td> <td style="text-align:left;"> <code>map2()</code> </td> <td style="text-align:left;"> <code>map2_lgl(), ...</code> </td> <td style="text-align:left;"> <code>modify2()</code> </td> <td style="text-align:left;"> <code>walk2()</code> </td> </tr> <tr> <td style="text-align:left;"> one argument + index </td> <td style="text-align:left;"> <code>imap()</code> </td> <td style="text-align:left;"> <code>imap_lgl(), ...</code> </td> <td style="text-align:left;"> <code>imodify()</code> </td> <td style="text-align:left;"> <code>iwalk()</code> </td> </tr> <tr> <td style="text-align:left;"> n arguments </td> <td style="text-align:left;"> <code>pmap()</code> </td> <td style="text-align:left;"> <code>pmap_lgl(), ...</code> </td> <td style="text-align:left;"> <code>NA</code> </td> <td style="text-align:left;"> <code>pwalk()</code> </td> </tr> </tbody> </table> --- class: title-subtitle # Using `walk()` .fancy.subtitle[Itteratively write data to disk using `purrr::pwalk()`] .accent.question[] For each manufacturer in the `mpg` dataset, write a `.csv` file to disk containing only the data for that manufacturer -- .code-compact[ ```r # check for files (show that there are none) list.files('data/mpg') ``` ``` character(0) ``` ] -- .code-compact[ ```r # create files by taking the mpg df %>% collapsing the data for each manufacturer into a list column %>% walking over the two columns in the df and for each pair (i.e. row of manufacturer and data values) doing: {create path variable to point to the path where the data should be written %>% write the data to disk in .csv format} mpg %>% group_nest(manufacturer, keep = T) %>% pwalk(function(manufacturer, data) { path <- file.path('data/mpg', glue::glue('df_{manufacturer}.csv')) write_csv(data, path) }) ``` ] -- .code-compact[ ```r # check for files again (show that there are now files) list.files('data/mpg') %>% {c(head(., 2), tail(., 2))} ``` ``` [1] "df_audi.csv" "df_chevrolet.csv" "df_toyota.csv" "df_volkswagen.csv" ``` ] --- class: title-subtitle # Using `iwalk()` .fancy.subtitle[Itteratively read data into `purrr::iwalk()`] .accent.question[] Read each of the `.csv` files just written to disk into R's global environment as a data frames. Use each file's name (without the `.csv` extension) as the name for its data frame. -- .code-compact[ ```r # check for objects (show that there are none) ls() ``` ``` character(0) ``` ] -- .code-compact[ ```r # get a list of all of the .csv files located in data/mpg %>% using set_names, name each element in this list with its filename sans the .csv extension %>% using iwalk to apply the assign function to each element in the list. Specifically, use fread to read the csv file from disk into a data frame and then assign that data frame as a named object to R's global environment list.files('data/mpg', pattern = '.csv', full.names = T) %>% set_names(str_remove(basename(.), '.csv$')) %>% iwalk(function(x, i) assign(i, fread(x), .GlobalEnv)) ``` ] -- .code-compact[ ```r # check for objects again (show that there are now files) ls() %>% {c(head(., 2), tail(., 2))} ``` ``` [1] "df_audi" "df_chevrolet" "df_toyota" "df_volkswagen" ``` ] --- class: title-subtitle # Using `modify_*()` .fancy.subtitle[Conditionally change contents using `purrr::modify_if()`] .question[] For each manufacturer in the `mpg` dataset, express all of the numeric columns as the percentage deviation from the mean -- ```r # define function to express each element in vector as % deviation from mean myfunc <- function(x) x / mean(x, na.rm = T) - 1 ``` -- .fitgrid.top[ .code-compact[ ```r # data.table approach # take mpg %>% convert to data.table %>% group by manufacturer, then use modify_if to target all numeric columns and modify each using the deviation function a <- mpg %>% setDT() %>% .[by = .(manufacturer), j = modify_if(.SD, is.numeric, myfunc)] ``` ] .code-compact[ ```r # dplyr approach # take mpg %>% group by manufacturer %>% use mutate_if to mutate all numeric columns using the deviation function %>% ungroup the data b <- mpg %>% group_by(manufacturer) %>% mutate_if(is.numeric, myfunc) %>% ungroup() ``` ] ] -- .code-compact[ ```r # show that methods produce equivalent outputs all_equal(a, b) ``` ``` [1] TRUE ``` ] --- name: Predicate functionals class: inverse, center, middle, no-footer, fancy-title # Predicate functionals <div style="font-size: 150%;"> .center.fancy.accent[A predicate function is a function that either returns `TRUE` or `FALSE`. Predicate functionals take vector `.x` and predicate function `.f` and do something useful.] </div> <div class="image" style="margin-top: 2em"><img src="../img/predicate_cat.svg" height=350px /></div> --- class: title-subtitle # Using `every()` and `some()` .fancy.subtitle[All or .strikeout[nothing] some!] .question[]For which manufacturers in the `mpg` dataset do the city miles per gallon (`cty`) (a) exceed 15mpg on all models and/or (b) 25 mpg on at least some models? -- .pull-left[ .code-compact[ ```r # take mpg %>% group by manufacturer %>% use summarise to create 2 summary columns: all_above_15 captures whether every value of cty > 15, while some_above_25 captures whether some values of cty > 25 %>% ungroup the data mpg %>% group_by(manufacturer) %>% summarise( all_above_15 = every(cty, function(x) x > 15), some_above_25 = some(cty, function(x) x > 25)) %>% ungroup() ``` ] ] -- .pull-right[ ``` # A tibble: 15 x 3 manufacturer all_above_15 some_above_25 <chr> <lgl> <lgl> 1 audi FALSE FALSE 2 chevrolet FALSE FALSE 3 dodge FALSE FALSE 4 ford FALSE FALSE 5 honda TRUE TRUE 6 hyundai TRUE FALSE 7 jeep FALSE FALSE 8 land rover FALSE FALSE 9 lincoln FALSE FALSE 10 mercury FALSE FALSE 11 nissan FALSE FALSE 12 pontiac TRUE FALSE 13 subaru TRUE FALSE 14 toyota FALSE TRUE 15 volkswagen TRUE TRUE ``` ] --- class: title-subtitle count: false # Using `every()` and `some()` .fancy.subtitle[All or .strikeout[nothing] some!] .question[]For which manufacturers in the `mpg` dataset do the city miles per gallon (`cty`) (a) exceed 15mpg on all models and/or (b) 25 mpg on at least some models? .pull-left[ .code-compact[ ```r # take mpg %>% group by manufacturer %>% use summarise to create 2 summary columns: all_above_15 captures whether every value of cty > 15, while some_above_25 captures whether some values of cty > 25 %>% ungroup the data mpg %>% group_by(manufacturer) %>% summarise( all_above_15 = every(cty, function(x) x > 15), some_above_25 = some(cty, function(x) x > 25)) %>% ungroup() ``` <br/ style="inline-block"> .accent.negspace[**Bonus:**] ```r # take mpg %>% group by manufacturer %>% filter to keep only data for manufacturers whose models all have cty > 15mpg mpg %>% group_by(manufacturer) %>% filter(every(cty, function(x) x > 15)) ``` ] ] .pull-right[ ``` # A tibble: 15 x 3 manufacturer all_above_15 some_above_25 <chr> <lgl> <lgl> 1 audi FALSE FALSE 2 chevrolet FALSE FALSE 3 dodge FALSE FALSE 4 ford FALSE FALSE 5 honda TRUE TRUE 6 hyundai TRUE FALSE 7 jeep FALSE FALSE 8 land rover FALSE FALSE 9 lincoln FALSE FALSE 10 mercury FALSE FALSE 11 nissan FALSE FALSE 12 pontiac TRUE FALSE 13 subaru TRUE FALSE 14 toyota FALSE TRUE 15 volkswagen TRUE TRUE ``` ] --- name: Other trasformations class: inverse, center, middle, no-footer, fancy-title # Other vector transformations <div class="image" style="margin-top: 2em"><img src="../img/transform_cat.svg" height=200px /></div> --- class: title-subtitle # `reduce()` and `accumulate()` .fancy.subtitle[Collapsing it all or building it up] .fitgrid.top[ .definition[ > .large[<code>reduce(.x, .f, ..., .init)</code>] <br> use function `.f` to combine elements of `.x` by passing the result of each itteration as an initial value to the next itteration; return single result from final itteration ] .definition[ > .large[<code>accumulate(.x, .f, ..., .init)</code>] <br> use function `.f` to combine elements of `.x` by passing the result of each itteration as an initial value to the next itteration; return list of results from each itteration ] ] -- .fitgrid.top[ .normal[ .accent.example[] .code-compact[ ```r # return cumulative sum of 1:5 1:5 %>% reduce(`+`) ``` .nocodegap[] ```r 1:5 %>% reduce(function(x, y) x + y) ``` ``` [1] 15 ``` ```r # which numbers appear in the vector 1:5 1:5 %>% reduce(function(x, y) paste(x, 'and', y)) ``` ``` [1] "1 and 2 and 3 and 4 and 5" ``` ] ] .normal[ .accent.example[] .code-compact[ ```r # return each step in cumulative sum of 1:5 1:5 %>% accumulate(`+`) ``` .nocodegap[] ```r 1:5 %>% accumulate(function(x, y) x + y) ``` ``` [1] 1 3 6 10 15 ``` ```r # which numbers appear in each itteration 1:5 %>% accumulate(function(x, y) paste(x, 'and', y)) ``` ``` [1] "1" "1 and 2" "1 and 2 and 3" [4] "1 and 2 and 3 and 4" "1 and 2 and 3 and 4 and 5" ``` ] ] ] --- class: title-subtitle # Why `reduce()`? Why `accumulate()`? .fancy.subtitle[What's the point?] .fitgrid.top[ .definition[ > .large[<code>reduce(.x, .f, ..., .init)</code>] <br> use function `.f` to combine elements of `.x` by passing the result of each itteration as an initial value to the next itteration; return single result from final itteration ] .definition[ > .large[<code>accumulate(.x, .f, ..., .init)</code>] <br> use function `.f` to combine elements of `.x` by passing the result of each itteration as an initial value to the next itteration; return list of results from each itteration ] ] <br/ style="display:block"> .fitgrid[ <div style="font-size:200%; width:70%; text-align:center; display: grid; grid-column-gap: 1em; grid-template-columns: 1fr 1fr 1fr; justify-items: center;">
</div> <div style="font-size:200%; width:70%; text-align:center; display: grid; grid-column-gap: 1em; grid-template-columns: 1fr 1fr 1fr; justify-items: center;">
</div> ] .fitgrid.top[ .normal[ .accent.negspace[**_E pluribus unum_**] - You want just one thing - Getting that thing requires repeating (effectively) the same additive operation - E.g.: `bind_rows()`, `bind_cols()`, `left_join()`, `merge`, etc ] .normal[ .accent.negspace[**Build something bit by bit**] - Each step that builds up to the final "thing" is of interest - Incrementally building a plot - Building up a model specification - Complex accumulative sequences ] ] --- class: title-subtitle # Using `accumulate()` .fancy.subtitle[Building up a model using `purrr::accumulate()`] .accent.question[] Starting with `cty ~ manufacturer` as a base, .accent[**(1)**] build up several linear model specifications for estimating the city miles per gallon (`cty`) in the `mpg` dataset by incrementally adding the `trans`, `drv`, and `class` terms to the model. .accent[**(2)**] Estimate each model and report the adjusted R-squared. -- .pull-left[ .accent.example.pullup[1] .code-compact[ ```r # create a vector of model specifications by taking the relevant column names %>% accumulating each into the base specification using paste and a ' + ' separator %>% number each model sequentially using set_names %>% print the results in a neatly formatted tibble models <- c('trans', 'drv', 'class') %>% accumulate(function(x, y) paste(x, y, sep = ' + '), .init = 'cty ~ manufacturer') %>% set_names(1:length(.)) enframe(models, name = 'model', value = 'spec') ``` ``` # A tibble: 4 x 2 model spec <chr> <chr> 1 1 cty ~ manufacturer 2 2 cty ~ manufacturer + trans 3 3 cty ~ manufacturer + trans + drv 4 4 cty ~ manufacturer + trans + drv + class ``` ] ] -- .pull-right.pullup[ .accent.example[2] .code-compact[ ```r # take models %>% estimate each using map to apply the lm function %>% get the summary for each set of results using map to apply the summary function %>% extract the adjusted r-squared for each set of summary results using map_dbl to extract it by name %>% print the results in a neatly formatted tibble models %>% map(lm, data = mpg) %>% map(summary) %>% map_dbl("adj.r.squared") %>% enframe(name = 'model', value = 'Adj-R2') ``` ``` # A tibble: 4 x 2 model `Adj-R2` <chr> <dbl> 1 1 0.528 2 2 0.551 3 3 0.687 4 4 0.713 ``` ] ] --- name: Adverbs class: inverse, center, middle, no-footer, fancy-title # Adverbs <div style="font-size: 160%;"> .center.fancy.accent[modify the action of a function; taking a function as input and returning a function with modified action as output] </div> <div class="image" style="margin-top: 2em"><img src="../img/adverbs.svg" height=400px /></div> --- class: title-subtitle # `compose()` and `partial()` .fancy.subtitle[Why work so hard?] .fitgrid[ .definition[ > .large[<code>compose(..., .dir = c('backward, 'forward))</code>] <br> apply functions `...` in order in the direction `.dir` specified ] .definition[ > .large[<code>partial(.f, ...)</code>] <br> modify function `.f` by pre-filling and fixing some of its arguments ] ] -- .fitgrid.top[ .normal[ .accent.example[] .code-compact[ ```r round(mean(log(1:20), na.rm = T), digits = 2) ``` ``` [1] 2.12 ``` ```r round(mean(log(5:100), na.rm = T), digits = 2) ``` ``` [1] 3.76 ``` ```r # compose steps into a function mycomp <- compose(log, ~ mean(.x, na.rm = T), ~ round(.x, digits = 2), .dir = 'forward') mycomp(1:20) ``` ``` [1] 2.12 ``` ```r mycomp(5:100) ``` ``` [1] 3.76 ``` ] ] .normal[ .accent.example[] .code-nospace[ ```r round(0.532131245, digits = 2) ``` ``` [1] 0.53 ``` ```r round(12394.13498134, digits = 2) ``` ``` [1] 12394.13 ``` ```r # prefill and fix parameter (WARNING!) myround <- partial(round, digits = 2) myround(0.532131245) ``` ``` [1] 0.53 ``` ```r myround(12394.13498134) ``` ``` [1] 12394.13 ``` ```r myround(1/3, digits = 2) ``` ``` Error in (function (x, digits = 0) : formal argument "digits" matched by multiple actual arguments ``` ] ] ] --- class: title-subtitle # Using `compose()` .fancy.subtitle[There's more than one way to .strikeout[skin] pet a cat] .accent.question[] Compose a function that can be used to estimate each model in the previously defined `models` vector and report the adjusted R-squared. -- .pull-left[ .accent.example.pullup[previously] .code-compact[ ```r # take models %>% estimate each using map to apply the lm function %>% get the summary for each set of results using map to apply the summary function %>% extract the adjusted r-squared for each set of summary results using map_dbl to extract it by name %>% print the results in a neatly formatted tibble models %>% map(lm, data = mpg) %>% map(summary) %>% map_dbl("adj.r.squared") %>% enframe(name = 'model', value = 'Adj-R2') ``` ``` # A tibble: 4 x 2 model `Adj-R2` <chr> <dbl> 1 1 0.528 2 2 0.551 3 3 0.687 4 4 0.713 ``` ] ] -- .pull-right[ .accent.example[alternative] .code-compact[ ```r # compose a function that sends arguments to lm, then passes the results to summary, then plucks the r-squared from those results, then enframes mycomp <- compose(lm, summary, ~pluck(.x, 'adj.r.squared'), ~enframe(.x, name = 'model', value = 'adj.r.squared'), .dir = 'forward') # take models %>% estimate each by using map_dfr to apply the mycomp function and row bind models %>% map_dfr(~mycomp(.x, data = mpg)) ``` ``` # A tibble: 4 x 2 model adj.r.squared <int> <dbl> 1 1 0.528 2 1 0.551 3 1 0.687 4 1 0.713 ``` ] ] --- class: title-subtitle # `safely()`, `possibly()`, and `insistently()` .fancy.subtitle[Failure is .strikeout[not] an option!] .definition[ > .large[<code>safely(.f, otherwise = NULL, quiet = TRUE)</code>] <br> modifies function `.f` to return a list with components `result` (result if not error, NA otherwise) and `error` (error message if error, NULL otherwise). ] .fitgrid.top[ .definition[ > .large[<code>possibly(.f, otherwise, quiet = TRUE)</code>] <br> modifies function `.f` to return `otherwise` if error occurs. ] .definition[ > .large[<code>insistently(f, rate = rate_backoff())</code>] <br> modifies function `.f` to retry specified times on error. ] ] -- .accent.example[] .code-compact[ ```r # define bad function that only works on odd numbers badfunc <- function(x) if (x %% 2 == 0) stop('Only odd numbers allowed') else (x) # define safe version of badfunc , possible, and insistent versions of badfunc safely_badfunc <- safely(badfunc) # define possible version of badfunc possibly_badfunc <- possibly(badfunc, otherwise = NA_real_) # define insistent version of badfunc insistently_badfunc <- insistently(badfunc, rate = rate_backoff(pause_cap = 1, max_times = 4)) ``` ] --- class: title-subtitle count: false # `safely()`, `possibly()`, and `insistently()` .fancy.subtitle[Failure is .strikeout[not] an option!] .definition[ > .large[<code>safely(.f, otherwise = NULL, quiet = TRUE)</code>] <br> modifies function `.f` to return a list with components `result` (result if not error, NA otherwise) and `error` (error message if error, NULL otherwise). ] .fitgrid.top[ .definition[ > .large[<code>possibly(.f, otherwise, quiet = TRUE)</code>] <br> modifies function `.f` to return `otherwise` if error occurs. ] .definition[ > .large[<code>insistently(f, rate = rate_backoff())</code>] <br> modifies function `.f` to retry specified times on error. ] ] .fitgrid.top[ .normal[ .accent.example[_"Good"_ value] .code-nospace[ ```r # test functions of "good" value badfunc(1) ``` ``` [1] 1 ``` ```r safely_badfunc(1) ``` ``` $result [1] 1 $error NULL ``` ```r possibly_badfunc(1) ``` ``` [1] 1 ``` ```r insistently_badfunc(1) ``` ``` [1] 1 ``` ] ] .normal[ .accent.example[_"Bad"_ value] .code-nospace[ ```r # test functions of "bad" value badfunc(2) ``` ``` Error in badfunc(2): Only odd numbers allowed ``` ```r safely_badfunc(2) ``` ``` $result NULL $error <simpleError in .f(...): Only odd numbers allowed> ``` ```r possibly_badfunc(2) ``` ``` [1] NA ``` ```r insistently_badfunc(2) ``` ``` Error: Request failed after 4 attempts ``` ] ] ] --- class: title-subtitle # Why `safely()`? Why `possibly()`? Why `insistently()` .fancy.subtitle[What's the point?] .fitgrid.top[ .definition[ > .large[<code>safely(.f, otherwise = NULL, quiet = TRUE)</code>] <br> modifies function `.f` to return a list with components `result` (result if not error, NA otherwise) and `error` (error message if error, NULL otherwise). ] .normal[ <div style="margin-top:-0.6em;"></div> .accent.negspace[**Give me the info and let me decide what to do**] - Ever called an API? - Allows for robust, flexible error handling ] ] .fitgrid.top[ .definition[ > .large[<code>possibly(.f, otherwise, quiet = TRUE)</code>] <br> modifies function `.f` to return `otherwise` if error occurs. ] .normal[ <div style="margin-top:-0.6em;"></div> .accent.negspace[**Let's pretend that didn't happen, okay?**] - Don't get bogged down with failures - You care (a bit) about the fact that there was an error, but not enough to want to stop. - You don't care at all about the reason for the error or you're fairly confident about why there is an error ] ] .fitgrid.top[ .definition[ > .large[<code>insistently(f, rate = rate_backoff())</code>] <br> modifies function `.f` to retry specified times on error. ] .normal[ <div style="margin-top:-0.6em;"></div> .accent.negspace[**If at first you don't succeed**] - Get back on that horse! - Really only useful if you expect the chance of success to change with repeated attempts - i.e. the input to the function could change over successive calls ] ] --- class: title-subtitle # More `purrr` fun(ctions) .fancy.subtitle[But wait, there's more...] .accent.negspace[**Generalisations**] - `keep()` and `discard` as generalisations of `dplyr::select_if()` - `pluck()` as generalisation of `[[` and `dplyr::pull()` - etc. .accent.negspace[**Companions**] - `prepend()` as companion to `append()` - `negate()` as companion to any predicate function .accent.negspace[**etc**] - more predicate functionals - more vector transformations - etc. --- class: title-subtitle # I'm intrigued... .fancy.subtitle[Where can I learn more?] .autogrid-left[ <div class="image" style="font-size:5em; margin-right: 0.2em; width: 1em; "><img src="../img/rstudio_logo.svg"/></div> .normal[ .accent.negspace[**Reference (R/Rstudio)**] - `help(package = purrr)` - .keystroke[F1] to show function help - .keystroke[F2] to inspect function ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Reference (online)**] - [`purrr` cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf) - [`purrr` reference](https://purrr.tidyverse.org/reference/index.html) ] ] .autogrid-left[ <div style="font-size:5em; margin-right: 0.2em; width: 1em; text-align:center; justify-items: center;">
</div> .normal[ .accent.negspace[**Learning and understanding**] - Hadley Wickham's [Advanced R Chapter 9: Functionals](https://adv-r.hadley.nz/functionals.html) - Jenny Bryan's [purrr tutorial](https://jennybc.github.io/purrr-tutorial/) - Emil Hvitfeldt's [Purrr - tips and tricks](https://www.hvitfeldt.me/blog/purrr-tips-and-tricks) - Emily Robinson's [Going Off the Map: Exploring purrr's Other Functions](https://hookedondata.org/going-off-the-map/) - Colin Fay's 6-part [A Crazy Little Thing Called {purrr}](https://colinfay.me/purrr-web-mining/) ] ]