If you’ve ever sourced data from the web, chances are that you will have encountered JSON - a lightweight and flexible data-interchange format. Apart from being easy to read and write, one of the foremost reasons for its ubiquity on the web is that it is particularly well-suited to storing data with complex hierarchical, nested structures. However, unless you’re using a language like JavaScript, JSON can be less than ideal for the purposes of actually analysing data. This is also true when trying to work with JSON data in R.

To analyse JSON data in R, ideally what we want is a way of first consistently converting it into tidy data (e.g. a tidy tibble). Fortunately, the fromJSON() function from the excellent jsonlite package makes converting JSON objects into R objects a pretty trivial task. Depending on the structure and complexity of the JSON data, however, converting the resultant R object into a tidy tibble can at times require a litte extra work.

In this post, I try to show how one can convert JSON data into tidy tibbles and, in particular, to illustrate what I’ve found to be a reasonably good, general approach to converting nested JSON into nested tibbles. I’ll be using three illustrative examples of increasing complexity before using the approach in the context of some real-world rock climbing competition data.

For the code that follows, we’ll need to load both the jsonlite and tidyverse packages.1:

# load required libraries
library(jsonlite)
library(tidyverse)
  1. load the jsonlite package since we’ll be using the fromJSON() function
  2. load tidyverse since we’ll be using a range of purrr, tidyr, and dplyr funtions

If you want to follow along with the examples, you’ll also need to load the example JSON strings into your R environment:

# load json string examples
load('../../static/data/json_examples.rda')
  1. load the objects contained in json_examples.rda into the R environment. This file contains three JSON strings, respectively called json1, json2, and json3.

Example 1: easy does it…

Our first simple JSON string is contained in the json1 object and looks like this:

{
    "year": [
        2018
    ],
    "customers": [
        {
            "name": "John"
        },
        {
            "name": "Jane",
            "surname": "Shaw"
        }
    ]
}
 

Here we can see that our data contains a single entry for the year field, followed by an array of customers data which, in turn, includes fields for name and surname. Next, we’ll convert this JSON string into an R object and then take a glimpse at the resultant data structure:

#{r eval=FALSE, message=FALSE, warning=FALSE, include=TRUE, paged.print=FALSE}
# use jsonlite to convert json string to R object
list_json1 <- fromJSON(json1) 

# get a glimpse of the list_json1 object
glimpse(list_json1)
> List of 2
>  $ year     : int 2018
>  $ customers:'data.frame':    2 obs. of  2 variables:
>   ..$ name   : chr [1:2] "John" "Jane"
>   ..$ surname: chr [1:2] NA "Shaw"
  1. use the fromJSON() function from the jsonlite package to convert the JSON string into a list object which we store as list_json1.
  2. use glimpse() from the tibble package to look at the structure of the list_json1 object.

So far, so easy. We can see that the JSON string has succesfully been converted into an R list object with year automatically being detected as an integer and customers automatically being detected as a data.frame. Next, we’ll convert this list into a single tibble:

# convert list_json1 to a tibble
df_json1 <- as_tibble(list_json1, validate = F) 

# display df_json1
print(df_json1)
> # A tibble: 2 x 2
>    year customers$name $surname
>   <int> <chr>          <chr>   
> 1  2018 John           <NA>    
> 2  2018 Jane           Shaw
  1. use as_tibble() from the tibble package to convert the list object into a tibble. as_tibble() is similar to as.data.frame(), but is a little more efficient and is stricter in the sense that, by default, (a) each element of the list passed to it must have the same length and (b) each element must be a one-dimensional (1d) vector or list. This is obviously not true of list_json1: the length of the year element is 1 whereas the length of the customers element is 2. This is why we have to pass the option validate = FALSE to the as_tibble() call.
  2. print() df_json1 to the console

We now have our data as a tibble, but something seems rather odd: according to the output, the dimensions of our tibble is 2 x 2. However, we can clearly see three columns of data in the output! We can get a better sense of what’s happening under the hood by glimpsing the data again:

# get a glimpse of the df_json1 tibble
glimpse(df_json1)
> Observations: 2
> Variables: 2
> $ year      <int> 2018, 2018
> $ customers <df[,2]> <data.frame[2 x 2]>

The output shows that our df_json1 tibble does indeed have only two variables, but that the second variable is itself a data.frame. Practically, what this means is that we would need to use two $ accessors in order to refer to the name or surname columns in df_json1, e.g. df_json1$customers$name and df_json1$customers$surname. This seems needlessly cumbersome.

At this point, you may be tempted to simply unnest df_json1. But, recall that unnest() only works on list-columns and that customers is a data.frame - not a list-column. df_json1 is not, in fact, nested at all. Yeesh!…

Put it in a list first

One “solution” to this “problem” comes courtesy of purrr’s map_if() command. Basically, all we need to do is ensure that any data.frame object in our list first gets placed into its own sub-list before we convert the list to a tibble:

# place any data.frame found in list_json1 into its own list
list_json1 <- map_if(list_json1, is.data.frame, list) 

# get a glimpse of the list_json1 object
glimpse(list_json1)
> List of 2
>  $ year     : int 2018
>  $ customers:List of 1
>   ..$ :'data.frame':  2 obs. of  2 variables:
>   .. ..$ name   : chr [1:2] "John" "Jane"
>   .. ..$ surname: chr [1:2] NA "Shaw"
  1. use map_if() from the purrr package to iterate over the elements in list_json1 and apply a specified function if a certain condition is met. The condition being tested is whether the element in question is a data.frame. If this condition evaluates to TRUE, the list() function should be applied to that element (effectively putting it into a list). map_if() will produce a list as output which we then re-assign to the list_json1 object
  2. use glimpse() from the tibble package to look at the new structure of the list_json1 object.

The structure of list_json1 is now somewhat different: where before customers was a data.frame, it is now a list containing a data.frame. The difference, though perhaps subtle at face value, is important since it means we now have nested list (i.e. a list of lists). My intuition tells me that we should therefore be able to convert this nested list into a nested tibble using the as_tibble() function:

# convert list_json1 to a tibble
df_json1 <- as_tibble(list_json1) 

# display df_json1
print(df_json1)
## # A tibble: 1 x 2
##    year customers       
##   <int> <list>          
## 1  2018 <df[,2] [2 × 2]>
  1. use as_tibble() from the tibble package to convert the list object into a tibble. Since each element in list_json1 has length 1 (you can test this if you like) and is either a one-dimensional vector or list, we no longer need to pass the option validate = FALSE to the as_tibble() call as we did before.
  2. print() df_json1 to the console

Hey presto! That certainly looks like a nested tibble to me. Let’s make sure by trying to unnest the data into a unnested tibble:

# convert list_json1 to a tibble
df_json1 <- unnest(df_json1) 

# display df_json1
print(df_json1)
## # A tibble: 2 x 3
##    year name  surname
##   <int> <chr> <chr>  
## 1  2018 John  <NA>   
## 2  2018 Jane  Shaw
  1. use unnest() from the tidyr package to convert the nested tibble into an unnested tibble. Given that we only have one nested column of data in the tibble, unnest() automatically detects that we want to unnest by the customers column
  2. print() df_json1 to the console

Success! We now have a tidy, flat tibble.

Example 2: taking it down a level

To test how well the approach outlined above generalises to other applications, let’s consider the somewhat more complex JSON string contained in json2:

{
    "year": [
        2018
    ],
    "customers": [
        {
            "name": "John",
            "contact": [
                {
                    "phone": "home",
                    "number": "333-3334"
                }
            ]
        },
        {
            "name": "Jane",
            "contact": [
                {
                    "phone": "home",
                    "number": "555-5556"
                }
            ],
            "surname": "Shaw"
        }
    ]
}
 

json2 builds on json1 by adding contact information for each customer. In this instance, the contact field for each customer is a one-dimensional array containing a phone type field and a corresponding phone number field. This time, we’ll try to convert the JSON string into a nested tibble using a single pipe which combines all of the key steps used above. As before, we’ll print the result to the console:

# convert json2 into a nested list object
df_json2 <- fromJSON(json2) %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble()

# get a glimpse of the list_json2 object
print(df_json2)
> # A tibble: 1 x 2
>    year customers       
>   <int> <list>          
> 1  2018 <df[,3] [2 × 3]>
  1. take json2 and convert it into an R list using fromJSON(), then (%>%)
    • use map_if() to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble()
    • store the result of the pipe as df_json2
  2. print() df_json2 to the console

Once more, we now have a nested tibble. However, this is the result when we unnest the data:

# try to unnest df_json1
unnest(df_json2)
> # A tibble: 2 x 4
>    year name  contact          surname
>   <int> <chr> <list>           <chr>  
> 1  2018 John  <df[,2] [1 × 2]> <NA>   
> 2  2018 Jane  <df[,2] [1 × 2]> Shaw
  1. use unnest() from the tidyr package to unnest the tibble.

Contrary to what you may have expected, the result is not simply a flat tibble, but a nested tibble, albeit one that is nested at a lower level of the nesting hierarchy than before. The key thing to notice here is that, since json2 has three hierarchically nested levels, this will also be reflected in our nested tibble. This means that we have to unnest twice in order to get df_json2 into a flat tibble:

# unnest df_json2 twice
df_json2 %>% 
  unnest() %>% 
  unnest() 
> # A tibble: 2 x 5
>    year name  phone number   surname
>   <int> <chr> <chr> <chr>    <chr>  
> 1  2018 John  home  333-3334 <NA>   
> 2  2018 Jane  home  555-5556 Shaw
  1. take df_json2, then (%>%)
    • unnest it, then (%>%)
    • unnest it again to get it into a flat tibble

Example 3: going sideways

Our approach now appears to be general enough to handle hierarchically nested objects of arbitrary depth. However, it is useful to consider the implications of having multiple lists of varying length at the same level of nesting. Consider json3:

{
    "year": [
        2018
    ],
    "customers": [
        {
            "name": "John",
            "contact": [
                {
                    "phone": "home",
                    "number": "333-3334"
                }
            ],
            "address": [
                {
                    "type": "home",
                    "details": "4 Orangegrove lane, Elan, NC"
                },
                {
                    "type": "work",
                    "details": "13 Kombucha Close, Elan, NC"
                }
            ]
        },
        {
            "name": "Jane",
            "contact": [
                {
                    "phone": "home",
                    "number": "555-5556"
                }
            ],
            "address": [
                {
                    "type": "home",
                    "details": "4 Orangegrove lane, Elan, NC"
                },
                {
                    "type": "work",
                    "details": "1st First street, Jinkies, NC"
                }
            ],
            "surname": "Shaw"
        }
    ]
}
 

json3 adds an address array for each customer. Note that this array is at the same level of nesting as the contact array. In other words, json3 has the same nesting depth as json2. However, let’s see what happens when we try to use the same procedure to first convert json3 into a nested tibble and then convert it to a flat tibble by unnesting twice:

# convert json3 into a nested tibble and then try to unnest twice
json3 %>% 
  fromJSON() %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble() %>% 
  unnest() %>% 
  unnest()
> # A tibble: 4 x 7
>    year name  phone number   type  details                       surname
>   <int> <chr> <chr> <chr>    <chr> <chr>                         <chr>  
> 1  2018 John  home  333-3334 home  4 Orangegrove lane, Elan, NC  <NA>   
> 2  2018 John  home  333-3334 work  13 Kombucha Close, Elan, NC   <NA>   
> 3  2018 Jane  home  555-5556 home  4 Orangegrove lane, Elan, NC  Shaw   
> 4  2018 Jane  home  555-5556 work  1st First street, Jinkies, NC Shaw
  1. take json3 and convert it into an R list using fromJSON(), then (%>%)
    • use map_if() to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble(), then (%>%)
    • try to unnest the tibble twice

That obviously didn’t work. We can see why by going one step back and unnesting the data once first:

# convert json3 into a nested tibble and then unnest once
json3 %>% 
  fromJSON() %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble() %>% 
  unnest() 
> # A tibble: 2 x 5
>    year name  contact          address          surname
>   <int> <chr> <list>           <list>           <chr>  
> 1  2018 John  <df[,2] [1 × 2]> <df[,2] [2 × 2]> <NA>   
> 2  2018 Jane  <df[,2] [1 × 2]> <df[,2] [2 × 2]> Shaw
  1. take json3 and convert it into an R list using fromJSON(), then (%>%)
    • use map_if() to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble(), then (%>%)
    • unnest the tibble

We now see that, while contact and address are both list-columns, they have different numbers of rows. What effectively happens when we add a second call to unnest() in our pipe without any further arguments is that R tries to unnest by contact and address simultaneously. But, given the different dimensions of the underlying columns it cannot do so. Hence the error we got before. To get around this we need to be a bit more explicit in terms of how we ask R to unnest the data. Essentially, we have to choose the sequence in which the tibble should be unnested. E.g.:

# convert json3 into a nested tibble and then unnest twice
json3 %>% 
  fromJSON() %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble() %>% 
  unnest(customers) %>% 
  unnest(contact) %>% 
  unnest(address)
> # A tibble: 4 x 7
>    year name  phone number   type  details                       surname
>   <int> <chr> <chr> <chr>    <chr> <chr>                         <chr>  
> 1  2018 John  home  333-3334 home  4 Orangegrove lane, Elan, NC  <NA>   
> 2  2018 John  home  333-3334 work  13 Kombucha Close, Elan, NC   <NA>   
> 3  2018 Jane  home  555-5556 home  4 Orangegrove lane, Elan, NC  Shaw   
> 4  2018 Jane  home  555-5556 work  1st First street, Jinkies, NC Shaw
  1. take json3 and convert it into an R list using fromJSON(), then (%>%)
    • use map_if() to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble(), then (%>%)
    • unnest the tibble by the customers list-column, then (%>%)
    • unnest the tibble by the contact list-column, then (%>%)
    • unnest the tibble by the address list-column, then (%>%)

It’s worth noting that the order of our unnesting sequence matters. If we choose to first unnest by customers and then by address, unnest()’s default behaviour would be to drop the contact list-column unless we explicitly specify the option .preserve = 'contact' when issuing the command. In other words, the following code would be required to produce a similar unnested tibble to the one shown above:

# convert json3 into a nested tibble and then unnest twice
json3 %>% 
  fromJSON() %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble() %>% 
  unnest(customers) %>% 
  unnest(address, .preserve = 'contact') %>% 
  unnest(contact)
> # A tibble: 4 x 7
>    year name  phone number   type  details                       surname
>   <int> <chr> <chr> <chr>    <chr> <chr>                         <chr>  
> 1  2018 John  home  333-3334 home  4 Orangegrove lane, Elan, NC  <NA>   
> 2  2018 John  home  333-3334 work  13 Kombucha Close, Elan, NC   <NA>   
> 3  2018 Jane  home  555-5556 home  4 Orangegrove lane, Elan, NC  Shaw   
> 4  2018 Jane  home  555-5556 work  1st First street, Jinkies, NC Shaw
  1. take json3 and convert it into an R list using fromJSON(), then (%>%)
    • use map_if() to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble(), then (%>%)
    • unnest the tibble by the customers list-column, then (%>%)
    • unnest the tibble by the address list-column and add the option .preserve = 'contact' to ensure that the contact column doesn’t get dropped in the unnesting process, then (%>%)
    • unnest the tibble by the contact list-column

Application: IFSC climbing results

Let’s see how our approach performs in the real world. The application I’m going to consider here is the overall results for the men’s and women’s lead climbing finalists from the most recent round of the IFSC Climbing World cup, held in Briançon, France on July 21st 2018.2 It’s pretty easy to get detailed results for this competition from the web in JSON format and my objective is to get these results into R and convert them into a tidy tibble.

We’ll start with the results for the women’s competition first. Our first step is simply to read the JSON string into R from the web and then convert it to an R list:

# create a variable to store the url for the climbing results 
results_url <- 'https://www.digitalrock.de/egroupware/ranking/json.php?comp=7146&cat=2&route=3'

# read the json string into memory from the web and convert to r list
list_json <- fromJSON(results_url)
  1. create a variale to store the url where the JSON results data is hosted
  2. read the data from the web into the R environment and convert it into an R list using fromJSON(). Store the result in list_json

Since the data we’re dealing with here is rather more complex than what was considered in the examples above, this time we’ll simply glimpse at the top-level structure of the resultant list_json object:

# glimpse at the top level structure of list_json
glimpse(list_json, max.level = 1) 
> List of 22
>  $ WetId              : chr "7146"
>  $ GrpId              : chr "2"
>  $ route_order        : chr "3"
>  $ route_name         : chr "Final W O M E N lead"
>  $ route_judge        : chr "Johannes Altner"
>  $ route_iso_open     : chr "19:00"
>  $ route_iso_close    : chr "20:00"
>  $ route_start        : chr "21:00"
>  $ route_result       : chr "21/07/2018 à 22:03:23"
>  $ route_climbing_time: chr "6'"
>  $ discipline         : chr "lead"
>  $ comp_name          : chr "IFSC Climbing Worldcup (L) - Briançon (FRA) 2018 "
>  $ comp_date          : chr "2018-07-20"
>  $ display_athlete    : chr "nation"
>  $ route_names        :List of 5
>  $ category_offical   : logi TRUE
>  $ current            : list()
>  $ participants       :'data.frame':  8 obs. of  18 variables:
>  $ see_also           :'data.frame':  2 obs. of  2 variables:
>  $ categorys          :'data.frame':  2 obs. of  3 variables:
>  $ etag               : chr "\"4010c71c513dbde660fb7c10cc7e0961\""
>  $ expires            : int 86400
  1. use glimpse() from the tibble package to get a summary of the structure of list_json. use the max.level = 1 option to specify that we only want to see the top-level structure

From this glimpse we see that we have one list element (route_names) and several data.frame elemens in list_json. Let’s see if we can convert this into a nested tibble using our approach. I mean, it shoud work, right?…

# convert json3 into a nested tibble and then unnest twice
list_json %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble()
> Error: Tibble columns must have consistent lengths, only values of length one are recycled:
> * Length 0: Column `current`
> * Length 5: Column `route_names`
  1. take list_json, then (%>%)
    • use map_if() to iterate over the list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble()

Alas… The issue here is that, unlike all of the other elements, the current element in list_json is effectively empt. This is problematic since as_tibble() requires all elements to have the same length. Fortunately, we need only a minor addition to the code to take care of this. We’ll use purrr’s discard() function to first get rid of any empty elements in list_json before performing any further operations on it:

# convert json3 into a nested tibble and then unnest twice
df_json <- list_json %>% 
  discard(is_empty) %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble()

# print the result to the console
print(df_json)
> # A tibble: 5 x 21
>   WetId GrpId route_order route_name route_judge route_iso_open route_iso_close
>   <chr> <chr> <chr>       <chr>      <chr>       <chr>          <chr>          
> 1 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 2 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 3 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 4 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 5 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> # … with 14 more variables: route_start <chr>, route_result <chr>,
> #   route_climbing_time <chr>, discipline <chr>, comp_name <chr>,
> #   comp_date <chr>, display_athlete <chr>, route_names <named list>,
> #   category_offical <lgl>, participants <list>, see_also <list>,
> #   categorys <list>, etag <chr>, expires <int>
  1. take list_json, then (%>%)
    • use discard() from the purrr package to remove any empty elements from the list, then (%>%)
    • use map_if() to iterate over the list from the previous step and place any data.frame elements into lists, then (%>%)
    • convert the resultant nested list into a nested tibble using as_tibble()
    • store the resultant tibble as df_json

We now have a nested tibble. Before unnesting it, notice that the current tibble effectively has duplicate rows with one row for each of the values in the original route_names list within list_json. To remedy this, we’re going to first filter the data to retain only the record for the Final round, before unnesting by the participants list-column which contains the results for the 8 finalists:

# filter df_json and then unnest by participants
df_json %>% 
  filter(route_names == 'Final') %>% 
  unnest(participants)
> # A tibble: 8 x 38
>   WetId GrpId route_order route_name route_judge route_iso_open route_iso_close
>   <chr> <chr> <chr>       <chr>      <chr>       <chr>          <chr>          
> 1 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 2 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 3 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 4 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 5 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 6 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 7 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> 8 7146  2     3           Final W O… Johannes A… 19:00          20:00          
> # … with 31 more variables: route_start <chr>, route_result <chr>,
> #   route_climbing_time <chr>, discipline <chr>, comp_name <chr>,
> #   comp_date <chr>, display_athlete <chr>, route_names <list>,
> #   category_offical <lgl>, PerId <chr>, acl <chr>, birthyear <chr>,
> #   city <chr>, fed_id <chr>, fed_url <chr>, federation <chr>, firstname <chr>,
> #   lastname <chr>, nation <chr>, rank_prev_heat <chr>, result <chr>,
> #   result_modified <chr>, result_rank <chr>, rkey <chr>, start_number <chr>,
> #   start_order <chr>, url <chr>, see_also <list>, categorys <list>,
> #   etag <chr>, expires <int>
  1. take df_json, then (%>%)
    • filter the data using the filter() command from the dplyr package to keep only records where route_names is equal to “Final”, then (%>%)
    • unnest by the participants list-column using the unnest() command

Finally, we have a flat tibble containing the results for each of the 8 finalists. Given that this worked for the women’s results, we should be able to achieve a similar outcome for the men’s results. This time, we’ll do it all in a single pipe:

# create a tidy tibble of men's finalist results 
'https://www.digitalrock.de/egroupware/ranking/json.php?comp=7146&cat=1&route=3' %>% 
  fromJSON() %>% 
  discard(is_empty) %>% 
  map_if(is.data.frame, list) %>% 
  as_tibble() %>% 
  filter(route_names == 'Final') %>% 
  unnest(participants)
> # A tibble: 8 x 39
>   WetId GrpId route_order route_name route_judge route_iso_open route_iso_close
>   <chr> <chr> <chr>       <chr>      <chr>       <chr>          <chr>          
> 1 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 2 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 3 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 4 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 5 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 6 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 7 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> 8 7146  1     3           Final M E… Johannes A… 17:30          18:30          
> # … with 32 more variables: route_start <chr>, route_result <chr>,
> #   route_climbing_time <chr>, discipline <chr>, comp_name <chr>,
> #   comp_date <chr>, display_athlete <chr>, route_names <list>,
> #   category_offical <lgl>, PerId <chr>, acl <chr>, birthyear <chr>,
> #   city <chr>, fed_id <chr>, fed_parent <chr>, fed_url <chr>,
> #   federation <chr>, firstname <chr>, lastname <chr>, nation <chr>,
> #   rank_prev_heat <chr>, result <chr>, result_modified <chr>,
> #   result_rank <chr>, rkey <chr>, start_number <chr>, start_order <chr>,
> #   url <chr>, see_also <list>, categorys <list>, etag <chr>, expires <int>
  1. take the url where the JSON results data is hosted, then (%>%)
  • read the data from the web into the R environment and convert it into an R list using fromJSON(), then (%>%)
  • use discard() from the purrr package to remove any empty elements from the list, then (%>%)
  • use map_if() to iterate over the list from the previous step and place any data.frame elements into lists, then (%>%)
  • convert the resultant nested list into a nested tibble using as_tibble(), then (%>%)
  • filter the data using the filter() command from the dplyr package to keep only records where route_names is equal to “Final”, then (%>%)
  • unnest by the participants list-column using the unnest() command

Conclusion

And there you have it. To summarise: the basic steps required to convert nested JSON into a nested - and subsequently a flat - tibble are:

  1. convert the JSON string into an R list object using jsonlite::fromJSON() function
  2. discard any empty elements from the list using purrr::discard(is.empty)
  3. place all data.frame elements in the list inside their own lists using purrr::map_if(data.frame, list)
  4. convert the nested list into a nested tibble using tibble::as_tibble()
  5. unnest by the list-column(s) of your choice using tidyr::unnest()

  1. Technically, we won’t be needing all of the packages that ship as part of the tidyverse, but I’m choosing to load tidyverse since (a) I’m too lazy to remember which functions come from which packages in the tidyverse, (b) I generally find that loading tidyverse in full has me convered for any additional add-hoc analyses I may want to do, and (c) doing so best serves my nefarious plot to coerce others into the tidyverse↩︎

  2. Though it’s fairly easy to view a neatly formatted summary of these results on the IFSC website, where’s the fun in that?↩︎