Nested data: JSON to tibble
If you’ve ever sourced data from the web, chances are that you will have encountered JSON - a lightweight and flexible data-interchange format. Apart from being easy to read and write, one of the foremost reasons for its ubiquity on the web is that it is particularly well-suited to storing data with complex hierarchical, nested structures. However, unless you’re using a language like JavaScript, JSON can be less than ideal for the purposes of actually analysing data. This is also true when trying to work with JSON data in R.
To analyse JSON data in R, ideally what we want is a way of first consistently converting it into tidy data (e.g. a tidy tibble). Fortunately, the fromJSON()
function from the excellent jsonlite
package makes converting JSON objects into R objects a pretty trivial task. Depending on the structure and complexity of the JSON data, however, converting the resultant R object into a tidy tibble can at times require a litte extra work.
In this post, I try to show how one can convert JSON data into tidy tibbles and, in particular, to illustrate what I’ve found to be a reasonably good, general approach to converting nested JSON into nested tibbles. I’ll be using three illustrative examples of increasing complexity before using the approach in the context of some real-world rock climbing competition data.
For the code that follows, we’ll need to load both the jsonlite and tidyverse packages.1:
# load required libraries
library(jsonlite)
library(tidyverse)
- load the
jsonlite
package since we’ll be using thefromJSON()
function - load
tidyverse
since we’ll be using a range ofpurrr
,tidyr
, anddplyr
funtions
If you want to follow along with the examples, you’ll also need to load the example JSON strings into your R environment:
# load json string examples
load('../../static/data/json_examples.rda')
- load the objects contained in
json_examples.rda
into the R environment. This file contains three JSON strings, respectively calledjson1
,json2
, andjson3
.
Example 1: easy does it…
Our first simple JSON string is contained in the json1
object and looks like this:
{
"year": [
2018
],
"customers": [
{
"name": "John"
},
{
"name": "Jane",
"surname": "Shaw"
}
]
}
Here we can see that our data contains a single entry for the year
field, followed by an array of customers
data which, in turn, includes fields for name
and surname
. Next, we’ll convert this JSON string into an R object and then take a glimpse at the resultant data structure:
#{r eval=FALSE, message=FALSE, warning=FALSE, include=TRUE, paged.print=FALSE}
# use jsonlite to convert json string to R object
list_json1 <- fromJSON(json1)
# get a glimpse of the list_json1 object
glimpse(list_json1)
> List of 2
> $ year : int 2018
> $ customers:'data.frame': 2 obs. of 2 variables:
> ..$ name : chr [1:2] "John" "Jane"
> ..$ surname: chr [1:2] NA "Shaw"
- use the
fromJSON()
function from the jsonlite package to convert the JSON string into a list object which we store aslist_json1
. - use
glimpse()
from the tibble package to look at the structure of thelist_json1
object.
So far, so easy. We can see that the JSON string has succesfully been converted into an R list object with year
automatically being detected as an integer and customers
automatically being detected as a data.frame. Next, we’ll convert this list into a single tibble:
# convert list_json1 to a tibble
df_json1 <- as_tibble(list_json1, validate = F)
# display df_json1
print(df_json1)
> # A tibble: 2 x 2
> year customers$name $surname
> <int> <chr> <chr>
> 1 2018 John <NA>
> 2 2018 Jane Shaw
- use
as_tibble()
from the tibble package to convert the list object into a tibble.as_tibble()
is similar toas.data.frame()
, but is a little more efficient and is stricter in the sense that, by default, (a) each element of the list passed to it must have the same length and (b) each element must be a one-dimensional (1d) vector or list. This is obviously not true oflist_json1
: the length of theyear
element is 1 whereas the length of thecustomers
element is 2. This is why we have to pass the optionvalidate = FALSE
to theas_tibble()
call. print()
df_json1
to the console
We now have our data as a tibble, but something seems rather odd: according to the output, the dimensions of our tibble is 2 x 2. However, we can clearly see three columns of data in the output! We can get a better sense of what’s happening under the hood by glimpsing the data again:
# get a glimpse of the df_json1 tibble
glimpse(df_json1)
> Observations: 2
> Variables: 2
> $ year <int> 2018, 2018
> $ customers <df[,2]> <data.frame[2 x 2]>
The output shows that our df_json1
tibble does indeed have only two variables, but that the second variable is itself a data.frame. Practically, what this means is that we would need to use two $
accessors in order to refer to the name
or surname
columns in df_json1
, e.g. df_json1$customers$name
and df_json1$customers$surname
. This seems needlessly cumbersome.
At this point, you may be tempted to simply unnest df_json1
. But, recall that unnest()
only works on list-columns and that customers
is a data.frame - not a list-column. df_json1
is not, in fact, nested at all. Yeesh!…
Put it in a list first
One “solution” to this “problem” comes courtesy of purrr
’s map_if()
command. Basically, all we need to do is ensure that any data.frame object in our list first gets placed into its own sub-list before we convert the list to a tibble:
# place any data.frame found in list_json1 into its own list
list_json1 <- map_if(list_json1, is.data.frame, list)
# get a glimpse of the list_json1 object
glimpse(list_json1)
> List of 2
> $ year : int 2018
> $ customers:List of 1
> ..$ :'data.frame': 2 obs. of 2 variables:
> .. ..$ name : chr [1:2] "John" "Jane"
> .. ..$ surname: chr [1:2] NA "Shaw"
- use
map_if()
from the purrr package to iterate over the elements inlist_json1
and apply a specified function if a certain condition is met. The condition being tested is whether the element in question is a data.frame. If this condition evaluates to TRUE, thelist()
function should be applied to that element (effectively putting it into a list).map_if()
will produce a list as output which we then re-assign to thelist_json1
object - use
glimpse()
from the tibble package to look at the new structure of thelist_json1
object.
The structure of list_json1
is now somewhat different: where before customers
was a data.frame, it is now a list containing a data.frame. The difference, though perhaps subtle at face value, is important since it means we now have nested list (i.e. a list of lists). My intuition tells me that we should therefore be able to convert this nested list into a nested tibble using the as_tibble()
function:
# convert list_json1 to a tibble
df_json1 <- as_tibble(list_json1)
# display df_json1
print(df_json1)
## # A tibble: 1 x 2
## year customers
## <int> <list>
## 1 2018 <df[,2] [2 × 2]>
- use
as_tibble()
from the tibble package to convert the list object into a tibble. Since each element inlist_json1
has length 1 (you can test this if you like) and is either a one-dimensional vector or list, we no longer need to pass the optionvalidate = FALSE
to theas_tibble()
call as we did before. print()
df_json1
to the console
Hey presto! That certainly looks like a nested tibble to me. Let’s make sure by trying to unnest the data into a unnested tibble:
# convert list_json1 to a tibble
df_json1 <- unnest(df_json1)
# display df_json1
print(df_json1)
## # A tibble: 2 x 3
## year name surname
## <int> <chr> <chr>
## 1 2018 John <NA>
## 2 2018 Jane Shaw
- use
unnest()
from the tidyr package to convert the nested tibble into an unnested tibble. Given that we only have one nested column of data in the tibble,unnest()
automatically detects that we want to unnest by thecustomers
column print()
df_json1
to the console
Success! We now have a tidy, flat tibble.
Example 2: taking it down a level
To test how well the approach outlined above generalises to other applications, let’s consider the somewhat more complex JSON string contained in json2
:
{
"year": [
2018
],
"customers": [
{
"name": "John",
"contact": [
{
"phone": "home",
"number": "333-3334"
}
]
},
{
"name": "Jane",
"contact": [
{
"phone": "home",
"number": "555-5556"
}
],
"surname": "Shaw"
}
]
}
json2
builds on json1
by adding contact
information for each customer. In this instance, the contact
field for each customer is a one-dimensional array containing a phone
type field and a corresponding phone number
field. This time, we’ll try to convert the JSON string into a nested tibble using a single pipe which combines all of the key steps used above. As before, we’ll print the result to the console:
# convert json2 into a nested list object
df_json2 <- fromJSON(json2) %>%
map_if(is.data.frame, list) %>%
as_tibble()
# get a glimpse of the list_json2 object
print(df_json2)
> # A tibble: 1 x 2
> year customers
> <int> <list>
> 1 2018 <df[,3] [2 × 3]>
- take
json2
and convert it into an R list usingfromJSON()
, then (%>%
)- use
map_if()
to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
- store the result of the pipe as
df_json2
- use
print()
df_json2
to the console
Once more, we now have a nested tibble. However, this is the result when we unnest the data:
# try to unnest df_json1
unnest(df_json2)
> # A tibble: 2 x 4
> year name contact surname
> <int> <chr> <list> <chr>
> 1 2018 John <df[,2] [1 × 2]> <NA>
> 2 2018 Jane <df[,2] [1 × 2]> Shaw
- use
unnest()
from the tidyr package to unnest the tibble.
Contrary to what you may have expected, the result is not simply a flat tibble, but a nested tibble, albeit one that is nested at a lower level of the nesting hierarchy than before. The key thing to notice here is that, since json2
has three hierarchically nested levels, this will also be reflected in our nested tibble. This means that we have to unnest twice in order to get df_json2
into a flat tibble:
# unnest df_json2 twice
df_json2 %>%
unnest() %>%
unnest()
> # A tibble: 2 x 5
> year name phone number surname
> <int> <chr> <chr> <chr> <chr>
> 1 2018 John home 333-3334 <NA>
> 2 2018 Jane home 555-5556 Shaw
- take
df_json2
, then (%>%
)- unnest it, then (
%>%
) - unnest it again to get it into a flat tibble
- unnest it, then (
Example 3: going sideways
Our approach now appears to be general enough to handle hierarchically nested objects of arbitrary depth. However, it is useful to consider the implications of having multiple lists of varying length at the same level of nesting. Consider json3
:
{
"year": [
2018
],
"customers": [
{
"name": "John",
"contact": [
{
"phone": "home",
"number": "333-3334"
}
],
"address": [
{
"type": "home",
"details": "4 Orangegrove lane, Elan, NC"
},
{
"type": "work",
"details": "13 Kombucha Close, Elan, NC"
}
]
},
{
"name": "Jane",
"contact": [
{
"phone": "home",
"number": "555-5556"
}
],
"address": [
{
"type": "home",
"details": "4 Orangegrove lane, Elan, NC"
},
{
"type": "work",
"details": "1st First street, Jinkies, NC"
}
],
"surname": "Shaw"
}
]
}
json3
adds an address
array for each customer. Note that this array is at the same level of nesting as the contact
array. In other words, json3
has the same nesting depth as json2
. However, let’s see what happens when we try to use the same procedure to first convert json3
into a nested tibble and then convert it to a flat tibble by unnesting twice:
# convert json3 into a nested tibble and then try to unnest twice
json3 %>%
fromJSON() %>%
map_if(is.data.frame, list) %>%
as_tibble() %>%
unnest() %>%
unnest()
> # A tibble: 4 x 7
> year name phone number type details surname
> <int> <chr> <chr> <chr> <chr> <chr> <chr>
> 1 2018 John home 333-3334 home 4 Orangegrove lane, Elan, NC <NA>
> 2 2018 John home 333-3334 work 13 Kombucha Close, Elan, NC <NA>
> 3 2018 Jane home 555-5556 home 4 Orangegrove lane, Elan, NC Shaw
> 4 2018 Jane home 555-5556 work 1st First street, Jinkies, NC Shaw
- take
json3
and convert it into an R list usingfromJSON()
, then (%>%
)- use
map_if()
to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
, then (%>%
) - try to unnest the tibble twice
- use
That obviously didn’t work. We can see why by going one step back and unnesting the data once first:
# convert json3 into a nested tibble and then unnest once
json3 %>%
fromJSON() %>%
map_if(is.data.frame, list) %>%
as_tibble() %>%
unnest()
> # A tibble: 2 x 5
> year name contact address surname
> <int> <chr> <list> <list> <chr>
> 1 2018 John <df[,2] [1 × 2]> <df[,2] [2 × 2]> <NA>
> 2 2018 Jane <df[,2] [1 × 2]> <df[,2] [2 × 2]> Shaw
- take
json3
and convert it into an R list usingfromJSON()
, then (%>%
)- use
map_if()
to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
, then (%>%
) - unnest the tibble
- use
We now see that, while contact
and address
are both list-columns, they have different numbers of rows. What effectively happens when we add a second call to unnest()
in our pipe without any further arguments is that R tries to unnest by contact
and address
simultaneously. But, given the different dimensions of the underlying columns it cannot do so. Hence the error we got before. To get around this we need to be a bit more explicit in terms of how we ask R to unnest the data. Essentially, we have to choose the sequence in which the tibble should be unnested. E.g.:
# convert json3 into a nested tibble and then unnest twice
json3 %>%
fromJSON() %>%
map_if(is.data.frame, list) %>%
as_tibble() %>%
unnest(customers) %>%
unnest(contact) %>%
unnest(address)
> # A tibble: 4 x 7
> year name phone number type details surname
> <int> <chr> <chr> <chr> <chr> <chr> <chr>
> 1 2018 John home 333-3334 home 4 Orangegrove lane, Elan, NC <NA>
> 2 2018 John home 333-3334 work 13 Kombucha Close, Elan, NC <NA>
> 3 2018 Jane home 555-5556 home 4 Orangegrove lane, Elan, NC Shaw
> 4 2018 Jane home 555-5556 work 1st First street, Jinkies, NC Shaw
- take
json3
and convert it into an R list usingfromJSON()
, then (%>%
)- use
map_if()
to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
, then (%>%
) - unnest the tibble by the
customers
list-column, then (%>%
) - unnest the tibble by the
contact
list-column, then (%>%
) - unnest the tibble by the
address
list-column, then (%>%
)
- use
It’s worth noting that the order of our unnesting sequence matters. If we choose to first unnest by customers
and then by address
, unnest()
’s default behaviour would be to drop the contact
list-column unless we explicitly specify the option .preserve = 'contact'
when issuing the command. In other words, the following code would be required to produce a similar unnested tibble to the one shown above:
# convert json3 into a nested tibble and then unnest twice
json3 %>%
fromJSON() %>%
map_if(is.data.frame, list) %>%
as_tibble() %>%
unnest(customers) %>%
unnest(address, .preserve = 'contact') %>%
unnest(contact)
> # A tibble: 4 x 7
> year name phone number type details surname
> <int> <chr> <chr> <chr> <chr> <chr> <chr>
> 1 2018 John home 333-3334 home 4 Orangegrove lane, Elan, NC <NA>
> 2 2018 John home 333-3334 work 13 Kombucha Close, Elan, NC <NA>
> 3 2018 Jane home 555-5556 home 4 Orangegrove lane, Elan, NC Shaw
> 4 2018 Jane home 555-5556 work 1st First street, Jinkies, NC Shaw
- take
json3
and convert it into an R list usingfromJSON()
, then (%>%
)- use
map_if()
to iterate over the resultant list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
, then (%>%
) - unnest the tibble by the
customers
list-column, then (%>%
) - unnest the tibble by the
address
list-column and add the option.preserve = 'contact'
to ensure that thecontact
column doesn’t get dropped in the unnesting process, then (%>%
) - unnest the tibble by the
contact
list-column
- use
Application: IFSC climbing results
Let’s see how our approach performs in the real world. The application I’m going to consider here is the overall results for the men’s and women’s lead climbing finalists from the most recent round of the IFSC Climbing World cup, held in Briançon, France on July 21st 2018.2 It’s pretty easy to get detailed results for this competition from the web in JSON format and my objective is to get these results into R and convert them into a tidy tibble.
We’ll start with the results for the women’s competition first. Our first step is simply to read the JSON string into R from the web and then convert it to an R list:
# create a variable to store the url for the climbing results
results_url <- 'https://www.digitalrock.de/egroupware/ranking/json.php?comp=7146&cat=2&route=3'
# read the json string into memory from the web and convert to r list
list_json <- fromJSON(results_url)
- create a variale to store the url where the JSON results data is hosted
- read the data from the web into the R environment and convert it into an R list using
fromJSON()
. Store the result inlist_json
Since the data we’re dealing with here is rather more complex than what was considered in the examples above, this time we’ll simply glimpse at the top-level structure of the resultant list_json
object:
# glimpse at the top level structure of list_json
glimpse(list_json, max.level = 1)
> List of 22
> $ WetId : chr "7146"
> $ GrpId : chr "2"
> $ route_order : chr "3"
> $ route_name : chr "Final W O M E N lead"
> $ route_judge : chr "Johannes Altner"
> $ route_iso_open : chr "19:00"
> $ route_iso_close : chr "20:00"
> $ route_start : chr "21:00"
> $ route_result : chr "21/07/2018 à 22:03:23"
> $ route_climbing_time: chr "6'"
> $ discipline : chr "lead"
> $ comp_name : chr "IFSC Climbing Worldcup (L) - Briançon (FRA) 2018 "
> $ comp_date : chr "2018-07-20"
> $ display_athlete : chr "nation"
> $ route_names :List of 5
> $ category_offical : logi TRUE
> $ current : list()
> $ participants :'data.frame': 8 obs. of 18 variables:
> $ see_also :'data.frame': 2 obs. of 2 variables:
> $ categorys :'data.frame': 2 obs. of 3 variables:
> $ etag : chr "\"4010c71c513dbde660fb7c10cc7e0961\""
> $ expires : int 86400
- use
glimpse()
from the tibble package to get a summary of the structure oflist_json
. use themax.level = 1
option to specify that we only want to see the top-level structure
From this glimpse we see that we have one list element (route_names
) and several data.frame elemens in list_json
. Let’s see if we can convert this into a nested tibble using our approach. I mean, it shoud work, right?…
# convert json3 into a nested tibble and then unnest twice
list_json %>%
map_if(is.data.frame, list) %>%
as_tibble()
> Error: Tibble columns must have consistent lengths, only values of length one are recycled:
> * Length 0: Column `current`
> * Length 5: Column `route_names`
- take
list_json
, then (%>%
)- use
map_if()
to iterate over the list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
- use
Alas… The issue here is that, unlike all of the other elements, the current
element in list_json
is effectively empt. This is problematic since as_tibble()
requires all elements to have the same length. Fortunately, we need only a minor addition to the code to take care of this. We’ll use purrr
’s discard()
function to first get rid of any empty elements in list_json
before performing any further operations on it:
# convert json3 into a nested tibble and then unnest twice
df_json <- list_json %>%
discard(is_empty) %>%
map_if(is.data.frame, list) %>%
as_tibble()
# print the result to the console
print(df_json)
> # A tibble: 5 x 21
> WetId GrpId route_order route_name route_judge route_iso_open route_iso_close
> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
> 1 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 2 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 3 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 4 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 5 7146 2 3 Final W O… Johannes A… 19:00 20:00
> # … with 14 more variables: route_start <chr>, route_result <chr>,
> # route_climbing_time <chr>, discipline <chr>, comp_name <chr>,
> # comp_date <chr>, display_athlete <chr>, route_names <named list>,
> # category_offical <lgl>, participants <list>, see_also <list>,
> # categorys <list>, etag <chr>, expires <int>
- take
list_json
, then (%>%
)- use
discard()
from the purrr package to remove any empty elements from the list, then (%>%
) - use
map_if()
to iterate over the list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
- store the resultant tibble as
df_json
- use
We now have a nested tibble. Before unnesting it, notice that the current tibble effectively has duplicate rows with one row for each of the values in the original route_names
list within list_json
. To remedy this, we’re going to first filter the data to retain only the record for the Final round, before unnesting by the participants
list-column which contains the results for the 8 finalists:
# filter df_json and then unnest by participants
df_json %>%
filter(route_names == 'Final') %>%
unnest(participants)
> # A tibble: 8 x 38
> WetId GrpId route_order route_name route_judge route_iso_open route_iso_close
> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
> 1 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 2 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 3 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 4 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 5 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 6 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 7 7146 2 3 Final W O… Johannes A… 19:00 20:00
> 8 7146 2 3 Final W O… Johannes A… 19:00 20:00
> # … with 31 more variables: route_start <chr>, route_result <chr>,
> # route_climbing_time <chr>, discipline <chr>, comp_name <chr>,
> # comp_date <chr>, display_athlete <chr>, route_names <list>,
> # category_offical <lgl>, PerId <chr>, acl <chr>, birthyear <chr>,
> # city <chr>, fed_id <chr>, fed_url <chr>, federation <chr>, firstname <chr>,
> # lastname <chr>, nation <chr>, rank_prev_heat <chr>, result <chr>,
> # result_modified <chr>, result_rank <chr>, rkey <chr>, start_number <chr>,
> # start_order <chr>, url <chr>, see_also <list>, categorys <list>,
> # etag <chr>, expires <int>
- take
df_json
, then (%>%
)- filter the data using the
filter()
command from the dplyr package to keep only records whereroute_names
is equal to “Final”, then (%>%
) - unnest by the
participants
list-column using theunnest()
command
- filter the data using the
Finally, we have a flat tibble containing the results for each of the 8 finalists. Given that this worked for the women’s results, we should be able to achieve a similar outcome for the men’s results. This time, we’ll do it all in a single pipe:
# create a tidy tibble of men's finalist results
'https://www.digitalrock.de/egroupware/ranking/json.php?comp=7146&cat=1&route=3' %>%
fromJSON() %>%
discard(is_empty) %>%
map_if(is.data.frame, list) %>%
as_tibble() %>%
filter(route_names == 'Final') %>%
unnest(participants)
> # A tibble: 8 x 39
> WetId GrpId route_order route_name route_judge route_iso_open route_iso_close
> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
> 1 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 2 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 3 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 4 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 5 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 6 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 7 7146 1 3 Final M E… Johannes A… 17:30 18:30
> 8 7146 1 3 Final M E… Johannes A… 17:30 18:30
> # … with 32 more variables: route_start <chr>, route_result <chr>,
> # route_climbing_time <chr>, discipline <chr>, comp_name <chr>,
> # comp_date <chr>, display_athlete <chr>, route_names <list>,
> # category_offical <lgl>, PerId <chr>, acl <chr>, birthyear <chr>,
> # city <chr>, fed_id <chr>, fed_parent <chr>, fed_url <chr>,
> # federation <chr>, firstname <chr>, lastname <chr>, nation <chr>,
> # rank_prev_heat <chr>, result <chr>, result_modified <chr>,
> # result_rank <chr>, rkey <chr>, start_number <chr>, start_order <chr>,
> # url <chr>, see_also <list>, categorys <list>, etag <chr>, expires <int>
- take the url where the JSON results data is hosted, then (
%>%
)
- read the data from the web into the R environment and convert it into an R list using
fromJSON()
, then (%>%
) - use
discard()
from the purrr package to remove any empty elements from the list, then (%>%
) - use
map_if()
to iterate over the list from the previous step and place any data.frame elements into lists, then (%>%
) - convert the resultant nested list into a nested tibble using
as_tibble()
, then (%>%
) - filter the data using the
filter()
command from the dplyr package to keep only records whereroute_names
is equal to “Final”, then (%>%
) - unnest by the
participants
list-column using theunnest()
command
Conclusion
And there you have it. To summarise: the basic steps required to convert nested JSON into a nested - and subsequently a flat - tibble are:
- convert the JSON string into an R list object using
jsonlite::fromJSON()
function - discard any empty elements from the list using
purrr::discard(is.empty)
- place all data.frame elements in the list inside their own lists using
purrr::map_if(data.frame, list)
- convert the nested list into a nested tibble using
tibble::as_tibble()
- unnest by the list-column(s) of your choice using
tidyr::unnest()
Technically, we won’t be needing all of the packages that ship as part of the tidyverse, but I’m choosing to load tidyverse since (a) I’m too lazy to remember which functions come from which packages in the tidyverse, (b) I generally find that loading tidyverse in full has me convered for any additional add-hoc analyses I may want to do, and (c) doing so best serves my nefarious plot to coerce others into the tidyverse…↩︎
Though it’s fairly easy to view a neatly formatted summary of these results on the IFSC website, where’s the fun in that?↩︎