When writing web applications for data analyses, I quite enjoy writing the REST API using R plumber (which handles all the data processing) and the front-end using some Java Script framework.
In order to build graphs I normally just pull the data through a GET call to the R Plumber API and then handle the plotting with Plotly.js. However, I recently ran into a case where I wanted to create a heatmap with a dendogram, and this type of plots is not currently available in the library! As such, I decided to create the necessary plots using R and serve these through the R plumber API directly.
In this post I share how to serve animated plots (a barplot and a geoplot that shows the map of the US) using R Plotly and R plumber. As an example, I will be using data from the CDC’s United States COVID-19 Cases and Deaths by State over Time. At the end of this post, I include the full code used.
Obtaining the data
The very first thing to do is obviously get the data (i.e. the CDC’s COVID data, as well as population and location data per state).
library(googlesheets4)
library(tidyverse)
library(plotly)
#Load data
gs4_deauth() #Indicates that no auth is needed to get the data
state_data<-read_sheet("https://docs.google.com/spreadsheets/d/1pzQcbyaECVWbHK257pceKuznwJahe2A7bUNWJaMpOe0/edit?usp=sharing")
US_covid<-read_csv("https://data.cdc.gov/api/views/9mfq-cb36/rows.csv?accessType=DOWNLOAD&api_foundry=true")str(US_covid)
# spec_tbl_df [47,580 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
# $ submission_date: Date[1:47580], format: "2021-11-03" "2021-12-02" ...
# $ state : chr [1:47580] "KS" "UT" "AR" "MP" ...
# $ tot_cases : num [1:47580] 297229 359641 0 1104 0 ...
# $ conf_cases : num [1:47580] 241035 359641 NA 1104 NA ...
# $ prob_cases : num [1:47580] 56194 0 NA 0 NA ...
# $ new_case : num [1:47580] 0 1060 0 0 0 ...
# $ pnew_case : num [1:47580] 0 0 NA 0 0 0 0 0 10 264 ...
# $ tot_death : num [1:47580] 4851 1785 0 5 0 ...
# $ conf_death : num [1:47580] NA 1729 NA 5 NA ...
# $ prob_death : num [1:47580] NA 56 NA 0 NA ...
# $ new_death : num [1:47580] 0 11 0 2 0 0 0 0 0 8 ...
# $ pnew_death : num [1:47580] 0 2 NA 0 0 0 0 0 0 2 ...
# $ created_at : chr [1:47580] "03/12/2021 03:20:13 PM" str(state_data)
# tibble [52 × 7] (S3: tbl_df/tbl/data.frame)
# $ name : chr [1:52] "Alabama" "Alaska" "Arizona" "Arkansas" ...
# $ estimates_base_2020: num [1:52] 5024279 733391 7151502 3011524 39538223 ...
# $ pop_estimate_2020 : num [1:52] 5024803 732441 7177986 3012232 39499738 ...
# $ pop_estimate_2021 : num [1:52] 5039877 732673 7276316 3025891 39237836 ...
# $ state : chr [1:52] "AL" "AK" "AZ" "AR" ...
# $ lat : num [1:52] 32.3 63.6 34 35.2 36.8 ...
# $ long : num [1:52] -86.9 -154.5 -111.1 -91.8 -119.4 ...
The COVID data ( US_covid
) includes the number of COVID cases and deaths per state over time. The state_data
include the location of the states (latitude and longitude) that will be used by the geoplot as well as the total population per state. *Note that I am using the googlesheets4 library to read these data directly from google docs.
Transforming the data
Next, I transformed the data as follows:
#Merge with pupulation and location data
US_covid <- merge(US_covid, state_data, by="state")#Add case and death rate
US_covid <- US_covid %>%
mutate(tot_death = ifelse(tot_death > tot_cases, NA, tot_death)) %>% #Make sure that tot_death is not greater that tot_cases
mutate(prop_death_among_cases = ifelse(!is.nan(tot_death/tot_cases), tot_death/tot_cases, 0)) %>%
mutate(prop_cases_in_pop = tot_cases/pop_estimate_2021) %>%
mutate(prop_death_in_pop = tot_death/pop_estimate_2021) %>%
mutate(submission_date = as.Date(submission_date, format="%m/%d/%Y")) %>%
arrange(submission_date,state)
In the code above, I first merge the COVID dataset with the population and location data. Then, I make sure that total_death
is not greater than total_cases
. Finally, I create the variables that I want to show in the plots: percentage of deaths among cases, percentage of cases in the population and percentage of deaths in the population.
Creating the animated plots
As an example, I decided to create barplots and geoplots. The code of the barplots looks like:
#Total cases and deaths
total_barplot <- plot_ly(US_covid, x = ~ state, y = ~ tot_cases, frame = ~ submission_date, type = 'bar', name = 'COVID cases') %>%
add_trace(y = ~ tot_death, name = 'COVID deaths') %>%
layout(title="Total COVID cases and deaths per state", yaxis = list(title = 'Count'), barmode = 'group')#Cases and deaths rate
proportion_barplot <- plot_ly(US_covid, x = ~ state, y = ~ prop_cases_in_pop*100, frame = ~ submission_date, type = 'bar', name = '% of COVID cases in population') %>%
add_trace(y = ~ prop_death_among_cases*100, name = '% of COVID deaths among cases') %>%
add_trace(y = ~ prop_death_in_pop*100, name = '% of COVID deaths in population') %>%
layout(title="Propotion of COVID cases and deaths", yaxis = list(title = 'Count'), barmode = 'group')
total_barplot
is a barplot where the x axis is the states and the y axis is the total number of cases (trace 1) and total number of deaths (trace 2). The frame = ~ submission_date
will allow us to see the data in an animated fashion by redrawing (or updating, depending the animations configuration (see gist at the end of the post)) the figure per each set of values corresponding to submission_date.
The proportion_barplot
has the same logic, but it will should three different traces (% of cases in population, % of death in cases and % of deaths in population).
The geoplots are created using the following code:
g <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showland = TRUE,
landcolor = toRGB("gray85"),
subunitwidth = 1,
countrywidth = 1,
subunitcolor = toRGB("white"),
countrycolor = toRGB("white")
)US_covid_shaped <- US_covid %>% gather(total_type, total, tot_cases, tot_death)tota_geoplot <- plot_geo(US_covid_shaped, locationmode = 'USA-states', sizes = c(1, 10000)) %>%
add_markers(
x = ~ long, y = ~ lat, size = ~ total, color = ~total_type, frame = ~ submission_date, hoverinfo = "text",
text = ~paste(US_covid_shaped$name, US_covid_shaped$total, US_covid_shaped$total_type)
) %>% layout(title = 'US total COVID cases and deaths', geo = g)#Proportion geoplot
US_covid_shaped <- US_covid %>% gather(total_type, total, prop_death_among_cases, prop_cases_in_pop, prop_death_in_pop)proportion_geoplot <- plot_geo(US_covid_shaped, locationmode = 'USA-states', sizes = c(1, 10000)) %>%
add_markers(
x = ~ long, y = ~ lat, size = ~ total, color = ~total_type, frame = ~ submission_date, hoverinfo = "text",
text = ~paste(US_covid_shaped$name, US_covid_shaped$total, US_covid_shaped$total_type)
) %>% layout(title = 'US % of COVID cases and deaths', geo = g)
In the code above g
is a list that is used to configure the geoplot with the US map. Prior to plotting the data, I reshape the US_covid
data frame such that a column called total_type
stores whether the value in the column total indicates tot_cases or tot_death .
The plot_geo()
function takes the data US_covid_shaped
, the location and sizes (the mapping of size to pixels). Once the plot is defined, the longitude and latitude are assigned to the x and y arguments in add_markers()
, the total column in our data frame is assigned to the size of the marker, total_type will indicate the color of the marker, and just as in the barplot, the submission_date will be used as the frame to cycle through the data in the plot.
And that’s it for the plots. Now lets serve these as part of an R plumber application!
The R Plumber API
The full code of this mini application is shown in this gist.
As we can see, this application has two endpoints. The /total endpoint will render the plots based on the total cases and deaths, while the /rate one will render the plots for the percentages. Note that total_fig and prop_fig are htmlwidgets and therefore I configure the serializer with #* @serializer htmlwidget
such that the content renders correctly.
We can save this application as a file named plumber.R
and run it with the command: r<-plumb("plumber.R")$run(port=4300)
.
Finally, navigating to localhost:4300/total and localhost:4300/rate in the browser will show us these plots:
And that’s it. I learnt quite a bit while working on this post and I am very glad that the plotly authors have made it super easy to create animated plots. If you are reading this, I hope you found it useful!.