R is my go to language for statistical analysis. However, I do not enjoy working with R shiny to develop web applications. I find that the code becomes convoluted very quickly and I have never been able to complete large projects using it.
Gladly, we have R Plumber - a package that makes ridiculously simple to create APIs with simple R code. In this small post I will go through some examples on how to build R plumber APIs.
- GET and POST requests
- Serialization
- Filters
- Endpoints with asynchronous functions
- Uploading files
- Serving static files
- Cross-Origin Resource Sharing (CORS)
- Splitting your API into several files
For the following examples, I use the following packages and versions: future_1.21.0
and plumber_1.1.0
. Note I had a few problems when serving static files with plumber_1.0.0
so I recommend at least version 1.1.
GET and POST requests
First things first. GET and POST are types of HTTP requests. GET is generally used to request data and sends the parameters through the URL. It is not recommended to use GET requests to change/update data in the server. For those cases, a POST request is preferred. In contrast to GET, a POST requests can send large (unlimited? depends on server!) amounts of data in the body of the request.
Something important to take into account when working with POST and GET requests with R plumber is that parameters sent through GET (i.e. through the URL) are taken as strings. Therefore if you want to send numbers, you need to transform these arguments in the Plumber endpoint. In contrast, when using a POST request, the types of data are sent and received as expected.
Without further ado, lets look at GET and POST examples:
#plumber.Rlibrary(plumber)#* @get /random_numbers
#* @param maxn
function(maxn) {
maxn<-as.numeric(maxn)
runif(1,min=0,max=maxn)
}#* @post /operation
#* @param numbers vector of numbers
#* @param metric
function(numbers, metric) {
if(metric == 'mean')
mean(numbers)
else if(metric == 'sd')
sd(numbers)
else if(metric == 'min')
min(numbers)
else if(metric == 'max')
max(numbers)
else
"Wrong metric! use mean, sd, min or max"
}
Lets assume that the script above lives in the file “plumber.R”. In order to the run this, we just do:
library(plumber)
r<-plumb("plumber_async.R")$run(port=9999)#The above will show:
#Running plumber API at http://127.0.0.1:9999
#Running swagger Docs at http://127.0.0.1:9999/__docs__/
Now that this mini API is running. Lets try the first endpoint. Note I’ll be using curl
in linux for these examples but you can test the API by going to http://127.0.0.1:9999/__docs__/:
curl http://127.0.0.1:9999/random_numbers?maxn=5
#[3.2213]
Here I made a GET request hitting the /random_numbers endpoint and sent the maxn=5 parameter. As can be seen in the API code, I had to transform this parameter into a number prior to using it as it is sent as a string. The result is a random number between 0 and 5 as per the function.
curl -X POST \
-H "Content-Type: application/json" \
-d '{"numbers":[1,3,5,12,30,3,100],"metric":"mean"}' \
http://localhost:9999/operation
#[22]
Here I am hitting the /operation endpoint with a POST request, sending a JSON object with an array of numbers and the parameter metric that indicates what I want to do with those numbers. In this case, I got the mean of the numbers I sent (i.e. 22). As can be seen in the code, for a POST request, it was not necessary to transform the data sent into numbers.
Serialization — what is it?
By default, R Plumber returns data in JSON format. However, we may want to get the resulting data in a different format. That is where serialization (i.e. the process of translating data into a different format) comes into play.
Lets look at a couple of examples:
library(plumber)#* @get /data
function(mean, sd) {
data.frame(x=seq(10), letters=letters[1:10])
}#* @get /data_csv
#* @serializer csv
function(mean, sd) {
data.frame(x=seq(10), letters=letters[1:10])
}
Here, a GET request to the endpoints /data and /data_csv will return the same data frame. However, /data_csv specifies #* @serializer csv
that indicates R to return the data in csv format, while /data returns the data in JSON:
curl http://localhost:9999/data
#Returns the data frame in JSON format:
{"x":1,"letters":"a"},{"x":2,"letters":"b"},....curl http://localhost:9999/data_csv
#Returns the data frame in csv format:
x,letters
1,a
2,b
3,c
Serialization can also be used to get plots from the API in e.g. PNG format. For example, the following endpoint generates a histogram from a random distribution with user parameters mean and sd. The endpoint uses #* @serializer png
to return the histogram as PNG.
#* @post /norm_dist
#* @param mean
#* @param sd
#* @serializer png
function(mean, sd) {
hist(rnorm(1000,mean,sd))
}
Making the POST request:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"mean":100,"sd":5}' \
--output test.png \
http://localhost:9999/norm_dist
Will get the histogram in the test.png file.
Filters
Plumber filters can be used for handling/modifying incoming requests. When a request is sent to Plumber, it will pass to each of the filters prior to going to the specified endpoint. For example:
library(plumber)
library(stringr)#* @filter logger
function(req, res){
cat(as.character(Sys.time()), "-",
req$REQUEST_METHOD, req$PATH_INFO, "-",
req$HTTP_USER_AGENT, "@", req$REMOTE_ADDR, "\n", append=TRUE, file="api_logs.txt")
plumber::forward()
}#* @filter removeSemicolon
function(req, res) {
req$args <- lapply(req$args, str_replace_all, pattern=";", replacement="")
plumber::forward()
}#* @get /show_query
#* @param my_query
function(my_query){
return(my_query)
}
The above examples show two filters. The first one “logger” gets information from the request and writes it to the “api_logs.txt” file. The second filter “removeSemicolon” modifies the args from the request by removing the semicolon. *Note this is not a good thing to do as may lead to unexpected behaviors. The example is just to illustrate that the request passes through the filter prior to hitting the endpoint. The function plumber::forward()
tells R plumber to continue to the next filter or requested endpoint.
Here can be seen in action:
curl http://localhost:9999/show_query?my_query="hello;world"
#This returns the query string without the semicolon
#["helloworld"]#And the api_logs.txt file contains:
api_logs.txt:2021-05-06 14:46:19 - GET /show_query - curl/7.61.0 @ 127.0.0.1
Endpoints with asynchronous functions
All code in R is synchronous. That is, functions are executed in a sequential manner, in order for R to go to the next function, the previous one needs to finish executing. This can be problematic when building APIs, as there may be an endpoint that takes long to return a value. If for example, a function takes 20 seconds, no user will be able to use the API during that time.
Fortunately there are R packages such as future
that allows writing asynchronous code. You can tell R that a specific function (or endpoint in plumber) needs to be executed asynchronously, and therefore the execution of such function won’t block other functions from executing. For more information about working with asynchronous code in R you can checkout this tutorial.
The following is a simple example on how to implement an asynchronous endpoint.
library(future)
library(plumber)
plan(multisession)#* @get /async_sqrt
#* @param n
function(n) {
future::future({
Sys.sleep(5)
x<-sqrt(as.numeric(n))
x
})
}#* @get /not_async_sqrt
#* @param n
function(n) {
Sys.sleep(5)
x<-sqrt(as.numeric(n))
x
}#* @get /sqrt
#* @param n
function(n) {
x<-sqrt(as.numeric(n))
x
}
Now lets test these:
curl http://localhost:9999/not_async_sqrt?n=25 & curl http://localhost:9999/sqrt?n=64
The command above will take 5 seconds and will first return sqrt(25) = 5 and then sqrt(64)=8. That is, it first had to wait 5 seconds as R had to finish executing the “not_async_sqrt” endpoint first.
Testing the async endpoint:
curl http://localhost:9999/async_sqrt?n=121 & curl http://localhost:9999/sqrt?n=36
As expected, the above will first return sqrt(36) = 6 and then sqrt(121)=11. Although the “async_sqrt” endpoint takes 5 seconds, this endpoint does not block the call to “sqrt” as we are telling R to execute this asynchronously with the function with future
with a multisession plan.
*Note that I used different numbers when testing the endpoints above as GET calls get cached!
Uploading files
In this section, I just want to go through a quick example on how can a file be uploaded to the server through a post request to the Plumber API. The following endpoint shows how this could be done:
#' @post /summarize
#' @serializer print
function(req){
multipart <- mime::parse_multipart(req)
data<-read.table(multipart$upload$datapath, header=T)
summary(data)
}
In this example, I am defining an endpoint that receives a POST request. I use the function mime::parse_multipart(req)
to obtain information about the file that was sent in the request. The path where the file was uploaded can then be accessed in multipart$upload$datapath
which will be a temporary directory. In this function, I expect the data will be in a format that read.table()
can read (of course, exceptions should be handled in a robust application). After reading the data, I use summary()
to return a summary. Note that I use the @serializer print
to ensure that the summary is returned as you would normally see it inside an R session.
Testing the endpoint:
curl -F upload=@iris.tsv http://localhost:9999/summarize #Returns the following:Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
Length:150
Class :character
Mode :character
Serving static files
Another very important part when building APIs is to be able to serve static files. In my case, this is particularly important as I wanted to replace apps developed in R shiny, to apps developed with a JavaScript framework, and a backend built in R.
In order to serve static files, is a simple as the following:
#* @assets ./images /images
list()#* @assets ./dist /
list()
Here I am indicating that files inside the ./dist directory should be reachable when going to the root endpoint “/” and files inside the ./images directory should be reached when going to the /images endpoint. For example, inside the ./dist directory, I could have an index.html along with some .css and .js files. I can then open this in the browser by going to http://localhost:9999/index.html
or whatever domain/port you are using. Likewise I could have some images (or other type of data) in the ./images directory and I should be able to reach them by going to http://localhost:9999/images/my_image.png
.
Cross-Origin Resource Sharing (CORS)
By default, Plumber forbids “cross-domain” requests. This means that the browser won’t allow requests from domains other than the one where the API is running (in our examples “localhost:9999”). Although this is a good thing, it is useful to allow cross-domain request during development (e.g. maybe you are testing your front end in localhost:4444). In order to allow requests from any domain, this can be achieved by adding the following to your API:
#' @filter cors
cors <- function(req, res) {
res$setHeader("Access-Control-Allow-Origin", "*")
if (req$REQUEST_METHOD == "OPTIONS") {
res$setHeader("Access-Control-Allow-Methods","*")
res$setHeader("Access-Control-Allow-Headers", req$HTTP_ACCESS_CONTROL_REQUEST_HEADERS)
res$status <- 200
return(list())
} else {
plumber::forward()
}
}
Splitting your API into several files
Something that I have found super useful is that Plumber (1.0 and above) allows mounting routes from different files. This is super useful to keep the API well organized and streamline support.
For example. Below I have two files with different endpoints:
#routes_1.R#* @get /random_numbers
#* @param maxn
function(maxn) {
maxn<-as.numeric(maxn)
runif(1,min=0,max=maxn)
}
and
#routes_2.R#* @get /sqrt
#* @param n
function(n) {
sqrt(as.numeric(n))
}
And another file to combine them:
#api.Rlibrary(plumber)#' @plumber
function(pr) {
pr %>%
pr_mount("/routes1", plumb("./routes_1.R")) %>%
pr_mount("/routes2", plumb("./routes_2.R"))
}
In this last one, I mount the endpoints from routes_1.R on /routes1 and those from routes_2.R on /routes2. In order to launch the API then I just do something like:
r<-plumb("api.r")$run(port=9999)
Finally, to test the endpoints:
curl http://localhost:9999/routes1/random_numbers?maxn=100
#Returns:
[46.752]curl http://localhost:9999/routes2/sqrt?n=100
#Returns:
[10]
And that is it for this post!