In this post I share some notes on how to interact with a GraphQL API with R. As an example, I will be using the OpenTarget Genetics platform API.
Understanding the query structure
It has been some time since I do some web development as a hobby. As such, I feel comfortable enough interacting with REST APIs; however when the time came and I had to interact with a GraphQL API, I found it a bit confusing.
Anyway, as with everything, I feel that the best way to understand these type of things is by going through some examples and experimenting.
Let’s now look at an example using the OpenTargets Genetics GraphQL API.
In the browser, we go to the docs and then click on query to see the type of queries we can perform.
Now, let’s take a look to the “genes” query:
genes(chromosome: String!,
start: Long!,
end: Long!):[Gene!]!
Here, the documentation is telling us that this type of query requires us to pass a chromosome, start and end positions, and that it will return an array of “Gene” Objects. *Note that the “!” after the data type indicates that the parameter is not optional. Now, if we click on “Gene” in the documentation, we see what fields we can obtain from this query:
Now lets make a query:
In the query above, I am requesting for all genes in chromosome “6” between the position 26,000,000–27,000,000. For this particular query that I called “myGenesQuery” I decided that I just wanted to get the genes’ symbol, bioType and description.
Performing queries in R
Now that we have some understanding on how the queries and their output is structured. Let’s see how we would perform this and other queries from R.
library(ghql)
library(jsonlite)
library(dplyr)
get_opentargets_url <- function() {
url <- 'https://api.genetics.opentargets.org/graphql'
return(url)
}
get_genes <- function(url=get_opentargets_url()) {
query <- 'query myGenesQuery {
genes(chromosome: "6",
start: 26000000,
end: 27000000) {
symbol,
bioType,
description
}
}'
conn <- GraphqlClient$new(url = url) #Create the connection
new <- Query$new()$query('url', query) #Pass the query
result <- conn$exec(new$url) %>%
fromJSON(flatten = TRUE) #Parse the JSON output to a data.frame
return(result)
}
I am using ghql library to connect and interact with the GraphQL API and jsonlite and dplyr to transform the output into something more R friendly.
The first function get_opentargets_url
is a dummy function that only returns the URL of the API. The function get_genes
defines our query, just as we wrote in the previous example, connects to the API and executes the query.
Now, when we execute the function:
results <- get_genes()
head(results$data$genes)
# symbol bioType
#1 HFE protein_coding
#2 BTN3A1 protein_coding
#3 BTN3A3 protein_coding
#4 BTN2A1 protein_coding
#5 BTN2A2 protein_coding
#6 BTN1A1 protein_coding
description
#1 homeostatic iron regulator [Source:HGNC Symbol;Acc:HGNC:4886]
#2 butyrophilin subfamily 3 member A1 [Source:HGNC Symbol;Acc:HGNC:1138]
#3 butyrophilin subfamily 3 member A3 [Source:HGNC Symbol;Acc:HGNC:1140]
#...
It returns what we expected.
Now, let’s make the function more useful. Currently it has the chromosome and positions hardcoded, so let’s fix that.
get_genes <- function(chr, start, end, url=get_opentargets_url()) {
query <- 'query myGenesQuery($chromosome: String!, $start: Long!, $end: Long!) {
genes(chromosome: $chromosome,
start: $start,
end: $end) {
symbol,
bioType,
description
}
}'
variable <- list(
chromosome = as.character(chr),
start = start,
end = end
)
conn <- GraphqlClient$new(url = url) #Create the connection
new <- Query$new()$query('url', query) #Pass the query
result <- conn$exec(new$url, variables = variable) %>%
fromJSON(flatten = TRUE) #Parse the JSON output to a data.frame
return(result)
}
In order to pass variables to the query, we first need to define the variables/parameters that will be passed to “myGenesQuery”. That’s why now we see myGenesQuery($chromosome: String!, $start: Long!, $end: Long!)
. Then, we pass the arguments via a list and during the execution conn$exec(new$url, variables = variable)
.
Let’s now try our new function with other positions:
get_genes("6", 100000, 2000000)
#$data
#$data$genes
# symbol bioType
#1 FOXC1 protein_coding
#2 DUSP22 protein_coding
#3 EXOC2 protein_coding
#4 IRF4 protein_coding
#5 FOXF2 protein_coding
#6 FOXQ1 protein_coding
#7 HUS1B protein_coding
# description
#1 forkhead box C1 [Source:HGNC Symbol;Acc:HGNC:3800]
#2 dual specificity phosphatase 22 [Source:HGNC Symbol;Acc:HGNC:16077]
#3 exocyst complex component 2 [Source:HGNC Symbol;Acc:HGNC:24968]
#....
And that’s everything for this quick weekend post. Hope anyone finds this useful.