Skip to contents

🦉 pallas: An R package to compose and query SPARQL.

Introduction

The pallas package allows users to query SPARQL endpoints from R. As composing SPARQL queries can be complicated, pallas also provides a set of utilities in the S7 framework to assist the user.

Installation instructions

Get the latest stable R release from CRAN. Then install pallas from GitHub with remotes:

install.packages("remotes")
remotes::install_github("minotau-R/pallas")

We can use map_endpoint() to take information about what SPARQL terms, namely classes and predicates, are defined at a given endpoint. map_endpoint() takes the URL of an endpoint as input and returns an object of the OWL S7 class.

endpoint_url = "https://sparql.uniprot.org/"
x <- map_endpoint(endpoint_url)
#> adding rname '5e918bfe31a47685419ba31861ec8b48ec502dbd71a767e0b138d60c40148804'

OWL objects contain Web Ontology Language (OWL). A typical SPARQL query consists of at least three parts:

  1. prefix declarations
  2. query type (SELECT, ASK, CONSTRUCT, DESCRIBE) and arguments
  3. WHERE clause This is also reflected in the OWL class:
x
#> pallas::OWL tbl S7_object.
#> PREFIX core: <http://purl.uniprot.org/core/>
#> PREFIX x01: <http://www.w3.org/2000/01/>
#> SELECT * 
#> WHERE
#> {
#> 
#> }
#> No 'where' found. Add where-clauses with `where_clause()`.

Printing the object will show our SPARQL code so far. It will also give some remarks regarding the SPARQL code we want to write. We can specify where-clauses using the function where_clause():

x |>
    where_clause(C.Enzyme(?a)) |>
    where_clause(P.alternativeName(g:h ~ ?j)) |>
    where_clause(P.activity(?d ~ e:f),
                 C.Cluster(b:c))
#> pallas::OWL tbl S7_object.
#> PREFIX core: <http://purl.uniprot.org/core/>
#> PREFIX x01: <http://www.w3.org/2000/01/>
#> SELECT * 
#> WHERE
#> {
#> ?a a core:Enzyme .
#> g:h core:alternativeName ?j .
#> ?d core:activity e:f .
#> b:c a core:Cluster .
#> }
res <- x |> 
  select_query("SELECT ?protein") |>
  where_clause(
    C.Protein(?protein),
    triple("?protein", "core:mnemonic", "'A4_HUMAN'")
    ) |> 
  as.SPARQL() |>
  send_query(endpoint_url = "https://sparql.uniprot.org/")
#> adding rname 'ba54eca6524fe4552b036fff5457e43adc2d8a8a9b05382d41d09ebd4574f3f0'

res
#>                                  protein
#> 1 http://purl.uniprot.org/uniprot/P05067