MGnifyR 0.99.11
MGnifyR
is a package designed to ease access to the EBI’s
MGnify resource, allowing searching and
retrieval of multiple datasets for downstream analysis.
The latest version of MGnifyR seamlessly integrates with the miaverse framework providing access to cutting-edge tools in microbiome down-stream analytics.
MGnifyR
is currently hosted on GitHub, and can be installed using via
devtools
. MGnifyR
should be built using the following snippet.
BiocManager::install(MGnifyR)
MGnifyR
packageOnce installed, MGnifyR
is made available in the usual way.
library(MGnifyR)
All functions in MGnifyR
make use of a MgnifyClient
object to keep track
of the JSONAPI url, disk cache location and user access tokens. Thus the first
thing to do when starting any analysis is to instantiate this object. The
following snippet creates this.
mg <- MgnifyClient()
mg
Below, we fetch information on samples of drinking water.
# Fetch studies
samples <- doQuery(
mg,
type = "samples",
biome_name = "root:Environmental:Aquatic:Freshwater:Drinking water",
max.hits = 10)
# For demonstrative purpose, take only few samples
set.seed(595)
samples <- samples[ sample(rownames(samples), 5), ]
Now we want to find analysis accessions. Each sample might have multiple analyses. Each analysis ID corresponds to a single run of a particular pipeline on a single sample in a single study.
analyses_accessions <- searchAnalysis(mg, "samples", samples$accession)
head(analyses_accessions)
We can now check the metadata to get hint of what kind of data we have.
analyses_metadata <- getMetadata(mg, analyses_accessions)
head(analyses_metadata)
After we have selected the data to fetch, we can use getResult()
The output is TreeSummarizedExperiment (TreeSE
) or
MultiAssayExperiment (MAE
) depending on the dataset.
If the dataset includes only taxonomic profiling data, the output is a single
TreeSE
. If dataset includes also functional data, the output is multiple
TreeSE
objects that are linked together by utilizing MAE
.
mae <- getResult(mg, accession = analyses_accessions)
mae
You can get access to individual TreeSE
object in MAE
by specifying
index or name.
mae[[1]]
TreeSE
object is uniquely positioned to support SummarizedExperiment
-based
microbiome data manipulation and visualization. Moreover, it enables access
to miaverse
tools. For example, we can estimate diversity of samples.
mae[[1]] <- estimateDiversity(mae[[1]], index = "shannon")
library(scater)
plotColData(mae[[1]], "shannon", x = "sample_environment.feature")
# Agglomerate data
altExps(mae[[1]]) <- splitByRanks(mae[[1]])
library(miaViz)
# Plot top taxa
top_taxa <- getTopFeatures(altExp(mae[[1]], "Phylum"), 10)
plotAbundance(altExp(mae[[1]], "Phylum")[top_taxa, ], rank = "Phylum")
We can perform principal component analysis to microbial profiling data by utilizing miaverse tools.
# Apply relative transformation
mae[[1]] <- transformCounts(mae[[1]], method = "relabundance")
# Perform PCoA
mae[[1]] <- runMDS(
mae[[1]], assay.type = "relabundance",
FUN = vegan::vegdist, method = "bray")
# Plot
plotReducedDim(mae[[1]], "MDS", colour_by = "sample_environment.feature")
Finally, we can use searchFile()
and getFile()
to retrieve other MGnify
pipeline outputs such as merged sequence reads, assembled contigs, and details
of the functional analyses.
With searchFile()
, we can search files from the database.
# Find list of available downloads, and filter for
dl_urls <- searchFile(mg, analyses_accessions, type = "analyses")
target_urls <- dl_urls[
dl_urls$attributes.description.label == "Predicted CDS with annotation", ]
head(target_urls)
Finally, we can download the files with getFile()
.
# Just select a single file from the target_urls list for demonstration.
cached_location <- getFile(mg, target_urls$download_url[[1]])
# Where are the files?
c(cached_location)
sessionInfo()