Contents

1 Introduction

MGnifyR is a package designed to ease access to the EBI’s MGnify resource, allowing searching and retrieval of multiple datasets for downstream analysis.

The latest version of MGnifyR seamlessly integrates with the miaverse framework providing access to cutting-edge tools in microbiome down-stream analytics.

2 Installation

MGnifyR is currently hosted on GitHub, and can be installed using via devtools. MGnifyR should be built using the following snippet.

BiocManager::install(MGnifyR)

3 Load MGnifyR package

Once installed, MGnifyR is made available in the usual way.

library(MGnifyR)

4 Create a client

All functions in MGnifyR make use of a MgnifyClient object to keep track of the JSONAPI url, disk cache location and user access tokens. Thus the first thing to do when starting any analysis is to instantiate this object. The following snippet creates this.

mg <- MgnifyClient()
mg

5 Functions for fetching the data

5.1 Search data

Below, we fetch information on samples of drinking water.

# Fetch studies
samples <- doQuery(
    mg,
    type = "samples",
    biome_name = "root:Environmental:Aquatic:Freshwater:Drinking water",
    max.hits = 10)

# For demonstrative purpose, take only few samples
set.seed(595)
samples <- samples[ sample(rownames(samples), 5), ]

5.2 Find relevent analyses accessions

Now we want to find analysis accessions. Each sample might have multiple analyses. Each analysis ID corresponds to a single run of a particular pipeline on a single sample in a single study.

analyses_accessions <- searchAnalysis(mg, "samples", samples$accession)

head(analyses_accessions)

5.3 Fetch metadata

We can now check the metadata to get hint of what kind of data we have.

analyses_metadata <- getMetadata(mg, analyses_accessions)

head(analyses_metadata)

5.4 Fetch microbiome data

After we have selected the data to fetch, we can use getResult()

The output is TreeSummarizedExperiment (TreeSE) or MultiAssayExperiment (MAE) depending on the dataset. If the dataset includes only taxonomic profiling data, the output is a single TreeSE. If dataset includes also functional data, the output is multiple TreeSE objects that are linked together by utilizing MAE.

mae <- getResult(mg, accession = analyses_accessions)
mae

You can get access to individual TreeSE object in MAE by specifying index or name.

mae[[1]]

TreeSE object is uniquely positioned to support SummarizedExperiment-based microbiome data manipulation and visualization. Moreover, it enables access to miaverse tools. For example, we can estimate diversity of samples.

mae[[1]] <- estimateDiversity(mae[[1]], index = "shannon")

library(scater)

plotColData(mae[[1]], "shannon", x = "sample_environment.feature")
# Agglomerate data
altExps(mae[[1]]) <- splitByRanks(mae[[1]])

library(miaViz)

# Plot top taxa
top_taxa <- getTopFeatures(altExp(mae[[1]], "Phylum"), 10)
plotAbundance(altExp(mae[[1]], "Phylum")[top_taxa, ], rank = "Phylum")

We can perform principal component analysis to microbial profiling data by utilizing miaverse tools.

# Apply relative transformation
mae[[1]] <- transformCounts(mae[[1]], method = "relabundance")
# Perform PCoA
mae[[1]] <- runMDS(
    mae[[1]], assay.type = "relabundance",
    FUN = vegan::vegdist, method = "bray")
# Plot
plotReducedDim(mae[[1]], "MDS", colour_by = "sample_environment.feature")

5.5 Fetch sequence files

Finally, we can use searchFile() and getFile() to retrieve other MGnify pipeline outputs such as merged sequence reads, assembled contigs, and details of the functional analyses.

With searchFile(), we can search files from the database.

# Find list of available downloads, and filter for 
dl_urls <- searchFile(mg, analyses_accessions, type = "analyses")
target_urls <- dl_urls[
    dl_urls$attributes.description.label == "Predicted CDS with annotation", ]
head(target_urls)

Finally, we can download the files with getFile().

# Just select a single file from the target_urls list for demonstration.
cached_location <- getFile(mg, target_urls$download_url[[1]])

# Where are the files?
c(cached_location)
sessionInfo()