Introduction

What is it for?

BulkSignalR is a Bioconductor library that enables the inference of L-R interactions from bulk expression data sets, i.e., from transcriptomics (RNA-seq or microarrays) or expression proteomics.

BulkSignalR also applies to spatial transcriptomics such as 10x Genomics VISIUM:TM:, and a set of functions dedicated to spatial data has been added to better support this particular use of the library.

Starting point

There is a variety of potential data sources that can be used with BulkSignalR, e.g., expression proteomics or RNA sequencing. Such data are typically represented as a matrix of numbers representing the abundance of molecules (gene transcripts or proteins) across samples. This matrix may have been normalized already or not prior the use of BulkSignalR. In both cases, the latter matrix is the input data for BulkSignalR analysis.

It is mandatory that genes/proteins are represented as rows of the expression matrix and the samples as columns. HUGO gene symbols (even for proteins) must be used to ensure matching LRdb, Reactome, and GOBP contents.

You can also work with an object from the SummarizedExperiment or SpatialExperiment Bioconductor classes directly.

How does it work?

As represented in the figure below, only a few steps are required in order to perform a BulkSignalR analysis. Three S4 objects will be sequentially constructed:

BSRDataModel, denoted here as bsrdm
BSRInference, denoted here as bsrinf
BSRSignature, denoted here as bsrsig

BSRDataModel comprises the expression data matrix and the parameters of the statistical model learned from this matrix.

BSRInference provides various lists that contain the ligands, receptors, downstream pathways as well as the target genes, their correlations and statistical significance for all the L-R interactions.

BSRSignature contains gene signatures associated with the triples (ligand, receptor, downstream pathway) stored in a BSRInference object. Those signatures are comprised of the ligand and the receptor obviously, but also the pathway target genes that were used in the statistical model. Gene signatures are meant to report the L-R interaction as a global phenomenon integrating its downstream effect. Indeed, signatures can be subsequently scored with a dedicated function allowing the user to obtain a numerical value representing the activity of the L-R interactions (with downstream consequences) across the samples. Signature scores are returned as a matrix (one row per L-R interaction and one column per sample).

Because of the occurrence of certain receptors in multiple pathways, and also because some ligands may bind several receptors, or vice versa, BSRInference objects may contain redundant data depending on how the user wants to look at them. We therefore provide a range of reduction operators meant to obtain reduced BSRInference objects (see below).

Furthermore, we provide several handy functions to explore the data through different plots (heatmaps, alluvial plots, chord diagrams or networks).

BulkSignalR library functions have many parameters that can be changed by the user to fit specific needs (see Reference Manual for details).

Parallel mode settings

Users can reduce compute time by using several processors in parallel.

Code

library(doParallel)
n.proc <- 1
cl <- makeCluster(n.proc)
registerDoParallel(cl)

# To add at the end of your script:
# stopCluster(cl)

Notes: For operating systems that can fork such as the UNIX-like systems, it might be preferable to use the library doMC that is faster (less overhead). This is transparent to BulkSignalR.

In case you need to reproduce results exactly and since statistical model parameter learning involves the generation of randomized expression matrices, you might want to use set.seed(). In a parallel mode, iseed that is a parameter of clusterSetRNGStream must be used.

First Example

Loading the data

Here, we load salivary duct carcinoma (SDC) bulk transcriptomes integrated as sdc in BulkSignalR.

Code

data(sdc)
head(sdc)

##          SDC16 SDC15 SDC14 SDC13 SDC12 SDC11 SDC10 SDC9 SDC8 SDC7 SDC6 SDC5
## TSPAN6    1141  2491  1649  3963  2697  3848  3115  495 1777 1997 1682 1380
## TNMD         3     0     2    48     0     0    10    0  195    0    4    0
## DPM1      1149  2105  1579  2597  2569  2562  1690  964  980 3323 3409  743
## SCYL3     1814   768  3513  1402  1007  1243  2501 3080  939 1190 1489 1097
## C1orf112   438  1016   507  1149   598  1017   464  531  388 1088 1719  625
## FGR        381   922   307   256   249   902   304  355  410 1054  265  121
##          SDC4 SDC3 SDC2 SDC1 SDC17 SDC19 SDC20  N20 SDC21 SDC22  N22 SDC23
## TSPAN6   1392 2490 1841 1883  1915  4093  3127 8035  4738  2994 9782  3866
## TNMD        0    0    0    1     5     0    18   21     6     2   27     0
## DPM1     1147 1705 1972 3048  2405  2943  2710 1688  2654  2772 2018  1610
## SCYL3     507 1545 2450 2006   910  1842  2947 1058  2636  4199 1173  2980
## C1orf112  318 1338 3065  723   509  1320   767  179   718  1498  147   629
## FGR       408  813  787 1152  1049  4205   722  149   584   701  164   842
##          SDC24 SDC25
## TSPAN6    3014  1765
## TNMD        14     3
## DPM1      2875  2357
## SCYL3     2079   903
## C1orf112  1008   521
## FGR        676   654

Building a BSRDataModel object

The constructor BSRDataModel creates the first object from SDC data above.

Code

bsrdm <- BSRDataModel(counts = sdc)
bsrdm

## Expression values are log2-transformed: FALSE
## Normalization method: UQ
## Organism: hsapiens
## Statistical model parameters:
## List of 1
##  $ spatial.smooth: logi FALSE
## Expression data:
##                SDC16     SDC15       SDC14     SDC13     SDC12     SDC11
## TSPAN6   1285.996070 2245.8788 1651.184018 3319.4416 2573.9214 3565.2017
## TNMD        3.381234    0.0000    2.002649   40.2052    0.0000    0.0000
## DPM1     1295.012694 1897.8623 1581.091307 2175.2687 2451.7627 2373.7128
## SCYL3    2044.519606  692.4267 3517.652793 1174.3268  961.0452 1151.6491
## C1orf112  493.660192  916.0228  507.671496  962.4119  570.7100  942.2584
## FGR       429.416742  831.2727  307.406606  214.4277  237.6368  835.7100
##                SDC10      SDC9
## TSPAN6   2996.458012  562.3750
## TNMD        9.619448    0.0000
## DPM1     1625.686690 1095.2111
## SCYL3    2405.823913 3499.2222
## C1orf112  446.342381  603.2750
## FGR       292.431215  403.3194

learnParameters updates a BSRDataModel object with the parameters necessary for BulkSignalR statistical modeling.

Code

bsrdm <- learnParameters(bsrdm, quick=TRUE)
bsrdm

## Expression values are log2-transformed: FALSE
## Normalization method: UQ
## Organism: hsapiens
## Statistical model parameters:
## List of 11
##  $ spatial.smooth: logi FALSE
##  $ n.rand.LR     : int 5
##  $ n.rand.RT     : int 2
##  $ with.complex  : logi TRUE
##  $ max.pw.size   : num 400
##  $ min.pw.size   : num 5
##  $ min.positive  : num 4
##  $ quick         : logi TRUE
##  $ min.corr.LR   : num -1
##  $ LR.0          :List of 2
##   ..$ n    : int 14580
##   ..$ model:List of 7
##   .. ..$ mu     : num 0.0125
##   .. ..$ sigma  : num 0.263
##   .. ..$ factor : num 1
##   .. ..$ start  : num 5.98e-05
##   .. ..$ distrib: chr "censored_normal"
##   .. ..$ D      : num 0.486
##   .. ..$ Chi2   : num 0.000111
##  $ RT.0          :List of 2
##   ..$ n    : int 14580
##   ..$ model:List of 7
##   .. ..$ mu     : num 0.0125
##   .. ..$ sigma  : num 0.263
##   .. ..$ factor : num 1
##   .. ..$ start  : num 5.98e-05
##   .. ..$ distrib: chr "censored_normal"
##   .. ..$ D      : num 0.486
##   .. ..$ Chi2   : num 0.000111
## Expression data:
##                SDC16     SDC15       SDC14     SDC13     SDC12     SDC11
## TSPAN6   1285.996070 2245.8788 1651.184018 3319.4416 2573.9214 3565.2017
## TNMD        3.381234    0.0000    2.002649   40.2052    0.0000    0.0000
## DPM1     1295.012694 1897.8623 1581.091307 2175.2687 2451.7627 2373.7128
## SCYL3    2044.519606  692.4267 3517.652793 1174.3268  961.0452 1151.6491
## C1orf112  493.660192  916.0228  507.671496  962.4119  570.7100  942.2584
## FGR       429.416742  831.2727  307.406606  214.4277  237.6368  835.7100
##                SDC10      SDC9
## TSPAN6   2996.458012  562.3750
## TNMD        9.619448    0.0000
## DPM1     1625.686690 1095.2111
## SCYL3    2405.823913 3499.2222
## C1orf112  446.342381  603.2750
## FGR       292.431215  403.3194

Building a BSRInference object

From the previous object bsrdm, you can generate inferences by calling its method BSRInference. The returned BSRInference object, contains all the inferred L-R interactions with their associated pathways and corrected p-values.

From there, you can already access L-R interactions using LRinter(bsrinf), which returns a summary table.

Code

# We use a subset of the reference to speed up
# inference in the context of the vignette.
subset <- c("REACTOME_BASIGIN_INTERACTIONS",
"REACTOME_SYNDECAN_INTERACTIONS",
"REACTOME_ECM_PROTEOGLYCANS",
"REACTOME_CELL_JUNCTION_ORGANIZATION")

reactSubset <- BulkSignalR:::.SignalR$BulkSignalR_Reactome[
BulkSignalR:::.SignalR$BulkSignalR_Reactome$`Reactome name` %in% subset,]

resetPathways(dataframe = reactSubset,
              resourceName = "Reactome")

bsrinf <- BSRInference(bsrdm,
    min.cor = 0.3,
    reference="REACTOME")

LRinter.dataframe <- LRinter(bsrinf)

head(LRinter.dataframe[
    order(LRinter.dataframe$qval),
    c("L", "R", "LR.corr", "pw.name", "qval")])

##           L     R   LR.corr                    pw.name         qval
## 8131  EDIL3 ITGB5 0.7299145 REACTOME_ECM_PROTEOGLYCANS 2.896847e-07
## 21871 TGFB3 ITGB5 0.7360684 REACTOME_ECM_PROTEOGLYCANS 2.896847e-07
## 18101  PLAU ITGB5 0.6389744 REACTOME_ECM_PROTEOGLYCANS 5.301485e-07
## 15761 LTBP3 ITGB5 0.5712821 REACTOME_ECM_PROTEOGLYCANS 7.796029e-07
## 21701 TGFB1 ITGB5 0.3996581 REACTOME_ECM_PROTEOGLYCANS 2.622313e-06
## 15751 LTBP1 ITGB5 0.3736752 REACTOME_ECM_PROTEOGLYCANS 2.629570e-06

You can finally filter out non-significant L-R interactions and order them by Q-values before saving them into a file for instance.

Code

write.table(LRinter.dataframe[order(LRinter.dataframe$qval), ],
    "./sdc_LR.tsv",
    row.names = FALSE,
    sep = "\t",
    quote = FALSE
)

Reduction strategies

The output of BSRInference is exhaustive and can thus contain redundancy due to the redundancy present in the reference databases (Reactome, KEGG, GOBP) and multilateral interactions in LRdb. To alleviate this issue, we propose several strategies.

Reducing a BSRInference object to pathways

With reduceToPathway, all the L-R interactions with their receptors included in a certain pathway are aggregated to only report the downstream pathway once. For a given pathway, the reported P-values and target genes are those of best (smallest P-value) L-R interaction that was part of the aggregation. Nothing is recomputed, we simply merge data.

Code

bsrinf.redP <- reduceToPathway(bsrinf)

Reducing a BSRInference object to the best pathways

With reduceToBestPathway, a BSRInference object is reduced to only report one pathway per L-R interaction. The pathway with the smallest P-value is selected. A same pathways might occur multiple times with due different L-R interactions that all have their receptor in this pathway.

Code

bsrinf.redBP <- reduceToBestPathway(bsrinf)

Reducing to ligands or receptors

As already mentioned, several ligands might bind to single receptor (or several shared receptors) and the converse might happen also. Two reduction operators enable users to either aggregate all the ligands of a same receptor or all the receptors bound by a same ligand:

Code

bsrinf.L <- reduceToLigand(bsrinf)
bsrinf.R <- reduceToReceptor(bsrinf)

Combined reductions

Combinations are possible.

For instance, users can apply the reduceToPathway and reduceToBestPathway functions sequentially to maximize the reduction effect. In case the exact same sets of aggregated ligands and receptors obtained with reduceToPathway was associated with several pathways, the pathway with the best P-value would be kept by reduceToBestPathway.

Code

bsrinf.redP <- reduceToPathway(bsrinf)
bsrinf.redPBP <- reduceToBestPathway(bsrinf.redP)

Building a BSRSignature object

Gene signatures for a given, potentially reduced BSRInference object are generated by the BSRSignature constructor, which returns a BSRSignature object.

To follow the activity of L-R interactions across the samples of a data set, scoreLRGeneSignatures computes a score for each gene signature in each sample. Then, heatmaps can be generated to represent differences, e.g., using the built-in utility function simpleHeatmap.

Hereafter, we show different workflows of reductions combined with gene signature scoring and display.

Scoring by ligand-receptor

Code

bsrsig.redBP <- BSRSignature(bsrinf.redBP, qval.thres=0.001)

scoresLR <- scoreLRGeneSignatures(bsrdm, bsrsig.redBP,
    name.by.pathway=FALSE
)

simpleHeatmap(scoresLR[1:20, ],
    hcl.palette="Cividis",
    pointsize=8)

Scoring by pathway

Code

bsrsig.redPBP <- BSRSignature(bsrinf.redPBP, qval.thres=0.01)

scoresPathway <- scoreLRGeneSignatures(bsrdm, bsrsig.redPBP,
    name.by.pathway=TRUE
)

simpleHeatmap(scoresPathway,
    hcl.palette="Blue-Red 2",
    pointsize=8)

Other visualization utilities

Heatmap of ligand-receptor-target genes expression

After computing gene signatures score, we may wish to look at the expression of the genes involved in that signature. For instance, we can display three heatmaps corresponding to the scaled (z-scores) expression of ligands (pink), receptors (green), and target genes (blue).

On the top of each individual heatmap, the whole signature score from scoreLRGeneSignatures is reported for reference.

Code

pathway1 <- pathways(bsrsig.redPBP)[1]
signatureHeatmaps(
    pathway = pathway1,
    bsrdm = bsrdm,
    heights = c(3,2,15),
    bsrsig = bsrsig.redPBP
)

AlluvialPlot

alluvial.plot is a function that enable users to represent the different interactions between ligands, receptors, and pathways stored in a BSRInference object.

Obviously, it is possible to filter by ligand, receptor, or pathway to limit output complexity. This is achieved by specifying a key word in the chosen category. A threshold on L-R interaction Q-values can be applied in addition.

Code

alluvialPlot(bsrinf,
    keywords = c("LAMC1"),
    type = "L",
    qval.thres = 0.01
)

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the BulkSignalR package.
##   Please report the issue at <https://github.com/ZheFrench/BulkSignalR/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

BubblePlot

bubblePlotPathwaysLR is a handy way to visualize the strengths of several L-R interactions in relation with their receptor downstream pathways.

A vector of pathways of interest can be provided to limit the complexity of the plot.

Code

pathways <- LRinter(bsrinf)[1,c("pw.name")]
bubblePlotPathwaysLR(bsrinf,
    pathways = pathways,
    qval.thres = 0.001,
    color = "red",
    pointsize = 8
)

## 20 LR interactions detected.

## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the BulkSignalR package.
##   Please report the issue at <https://github.com/ZheFrench/BulkSignalR/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Chordiagram

chord.diagram.LR is a function that enables users to feature the different L-R interactions involved in a specific pathway.

L-R correlations strengths are drawn using a yellow color-scale.
Ligands are in grey, whereas receptors are in green.
You can also highlight in red one specific interaction by passing values of a L-R pair as follows ligand="FYN", receptor="SPN".

Code

chordDiagramLR(bsrinf,
    pw.id.filter = "R-HSA-210991",
    limit = 20,
    ligand="FYN", 
    receptor="SPN"
)

Network Analysis

Since BulkSignalR relies on intracellular networks to estimate the statistical significance of (ligand, receptor, pathway) triples, links from receptors to target genes are naturally accessible. Different functions enable users to exploit this graphical data for plotting or further data analysis.

Furthermore, networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools.

Code

# Generate a ligand-receptor network and export it in .graphML
# for Cytoscape or similar tools
gLR <- getLRNetwork(bsrinf.redBP, qval.thres = 1e-3)

# save to file
# write.graph(gLR,file="SDC-LR-network.graphml",format="graphml")

# As an alternative to Cytoscape, you can play with igraph package functions.
plot(gLR,
    edge.arrow.size = 0.1,
    vertex.label.color = "black",
    vertex.label.family = "Helvetica",
    vertex.label.cex = 0.1
)

Code

# You can apply other functions.


# Community detection
u.gLR <- as_undirected(gLR) # most algorithms work for undirected graphs only
comm <- cluster_edge_betweenness(u.gLR)
# plot(comm,u.gLR,
#     vertex.label.color="black",
#     vertex.label.family="Helvetica",
#     vertex.label.cex=0.1)

# Cohesive blocks
cb <- cohesive_blocks(u.gLR)
plot(cb, u.gLR,
    vertex.label.color = "black",
    vertex.label.family = "Helvetica",
    vertex.label.cex = 0.1,
    edge.color = "black"
)

Code

# For the next steps, we just share the code below, but graph generation
# functions are commented to lighten the vignette.

# Generate a ligand-receptor network complemented with intra-cellular,
# receptor downstream pathways [computations are a bit longer here]
#
# You can save to a file for cystoscape or plot with igraph.

gLRintra <- getLRIntracellNetwork(bsrinf.redBP, qval.thres = 1e-3)

lay <- layout_with_kk(gLRintra)
# plot(gLRintra,
#     layout=lay,
#     edge.arrow.size=0.1,
#     vertex.label.color="black",
#     vertex.label.family="Helvetica",
#     vertex.label.cex=0.1)

# Reduce complexity by focusing on strongly targeted pathways
pairs <- LRinter(bsrinf.redBP)
top <- unique(pairs[pairs$pval <  1e-3, c("pw.id", "pw.name")])
top
gLRintra.res <- getLRIntracellNetwork(bsrinf.redBP,
    qval.thres = 0.01,
    restrict.pw = top$pw.id
)
lay <- layout_with_fr(gLRintra.res)

# plot(gLRintra.res,
#     layout=lay,
#     edge.arrow.size=0.1,
#     vertex.label.color="black",
#     vertex.label.family="Helvetica",
#     vertex.label.cex=0.4)

Non-human data

In order to process data from non-human organisms, users only need to specify a few additional parameters and all the other steps of the analysis remain unchanged.

By default, BulksignalR works with Homo sapiens. We implemented a strategy using ortholog genes (mapped by the orthogene BioConductor package) in BulkSignalR directly.

The function findOrthoGenes creates a correspondence table between human and another organism. convertToHuman then converts an initial expression matrix to a Homo sapiens equivalent.

When calling BSRDataModel, the user only needs to pass this transformed matrix, the actual non-human organism name, and the correspondence table. Then, L-R interaction inference is performed as for human data. Finally, users can switch back to gene names relative to the original organism via resetToInitialOrganism. The rest of the workflow is executed as usual for computing gene signatures and visualizing.

Code

data(bodyMap.mouse)

ortholog.dict <- findOrthoGenes(
    from_organism = "mmusculus",
    from_values = rownames(bodyMap.mouse)
)

## Dictionary Size: 14697 genes

## -> 639 Ligands

## -> 658 Receptors

Code

matrix.expression.human <- convertToHuman(
    counts = bodyMap.mouse,
    dictionary = ortholog.dict
)

bsrdm <- BSRDataModel(
    counts = matrix.expression.human,
    species = "mmusculus",
    conversion.dict = ortholog.dict
)

bsrdm <- learnParameters(bsrdm,quick=TRUE)

bsrinf <- BSRInference(bsrdm,reference="REACTOME")

bsrinf <- resetToInitialOrganism(bsrinf, conversion.dict = ortholog.dict)

# For example, if you want to explore L-R interactions
# you can proceed as shown above for a human dataset.

bsrinf.redBP <- reduceToBestPathway(bsrinf)
bsrsig.redBP <- BSRSignature(bsrinf.redBP, qval.thres=0.001)
scoresLR <- scoreLRGeneSignatures(bsrdm, bsrsig.redBP, name.by.pathway=FALSE)
head(LRinter(bsrinf.redBP))

##           L     R         pw.id                             pw.name       pval
## 31   Adam15 Itgav R-HSA-3000178          REACTOME_ECM_PROTEOGLYCANS 0.18091706
## 71    Adam9 Itgb1  R-HSA-446728 REACTOME_CELL_JUNCTION_ORGANIZATION 0.16796226
## 8     Adam9 Itgb5 R-HSA-3000170      REACTOME_SYNDECAN_INTERACTIONS 0.04079285
## 77     Cdh1  Cdh2  R-HSA-446728 REACTOME_CELL_JUNCTION_ORGANIZATION 0.22473890
## 881 Col18a1 Itgb1  R-HSA-446728 REACTOME_CELL_JUNCTION_ORGANIZATION 0.16796226
## 89  Col18a1 Itgb5 R-HSA-3000170      REACTOME_SYNDECAN_INTERACTIONS 0.04079285
##          qval   LR.corr rank len  rank.corr
## 31  0.3072011 0.7619048    2   5 -0.7619048
## 71  0.3072011 0.7857143    4   9 -0.6904762
## 8   0.2365985 0.6904762    2   4  0.6428571
## 77  0.3072011 0.2857143    3   7 -0.3333333
## 881 0.3072011 0.7857143    4   9 -0.6904762
## 89  0.2365985 0.6904762    2   4  0.6428571

Code

#simpleHeatmap(scoresLR, column.names=TRUE,
#              pointsize=8)

Spatial Transcriptomics

BulkSignalR analysis can be applied to spatial transcriptomics (ST) at medium resolution such as with 10x Genomics Visium:TM: or Nanostring GeoMx:TM:. L-R interactions that display significant occurrence in a tissue are retrieved. Additional functions have been introduced to facilitate the visualization and analysis of the results in a spatial context.

To account for the rather limited dynamics of ST data and dropouts, it is usually recommended to release specific parameters controlling the training of the statistical model. Namely, minimum correlation can be set at -1 during training and the minimum number of target genes in a pathway should be reduced to 2 instead of the default at 4. Also, the thresholds on L-R interaction Q-values should be raised at 1% or 5% instead of 0.1%.

A key spatial plot function is spatialPlot that enables visualizing L-R interaction gene signature scores at their spatial coordinates with a potential reference plot (raw tissue image or user-defined areas) on the side.

As we published BulkSignalR paper, we provided example scripts to apply the library to spatial data represented in tabular text files BulkSignalR github companion. It is also possible to work with an object of the SpatialExperiment Bioconductor class. In addition, the VisiumIO package allows users to readily import Visium data from the 10X Space Ranger pipeline and retrieve a SpatialExperiment object.

Code

# load data =================================================

# We re-initialize the environment variable of Reactome
# to have all the pathways (we reduced it to a subset above to fasten
# example computations)
reactSubset <- getResource(resourceName = "Reactome",
cache = TRUE)

resetPathways(dataframe = reactSubset,
resourceName = "Reactome")

# Few steps of pre-process to subset a spatialExperiment object
# from STexampleData package ==================================

spe <- Visium_humanDLPFC()
set.seed(123)

speSubset <- spe[, colData(spe)$ground_truth%in%c("Layer1","Layer2")]

idx <- sample(ncol(speSubset), 10)
speSubset <- speSubset[, idx]

my.image.as.raster <- SpatialExperiment::imgRaster(speSubset, 
    sample_id = imgData(spe)$sample_id[1], image_id = "lowres")

colData(speSubset)$idSpatial <- paste(colData(speSubset)[[4]],
                colData(speSubset)[[5]],sep = "x")


annotation <- colData(speSubset)

Code

# prepare data =================================================

bsrdm <- BSRDataModel(speSubset,
    min.count = 1,
    prop = 0.01,
    method = "TC",
    symbol.col = 2,
    x.col = 4,
    y.col = 5, 
    barcodeID.col = 1)

bsrdm <- learnParameters(bsrdm,
    quick = TRUE,
    min.positive = 2,
    verbose = TRUE)

## Learning ligand-receptor correlation null distribution...

## Automatic null model choice:

##   Censored normal D=0.586330935251799, Chi2=0.0200907446243012

##   Censored mixture D=0.592805755395683, Chi2=0.0125673529406878

##   Gaussian kernel empirical D=0.572661870503597, Chi2=0.00782320341433068

##   ==> select censored mixture of 2 normals

## Censored mixed normal parameters (alpha, mean1, sd1, mean2, sd2): 0.510127585193635, -0.146243914507209, 0.172221109406777, 0.429848043041605, 0.68888443762711

## Quick learning, receptor-target correlation null distribution assumed to be equal to ligand-receptor...

## Learning of statistical model parameters completed

Code

bsrinf <- BSRInference(bsrdm, min.cor = -1,reference="REACTOME")

# spatial analysis ============================================

bsrinf.red <- reduceToBestPathway(bsrinf)
pairs.red <- LRinter(bsrinf.red)

thres <- 0.01
min.corr <- 0.01
pairs.red <- pairs.red[pairs.red$qval < thres & pairs.red$LR.corr > min.corr,]

head(pairs.red[
    order(pairs.red$qval),
    c("L", "R", "LR.corr", "pw.name", "qval")])

##           L       R   LR.corr
## 8       ADM   VIPR1 1.0000000
## 711     CRH   VIPR1 1.0000000
## 1853  RIMS1 SLC17A7 0.2910126
## 575   CALM3   PDE1A 0.5741634
## 505   CALM2   PDE1A 0.4697701
## 25928 CALM1   ADCY1 0.3719495
##                                                     pw.name         qval
## 8              REACTOME_CLASS_B_2_SECRETIN_FAMILY_RECEPTORS 2.941078e-14
## 711            REACTOME_CLASS_B_2_SECRETIN_FAMILY_RECEPTORS 2.941078e-14
## 1853         REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES 1.428826e-05
## 575   REACTOME_INTRACELLULAR_SIGNALING_BY_SECOND_MESSENGERS 3.021048e-04
## 505   REACTOME_INTRACELLULAR_SIGNALING_BY_SECOND_MESSENGERS 3.549323e-04
## 25928        REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES 4.717016e-04

Code

s.red  <- BSRSignature(bsrinf.red, qval.thres=thres)
scores.red <- scoreLRGeneSignatures(bsrdm,s.red)

head(scores.red)

##                         10x32       3x47        4x50      17x111       5x59
## {ADAM17} / {ERBB4} -0.2910176 -0.3707323 -0.36858294 -0.19935564  0.6054123
## {ADM} / {VIPR1}    -0.3958059 -0.3958059  0.07490482  1.54297812  0.3645659
## {APP} / {GPC1}     -0.2595579 -0.2773347 -0.80159128 -0.07890639  0.6176765
## {CALM1} / {PDE1A}  -0.5359815  0.1885086 -0.15395244 -0.49769151 -0.2940636
## {CALM1} / {PDE1B}   0.9193313  0.1512018 -0.10990203 -0.41246202 -0.2808510
## {CALM2} / {PDE1A}  -0.5843362 -0.7244152 -0.54126807 -0.40010905 -0.1464737
##                           0x20       8x100       8x108      14x30       11x39
## {ADAM17} / {ERBB4} -0.34515568 -0.12101923 -0.22763140  0.4681621  0.84992034
## {ADM} / {VIPR1}    -0.39580588 -0.35986278  0.04021555 -0.3958059 -0.07957812
## {APP} / {GPC1}     -0.06718939  0.76786935 -0.29802760  0.9264921 -0.52943075
## {CALM1} / {PDE1A}  -0.28820297  0.23963850 -0.23401675  0.6169534  0.95880833
## {CALM1} / {PDE1B}  -0.33599303 -0.02749348 -0.29449527  0.4731956 -0.08253183
## {CALM2} / {PDE1A}  -0.14420714  0.48166531  0.15364503  0.9277328  0.97776626

From here, one can start exploring the ST data through different plots.

Note: As we work on a much reduced data set to fasten the vignette generation, only a few point are displayed in each plot. Actual plots are a lot more informative.

Code

# Visualization ============================================

# plot one specific interaction

# we have to follow the syntax with {} 
# to be compatible with the reduction functions
inter <- "{SLIT2} / {GPC1}"

# with raw tissue reference
spatialPlot(scores.red[inter, ], annotation, inter,
    ref.plot = TRUE, ref.plot.only = FALSE,
    image.raster = NULL, dot.size = 1,
    label.col = "ground_truth"
)

Code

# or with synthetic image reference
spatialPlot(scores.red[inter, ], annotation, inter,
    ref.plot = TRUE, ref.plot.only = FALSE,
    image.raster = my.image.as.raster, dot.size = 1,
    label.col = "ground_truth"

)

You can dissect one interaction to separately visualize the expression of the ligand and the receptor involved in a specific L-R interaction.

Code

separatedLRPlot(scores.red, "SLIT2", "GPC1", 
    ncounts(bsrdm), 
    annotation,
    label.col = "ground_truth")

Code

# generate a visual index in a pdf file directly
spatialIndexPlot(scores.red, annotation,  
    label.col = "ground_truth",
    out.file="spatialIndexPlot")

Finally, we provide a function to assess the statistical associations of L-R interaction signature scores with user-defined areas of the sample. Based on these associations, a further visualization function can represent the latter in the form of a heatmap.

Code

# statistical association with tissue areas based on correlations
# For display purpose, we only use a subset here
assoc.bsr.corr <- spatialAssociation(scores.red[c(1:17), ],
annotation, label.col = "ground_truth",test = "Spearman")

head(assoc.bsr.corr)

##                   interaction      Layer1      Layer2
## ADAM17 / ERBB4 ADAM17 / ERBB4 -0.87038828  0.87038828
## ADM / VIPR1       ADM / VIPR1 -0.35921060  0.35921060
## APP / GPC1         APP / GPC1 -0.52223297  0.52223297
## CALM1 / PDE1A   CALM1 / PDE1A -0.31333978  0.31333978
## CALM1 / PDE1B   CALM1 / PDE1B  0.03481553 -0.03481553
## CALM2 / PDE1A   CALM2 / PDE1A -0.59186403  0.59186403

Code

spatialAssociationPlot(assoc.bsr.corr)

We also provide 2D-projections (see the spatialDiversityPlot function) to assess the diversity among L-R interaction spatial distributions over an entire data set. Other functions such as generateSpatialPlots can output multiple individual spatial plots in a graphic file directly.

Note that we describe additional use cases in the BulkSignalR github companion.

See the reference manual for all the details.

Acknowledgements

We thank Guillaume Tosato for his help with the figures and Gauthier Gadouas for testing the software on different platforms.

Thank you for reading this guide and for using BulkSignalR.

Session Information

Code

sessionInfo()

## R Under development (unstable) (2025-10-20 r88955)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] doParallel_1.0.17           iterators_1.0.14           
##  [3] foreach_1.5.2               STexampleData_1.19.0       
##  [5] SpatialExperiment_1.21.0    SingleCellExperiment_1.33.0
##  [7] SummarizedExperiment_1.41.0 Biobase_2.71.0             
##  [9] GenomicRanges_1.63.0        Seqinfo_1.1.0              
## [11] IRanges_2.45.0              S4Vectors_0.49.0           
## [13] MatrixGenerics_1.23.0       matrixStats_1.5.0          
## [15] ExperimentHub_3.1.0         AnnotationHub_4.1.0        
## [17] BiocFileCache_3.1.0         dbplyr_2.5.1               
## [19] BiocGenerics_0.57.0         generics_0.1.4             
## [21] igraph_2.2.1                BulkSignalR_1.3.1          
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3        jsonlite_2.0.0           
##   [3] shape_1.4.6.1             magrittr_2.0.4           
##   [5] magick_2.9.0              farver_2.1.2             
##   [7] rmarkdown_2.30            GlobalOptions_0.1.2      
##   [9] fs_1.6.6                  vctrs_0.6.5              
##  [11] multtest_2.67.0           Cairo_1.7-0              
##  [13] memoise_2.0.1             RCurl_1.98-1.17          
##  [15] ggtree_4.1.1              rstatix_0.7.3            
##  [17] htmltools_0.5.8.1         S4Arrays_1.11.0          
##  [19] curl_7.0.0                broom_1.0.10             
##  [21] SparseArray_1.11.1        Formula_1.2-5            
##  [23] gridGraphics_0.5-1        sass_0.4.10              
##  [25] bslib_0.9.0               htmlwidgets_1.6.4        
##  [27] httr2_1.2.1               plotly_4.11.0            
##  [29] cachem_1.1.0              lifecycle_1.0.4          
##  [31] pkgconfig_2.0.3           Matrix_1.7-4             
##  [33] R6_2.6.1                  fastmap_1.2.0            
##  [35] clue_0.3-66               digest_0.6.37            
##  [37] aplot_0.2.9               colorspace_2.1-2         
##  [39] AnnotationDbi_1.73.0      patchwork_1.3.2          
##  [41] grr_0.9.5                 RSQLite_2.4.3            
##  [43] ggpubr_0.6.2              labeling_0.4.3           
##  [45] filelock_1.0.3            httr_1.4.7               
##  [47] abind_1.4-8               compiler_4.6.0           
##  [49] withr_3.0.2               bit64_4.6.0-1            
##  [51] fontquiver_0.2.1          S7_0.2.0                 
##  [53] backports_1.5.0           orthogene_1.17.0         
##  [55] carData_3.0-5             DBI_1.2.3                
##  [57] homologene_1.4.68.19.3.27 ggsignif_0.6.4           
##  [59] MASS_7.3-65               rappdirs_0.3.3           
##  [61] DelayedArray_0.37.0       rjson_0.2.23             
##  [63] tools_4.6.0               ape_5.8-1                
##  [65] glue_1.8.0                stabledist_0.7-2         
##  [67] nlme_3.1-168              grid_4.6.0               
##  [69] Rtsne_0.17                cluster_2.1.8.1          
##  [71] gtable_0.3.6              tidyr_1.3.1              
##  [73] data.table_1.17.8         car_3.1-3                
##  [75] XVector_0.51.0            BiocVersion_3.23.1       
##  [77] ggrepel_0.9.6             RANN_2.6.2               
##  [79] pillar_1.11.1             yulab.utils_0.2.1        
##  [81] babelgene_22.9            circlize_0.4.16          
##  [83] splines_4.6.0             dplyr_1.1.4              
##  [85] treeio_1.35.0             lattice_0.22-7           
##  [87] survival_3.8-3            bit_4.6.0                
##  [89] tidyselect_1.2.1          fontLiberation_0.1.0     
##  [91] ComplexHeatmap_2.27.0     Biostrings_2.79.2        
##  [93] knitr_1.50                fontBitstreamVera_0.1.1  
##  [95] gridExtra_2.3             xfun_0.54                
##  [97] lazyeval_0.2.2            ggfun_0.2.0              
##  [99] yaml_2.3.10               evaluate_1.0.5           
## [101] codetools_0.2-20          gdtools_0.4.4            
## [103] tibble_3.3.0              BiocManager_1.30.26      
## [105] ggplotify_0.1.3           cli_3.6.5                
## [107] systemfonts_1.3.1         jquerylib_0.1.4          
## [109] dichromat_2.0-0.1         Rcpp_1.1.0               
## [111] gprofiler2_0.2.3          png_0.1-8                
## [113] ggplot2_4.0.0             blob_1.2.4               
## [115] ggalluvial_0.12.5         bitops_1.0-9             
## [117] glmnet_4.1-10             viridisLite_0.4.2        
## [119] tidytree_0.4.6            ggiraph_0.9.2            
## [121] scales_1.4.0              purrr_1.2.0              
## [123] crayon_1.5.3              GetoptLong_1.0.5         
## [125] rlang_1.1.6               KEGGREST_1.51.0