tpSVGThe objective of tpSVG is to detect spatially variable genes (SVG) when analyzing spatially-resolved transcriptomics data. This includes both unsupervised features where there’s not additional information is supplied besides the (normalized) gene counts and spatial coordinates, but also the spatial variation explained besides some covariates, such as tissue anatomy or possibly cell type composition. 
Compared to previous SVG detection tools, tpSVG provides a scalable solution to model the gene expression as counts instead of logarithm-transformed counts. While log transformation provides convenience to model the spatial gene expression by mapping count data to the continuous domain, hence enabling well-understood Gaussian models, log transformation distorts low expressed genes counts and create bias populating high-expressed genes. For example, the rank of genes based on their effect size are commonly used for dimensional reduction, or its input. Hence, estimating gene ranking correctly is very important. Gaussian models, exemplified with nnSVG, often dissociates the mean-variance relationship which is commonly assumed for counts data, and hence often prioritizes the highly expressed genes over the lowly expressed genes. In the figure below, we saw that nnSVG is susceptible to such mean-rank relationship, meaning highly expressed genes are often ranked highly. In contrast, the proposed tpSVG with Poisson distribution is not susceptible to this mean-rank relationship when examining the DLPFC dataset.
Model counts data with tpSVG avoids mean-rank relationship.
You can install the development version of tpSVG from GitHub with:
# install.packages("devtools")
devtools::install_github("boyiguo1/tpSVG")The package is currently submitted to Bioconductor for review. Once the package
is accepted by Bioconductor, you can install the latest release
version of tpSVG from Bioconductor via the following code. Additional details
are shown on the Bioconductor page.
if (!require("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("tpSVG")The latest development version can also be installed from the devel version
of Bioconductor or from GitHub following
BiocManager::install(version='devel')tpSVGIn the following section, we demonstrate the how to use tpSVG with a SpatialExperiment object.
To run the demonstration, there a couple of necessary packages to load. We will use the data set in STexampleData, which contains a 10x Visium dataset. Specifically, we will find one 10xVisium sample collected from the dorso lateral prefrontal region of a postmortem human brain from STexampleData package, indexed by brain number of “151673” from Maynard, Collado-Torres et al. (2021). For more information, please see the vignettes of STexampleData
library(tpSVG)
library(SpatialExperiment)
library(STexampleData)          # Example data
library(scuttle)                # Data preprocessBefore running the analysis using tpSVG, we will preprocess the data, which
includes 1) calculating normalization factor, or equivalently library size; 2)
down-sizing the number of genes to 6 to reduce running time. These preprocessing
step may not be necessary if real world data analysis.
spe <- Visium_humanDLPFC()
spe <- spe[, colData(spe)$in_tissue == 1]
spe <- logNormCounts(spe)
# Normalization factor
head(spe$sizeFactor)
#> AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 
#>          1.8428941          0.3632188          0.8212187          1.1837838 
#> AAACAGCTTTCAGAAG-1 AAACAGGGTCTATATT-1 
#>          0.9321235          0.8724223
# Equivalently, library size
spe$total <- counts(spe) |> colSums()
# Down-sizing genes for faster computation
idx <- which(
  rowData(spe)$gene_name %in% c("MOBP", "PCP4", "SNAP25",
                                "HBB", "IGKC", "NPY")
)
spe <- spe[idx, ]The following example demonstrates how to model raw gene counts data with tpSVG.
The model fitting is simple, following stats::glm syntax to follow any
distributional assumption via the argument family. The only key point to
mention here is the model needs to account for any technical variation due to
the gene profiling procedure. To account for such techincal variation, we use
offset term in the model. In the following example, we use the commonly used
library size as the normalizing factor, and hence set offset = log(spe$total)
to account for the techinical variation in the data. Equivalently, it is also
possible/encouraged to use offset = log(spe$sizeFactor), where spe$sizeFactor
is calculated during logNormCounts and is a linear function of the library
size, i.e. spe$total. Note: it is very important to use the log function in
the offset making sure the data scale is conformable.
set.seed(1)
spe_poisson  <- tpSVG(
  spe,
  family = poisson,
  assay_name = "counts",
  offset = log(spe$total)   # Natural log library size
)
rowData(spe_poisson)
#> DataFrame with 6 rows and 6 columns
#>                         gene_id   gene_name    feature_type test_stat     raw_p
#>                     <character> <character>     <character> <numeric> <numeric>
#> ENSG00000211592 ENSG00000211592        IGKC Gene Expression   3447.12         0
#> ENSG00000168314 ENSG00000168314        MOBP Gene Expression   7143.34         0
#> ENSG00000122585 ENSG00000122585         NPY Gene Expression   3596.18         0
#> ENSG00000244734 ENSG00000244734         HBB Gene Expression   1536.56         0
#> ENSG00000132639 ENSG00000132639      SNAP25 Gene Expression   3529.46         0
#> ENSG00000183036 ENSG00000183036        PCP4 Gene Expression   1381.63         0
#>                   runtime
#>                 <numeric>
#> ENSG00000211592     1.250
#> ENSG00000168314     1.244
#> ENSG00000122585     1.257
#> ENSG00000244734     1.242
#> ENSG00000132639     1.300
#> ENSG00000183036     1.163tpSVG provides flexibility regarding the distributional assumption. If interested,
it is possible to model log-transformed count data using Gaussian distribution.
spe_gauss <- tpSVG(
  spe, 
  family = gaussian(),
  assay_name = "logcounts",
  offset = NULL 
)It is scientifically interesting to understand if and how much additional spatial variation in gene expression when accounting some known biology. For example, if known anatomy is accounted for in the model, is there any additional spatial variation in the gene expression which can be informative to any unknown biology. Statistically, this question is known as covariate adjustment, where the known biology is quantified and accounted for in a model.
To address this question, tpSVG allows introducing covariates in the model via
the argument X, where X takes a vector of any kind, including categorical
variables.
The frist step is to remove any missing data in the dataset, specifically as
the covariate. This can be done via complete.cases.
# Check missing data
idx_complete_case <- complete.cases(spe$ground_truth)
# If multiple covariates
# idx_complete_case <- complete.cases(spe$ground_truth, spe$cell_count)
# Remove missing data
spe <- spe[, idx_complete_case]
# Create a design matrix
x <- spe$ground_truth 
spe_poisson_cov  <- tpSVG(
  spe,
  X = x, 
  family = poisson,
  assay_name = "counts",
  offset = log(spe$total)   # Natural log library size
)SpatialExperiment (e.g. SpatialFeatureExperiment)tpSVG can be also used to model image-based SRT data. We use
the seqFISH data from Lohoff and Ghazanfar et al. (2020) to demonstrate tpSVG. Specifically, we use the curated example data in STexampleData package. For more information, please see the vignettes of STexampleData
library(STexampleData)
spe <- seqFISH_mouseEmbryo()
#> see ?STexampleData and browseVignettes('STexampleData') for documentation
#> loading from cache
#> Loading required namespace: BumpyMatrix
spe
#> class: SpatialExperiment 
#> dim: 351 11026 
#> metadata(0):
#> assays(2): counts molecules
#> rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
#> rowData names(1): gene_name
#> colnames(11026): embryo1_Pos0_cell10_z2 embryo1_Pos0_cell100_z2 ...
#>   embryo1_Pos28_cell97_z2 embryo1_Pos28_cell98_z2
#> colData names(14): cell_id embryo ... segmentation_vertices sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):The example data set contains 351 genes for 11026 genes. To make the
demonstration computationally feasible, we down-size the number of genes to 1.
The average computation times for 11026 cells is roughly 2 minutes.
# Calculate "library size"
spe$total <- counts(spe) |> colSums()
# Down-size genes
idx_gene <- which(
  rowData(spe)$gene_name %in%
    c("Sox2")
  )
library(tpSVG)
# Poisson model
tp_spe <- tpSVG(
  input = spe[idx_gene,],
  family = poisson(),
  offset = log(spe$total),
  assay_name = "counts")
rowData(tp_spe)
#> DataFrame with 1 row and 4 columns
#>        gene_name test_stat     raw_p   runtime
#>      <character> <numeric> <numeric> <numeric>
#> Sox2        Sox2   9588.51         0      3.67sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.0 RC (2025-04-04 r88126)
#>  os       Ubuntu 24.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-04-15
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#>  quarto   1.5.57 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version date (UTC) lib source
#>  abind                  1.4-8   2024-09-12 [2] CRAN (R 4.5.0)
#>  AnnotationDbi          1.70.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  AnnotationHub        * 3.16.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  beachmat               2.24.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  Biobase              * 2.68.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocFileCache        * 2.16.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocGenerics         * 0.54.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocManager            1.30.25 2024-08-28 [2] CRAN (R 4.5.0)
#>  BiocParallel           1.42.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocStyle            * 2.36.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocVersion            3.21.1  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  Biostrings             2.76.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  bit                    4.6.0   2025-03-06 [2] CRAN (R 4.5.0)
#>  bit64                  4.6.0-1 2025-01-16 [2] CRAN (R 4.5.0)
#>  blob                   1.2.4   2023-03-17 [2] CRAN (R 4.5.0)
#>  bookdown               0.43    2025-04-15 [2] CRAN (R 4.5.0)
#>  bslib                  0.9.0   2025-01-30 [2] CRAN (R 4.5.0)
#>  BumpyMatrix            1.16.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  cachem                 1.1.0   2024-05-16 [2] CRAN (R 4.5.0)
#>  cli                    3.6.4   2025-02-13 [2] CRAN (R 4.5.0)
#>  codetools              0.2-20  2024-03-31 [3] CRAN (R 4.5.0)
#>  crayon                 1.5.3   2024-06-20 [2] CRAN (R 4.5.0)
#>  curl                   6.2.2   2025-03-24 [2] CRAN (R 4.5.0)
#>  DBI                    1.2.3   2024-06-02 [2] CRAN (R 4.5.0)
#>  dbplyr               * 2.5.0   2024-03-19 [2] CRAN (R 4.5.0)
#>  DelayedArray           0.34.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  digest                 0.6.37  2024-08-19 [2] CRAN (R 4.5.0)
#>  dplyr                  1.1.4   2023-11-17 [2] CRAN (R 4.5.0)
#>  evaluate               1.0.3   2025-01-10 [2] CRAN (R 4.5.0)
#>  ExperimentHub        * 2.16.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  fastmap                1.2.0   2024-05-15 [2] CRAN (R 4.5.0)
#>  filelock               1.0.3   2023-12-11 [2] CRAN (R 4.5.0)
#>  generics             * 0.1.3   2022-07-05 [2] CRAN (R 4.5.0)
#>  GenomeInfoDb         * 1.44.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  GenomeInfoDbData       1.2.14  2025-04-10 [2] Bioconductor
#>  GenomicRanges        * 1.60.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  glue                   1.8.0   2024-09-30 [2] CRAN (R 4.5.0)
#>  htmltools              0.5.8.1 2024-04-04 [2] CRAN (R 4.5.0)
#>  httr                   1.4.7   2023-08-15 [2] CRAN (R 4.5.0)
#>  IRanges              * 2.42.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  jquerylib              0.1.4   2021-04-26 [2] CRAN (R 4.5.0)
#>  jsonlite               2.0.0   2025-03-27 [2] CRAN (R 4.5.0)
#>  KEGGREST               1.48.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  knitr                  1.50    2025-03-16 [2] CRAN (R 4.5.0)
#>  lattice                0.22-7  2025-04-02 [3] CRAN (R 4.5.0)
#>  lifecycle              1.0.4   2023-11-07 [2] CRAN (R 4.5.0)
#>  magick                 2.8.6   2025-03-23 [2] CRAN (R 4.5.0)
#>  magrittr               2.0.3   2022-03-30 [2] CRAN (R 4.5.0)
#>  Matrix                 1.7-3   2025-03-11 [3] CRAN (R 4.5.0)
#>  MatrixGenerics       * 1.20.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  matrixStats          * 1.5.0   2025-01-07 [2] CRAN (R 4.5.0)
#>  memoise                2.0.1   2021-11-26 [2] CRAN (R 4.5.0)
#>  mgcv                 * 1.9-3   2025-04-04 [3] CRAN (R 4.5.0)
#>  mime                   0.13    2025-03-17 [2] CRAN (R 4.5.0)
#>  nlme                 * 3.1-168 2025-03-31 [3] CRAN (R 4.5.0)
#>  pillar                 1.10.2  2025-04-05 [2] CRAN (R 4.5.0)
#>  pkgconfig              2.0.3   2019-09-22 [2] CRAN (R 4.5.0)
#>  png                    0.1-8   2022-11-29 [2] CRAN (R 4.5.0)
#>  purrr                  1.0.4   2025-02-05 [2] CRAN (R 4.5.0)
#>  R6                     2.6.1   2025-02-15 [2] CRAN (R 4.5.0)
#>  rappdirs               0.3.3   2021-01-31 [2] CRAN (R 4.5.0)
#>  Rcpp                   1.0.14  2025-01-12 [2] CRAN (R 4.5.0)
#>  rjson                  0.2.23  2024-09-16 [2] CRAN (R 4.5.0)
#>  rlang                  1.1.6   2025-04-11 [2] CRAN (R 4.5.0)
#>  rmarkdown              2.29    2024-11-04 [2] CRAN (R 4.5.0)
#>  RSQLite                2.3.9   2024-12-03 [2] CRAN (R 4.5.0)
#>  S4Arrays               1.8.0   2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  S4Vectors            * 0.46.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  sass                   0.4.10  2025-04-11 [2] CRAN (R 4.5.0)
#>  scuttle              * 1.18.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  sessioninfo            1.2.3   2025-02-05 [2] CRAN (R 4.5.0)
#>  SingleCellExperiment * 1.30.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  SparseArray            1.8.0   2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  SpatialExperiment    * 1.18.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  STexampleData        * 1.15.3  2025-03-25 [2] Bioconductor 3.21 (R 4.5.0)
#>  SummarizedExperiment * 1.38.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  tibble                 3.2.1   2023-03-20 [2] CRAN (R 4.5.0)
#>  tidyselect             1.2.1   2024-03-11 [2] CRAN (R 4.5.0)
#>  tpSVG                * 1.4.0   2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  UCSC.utils             1.4.0   2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  vctrs                  0.6.5   2023-12-01 [2] CRAN (R 4.5.0)
#>  withr                  3.0.2   2024-10-28 [2] CRAN (R 4.5.0)
#>  xfun                   0.52    2025-04-02 [2] CRAN (R 4.5.0)
#>  XVector                0.48.0  2025-04-15 [2] Bioconductor 3.21 (R 4.5.0)
#>  yaml                   2.3.10  2024-07-26 [2] CRAN (R 4.5.0)
#> 
#>  [1] /tmp/RtmpfT2DTF/Rinst2a33c11e698a10
#>  [2] /home/biocbuild/bbs-3.21-bioc/R/site-library
#>  [3] /home/biocbuild/bbs-3.21-bioc/R/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────