Here, we demonstrate a grid search of clustering parameters with a mouse
hippocampus VeraFISH dataset. BANKSY currently provides four algorithms for
clustering the BANKSY matrix with clusterBanksy: Leiden (default), Louvain,
k-means, and model-based clustering. In this vignette, we run only Leiden
clustering. See ?clusterBanksy for more details on the parameters for
different clustering methods.
The dataset comprises gene expression for 10,944 cells and 120 genes in 2
spatial dimensions. See ?Banksy::hippocampus for more details.
# Load libs
library(Banksy)
library(SummarizedExperiment)
library(SpatialExperiment)
library(scuttle)
library(scater)
library(cowplot)
library(ggplot2)
# Load data
data(hippocampus)
gcm <- hippocampus$expression
locs <- as.matrix(hippocampus$locations)Here, gcm is a gene by cell matrix, and locs is a matrix specifying the
coordinates of the centroid for each cell.
head(gcm[,1:5])
#>         cell_1276 cell_8890 cell_691 cell_396 cell_9818
#> Sparcl1        45         0       11       22         0
#> Slc1a2         17         0        6        5         0
#> Map            10         0       12       16         0
#> Sqstm1         26         0        0        2         0
#> Atp1a2          0         0        4        3         0
#> Tnc             0         0        0        0         0
head(locs)
#>                 sdimx    sdimy
#> cell_1276  -13372.899 15776.37
#> cell_8890    8941.101 15866.37
#> cell_691   -14882.899 15896.37
#> cell_396   -15492.899 15835.37
#> cell_9818   11308.101 15846.37
#> cell_11310  14894.101 15810.37Initialize a SpatialExperiment object and perform basic quality control. We keep cells with total transcript count within the 5th and 98th percentile:
se <- SpatialExperiment(assay = list(counts = gcm), spatialCoords = locs)
colData(se) <- cbind(colData(se), spatialCoords(se))
# QC based on total counts
qcstats <- perCellQCMetrics(se)
thres <- quantile(qcstats$total, c(0.05, 0.98))
keep <- (qcstats$total > thres[1]) & (qcstats$total < thres[2])
se <- se[, keep]Next, perform normalization of the data.
# Normalization to mean library size
se <- computeLibraryFactors(se)
aname <- "normcounts"
assay(se, aname) <- normalizeCounts(se, log = FALSE)BANKSY has a few key parameters. We describe these below.
For characterising neighborhoods, BANKSY computes the weighted neighborhood
mean (H_0) and the azimuthal Gabor filter (H_1), which estimates gene
expression gradients. Setting compute_agf=TRUE computes both H_0 and H_1.
k_geom specifies the number of neighbors used to compute each H_m for
m=0,1. If a single value is specified, the same k_geom will be used
for each feature matrix. Alternatively, multiple values of k_geom can be
provided for each feature matrix. Here, we use k_geom[1]=15 and
k_geom[2]=30 for H_0 and H_1 respectively. More neighbors are used to
compute gradients.
We compute the neighborhood feature matrices using normalized expression
(normcounts in the se object).
k_geom <- c(15, 30)
se <- computeBanksy(se, assay_name = aname, compute_agf = TRUE, k_geom = k_geom)
#> Computing neighbors...
#> Spatial mode is kNN_median
#> Parameters: k_geom=15
#> Done
#> Computing neighbors...
#> Spatial mode is kNN_median
#> Parameters: k_geom=30
#> Done
#> Computing harmonic m = 0
#> Using 15 neighbors
#> Done
#> Computing harmonic m = 1
#> Using 30 neighbors
#> Centering
#> DonecomputeBanksy populates the assays slot with H_0 and H_1 in this
instance:
se
#> class: SpatialExperiment 
#> dim: 120 10205 
#> metadata(1): BANKSY_params
#> assays(4): counts normcounts H0 H1
#> rownames(120): Sparcl1 Slc1a2 ... Notch3 Egfr
#> rowData names(0):
#> colnames(10205): cell_1276 cell_691 ... cell_11635 cell_10849
#> colData names(4): sample_id sdimx sdimy sizeFactor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : sdimx sdimy
#> imgData names(1): sample_idThe lambda parameter is a mixing parameter in [0,1] which
determines how much spatial information is incorporated for downstream analysis.
With smaller values of lambda, BANKY operates in cell-typing mode, while at
higher levels of lambda, BANKSY operates in domain-finding mode. As a
starting point, we recommend lambda=0.2 for cell-typing and lambda=0.8 for
zone-finding. Here, we run lambda=0 which corresponds to non-spatial
clustering, and lambda=0.2 for spatially-informed cell-typing. We compute PCs
with and without the AGF (H_1).
lambda <- c(0, 0.2)
se <- runBanksyPCA(se, use_agf = c(FALSE, TRUE), lambda = lambda, seed = 1000)
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000runBanksyPCA populates the reducedDims slot, with each combination of
use_agf and lambda provided.
reducedDimNames(se)
#> [1] "PCA_M0_lam0"   "PCA_M0_lam0.2" "PCA_M1_lam0"   "PCA_M1_lam0.2"Next, we cluster the BANKSY embedding with Leiden graph-based clustering. This
admits two parameters: k_neighbors and resolution. k_neighbors determines
the number of k nearest neighbors used to construct the shared nearest
neighbors graph. Leiden clustering is then performed on the resultant graph
with resolution resolution. For reproducibiltiy we set a seed for each
parameter combination.
k <- 50
res <- 1
se <- clusterBanksy(se, use_agf = c(FALSE, TRUE), lambda = lambda, k_neighbors = k, resolution = res, seed = 1000)
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000clusterBanksy populates colData(se) with cluster labels:
colnames(colData(se))
#> [1] "sample_id"                "sdimx"                   
#> [3] "sdimy"                    "sizeFactor"              
#> [5] "clust_M0_lam0_k50_res1"   "clust_M0_lam0.2_k50_res1"
#> [7] "clust_M1_lam0_k50_res1"   "clust_M1_lam0.2_k50_res1"To compare clustering runs visually, different runs can be relabeled to
minimise their differences with connectClusters:
se <- connectClusters(se)
#> clust_M1_lam0_k50_res1 --> clust_M0_lam0_k50_res1
#> clust_M0_lam0.2_k50_res1 --> clust_M1_lam0_k50_res1
#> clust_M1_lam0.2_k50_res1 --> clust_M0_lam0.2_k50_res1Visualise spatial coordinates with cluster labels.
cnames <- colnames(colData(se))
cnames <- cnames[grep("^clust", cnames)]
cplots <- lapply(cnames, function(cnm) {
    plotColData(se, x = "sdimx", y = "sdimy", point_size = 0.1, colour_by = cnm) +
        coord_equal() +
        labs(title = cnm) +
        theme(legend.title = element_blank()) +
        guides(colour = guide_legend(override.aes = list(size = 2)))
})
plot_grid(plotlist = cplots, ncol = 2)Compare all cluster outputs with compareClusters. This function computes
pairwise cluster comparison metrics between the clusters in colData(se) based
on adjusted Rand index (ARI):
compareClusters(se, func = "ARI")
#>                          clust_M0_lam0_k50_res1 clust_M0_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                     0.67
#> clust_M0_lam0.2_k50_res1                  0.670                     1.00
#> clust_M1_lam0_k50_res1                    1.000                     0.67
#> clust_M1_lam0.2_k50_res1                  0.747                     0.87
#>                          clust_M1_lam0_k50_res1 clust_M1_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                    0.747
#> clust_M0_lam0.2_k50_res1                  0.670                    0.870
#> clust_M1_lam0_k50_res1                    1.000                    0.747
#> clust_M1_lam0.2_k50_res1                  0.747                    1.000or normalized mutual information (NMI):
compareClusters(se, func = "NMI")
#>                          clust_M0_lam0_k50_res1 clust_M0_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                    0.741
#> clust_M0_lam0.2_k50_res1                  0.741                    1.000
#> clust_M1_lam0_k50_res1                    1.000                    0.741
#> clust_M1_lam0.2_k50_res1                  0.782                    0.915
#>                          clust_M1_lam0_k50_res1 clust_M1_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                    0.782
#> clust_M0_lam0.2_k50_res1                  0.741                    0.915
#> clust_M1_lam0_k50_res1                    1.000                    0.782
#> clust_M1_lam0.2_k50_res1                  0.782                    1.000See ?compareClusters for the full list of comparison measures.
Vignette runtime:
#> Time difference of 52.79655 secssessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ExperimentHub_2.14.0        AnnotationHub_3.14.0       
#>  [3] BiocFileCache_2.14.0        dbplyr_2.5.0               
#>  [5] spatialLIBD_1.17.10         cowplot_1.1.3              
#>  [7] scater_1.34.0               ggplot2_3.5.1              
#>  [9] harmony_1.2.1               Rcpp_1.0.13                
#> [11] data.table_1.16.2           scran_1.34.0               
#> [13] scuttle_1.16.0              Seurat_5.1.0               
#> [15] SeuratObject_5.0.2          sp_2.1-4                   
#> [17] SpatialExperiment_1.16.0    SingleCellExperiment_1.28.0
#> [19] SummarizedExperiment_1.36.0 Biobase_2.66.0             
#> [21] GenomicRanges_1.58.0        GenomeInfoDb_1.42.0        
#> [23] IRanges_2.40.0              S4Vectors_0.44.0           
#> [25] BiocGenerics_0.52.0         MatrixGenerics_1.18.0      
#> [27] matrixStats_1.4.1           Banksy_1.2.0               
#> [29] BiocStyle_2.34.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] bitops_1.0-9             spatstat.sparse_3.1-0    doParallel_1.0.17       
#>   [4] httr_1.4.7               RColorBrewer_1.1-3       tools_4.4.1             
#>   [7] sctransform_0.4.1        DT_0.33                  utf8_1.2.4              
#>  [10] R6_2.5.1                 lazyeval_0.2.2           uwot_0.2.2              
#>  [13] withr_3.0.2              gridExtra_2.3            progressr_0.15.0        
#>  [16] cli_3.6.3                spatstat.explore_3.3-3   fastDummies_1.7.4       
#>  [19] labeling_0.4.3           sass_0.4.9               spatstat.data_3.1-2     
#>  [22] ggridges_0.5.6           pbapply_1.7-2            Rsamtools_2.22.0        
#>  [25] dbscan_1.2-0             aricode_1.0.3            dichromat_2.0-0.1       
#>  [28] sessioninfo_1.2.2        parallelly_1.38.0        attempt_0.3.1           
#>  [31] maps_3.4.2               limma_3.62.0             pals_1.9                
#>  [34] RSQLite_2.3.7            BiocIO_1.16.0            generics_0.1.3          
#>  [37] ica_1.0-3                spatstat.random_3.3-2    dplyr_1.1.4             
#>  [40] Matrix_1.7-1             ggbeeswarm_0.7.2         fansi_1.0.6             
#>  [43] abind_1.4-8              lifecycle_1.0.4          yaml_2.3.10             
#>  [46] edgeR_4.4.0              SparseArray_1.6.0        Rtsne_0.17              
#>  [49] paletteer_1.6.0          grid_4.4.1               blob_1.2.4              
#>  [52] promises_1.3.0           dqrng_0.4.1              crayon_1.5.3            
#>  [55] miniUI_0.1.1.1           lattice_0.22-6           beachmat_2.22.0         
#>  [58] mapproj_1.2.11           KEGGREST_1.46.0          magick_2.8.5            
#>  [61] pillar_1.9.0             knitr_1.48               metapod_1.14.0          
#>  [64] rjson_0.2.23             future.apply_1.11.3      codetools_0.2-20        
#>  [67] leiden_0.4.3.1           glue_1.8.0               spatstat.univar_3.0-1   
#>  [70] vctrs_0.6.5              png_0.1-8                spam_2.11-0             
#>  [73] gtable_0.3.6             rematch2_2.1.2           cachem_1.1.0            
#>  [76] xfun_0.48                S4Arrays_1.6.0           mime_0.12               
#>  [79] survival_3.7-0           RcppHungarian_0.3        iterators_1.0.14        
#>  [82] tinytex_0.53             fields_16.3              statmod_1.5.0           
#>  [85] bluster_1.16.0           fitdistrplus_1.2-1       ROCR_1.0-11             
#>  [88] nlme_3.1-166             bit64_4.5.2              filelock_1.0.3          
#>  [91] RcppAnnoy_0.0.22         bslib_0.8.0              irlba_2.3.5.1           
#>  [94] vipor_0.4.7              KernSmooth_2.23-24       colorspace_2.1-1        
#>  [97] DBI_1.2.3                tidyselect_1.2.1         bit_4.5.0               
#> [100] compiler_4.4.1           curl_5.2.3               BiocNeighbors_2.0.0     
#> [103] DelayedArray_0.32.0      plotly_4.10.4            rtracklayer_1.66.0      
#> [106] bookdown_0.41            scales_1.3.0             lmtest_0.9-40           
#> [109] rappdirs_0.3.3           stringr_1.5.1            digest_0.6.37           
#> [112] goftest_1.2-3            spatstat.utils_3.1-0     rmarkdown_2.28          
#> [115] benchmarkmeData_1.0.4    RhpcBLASctl_0.23-42      XVector_0.46.0          
#> [118] htmltools_0.5.8.1        pkgconfig_2.0.3          highr_0.11              
#> [121] fastmap_1.2.0            rlang_1.1.4              htmlwidgets_1.6.4       
#> [124] UCSC.utils_1.2.0         shiny_1.9.1              farver_2.1.2            
#> [127] jquerylib_0.1.4          zoo_1.8-12               jsonlite_1.8.9          
#> [130] BiocParallel_1.40.0      mclust_6.1.1             config_0.3.2            
#> [133] RCurl_1.98-1.16          BiocSingular_1.22.0      magrittr_2.0.3          
#> [136] GenomeInfoDbData_1.2.13  dotCall64_1.2            patchwork_1.3.0         
#> [139] munsell_0.5.1            viridis_0.6.5            reticulate_1.39.0       
#> [142] leidenAlg_1.1.4          stringi_1.8.4            zlibbioc_1.52.0         
#> [145] MASS_7.3-61              plyr_1.8.9               parallel_4.4.1          
#> [148] listenv_0.9.1            ggrepel_0.9.6            deldir_2.0-4            
#> [151] Biostrings_2.74.0        sccore_1.0.5             splines_4.4.1           
#> [154] tensor_1.5               locfit_1.5-9.10          igraph_2.1.1            
#> [157] spatstat.geom_3.3-3      RcppHNSW_0.6.0           reshape2_1.4.4          
#> [160] ScaledMatrix_1.14.0      XML_3.99-0.17            BiocVersion_3.20.0      
#> [163] evaluate_1.0.1           golem_0.5.1              BiocManager_1.30.25     
#> [166] foreach_1.5.2            httpuv_1.6.15            RANN_2.6.2              
#> [169] tidyr_1.3.1              purrr_1.0.2              polyclip_1.10-7         
#> [172] benchmarkme_1.0.8        future_1.34.0            scattermore_1.2         
#> [175] rsvd_1.0.5               xtable_1.8-4             restfulr_0.0.15         
#> [178] RSpectra_0.16-2          later_1.3.2              viridisLite_0.4.2       
#> [181] tibble_3.2.1             GenomicAlignments_1.42.0 memoise_2.0.1           
#> [184] beeswarm_0.4.0           AnnotationDbi_1.68.0     cluster_2.1.6           
#> [187] shinyWidgets_0.8.7       globals_0.16.3