To enhance clustering and visualization of time-series gene expression data from RNA-Seq experiments, we present the ClusterGVis package. This tool enables concise and elegant analysis of time-series gene expression data in a simple, one-step operation. Additionally, you can perform enrichment analysis for each cluster using the enrichCluster function, which integrates seamlessly with clusterProfiler. ClusterGVis empowers you to create publication-quality visualizations with ease.
Comprehensive documentation can be found https://junjunlab.github.io/ClusterGvis-manual.
Here we load the built-in RNA-seq expression matrix, where each column represents transcriptome gene expression information from different differentiation stages: zygote, two-cell, four-cell, eight-cell, morula, and blastocyst:
suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(library(S4Vectors))
library(ClusterGVis)
# a data.frame or SummarizedExperiment object
data("exps")
head(exps)## Zygote 2-cell 4-cell 8-cell Morula Blastocyst
## Oog4 1.3132282 1.2370781 1.325978 1.262073 0.6549312 0.2067114
## Psmd9 1.0917337 1.3159888 1.174417 1.064756 0.8685598 0.4845448
## Sephs2 0.9859232 1.2010257 1.123076 1.084673 0.8878931 0.7174088
## Nhlrc2 0.9856354 1.0387869 1.061926 1.076825 0.9716945 0.8651322
## Trappc4 1.0775310 0.9757542 1.065544 1.080973 0.9732145 0.8269832
## Ywhah 1.0485306 1.0212216 1.117839 1.199569 1.0384096 0.5744298
The getClusters function employs the elbow method to help users pre-determine the appropriate number of clusters for their analysis:
To investigate gene expression modules that exhibit distinct expression patterns across different differentiation stages, we employ k-means clustering to group genes, with the number of clusters set to 8:
# using kemans for clustering
ck <- clusterData(obj = exps,
clusterMethod = "kmeans",
clusterNum = 8)## [1] "0 genes excluded.\n"
Besides standard gene expression matrices (in data.frame or matrix format), users can also directly pass SummarizedExperiment objects as input data:
# construct a SummarizedExperiment object
sce <- SummarizedExperiment(assays = list(counts = exps),
colData = S4Vectors::DataFrame(
sample = colnames(exps),
row.names = colnames(exps))
)
sce## class: SummarizedExperiment
## dim: 3767 6
## metadata(0):
## assays(1): counts
## rownames(3767): Oog4 Psmd9 ... Eprs Cenpe
## rowData names(0):
## colnames(6): Zygote 2-cell ... Morula Blastocyst
## colData names(1): sample
# using kemans for clustering
ck2 <- clusterData(obj = sce,
clusterMethod = "kmeans",
clusterNum = 8)## [1] "0 genes excluded.\n"
We can then visualize the clustering results. The visCluster function supports various visualization methods, including line plots, heatmaps, and complex composite graphics, to demonstrate the expression trend patterns of genes across different modules:
Line plot:
Heatmap plot:
Complex heatmap with line plot annotation:
ClusterGVis is compatible with outputs from single-cell analysis pipelines, such as Seurat objects. Here we demonstrate the visualization of marker genes discovered for distinct cell subpopulations:
suppressPackageStartupMessages(library(Seurat))
data("pbmc_subset")
# find markers for every cluster compared to all remaining cells
# report only the positive ones
pbmc.markers.all <- Seurat::FindAllMarkers(pbmc_subset,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
# get top 10 genes
pbmc.markers <- pbmc.markers.all |>
dplyr::group_by(cluster) |>
dplyr::top_n(n = 20, wt = avg_log2FC)
# check
head(pbmc.markers)## # A tibble: 6 × 7
## # Groups: cluster [1]
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
## 1 1.51e-18 0.585 1 0.973 7.57e-16 Naive CD4 T RPS23
## 2 8.37e-15 0.638 0.977 0.954 4.18e-12 Naive CD4 T RPSA
## 3 1.55e-14 0.560 0.992 0.973 7.74e-12 Naive CD4 T RPS16
## 4 1.45e- 8 0.436 0.955 0.965 7.24e- 6 Naive CD4 T RPL17
## 5 7.47e- 6 0.442 0.902 0.861 3.73e- 3 Naive CD4 T RPL23
## 6 1.24e- 3 0.918 0.406 0.278 6.19e- 1 Naive CD4 T FUS
# prepare data from seurat object
st.data <- prepareDataFromscRNA(object = pbmc_subset,
diffData = pbmc.markers,
showAverage = TRUE)
# check
str(st.data)## List of 5
## $ wide.res:'data.frame': 77 obs. of 11 variables:
## ..$ Naive CD4 T : num [1:77] 1.31 1.28 1.59 1.27 1.08 ...
## ..$ Memory CD4 T: num [1:77] 0.646 1.005 1.006 0.976 -0.256 ...
## ..$ CD14+ Mono : num [1:77] -0.586 -0.825 0.134 -0.334 0.291 ...
## ..$ B : num [1:77] 1.37 1.304 0.798 0.717 0.641 ...
## ..$ CD8 T : num [1:77] 0.2046 0.2121 -0.0593 0.7409 -0.0982 ...
## ..$ FCGR3A+ Mono: num [1:77] -0.424 -0.818 -0.812 -0.565 -0.37 ...
## ..$ NK : num [1:77] -0.82 -0.402 -0.71 -0.428 -1.442 ...
## ..$ DC : num [1:77] -0.0409 -0.3588 -0.3731 -0.5095 1.4949 ...
## ..$ Platelet : num [1:77] -1.66 -1.4 -1.58 -1.87 -1.34 ...
## ..$ gene : chr [1:77] "RPS23" "RPSA" "RPS16" "RPL17" ...
## ..$ cluster : chr [1:77] "1" "1" "1" "1" ...
## $ long.res:'data.frame': 693 obs. of 5 variables:
## ..$ cluster : chr [1:693] "1" "1" "1" "1" ...
## ..$ gene : chr [1:693] "RPS23" "RPSA" "RPS16" "RPL17" ...
## ..$ cell_type : Factor w/ 9 levels "Naive CD4 T",..: 1 1 1 1 1 1 2 2 2 2 ...
## ..$ norm_value : num [1:693] 1.31 1.28 1.59 1.27 1.08 ...
## ..$ cluster_name: Factor w/ 9 levels "cluster 1 (6)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ type : chr "scRNAdata"
## $ geneMode: chr "average"
## $ geneType: chr "unique|_"
Heatmap plot:
# add gene name
markGenes <- unique(pbmc.markers$gene)[
sample(1:length(unique(pbmc.markers$gene)),40,replace = FALSE)]
# heatmap plot
# pdf('sc1.pdf',height = 10,width = 6,onefile = FALSE)
p <- visCluster(object = st.data,
plotType = "heatmap",
column_names_rot = 45,
markGenes = markGenes,
clusterOrder = c(1:9))If you are working with a SingleCellExperiment object, you can use ClusterGVis to easily extract data and generate plots:
library(Seurat)
data("pbmc_subset")
# transform into SingleCellExperimentÂ
sce <- as.SingleCellExperiment(pbmc_subset)
pbmc.markers.all <- Seurat::FindAllMarkers(pbmc_subset,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
# get top 10 genes
pbmc.markers <- pbmc.markers.all |>
dplyr::group_by(cluster) |>
dplyr::top_n(n = 20, wt = avg_log2FC)
st.data <- prepareDataFromscRNA(object = sce,
diffData = pbmc.markers[,c("cluster","gene")],
showAverage = TRUE)
visCluster(object = st.data,
plotType = "heatmap",
column_names_rot = 45,
markGenes = markGenes,
clusterOrder = c(1:9))## R Under development (unstable) (2025-10-20 r88955)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Seurat_5.3.1 SeuratObject_5.2.0
## [3] sp_2.2-0 ClusterGVis_0.99.9
## [5] SummarizedExperiment_1.41.0 Biobase_2.71.0
## [7] GenomicRanges_1.63.0 Seqinfo_1.1.0
## [9] IRanges_2.45.0 S4Vectors_0.49.0
## [11] BiocGenerics_0.57.0 generics_0.1.4
## [13] MatrixGenerics_1.23.0 matrixStats_1.5.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 jsonlite_2.0.0
## [3] shape_1.4.6.1 magrittr_2.0.4
## [5] spatstat.utils_3.2-0 magick_2.9.0
## [7] farver_2.1.2 rmarkdown_2.30
## [9] GlobalOptions_0.1.2 vctrs_0.6.5
## [11] ROCR_1.0-11 spatstat.explore_3.5-3
## [13] Cairo_1.7-0 rstatix_0.7.3
## [15] htmltools_0.5.8.1 S4Arrays_1.11.0
## [17] broom_1.0.10 SparseArray_1.11.3
## [19] Formula_1.2-5 sctransform_0.4.2
## [21] sass_0.4.10 parallelly_1.45.1
## [23] KernSmooth_2.23-26 bslib_0.9.0
## [25] htmlwidgets_1.6.4 ica_1.0-3
## [27] plyr_1.8.9 plotly_4.11.0
## [29] zoo_1.8-14 cachem_1.1.0
## [31] igraph_2.2.1 mime_0.13
## [33] lifecycle_1.0.4 iterators_1.0.14
## [35] pkgconfig_2.0.3 Matrix_1.7-4
## [37] R6_2.6.1 fastmap_1.2.0
## [39] fitdistrplus_1.2-4 future_1.68.0
## [41] shiny_1.11.1 clue_0.3-66
## [43] digest_0.6.39 colorspace_2.1-2
## [45] patchwork_1.3.2 tensor_1.5.1
## [47] RSpectra_0.16-2 irlba_2.3.5.1
## [49] ggpubr_0.6.2 beachmat_2.27.0
## [51] labeling_0.4.3 progressr_0.18.0
## [53] spatstat.sparse_3.1-0 polyclip_1.10-7
## [55] httr_1.4.7 abind_1.4-8
## [57] compiler_4.6.0 withr_3.0.2
## [59] doParallel_1.0.17 S7_0.2.1
## [61] backports_1.5.0 BiocParallel_1.45.0
## [63] carData_3.0-5 fastDummies_1.7.5
## [65] ggsignif_0.6.4 MASS_7.3-65
## [67] DelayedArray_0.37.0 rjson_0.2.23
## [69] tools_4.6.0 lmtest_0.9-40
## [71] otel_0.2.0 httpuv_1.6.16
## [73] future.apply_1.20.0 goftest_1.2-3
## [75] glue_1.8.0 nlme_3.1-168
## [77] promises_1.5.0 grid_4.6.0
## [79] Rtsne_0.17 cluster_2.1.8.1
## [81] reshape2_1.4.5 spatstat.data_3.1-9
## [83] gtable_0.3.6 tidyr_1.3.1
## [85] data.table_1.17.8 utf8_1.2.6
## [87] car_3.1-3 XVector_0.51.0
## [89] spatstat.geom_3.6-1 RcppAnnoy_0.0.22
## [91] ggrepel_0.9.6 RANN_2.6.2
## [93] foreach_1.5.2 pillar_1.11.1
## [95] stringr_1.6.0 limma_3.67.0
## [97] spam_2.11-1 RcppHNSW_0.6.0
## [99] later_1.4.4 circlize_0.4.16
## [101] splines_4.6.0 dplyr_1.1.4
## [103] lattice_0.22-7 deldir_2.0-4
## [105] survival_3.8-3 tidyselect_1.2.1
## [107] ComplexHeatmap_2.27.0 SingleCellExperiment_1.33.0
## [109] miniUI_0.1.2 scuttle_1.21.0
## [111] pbapply_1.7-4 knitr_1.50
## [113] gridExtra_2.3 scattermore_1.2
## [115] xfun_0.54 statmod_1.5.1
## [117] factoextra_1.0.7 stringi_1.8.7
## [119] VGAM_1.1-13 lazyeval_0.2.2
## [121] yaml_2.3.10 evaluate_1.0.5
## [123] codetools_0.2-20 tibble_3.3.0
## [125] cli_3.6.5 uwot_0.2.4
## [127] reticulate_1.44.1 xtable_1.8-4
## [129] jquerylib_0.1.4 dichromat_2.0-0.1
## [131] Rcpp_1.1.0 spatstat.random_3.4-2
## [133] globals_0.18.0 png_0.1-8
## [135] spatstat.univar_3.1-5 parallel_4.6.0
## [137] ggplot2_4.0.1 dotCall64_1.2
## [139] listenv_0.10.0 viridisLite_0.4.2
## [141] scales_1.4.0 ggridges_0.5.7
## [143] purrr_1.2.0 crayon_1.5.3
## [145] GetoptLong_1.0.5 rlang_1.1.6
## [147] cowplot_1.2.0