---
title: "vignette"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{vignette}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
# ClusterGVis
To enhance clustering and visualization of time-series gene expression
data from RNA-Seq experiments, we present the ClusterGVis package. This
tool enables concise and elegant analysis of time-series gene expression
data in a simple, one-step operation. Additionally, you can perform
enrichment analysis for each cluster using the enrichCluster function,
which integrates seamlessly with clusterProfiler. ClusterGVis empowers
you to create publication-quality visualizations with ease.
Comprehensive documentation can be found
.
## Usage
### Basic examples:
Here we load the built-in RNA-seq expression matrix, where each column
represents transcriptome gene expression information from different
differentiation stages: zygote, two-cell, four-cell, eight-cell, morula,
and blastocyst:
```{r,eval=TRUE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(library(S4Vectors))
library(ClusterGVis)
# a data.frame or SummarizedExperiment object
data("exps")
head(exps)
```
The **getClusters** function employs the elbow method to help users
pre-determine the appropriate number of clusters for their analysis:
```{r,eval=TRUE,fig.width=5, message=FALSE, warning=FALSE}
# check suitable cluster nmbers
getClusters(obj = exps)
```
To investigate gene expression modules that exhibit distinct expression
patterns across different differentiation stages, we employ k-means
clustering to group genes, with the number of clusters set to 8:
```{r,eval=TRUE, message=FALSE, warning=FALSE}
# using kemans for clustering
ck <- clusterData(obj = exps,
clusterMethod = "kmeans",
clusterNum = 8)
```
Besides standard gene expression matrices (in data.frame or matrix
format), users can also directly pass **SummarizedExperiment** objects
as input data:
```{r,eval=TRUE, message=FALSE, warning=FALSE}
# construct a SummarizedExperiment object
sce <- SummarizedExperiment(assays = list(counts = exps),
colData = S4Vectors::DataFrame(
sample = colnames(exps),
row.names = colnames(exps))
)
sce
# using kemans for clustering
ck2 <- clusterData(obj = sce,
clusterMethod = "kmeans",
clusterNum = 8)
```
We can then visualize the clustering results. The **visCluster**
function supports various visualization methods, including line plots,
heatmaps, and complex composite graphics, to demonstrate the expression
trend patterns of genes across different modules:
Line plot:
```{r,eval=TRUE,fig.width=10,fig.height=6, message=FALSE, warning=FALSE}
# plot line only
visCluster(object = ck,
plotType = "line")
```
Heatmap plot:
```{r,eval=TRUE,fig.width=5,fig.height=10, message=FALSE, warning=FALSE}
# plot heatmap only
visCluster(object = ck,
plotType = "heatmap")
```
Complex heatmap with line plot annotation:
```{r,eval=TRUE,fig.width=6,fig.height=10, message=FALSE, warning=FALSE}
# plot heatmap only
visCluster(object = ck,
plotType = "both")
```
### Integration with seurat object:
ClusterGVis is compatible with outputs from single-cell analysis
pipelines, such as Seurat objects. Here we demonstrate the visualization
of marker genes discovered for distinct cell subpopulations:
```{r,eval=TRUE,fig.width=10,fig.height=9, message=FALSE, warning=FALSE}
suppressPackageStartupMessages(library(Seurat))
data("pbmc_subset")
# find markers for every cluster compared to all remaining cells
# report only the positive ones
pbmc.markers.all <- Seurat::FindAllMarkers(pbmc_subset,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
# get top 10 genes
pbmc.markers <- pbmc.markers.all |>
dplyr::group_by(cluster) |>
dplyr::top_n(n = 20, wt = avg_log2FC)
# check
head(pbmc.markers)
# prepare data from seurat object
st.data <- prepareDataFromscRNA(object = pbmc_subset,
diffData = pbmc.markers,
showAverage = TRUE)
# check
str(st.data)
```
Heatmap plot:
```{r,eval=TRUE,fig.width=6,fig.height=10, message=FALSE, warning=FALSE}
# add gene name
markGenes <- unique(pbmc.markers$gene)[
sample(1:length(unique(pbmc.markers$gene)),40,replace = FALSE)]
# heatmap plot
# pdf('sc1.pdf',height = 10,width = 6,onefile = FALSE)
p <- visCluster(object = st.data,
plotType = "heatmap",
column_names_rot = 45,
markGenes = markGenes,
clusterOrder = c(1:9))
# dev.off()
```
### Integration with SingleCellExperiment object:
If you are working with a `SingleCellExperiment` object, you can use
**ClusterGVis** to easily extract data and generate plots:
```{r,eval=TRUE,fig.width=6,fig.height=8, message=FALSE, warning=FALSE}
library(Seurat)
data("pbmc_subset")
# transform into SingleCellExperiment
sce <- as.SingleCellExperiment(pbmc_subset)
pbmc.markers.all <- Seurat::FindAllMarkers(pbmc_subset,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
# get top 10 genes
pbmc.markers <- pbmc.markers.all |>
dplyr::group_by(cluster) |>
dplyr::top_n(n = 20, wt = avg_log2FC)
st.data <- prepareDataFromscRNA(object = sce,
diffData = pbmc.markers[,c("cluster","gene")],
showAverage = TRUE)
visCluster(object = st.data,
plotType = "heatmap",
column_names_rot = 45,
markGenes = markGenes,
clusterOrder = c(1:9))
```
# Session Info
```{r}
sessionInfo()
```