---
title: "vignette"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{vignette}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
---

# ClusterGVis

To enhance clustering and visualization of time-series gene expression
data from RNA-Seq experiments, we present the ClusterGVis package. This
tool enables concise and elegant analysis of time-series gene expression
data in a simple, one-step operation. Additionally, you can perform
enrichment analysis for each cluster using the enrichCluster function,
which integrates seamlessly with clusterProfiler. ClusterGVis empowers
you to create publication-quality visualizations with ease.

Comprehensive documentation can be found
<https://junjunlab.github.io/ClusterGvis-manual>.

## Usage

### Basic examples:

Here we load the built-in RNA-seq expression matrix, where each column
represents transcriptome gene expression information from different
differentiation stages: zygote, two-cell, four-cell, eight-cell, morula,
and blastocyst:

```{r,eval=TRUE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(library(S4Vectors))
library(ClusterGVis)

# a data.frame or SummarizedExperiment object
data("exps")

head(exps)
```

The **getClusters** function employs the elbow method to help users
pre-determine the appropriate number of clusters for their analysis:

```{r,eval=TRUE,fig.width=5, message=FALSE, warning=FALSE}
# check suitable cluster nmbers
getClusters(obj = exps)
```

To investigate gene expression modules that exhibit distinct expression
patterns across different differentiation stages, we employ k-means
clustering to group genes, with the number of clusters set to 8:

```{r,eval=TRUE, message=FALSE, warning=FALSE}
# using kemans for clustering
ck <- clusterData(obj = exps,
                  clusterMethod = "kmeans",
                  clusterNum = 8)
```

Besides standard gene expression matrices (in data.frame or matrix
format), users can also directly pass **SummarizedExperiment** objects
as input data:

```{r,eval=TRUE, message=FALSE, warning=FALSE}
# construct a SummarizedExperiment object
sce <- SummarizedExperiment(assays = list(counts = exps),
                            colData = S4Vectors::DataFrame(
                              sample = colnames(exps),
                              row.names = colnames(exps))
                            )

sce

# using kemans for clustering
ck2 <- clusterData(obj = sce,
                  clusterMethod = "kmeans",
                  clusterNum = 8)
```

We can then visualize the clustering results. The **visCluster**
function supports various visualization methods, including line plots,
heatmaps, and complex composite graphics, to demonstrate the expression
trend patterns of genes across different modules:

Line plot:

```{r,eval=TRUE,fig.width=10,fig.height=6, message=FALSE, warning=FALSE}
# plot line only
visCluster(object = ck,
           plotType = "line")
```

Heatmap plot:

```{r,eval=TRUE,fig.width=5,fig.height=10, message=FALSE, warning=FALSE}
# plot heatmap only
visCluster(object = ck,
           plotType = "heatmap")
```

Complex heatmap with line plot annotation:

```{r,eval=TRUE,fig.width=6,fig.height=10, message=FALSE, warning=FALSE}
# plot heatmap only
visCluster(object = ck,
           plotType = "both")
```

### Integration with seurat object:

ClusterGVis is compatible with outputs from single-cell analysis
pipelines, such as Seurat objects. Here we demonstrate the visualization
of marker genes discovered for distinct cell subpopulations:

```{r,eval=TRUE,fig.width=10,fig.height=9, message=FALSE, warning=FALSE}
suppressPackageStartupMessages(library(Seurat))

data("pbmc_subset")

# find markers for every cluster compared to all remaining cells
# report only the positive ones
pbmc.markers.all <- Seurat::FindAllMarkers(pbmc_subset,
                                           only.pos = TRUE,
                                           min.pct = 0.25,
                                           logfc.threshold = 0.25)

# get top 10 genes
pbmc.markers <- pbmc.markers.all |>
  dplyr::group_by(cluster) |>
  dplyr::top_n(n = 20, wt = avg_log2FC)

# check
head(pbmc.markers)

# prepare data from seurat object
st.data <- prepareDataFromscRNA(object = pbmc_subset,
                                diffData = pbmc.markers,
                                showAverage = TRUE)

# check
str(st.data)
```

Heatmap plot:

```{r,eval=TRUE,fig.width=6,fig.height=10, message=FALSE, warning=FALSE}
# add gene name
markGenes <- unique(pbmc.markers$gene)[
  sample(1:length(unique(pbmc.markers$gene)),40,replace = FALSE)]

# heatmap plot
# pdf('sc1.pdf',height = 10,width = 6,onefile = FALSE)
p <- visCluster(object = st.data,
           plotType = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes,
           clusterOrder = c(1:9))
# dev.off()
```

### Integration with SingleCellExperiment object:

If you are working with a `SingleCellExperiment` object, you can use
**ClusterGVis** to easily extract data and generate plots:

```{r,eval=TRUE,fig.width=6,fig.height=8, message=FALSE, warning=FALSE}
library(Seurat)
data("pbmc_subset")

# transform into SingleCellExperiment 
sce <- as.SingleCellExperiment(pbmc_subset)

pbmc.markers.all <- Seurat::FindAllMarkers(pbmc_subset,
                                           only.pos = TRUE,
                                           min.pct = 0.25,
                                           logfc.threshold = 0.25)

# get top 10 genes
pbmc.markers <- pbmc.markers.all |>
  dplyr::group_by(cluster) |>
  dplyr::top_n(n = 20, wt = avg_log2FC)

st.data <- prepareDataFromscRNA(object = sce,
                                diffData = pbmc.markers[,c("cluster","gene")],
                                showAverage = TRUE)

visCluster(object = st.data,
           plotType = "heatmap",
           column_names_rot = 45,
           markGenes = markGenes,
           clusterOrder = c(1:9))
```

# Session Info

```{r}
sessionInfo()
```