--- title: "BLASE for annotating scRNA-seq" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{BLASE for annotating scRNA-seq} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, results='hide', message=FALSE, warning=FALSE} library(scater) library(ggplot2) library(BiocParallel) library(blase) library(patchwork) ``` ```{r, randomseed} RNGversion("3.5.0") SEED <- 7 set.seed(SEED) ``` ```{r, concurrency} N_CORES <- 2 bpparam <- MulticoreParam(N_CORES) ``` This article will show how BLASE can be used for annotating scRNA-seq data using existing bulk or microarray data. We make use of scRNA-seq [Dogga et al. 2024](https://www.science.org/doi/10.1126/science.adj4088) and microarray data from [Painter et al. 2018](https://www.nature.com/articles/s41467-018-04966-3). Code for generating the objects used here is available in the BLASE reproducibility repository. ## Load Data First we'll load in the data we're using, pre-prepared from the BLASE reproducibility repository. ```{r, load_data} data(painter_microarray, package = "blase") data(MCA_PF_SCE, package = "blase") ``` We can examine the true lifecycle stages, and also the calculated pseudotime trajectory (Slingshot). ```{r, sc_umaps} #| fig.alt: > #| UMAP of Dogga et al. single-cell RNA-seq reference coloured by lifecycle #| stage, showing the cycle from Rings, Trophozoites, Schizonts, to Rings #| again. plotUMAP(MCA_PF_SCE, colour_by = "STAGE_HR") #| fig.alt: > #| UMAP of Dogga et al. single-cell RNA-seq reference coloured by pseudotime, #| starting with Rings, and ending with Schizonts. plotUMAP(MCA_PF_SCE, colour_by = "slingPseudotime_1") ``` ## Prepare BLASE Now we'll prepare BLASE for use. ### Create BLASE data object First, we create the object, giving it the number of bins we want to use, and how to calculate them. ```{r, create_BLASE_object} blase_data <- as.BlaseData( MCA_PF_SCE, pseudotime_slot = "slingPseudotime_1", n_bins = 48, split_by = "cells" ) # Add these bins to the sc metadata MCA_PF_SCE <- assign_pseudotime_bins(MCA_PF_SCE, pseudotime_slot = "slingPseudotime_1", n_bins = 48, split_by = "cells" ) ``` ### Select Genes Now we will select the genes we want to use, using BLASE's gene peakedness metric. ```{r, calculate_gene_peakedness} gene_peakedness_info <- calculate_gene_peakedness( MCA_PF_SCE, window_pct = 5, knots = 18, BPPARAM = bpparam ) genes_to_use <- gene_peakedness_spread_selection( MCA_PF_SCE, gene_peakedness_info, genes_per_bin = 30, n_gene_bins = 30 ) head(gene_peakedness_info) ``` By using the `gene_peakedness_spread_selection` function, we can ensure that genes with high ratios are selected from throughout the trajectory. ```{r, plot_gene_peakedness} #| fig.alt: > #| Scatter plot showing gene peakedness ratios for all genes, ordered #| by pseudotime. ggplot(gene_peakedness_info, aes(x = peak_pseudotime, y = ratio)) + geom_point() + ggtitle("All genes") gene_peakedness_selected_genes <- gene_peakedness_info[ gene_peakedness_info$gene %in% genes_to_use, ] ``` ```{r plot_gene_peakedness_selection} #| fig.alt: > #| Scatter plot showing gene peakedness ratios for genes selected by BLASE #| spread selection, ordered by pseudotime. ggplot(gene_peakedness_selected_genes, aes(x = peak_pseudotime, y = ratio)) + geom_point() + ggtitle("Selected genes") ``` Here, we add them to the BLASE object for mapping with. ```{r, add_genes_to_blase_data} genes(blase_data) <- genes_to_use ``` ## Calculate Mappings Now we can perform the actual mapping step, and review the results. ```{r, mapping} mapping_results <- map_all_best_bins( blase_data, painter_microarray, BPPARAM = bpparam ) ``` ```{r mappingPlot} #| fig.alt: > #| Heatmap of mapping correlations of the Painter et al. data onto #| the Dogga et al. scRNA-seq. plot_mapping_result_heatmap(mapping_results) ``` ## Transfer Mappings In the Painter et al. paper, the expected lifecycle stages are as follows: | Lifecycle Stage | Painter et al. HPI | |-----------------|--------------------| | Ring | 0-21 | | Trophozoite | 16-32 | | Schizont | 33-48 | We can use this to transfer back to the scRNA-seq as follows: ```{r, plot_pseudotime_bins} #| fig.alt: > #| A UMAP of the reference scRNA-seq data coloured by pseudotime bin, as #| calculated by BLASE. plotUMAP(MCA_PF_SCE, colour_by = "pseudotime_bin") ``` ```{r, transfer_mappings_to_sc} annotated_sce <- annotate_sce(MCA_PF_SCE, mapping_results, include_stats = TRUE) annotated_sce$BLASE_Annotation <- gsub(" hpi", "", annotated_sce$BLASE_Annotation) annotated_sce$BLASE_Annotation <- as.numeric(annotated_sce$BLASE_Annotation) annotation_plot <- plotUMAP(annotated_sce, colour_by = "BLASE_Annotation") corr_plot <- plotUMAP(annotated_sce, colour_by = "BLASE_Annotation_Correlation") + labs(color = "Correlation") ``` ```{r, transfer_mappings_to_scPlot} #| fig.alt: > #| UMAP of Dogga et al. single-cell RNA-seq reference coloured #| by best matching bulk RNA-seq sample (hpi). Beside it is a UMAP of #| the correlations of these mappings. annotation_plot + corr_plot ``` ```{r, transfer_painter_stages_to_sc} annotated_sce$BLASE_Annotation_transfer <- "" for (best_bulk in unique(annotated_sce$BLASE_Annotation)) { mask <- colData(annotated_sce)[, "BLASE_Annotation"] == best_bulk if (best_bulk %in% 0:15) { annotated_sce[, mask]$BLASE_Annotation_transfer <- "Ring" } else if (best_bulk %in% 16:21) { annotated_sce[, mask]$BLASE_Annotation_transfer <- "Ring/Trophozoite" } else if (best_bulk %in% 22:32) { annotated_sce[, mask]$BLASE_Annotation_transfer <- "Trophozoite" } else if (best_bulk %in% 33:48) { annotated_sce[, mask]$BLASE_Annotation_transfer <- "Schizont" } } ``` ```{r transfer_painter_stages_to_scPlot} #| fig.alt: > #| UMAP of Dogga et al. single-cell RNA-seq reference coloured #| the original mappings from Dogga et al. and by best #| matching bulk RNA-seq sample, and then named by the expected cell type ( #| from annotations in Painter et al.). #| The mappings are broadly the same. plotUMAP(annotated_sce, colour_by = "BLASE_Annotation_transfer") + plotUMAP(annotated_sce, colour_by = "STAGE_LR") ``` ## Session Info ```{r, sessioninfo} sessionInfo() ```