--- title: "revert" author: "Hui Xiao, Stephen Pettitt, Syed Haider" date: "`r Sys.Date()`" output: html_document: toc: yes theme: united highlight: kate toc_float: collapsed: yes smooth_scroll: yes pdf_document: toc: yes pdf_document: toc: yes vignette: > %\VignetteIndexEntry{help} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", cache = TRUE, eval = TRUE ) ``` ## Overview Reversion mutations are secondary mutations that reverse the deleterious effects of an original pathogenic mutation, partially or fully restoring the gene's function. Reversion mutations are key mechanisms for cancer cells to develop resistance to targeted therapies such as PARP inhibitors which target DNA damage repair in cancers with BRCA1/2 mutations. Detecting reversion mutations can help understand treatment failure and predict resistance. Monitoring reversions through blood tests (ctDNA) during treatment can offer early warnings of acquired resistance. The `revert` package detects reversions for a specific pathogenic mutation from BAM files of DNA-seq data. `revert` performs local realignments of reads in flanking windows surrounding the pathogenic mutation with permissive gap opening for soft-clipped reads and adjustments subject to pathogenic mutation, and identifies reversion mutations that restore the open reading frame of the reference gene or the reference sequence, e.g., secondary indels converting the orignal frameshift insertion or deletion into inframe indels, secondary SNVs restoring the mutant codon caused by the original nonsense or missense SNV, indels or SNVs replacing the original pathogenic mutation, secondary SNVs creating a cryptic splice donor/acceptor site or a cryptic start/stop codon, etc. The `revert` package is designed to be applicable to most types of DNA-seq data such as ctDNA, WES, WGS and targeted amplicon sequencing (TAS). To start using `revert` quickly, see the [Examples](#examples) section. ## Prerequisite - R \>= 4.4.0 ## Inputs ### Required information for running revert - A BAM file containing aligned reads to be processed, see below for recommendations on [BAM file preparation](#bam) - A file path to write output files - The reference genome version (hg19/hg38/mm10) or a FASTA file containing the open reading frames of reference sequences - Genomic position of a pathogenic mutation following the HGVS-like syntax for substitution, deletion, insertion, deletion-insertion (delins), or duplication, e.g., "chr13:g.32913778T>G", "chr13:g.32913319_32913320delTG", "chr17:g.41244706_41244707insT", "chr17:g.41244936delinsAA", "Brca2_5805_wt:117del" - Gene name and transcript Ensembl ID of the pathogenic mutation for the reference genome hg19, hg38 or mm10 - Other default parameters ```{r, echo=FALSE} df.params <- data.frame( Parameter = c( "detection.window", "splice.region", "check.soft.clipping", "softClippedReads.realign.window", "softClippedReads.realign.match", "softClippedReads.realign.mismatch", "softClippedReads.realign.gapOpening", "softClippedReads.realign.gapExtension", "check.wildtype.reads", "is.paired.end", "keep.duplicate.reads", "keep.secondary.alignment", "keep.supplementary.alignment", "minimum.mapping.quality", "verbose", "out.failed.reads" ), Description = c( "the length of flanking regions to be added to both ends of pathogenic mutation locus for detecting reversion mutations", "the length of splicing junction region to be considered in introns", "whether soft-clipped reads to be realigned", "the length of flanking regions to be added to both ends of pathogenic mutation locus for realigning soft-clipped reads", "the scoring for a nucleotide match for realigning soft-clipped reads", "the scoring for a nucleotide mismatch for realigning soft-clipped reads", "the cost for opening a gap in the realignment of soft-clipped reads", "the incremental cost incurred along the length of the gap in the realignment of soft-clipped reads", "whether wild type reads to be processed as revertant-to-wildtype reads", "whether reads in BAM file are paired-end (TRUE) or single-end (FALSE)", "whether duplicated reads in the BAM file to be processed (TRUE) or discarded (FALSE)", "whether secondary alignment reads in the BAM file to be processed (TRUE) or discarded (FALSE)", "whether supplementary alignment reads in the BAM file to be processed (TRUE) or discarded (FALSE)", "the minimum mapping quality of reads in the BAM file to be processed", "whether progress logging to be printed to stdout", "whether the name of failed reads to be written to '.failed_reads.txt' file" ), Default = c( "100", "8", "TRUE", "1000", "1", "4", "6", "0", "FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "0", "TRUE", "FALSE" ) ) knitr::kable(df.params) ``` ### BAM file preparation {#bam} Many state-of-art NGS aligners enable clipping modes to improve the accuracy of reads alignment by focusing on the high-confidence and well-aligned parts of a read and discarding (hard-clipping) or ignoring (soft-clipping) the non-aligned parts caused by adapters, large indels or translocations where the large indels or translocations might suggest potential large genomic rearrangements (LGRs) restoring the gene's function partially. The `revert` package realigns soft-clipped reads in flanking windows surrounding the pathogenic mutation with permissive gap opening to identify the LGR reversions. To improve the sensitivity for reversion detection, it is recommended to generate the BAM files by using standard aligners in soft-clipping mode, e.g., enabling parameters `-Y` for `bwa mem` and `--local` for `bowtie2`. ## Outputs The function `getReversions()` writes the following result files to the output directory: - '**.reversions.txt**' contains all reversions identified for the pathogenic mutation from the BAM file. ```{r, echo=FALSE} df.rev.tbl <- data.frame( Column = c( "pathogenic_mutation", "pathogenic_mutation_left_aligned", "reversion_id", "reversion_frequency", "pathogenic_mutation_retained", "reversion", "reads_total", "reads_wildtype", "reads_withPathogenicMutation", "reads_withReplacementMutation", "mutations_in_reversion" ), Description = c( "the original pathogenic mutation", "left-aligned position of the pathogenic mutation if it is an insertion or deletion", "unique identifier of the reversion", "number of reads carrying the reversion", "whether the pathogenic locus retained the original mutation (Yes), arose a different mutation (No), or reverted to wild type (WT)", "the reversion for pathogenic mutation, consisting of one or more mutations", "number of total reads aligned to the pathogenic mutation locus", "number of reads exhibiting wild type at the pathogenic mutation locus", "number of reads carrying the pathogenic mutation", "number of reads carrying a different mutation but not the pathogenic mutation at the pathogenic locus", "number of mutations included in the reversion" ) ) knitr::kable(df.rev.tbl) ``` - '**.split_mutations.txt**' contains information of each single mutation in a reversion. ```{r, echo=FALSE} df.mut.tbl <- data.frame( Column = c( "reversion_id", "mutation_id", "mutation_type", "mutation", "mutation_length_change", "pathogenic_mutation", "distance_to_pathogenic_mutation" ), Description = c( "unique identifier of a reversion, corresponding to the 'reversion_id' in '.reversions.txt'", "unique identifier of each single mutation in a reversion", "SNV, INS, DEL, DELINS or WT (self-revertant mutation represented by MT>WT)", "genomic position of the mutation in HGVS-like syntax", "length of the reference sequence change caused by the mutation", "the original pathogenic mutation", "distance in reference sequence between the mutation and the pathogenic mutation" ) ) knitr::kable(df.mut.tbl) ``` - '**.revert_assembly.bam**' contains all reads realigned to the pathogenic mutation. An RG tag is added to each realigned read indicating two read groups, 'Revertant' and 'NonRevertant'. The revert-assembled BAM file can be loaded to IGV for visualizing reversions. - '**.revert_assembly.bam.bai**' is the index file for '**.revert_assembly.bam**'. - '**.revert_settings.txt**' contains the summary of running parameters and processed reads. - '**.failed_reads.txt**' (optional) contains the names of reads failed for reversion detection. ## Examples {#examples} Reversion detection for a frameshift deletion ```{r del_example, eval=FALSE} library(revert) getReversions( bam.file = system.file("extdata", "toy_data_1.bam", package="revert"), out.dir = tempdir(), reference = "hg19", pathog.mut = "chr13:g.32913319_32913320delTG", gene.name = "BRCA2", transcript.id = "ENST00000544455" ) ``` Reversion detection for a frameshift insertion ```{r ins_example, eval=FALSE} getReversions( bam.file = system.file("extdata", "toy_data_2.bam", package="revert"), out.dir = tempdir(), reference = "hg19", pathog.mut = "chr17:g.41244706_41244707insT", gene.name = "BRCA1", transcript.id = "ENST00000357654" ) ``` Reversion detection for a frameshift deletion-insertion ```{r delins_example, eval=FALSE} getReversions( bam.file = system.file("extdata", "toy_data_3.bam", package="revert"), out.dir = tempdir(), reference = "hg19", pathog.mut = "chr17:g.41244936delinsAA", gene.name = "BRCA1", transcript.id = "ENST00000357654" ) ``` Reversion detection for a nonsense SNV ```{r ns_snv_example, eval=FALSE} getReversions( bam.file = system.file("extdata", "toy_data_4.bam", package="revert"), out.dir = tempdir(), reference = "hg19", pathog.mut = "chr13:g.32913778T>G", gene.name = "BRCA2", transcript.id = "ENST00000544455" ) ``` Reversion detection for a splice-acceptor SNV ```{r splice_snv_example, eval=FALSE} getReversions( bam.file = system.file("extdata", "toy_data_5.bam", package="revert"), out.dir = tempdir(), reference = "hg19", pathog.mut = "chr13:g.32928997G>A", gene.name = "BRCA2", transcript.id = "ENST00000544455" ) ``` Reversion detection for a targeted deletion with customised reference sequence ```{r targeted_del_example, eval=FALSE} getReversions( bam.file = system.file("extdata", "toy_data_6.bam", package="revert"), out.dir = tempdir(), reference = system.file("extdata", "toy_data_6_reference.fa", package="revert"), pathog.mut = "Brca2_5805_wt:117del", softClippedReads.realign.gapOpening = 8, check.wildtype.reads = TRUE ) ``` ## Acknowledgements Development of revert was supported by [Breast Cancer Now](https://breastcancernow.org/).