| Version: | 1.0.0 | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Description: | A mutation analysis tool that discovers cancer driver genes with frequent mutations in protein signalling sites such as post-translational modifications (phosphorylation, ubiquitination, etc). The Poisson generalised linear regression model identifies genes where cancer mutations in signalling sites are more frequent than expected from the sequence of the entire gene. Integration of mutations with signalling information helps find new driver genes and propose candidate mechanisms to known drivers. Reference: Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Juri Reimand and Gary D Bader. Molecular Systems Biology (2013) 9:637 <doi:10.1038/msb.2012.68>. | 
| Title: | Finding Cancer Driver Proteins with Enriched Mutations in Post-Translational Modification Sites | 
| Depends: | R (≥ 3.0) | 
| Imports: | stats, parallel, MASS | 
| Collate: | 'ActiveDriver.R' | 
| RoxygenNote: | 6.0.1.9000 | 
| NeedsCompilation: | no | 
| Packaged: | 2017-08-23 18:26:21 UTC; dthompson | 
| Author: | Juri Reimand [aut, cre] | 
| Maintainer: | Juri Reimand <juri.reimand@utoronto.ca> | 
| Repository: | CRAN | 
| Date/Publication: | 2017-08-23 20:55:51 UTC | 
Identification of active protein sites (post-translational modification sites, signalling domains, etc) with specific and significant mutations.
Description
Identification of active protein sites (post-translational modification sites, signalling domains, etc) with specific and significant mutations.
Usage
ActiveDriver(sequences, seq_disorder, mutations, active_sites, flank = 7,
  mid_flank = 2, mc.cores = 1, simplified = FALSE,
  return_records = FALSE, skip_mismatch = TRUE,
  regression_type = "poisson", enriched_only = TRUE)
Arguments
| sequences | character vector of protein sequences, names are protein IDs. | 
| seq_disorder | character vector of disorder in protein sequences, names are protein IDs and values are strings 1/0 for disordered/ordered protein residues. | 
| mutations | data frame of mutations, with [gene, sample_id, position, wt_residue, mut_residue] as columns. | 
| active_sites | data frame of active sites, with [gene, position, residue, kinase] as columns. Kinase field may be blank and is shown for informative purposes. | 
| flank | numeric for selecting region size around active sites considered important for site activity. Default value is 7. Ignored in case of simplified analysis. | 
| mid_flank | numeric for splitting flanking region size into proximal (<=X) and distal (>X). Default value is 2. Ignored in case of simplified analysis. | 
| mc.cores | numeric for indicating number of computing cores dedicated to computation. Default value is 1. | 
| simplified | true/false for selecting simplified analysis. Default value is FALSE. If TRUE, no flanking regions are considered and only indicated sites are tested for mutations. | 
| return_records | true/false for returning a collection of gene records with more data regarding sites and mutations. Default value is FALSE. | 
| skip_mismatch | true/false for skipping mutations whose reference protein residue does not match expected residue from FASTA sequence file. | 
| regression_type | 'nb' for negative binomial, 'poisson' for poisson GLM. The latter is default. | 
| enriched_only | true/false to indicate whether only sites with enriched active site mutations will be included in the final p-value estimation (TRUE is default). If FALSE, sites with less than expected mutations will be also included. | 
Value
list with the following components: @return all_active_mutations - table with mutations that hit or flank an active site. Additional columns of interest include Status (DI - direct active mutation; N1 - proximal flanking mutation; N2 - distal flanking mutation) and Active_region (region ID of active sites in that protein).
all_active_sites -
all_region_based_pval - p-values for regions of sites, statistics on observed mutations (obs) and expected mutations (exp, low, high based on mean and s.d. from Poisson sampling). The field Region identifies region in all_active_sites.
Author(s)
Juri Reimand <juri.reimand@utoronto.ca>
References
Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers (2013, Molecular Systems Biology) by Juri Reimand and Gary Bader.
Examples
data(ActiveDriver_data)
phos_results = ActiveDriver(sequences, sequence_disorder, mutations, phosphosites)
ovarian_mutations = mutations[grep("ovarian", mutations$sample_id),]
phos_results_ovarian = ActiveDriver(sequences, sequence_disorder, ovarian_mutations, phosphosites)
GBM_muts = mutations[grep("glioblastoma", mutations$sample_id),]
kin_rslt_GBM = ActiveDriver(sequences, sequence_disorder, GBM_muts, kinase_domains, simplified=TRUE)
kin_results = ActiveDriver(sequences, sequence_disorder, mutations, kinase_domains, simplified=TRUE)
Example kinase domains for ActiveDriver
Description
A dataset describing kinase domains. The variables are as follows:
Usage
data(ActiveDriver_data)
Format
A data frame with 1 observation of 4 variables
Details
- gene. the gene symbol of the gene where the kinase domain occurs 
- position. the position in the protein sequence where the kinase domain begins 
- phos. TRUE 
- residue. the kinase domain residues 
Example mutations for ActiveDriver
Description
A dataset describing mis-sense mutations (i.e., substitutions in proteins). The variables are as follows:
Usage
data(ActiveDriver_data)
Format
A data frame with 408 observations of 5 variables
Details
- gene. the mutated gene 
- sample_id. the sample where the mutation originates 
- position. the position in the protein sequence where the mutation occurs 
- wt_residue. the wild-type residue 
- mut_residue. the mutant residue 
Example phosphosites for ActiveDriver
Description
A dataset describing protein phosphorylation sites. The variables are as follows:
Usage
data(ActiveDriver_data)
Format
A data frame with 131 observations of 4 variables
Details
- gene. the gene symbol the phosphosite occurs in 
- position. the position in the protein sequence where the phosphosite occurs 
- residue. the phosphosite residue 
- kinase. the kinase that phosphorylates this site 
Read FASTA file as character vector.
Description
Read FASTA file as character vector.
Usage
read_fasta(fname)
Arguments
| fname | name of file to be read. | 
Value
character vector with names corresponding to annotations from FASTA.
Example protein disorder for ActiveDriver
Description
A dataset containing the disorder of four proteins.
Usage
data(ActiveDriver_data)
Format
A named character vector with 4 elements
Example protein sequences for ActiveDriver
Description
A dataset containing the sequences of four proteins.
Usage
data(ActiveDriver_data)
Format
A named character vector with 4 elements