| Title: | A Pipeline for Meta-Genome Wide Association | 
| Version: | 2.0.4 | 
| Date: | 2018-06-15 | 
| Description: | Correlates variation within the meta-genome to target species phenotype variations in meta-genome with association studies. Follows the pipeline described in Chaston, J.M. et al. (2014) <doi:10.1128/mBio.01631-14>. | 
| License: | MIT + file LICENSE | 
| LazyData: | true | 
| Imports: | ape, coxme, doParallel, dplyr, foreach, iterators, lme4, multcomp, parallel, plyr, qqman, survival, seqinr | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| Depends: | R (≥ 3.0) | 
| RoxygenNote: | 6.0.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2018-06-16 03:47:53 UTC; coripenrod | 
| Author: | Corinne Sexton [aut], John Chaston [aut, cre], Hayden Smith [ctb] | 
| Maintainer: | John Chaston <john_chaston@byu.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2018-07-12 07:20:17 UTC | 
Main OrthoMCL Analysis
Description
Main function for analyzing the statistical association of OG (orthologous group) presence with phenotype data
Usage
AnalyzeOrthoMCL(mcl_data, pheno_data, model, species_name, resp = NULL,
  fix2 = NULL, rndm1 = NULL, rndm2 = NULL, multi = 1, time = NULL,
  event = NULL, time2 = NULL, startnum = 1, stopnum = "end",
  output_dir = NULL, sig_digits = NULL, princ_coord = 0)
Arguments
| mcl_data | output of FormatAfterOrtho; a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG | 
| pheno_data | a data frame of phenotypic data with specific column names used to specify response variable as well as other fixed and random effects | 
| model | linear model with gene presence as fixed effect (lm), linear mixed mffect models with gene presence as fixed effect and additional variables specified as: one random effect (lmeR1); two independent random effects (lmeR2ind); two random effects with rndm2 nested in rndm1 (lmeR2nest); or two independent random effects with one additional fixed effect (lmeF2), Wilcox Test with gene presence as fixed effect (wx), Survival Tests with support for multi core design: with two random effects (survmulti), and with two times as well as an additional fixed variable (survmulticensor) | 
| species_name | Column name in pheno_data containing 4-letter species designations | 
| resp | Column name in pheno_data containing response variable | 
| fix2 | Column name in pheno_data containing second fixed effect | 
| rndm1 | Column name in pheno_data containing first random variable | 
| rndm2 | Column name in pheno_data containing second random variable | 
| multi | (can only be used with survival tests) Number of cores | 
| time | (can only be used with survival tests) Column name in pheno_data containing first time | 
| event | (can only be used with survival tests) Column name in pheno_data containing event | 
| time2 | (can only be used with survival tests) Column name in pheno_data containing second time | 
| startnum | number of test to start on | 
| stopnum | number of test to stop on | 
| output_dir | (if using survival tests) directory where small output files will be placed before using SurvAppendMatrix. Must specify a directory if choosing to output small files, else only written as a matrix | 
| sig_digits | amount of digits to display for p-values and means of data; default to NULL (no rounding) | 
| princ_coord | the number of principle coordinates to be included in model as fixed effects (1, 2, or 3), if a decimal is specified, as many principal coordinates as are needed to account for that percentage of the variance will be included in the analysis | 
Value
A matrix with the following columns: OG, p-values, Bonferroni corrected p-values, mean phenotype of OG-containing taxa, mean pheotype of OG-lacking taxa, taxa included in OG, taxa not included in OG
Examples
#Linear Model
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lm',
 'Treatment', resp='RespVar')
## End(Not run)
# the rest of the examples are not run for time's sake
#Linear Mixed Effect with one random effect
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR1',
'Treatment', resp='RespVar', rndm1='Experiment')
## End(Not run)
#Linear Mixed Effect with two independent random effects
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2ind',
 'Treatment', resp='RespVar', rndm1='Experiment', rndm2='Vial')
## End(Not run)
#Linear Mixed Effect with rndm2 nested in rndm1
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2nest',
 'Treatment',  resp='RespVar', rndm1='Experiment', rndm2='Vial')
## End(Not run) 
#Linear Mixed Effect with two independent random effects and one additional fixed effect
## Not run: 
mcl_mtrx3 <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeF2',
 'Treatment', resp='RespVar', fix2='Treatment', rndm1='Experiment', rndm2='Vial', princ_coord = 4)
## End(Not run)
#Wilcoxon Test
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'wx',
 'Treatment', resp='RespVar')
## End(Not run)
# ~ 5 minutes
#Survival with two independent random effects, run on multiple cores
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulti',
 time='t2', event='event', rndm1='EXP', rndm2='VIAL', multi=1)
## End(Not run)
# ~ 5 minutes
#Survival with two independent random effects and one additional fixed effect,
#including drops on multi cores
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulticensor',
 time='t1', time2='t2', event='event', rndm1='EXP', rndm2='VIAL', fix2='BACLO', multi=1)
 
## End(Not run)
#to be appended with SurvAppendMatrix
Show Principal Components Breakdown
Description
Function to show Principal Components statistics based on the OrthoMCL presence absence groupings.
Usage
CalculatePrincipalCoordinates(mcl_data)
Arguments
| mcl_data | output of FormatAfterOrtho –list of 2 things– 1: binary matrix indicating the presence / absence of genes in each OG and 2: vector of names of OGs | 
Value
returns a named list of principal components and accompanying proportion of variance for each
Examples
CalculatePrincipalCoordinates(after_ortho_format)
Format file from output of OrthoMCL algorithm before use in AnalyzeOrthoMCL
Description
After running OrthoMCL and/or submitting to www.orthomcl.org, formats the output file to be used in AnalyzeOrthoMCL
Usage
FormatAfterOrtho(file, format = "ortho")
Arguments
| file | Path to the OrthoMCL output file | 
| format | Specification of the method by which file was obtained: defaults to 'ortho' for output from orthomcl.org. Other option is 'groups' for output from local run of OrthoMCL software. | 
Value
a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG
Examples
file <- system.file('extdata', 'orthologGroups.txt', package='MAGNAMWAR')
after_ortho_format <- FormatAfterOrtho(file)
file_grps <- system.file('extdata', 'groups_example_r.txt', package='MAGNAMWAR')
after_ortho_format_grps <- FormatAfterOrtho(file_grps, format = 'groups')
Format all raw GenBank fastas to single OrthoMCL compatible fasta file
Description
Creates the composite fasta file for use in running OrthoMCL and/or submitting to www.orthomcl.org
Usage
FormatMCLFastas(fa_dir, genbnk_id = 4)
Arguments
| fa_dir | Path to the directory where all raw GenBank files are stored. Note, all file names must be changed to a 4-letter code representing each species and have '.fasta' file descriptor | 
| genbnk_id | (Only necessary for the deprecated version of fasta headers) The index of the sequence ID in the GenBank pipe-separated annotation line (default: 4) | 
Value
Returns nothing, but prints the path to the final OrthoMCL compatible fasta file
Examples
## Not run: 
dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR')
dir <- paste(dir,'/',sep='')
formatted_file <- FormatMCLFastas(dir)
## End(Not run)
Join Representative Sequences
Description
Joins the OrthoMCL output matrix to representative sequences
Usage
JoinRepSeq(mcl_data, fa_dir, mcl_mtrx, fastaformat = "new")
Arguments
| mcl_data | output of FormatAfterOrtho; a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG | 
| fa_dir | Path to the directory where all raw GenBank files are stored. Note, all file names must be changed to a 4-letter code representing each species and have '.fasta' file descriptor | 
| mcl_mtrx | OrthoMCL output matrix from AnalyzeOrthoMCL() | 
| fastaformat | options: new & old; new = no GI numbers included; defaults to new | 
Value
Returns the original OrthoMCL output matrix with additional columns: representative sequence taxon, representative sequence id, representative sequence annotation, representative sequence
Examples
## Not run: 
dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR')
dir <- paste(dir,'/',sep='')
joined_mtrx_grps <- JoinRepSeq(after_ortho_format_grps, dir, mcl_mtrx_grps, fastaformat = 'old')
## End(Not run)
Manhattan Plot of All Taxa
Description
Manhattan plot that graphs all p-values for taxa.
Usage
ManhatGrp(mcl_data, mcl_mtrx, tree = NULL)
Arguments
| mcl_data | FormatAfterOrtho output | 
| mcl_mtrx | output of AnalyzeOrthoMCL() | 
| tree | tree file optional, used for ordering taxa along x axis | 
Value
a manhattan plot
References
Some sort of reference
Examples
ManhatGrp(after_ortho_format, mcl_mtrx)
#@param equation of line of significance, defaults to -log10((.05)/dim(pdgs)[1])
Plot of a PDG and Data with Standard Error Bars
Description
Bar plot of PDG vs phenotype data with presence of taxa in PDG indicated by color
Usage
PDGPlot(data, mcl_matrix, OG = "NONE", species_colname, data_colname,
  xlab = "Taxa", ylab = "Data", ylimit = NULL, tree = NULL,
  order = NULL, main_title = NULL)
Arguments
| data | R object of phenotype data | 
| mcl_matrix | AnalyzeOrthoMCL output | 
| OG | optional parameter, a string with the name of chosen group (OG) to be colored | 
| species_colname | name of column in phenotypic data file with taxa designations | 
| data_colname | name of column in phenotypic data file with data observations | 
| xlab | string to label barplot's x axis | 
| ylab | string to label barplot's y axis | 
| ylimit | optional parameter to limit y axis | 
| tree | optional parameter (defaults to NULL) Path to tree file, orders the taxa by phylogenetic distribution, else it defaults to alphabetical | 
| order | vector with order of taxa names for across the x axis (defaults to alpha ordering) | 
| main_title | string for title of the plot (defaults to OG) | 
Value
a barplot with taxa vs phenotypic data complete with standard error bars
Examples
PDGPlot(pheno_data, mcl_mtrx, 'OG5_126778', 'Treatment', 'RespVar', ylimit=12)
Number of PDGs vs OGs/PDG
Description
Barplot that indicates the number of PDGs vs OGs(clustered orthologous groups) in a PDG
Usage
PDGvOG(mcl_data, num = 40, ...)
Arguments
| mcl_data | FormatAfterOrtho output | 
| num | an integer indicating where the x axis should end and be compiled | 
| ... | args to be passed to barplot | 
Value
a barplot with a height determined by the second column and the first column abbreviated to accomodate visual spacing
Examples
PDGvOG(after_ortho_format_grps,2)
Phylogenetic Tree with Attached Bar Plot and Standard Error Bars
Description
Presents data for each taxa including standard error bars next to a phylogenetic tree.
Usage
PhyDataError(phy, data, mcl_matrix, species_colname, data_colname,
  color = NULL, OG = NULL, xlabel = "xlabel", ...)
Arguments
| phy | Path to tree file | 
| data | R object of phenotype data | 
| mcl_matrix | AnalyzeOrthoMCL output | 
| species_colname | name of column in data file with taxa designations | 
| data_colname | name of column in data file with data observations | 
| color | optional parameter, (defaults to NULL) assign colors to individual taxa by providing file (format: Taxa | Color) | 
| OG | optional parameter, (defaults to NULL) a string with the names of chosen group to be colored | 
| xlabel | string to label barplot's x axis | 
| ... | argument to be passed from other methods such as parameters from barplot() function | 
Value
A phylogenetic tree with a barplot of the data (with standard error bars) provided matched by taxa.
References
Some sort of reference
Examples
file <- system.file('extdata', 'muscle_tree2.dnd', package='MAGNAMWAR')
PhyDataError(file, pheno_data, mcl_mtrx, species_colname = 'Treatment', data_colname = 'RespVar',
 OG='OG5_126778', xlabel='TAG Content')
Print OG Sequences
Description
Print all protein sequences and annotations in a given OG
Usage
PrintOGSeqs(after_ortho, OG, fasta_dir, out_dir = NULL, outfile = "none")
Arguments
| after_ortho | output from FormatAfterOrtho | 
| OG | name of OG | 
| fasta_dir | directory to fastas | 
| out_dir | complete path to output directory | 
| outfile | name of file that will be written to | 
Value
A fasta file with all protein sequences and ids for a given OG
Examples
## Not run: 
OG <- 'OG5_126968'
dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR')
dir <- paste(dir,'/',sep='')
PrintOGSeqs(after_ortho_format, OG, dir)
## End(Not run)
QQPlot
Description
Makes a qqplot of the p-values obtained through AnalyzeOrthoMCL
Usage
QQPlotter(mcl_mtrx)
Arguments
| mcl_mtrx | matrix generated by AnalyzeOrthoMCL | 
Value
a qqplot of the p-values obtained through AnalyzeOrthoMCL
References
Some sore of reference
Examples
QQPlotter(mcl_mtrx)
Write RAST files to Genbank formats OrthoMCL Analysis
Description
Useful for reformating RAST files to GBK format
Usage
RASTtoGBK(input_fasta, input_reference, out_name_path)
Arguments
| input_fasta | path to input fasta file | 
| input_reference | path to a .csv file; it should be downloaded from RAST as excel format, saved as a .csv (saved as the tab-delimited version has compatibility problems) | 
| out_name_path | name and path of the file to write to | 
Examples
## Not run: 
lfrc_fasta <- system.file('extdata', 'RASTtoGBK//lfrc.fasta', package='MAGNAMWAR')
lfrc_reference <- system.file('extdata', 'RASTtoGBK//lfrc_lookup.csv', package='MAGNAMWAR')
lfrc_path <- system.file('extdata', 'RASTtoGBK//lfrc_out.fasta', package='MAGNAMWAR')
RASTtoGBK(lfrc_fasta,lfrc_reference,lfrc_path)
## End(Not run)
Append Survival Test Outputs
Description
Function used to append all .csv files that are outputted from AnalyzeOrthoMCL into one matrix.
Usage
SurvAppendMatrix(work_dir, out_name = "surv_matrix.csv", out_dir = NULL)
Arguments
| work_dir | the directory where the output files of AnalyzeOrthoMCL are located | 
| out_name | file name of outputted matrix | 
| out_dir | the directory where the outputted matrix is placed | 
Value
A csv file containing a matrix with the following columns: OG, p-values, Bonferroni corrected p-values, mean phenotype of OG-containing taxa, mean pheotype of OG-lacking taxa, taxa included in OG, taxa not included in OG
Examples
## Not run: 
file <- system.file('extdata', 'outputs', package='MAGNAMWAR')
directory <- paste(file, '/', sep = '')
SurvAppendMatrix(directory)
## End(Not run)
Print analyzed matrix
Description
Writes a tab separated version of the analyzed OrthoMCL data with or without the joined representative sequences
Usage
WriteMCL(mtrx, filename)
Arguments
| mtrx | Matrix derived from AnalyzeOrthoMCL | 
| filename | File name to save final output | 
Value
The path to the written file
Examples
## Not run: 
WriteMCL(mcl_mtrx, 'matrix.tsv')
#mcl_mtrx previously derived from AnalyzeOrthoMCL() or join_repset()
## End(Not run)
Formatted output of OrthoMCL.
Description
A list created by inputting the output of OrthoMCL clusters into the FormatAfterOrtho function.
Usage
after_ortho_format
Format
List of 2: (1) presence absence matrix, (2) protein ids:
- pa_matrix
- matrix showing taxa presence/absence in OG 
- proteins
- matrix listing protein_id contained in each OG 
Formatted output of OrthoMCL.
Description
A list created by inputting the output of OrthoMCL clusters into the FormatAfterOrtho function.
Usage
after_ortho_format_grps
Format
List of 2: (1) presence absence matrix, (2) protein ids:
- pa_matrix
- matrix showing taxa presence/absence in OG 
- proteins
- matrix listing protein_id contained in each OG 
Final output of join_repset.
Description
A data frame containing the final results of statistical analysis with protein ids, annotations, and sequences added.
Usage
joined_mtrx
Format
A data frame with 17 rows and 11 variables:
- OG
- taxa cluster id, as defined by OrthoMCL 
- pval1
- p-value, based on presence absence 
- corrected_pval1
- Bonferroni p-value, corrected by number of tests 
- mean_OGContain
- mean of all taxa phenotypes in that OG 
- mean_OGLack
- mean of all taxa phenotypes not in that OG 
- taxa_contain
- taxa in that cluster 
- taxa_miss
- taxa not in that cluster 
- rep_taxon
- randomly selected representative taxa from the cluster 
- rep_id
- protein id, from randomly selected representative taxa 
- rep_annot
- fasta annotation, from randomly selected representative taxa 
- rep_seq
- AA sequence, from randomly selected representative taxa 
Final output of join_repset.
Description
A data frame containing the final results of statistical analysis with protein ids, annotations, and sequences added.
Usage
joined_mtrx_grps
Format
A data frame with 10 rows and 11 variables:
- OG
- taxa cluster id, as defined by OrthoMCL 
- pval1
- p-value, based on presence absence 
- corrected_pval1
- Bonferroni p-value, corrected by number of tests 
- mean_OGContain
- mean of all taxa phenotypes in that OG 
- mean_OGLack
- mean of all taxa phenotypes not in that OG 
- taxa_contain
- taxa in that cluster 
- taxa_miss
- taxa not in that cluster 
- rep_taxon
- randomly selected representative taxa from the cluster 
- rep_id
- protein id, from randomly selected representative taxa 
- rep_annot
- fasta annotation, from randomly selected representative taxa 
- rep_seq
- AA sequence, from randomly selected representative taxa 
Final output of AnalyzeOrthoMCL
Description
A matrix containing the final results of statistical analysis.
Usage
mcl_mtrx
Format
A matrix with 17 rows and 7 variables:
- OG
- taxa cluster id, as defined by OrthoMCL 
- pval1
- p-value, based on presence absence 
- corrected_pval1
- Bonferroni p-value, corrected by number of tests 
- mean_OGContain
- mean of all taxa phenotypes in that OG 
- mean_OGLack
- mean of all taxa phenotypes not in that OG 
- taxa_contain
- taxa in that cluster 
- taxa_miss
- taxa not in that cluster 
Final output of AnalyzeOrthoMCL
Description
A matrix containing the final results of statistical analysis.
Usage
mcl_mtrx_grps
Format
A matrix with 10 rows and 7 variables:
- OG
- taxa cluster id, as defined by OrthoMCL 
- pval1
- p-value, based on presence absence 
- corrected_pval1
- Bonferroni p-value, corrected by number of tests 
- mean_OGContain
- mean of all taxa phenotypes in that OG 
- mean_OGLack
- mean of all taxa phenotypes not in that OG 
- taxa_contain
- taxa in that cluster 
- taxa_miss
- taxa not in that cluster 
Triglyceride (TAG) content of fruit flies dataset.
Description
A subset of the TAG content of fruit flies, collected in the Chaston Lab, to be used as a brief example for tests in AnalyzeOrthoMCL.
Usage
pheno_data
Format
A data frame with 586 rows and 4 variables:
- Treatment
- 4-letter taxa designation of associated bacteria 
- RespVar
- response variable, TAG content 
- Vial
- random effect variable, vial number of flies 
- Experiment
- random effect variable, experiment number of flies 
Starvation rate of fruit flies dataset.
Description
A subset of the Starvation rate of fruit flies, collected in the Chaston Lab, to be used as a brief example for survival tests in AnalyzeOrthoMCL.
Usage
starv_pheno_data
Format
A matrix with 543 rows and 7 variables:
- EXP
- random effect variable, experiment number of flies 
- VIAL
- random effect variable, vial number of flies 
- BACLO
- fixed effect variable, loss of bacteria in flies 
- TRT
- 4-letter taxa designation of associated bacteria 
- t1
- time 1 
- t2
- time 2 
- event
- event