--- title: "polySegratioMM: An R library for Bayesian mixture models for marker dosage in autopolyploids" author: "Peter Baker" date: "`r format(Sys.time(), '%B %d, %Y')`" bibliography: polySegratioMM-overview.bib output: bookdown::html_document2 vignette: > %\VignetteIndexEntry{ polySegratioMM: An R library for Bayesian mixture models for marker dosage in autopolyploids } %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#:", fig.path = "man/figures/" ) ## output: BiocStyle::html_document ##version <- as.vector(read.dcf('DESCRIPTION')[, 'Version']) ##version <- gsub('-', '.', version) version <- "0.6-5" ``` Version: `r version` It is well known that the dosage level of markers in autopolyploids and allopolyploids can be characterised by their observed segregation ratios. The related package \texttt{polySegratio} provides functions to allocate dosage by standard approaches and to simulate marker data sets for differing ploidies and levels of overdispersion. Note that these methods could equally well be applied to allopolyploids with specified expected segregation ratios. For details see \texttt{polySegratio}. A Bayesian approach was proposed by @baker2010 where marker dosage estimation was obtained by fitting a finite mixture distribution. This library calls the `JAGS` software for Bayesian calculation. `JAGS 1.0` or higher must be installed following instructions from https://mcmc-jags.sourceforge.io. Note that only the most recent version is used for testing with `R`. The `JAGS` executable must be in your path. Currently, no checking is carried out to ascertain whether or not `JAGS` is set up appropriately. To use the library, you need to attach it with ```{r} library(polySegratioMM) ``` ```{r, echo=FALSE} ##library(cacheSweave) # comment this out after development phase ## to use: Sweave("polySegratioMM-overview.Rnw", driver=cacheSweaveDriver) ## or without cache: Sweave("polySegratioMM-overview.Rnw") op <- options() options(width=70, digits=4) ``` # Simulated data Library functions are demonstrated on a simulated data set generated using the `sim.autoMarkers` function from the `polySegratio` package. ```{r, echo=FALSE} ##< ```{r, trace1, fig.cap='Trace and posterior density plots for the parameters parameters $p_1$, $\\mu_1$, $\\sigma_1$ and for the 140^th^ marker for the overdispersed data', fig.height=12, out.width='100%'} plot(mcmcHexRun$mcmc.mixture$mcmc.list[[1]][,c("P[1]","mu[1]","sigma","T[140]")]) ``` When the `plots` option of `runSegratioMM` is set to the default value of `TRUE`, numerous plots are produced including trace and density plots from the `CODA` package. These may also be extracted manually but the process is somewhat more complicated. For instance, to obtain trace and density plots for the parameters $p_1$, $\mu_1$, $\sigma_1$ and for the 140^th^ marker, as shown in Figure \@ref(fig:trace1), then `CODA` may be used directly by following command: ```{r, eval=FALSE} plot(mcmcHexRun$mcmc.mixture$mcmc.list[[1]][,c("P[1]","mu[1]","sigma","T[140]")]) ``` ```{r, fitted1, echo=FALSE, fig.cap='Fitted (blue) and theoretical (red) distributions for simulated segregation ratios with overdispersion for 500 markers from 200 individuals.', out.width='50%'} plot(mcmcHexRun, theoretical=TRUE, main="") ``` The histogram of segregation proportions with fitted and theoretical values shown in Figure \@ref(fig:fitted1) may be obtained by setting the `theoretical` option to `TRUE` as follows. ```{r, eval=FALSE} plot(mcmcHexRun, theoretical=TRUE) ``` # Assigning marker dosage Marker dosages allocations may be obtained directly from the object `mcmcHexRun`. The dosage with maximum posterior probability is simply `mcmcHexRun\$doses\$max.post.dosage`. A more conservative allocation is obtained by using `mcmcHexRun\$doses\$dosage[,"0.8"]` whereby the dosage with posterior probability over 0.8 is employed. For instance, to tabulate the number of markers (including those not allocated a dosage which are labelled NA) the `table` command can be employed. ```{r} cat("Employing maximum posterior probability\n") table(Dose=mcmcHexRun$doses$max.post.dosage, exclude=NULL) cat("Employing posterior probability > 0.8\n") table(Dose=mcmcHexRun$doses$dosage[,"0.8"], exclude=NULL) ``` And of course since the data were simulated we can compare the estimated and true dosages obtained as `hexmarkers.overdisp\$true.doses\$dosage` via cross tabulation. Doses can also be obtained for the standard$\chi^2$ test by using the `test.segRatio` command from the `polySegratio` library. ```{r} cat("Employing theChi squared test\n") dose.chi <- test.segRatio(sr.overdisp, ploidy.level = 6) table(Chi2Dose=dose.chi$dosage, True=hexmarkers.overdisp$true.doses$dosage, exclude=NULL) cat("Employing maximum posterior probability\n") table(MixtureDose=mcmcHexRun$doses$max.post.dosage, True=hexmarkers.overdisp$true.doses$dosage, exclude=NULL) cat("Employing posterior probability > 0.8\n") table(MixtureDose=mcmcHexRun$doses$dosage[, "0.8"], True=hexmarkers.overdisp$true.doses$dosage, exclude=NULL) ``` These tables show that far fewer markers are allocated a dosage using the standard $\chi^2$ test than by the mixture model. Fewer markers were misclassified using a posterior probability threshold of 0.8 rather than the maximum posterior probability as a basis for allocating dosage. \bibliographystyle{apalike} ## Acknowledgments Karen Aitken, given her experience in tetraploids and sugarcane marker maps, has provided many valuable insights into marker dosage in autopolyploids. Additionally, Ross Darnell, Andrew George and Kerrie Mengersen provided useful comments and discussions. ## References