README

diemr implements the Diagnostic Index Expectation Maximization (diem) algorithm for genome polarization in R. It estimates which alleles of single nucleotide variant (SNV) sites belong to either side of a barrier to gene flow, co-estimates individual assignment, and infers barrier strength and divergence. These tools are designed for studies of hybridization, speciation, and population divergence, and extend the methods described in Baird et al. (2023) Genome polarisation for detecting barriers to geneflow. Methods in Ecology and Evolution 14, 512-528 doi:10.1111/2041-210X.14010. For the original algorithm description and implementations in Python and Mathematica, see the diem repository at https://github.com/StuartJEBaird/diem. For a step-by-step explanation of the functions and their outputs, see the
documentation for the diemr package.

Installation

To start using diemr, load the package or install it from CRAN if it is not yet available:

if(!require("diemr", character.only = TRUE)){
    install.packages("diemr", dependencies = TRUE)
    library("diemr", character.only = TRUE)
}
# Loading required package: diemr

The developer version can be installed directly from this repository using package devtools.

devtools::install_github("https://github.com/nmartinkova/diemr")

Check data format and polarise genotypes

Next, assemble paths to all files containing the data to be used by diemr. Here, we will use a tiny example dataset for illustration that is included in the package. A good practice is to check that all files contain data in correct format for all individuals and markers.

filepaths <- system.file("extdata", "data7x3.txt",
                         package = "diemr")
CheckDiemFormat(filepaths, ploidy = list(rep(2, 6)), ChosenInds = 1:6)
# File check passed: TRUE
# Ploidy check passed: TRUE

If the CheckDiemFormat() function fails, work through the error messages and fix the stored input files accordingly. The algorithm repeatedly accesses data from the harddisk, so seeing the passed file check prior to analysis is critical.

diem.res <- diem(files = filepaths,
                 ploidy = list(rep(2, 6)), 
                 ChosenInds = 1:6,
                 nCores = 1)

The results including marker polarisation, marker diagnostic index and its support will be included in the list element diem.res$DI. Additional elements in the results list contain basic tracking information about the expectation maximisation iterations. The key results are saved in a file MarkerDiagnosticsWithOptimalPolarities.txt in the working directory. Check the the diemr documentation for further information.

diemr: Genome polarisation in R

Installation

Check data format and polarise genotypes