Uniparental disomy (UPD) is a rare genetic condition where an individual inherits both copies of a chromosome from one parent, as opposed to the typical inheritance of one copy from each parent. The extent of its contribution as a causative factor in rare genetic diseases remains largely unexplored. UPDs can lead to disease either by inheriting a carrier pathogenic mutation as homozygous from a carrier parent or by causing errors in genetic imprinting. Nevertheless, there are currently no standardized methods available for the detection and characterization of these events.
We have developed UPDhmm
R/Bioconductor package. The package provides
a tool method to detect, classify and stablish the location
of uniparental disomy events.
The UPDhmm package uses a Hidden Markov Model (HMM) to identify regions to be
considered as Uniparental disomy events taking genetic data from a trio
(mother,father and proband) as input. The HMM was constructed based on the
difference on the genotype allelic frequencies obtained with the different
inheritance models (isodisomy, heterodisomy, or normal). With that,
is possible to calculate the likelihood of each inheritance model giving a
genotype sequence,using the Viterbi alogrithm. All in all, UPDhmm
infers
the most likely inheritance pattern of each genomic regions, thus identifying
UPD events in the proband.
library(UPDhmm)
library(dplyr)
The input of the package is a multisample vcf file with the following requirements:
The UPDhmm package includes one example dataset, adapted from GIB This dataset serves as a practical illustration and can be utilized for testing and familiarizing users with the functionality of the package.
After reading the VCF file, the vcf_check()
function is employed for
preprocessing the input data. This function facilitates reading the VCF in the
suitable format for the UPDhmm package.
library(UPDhmm)
library(VariantAnnotation)
file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz")
vcf <- readVcf(file)
processedVcf <- vcfCheck(
vcf,
proband = "NA19675",
mother = "NA19678",
father = "NA19679"
)
The principal function of UPDhmm
package, calculate_events()
, is the central
function for identifying Uniparental Disomy (UPD) events. It takes the output
from the previous vcf_check()
function and systematically analyzes genomic
data, splitting the VCF into chromosomes, applying the Viterbi algorithm,
calculateEvents()
function encompasses a serie of subprocesses for
identifying Uniparental Disomies (UPDs):
(1) Split vcf into chromosomes
(2) Apply Viterbi algorithm to every chromosome
(3) Transform the inferred hidden states to coordinates data.frame
(4) Create blocks of contiguous variants with the same state
(5) Calculate the statistics (OR and p-value).
This function requires the object generated by the vcf_check()
function.
updEvents <- calculateEvents(largecollapsed=processedVcf)
head(updEvents)
seqnames start end group n_snps log_OR p_value
1 3 100374740 197574936 het_mat 47 23.025596 1.598589e-06
2 6 32489853 32490000 iso_mat 6 36.463160 1.555791e-09
3 6 32609207 32609341 iso_mat 11 22.214754 2.437932e-06
4 11 55322539 55587117 iso_mat 7 20.828244 5.023662e-06
5 14 105408955 105945891 het_fat 30 4.221494 3.991501e-02
6 15 22382655 23000272 iso_mat 12 22.904370 1.702643e-06
The calculateEvents
function returns a data.frame
containing all
detected events in the provided trio. If no events are found, the function
will return an empty data.frame.
Column name | Description |
---|---|
sample_id |
Sample proband ID |
seqnames |
Chromosome |
start |
Start position of the block |
end |
End position of the block |
n_snps |
Number of variants within the event |
group |
Predicted state |
log_OR |
log likelihood ratio |
p_value |
p-value |
To visualize the results, the karyoploteR
package can be employed.
Here, a custom function is provided for easy implementation with the output
results.
library(karyoploteR)
library(regioneR)
plotUPDKp <- function(updEvents) {
updEvents$seqnames <- paste0("chr", updEvents$seqnames)
het_fat <- toGRanges(subset(updEvents,
group == "het_fat")[, c("seqnames", "start", "end")])
het_mat <- toGRanges(subset(updEvents,
group == "het_mat")[, c("seqnames", "start", "end")])
iso_fat <- toGRanges(subset(updEvents,
group == "iso_fat")[, c("seqnames", "start", "end")])
iso_mat <- toGRanges(subset(updEvents,
group == "iso_mat")[, c("seqnames", "start", "end")])
kp <- plotKaryotype(genome = "hg19")
kpPlotRegions(kp, het_fat, col = "#AAF593")
kpPlotRegions(kp, het_mat, col = "#FFB6C1")
kpPlotRegions(kp, iso_fat, col = "#A6E5FC")
kpPlotRegions(kp, iso_mat, col = "#E9B864")
colors <- c("#AAF593", "#FFB6C1", "#A6E5FC", "#E9B864")
legend("topright", legend = c("Het_Fat", "Het_Mat", "Iso_Fat", "Iso_Mat"),
fill = colors)
}
plotUPDKp(updEvents)
sessionInfo()
R Under development (unstable) (2024-03-18 r86148)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] karyoploteR_1.29.1 regioneR_1.35.0
[3] VariantAnnotation_1.49.6 Rsamtools_2.19.4
[5] Biostrings_2.71.4 XVector_0.43.1
[7] SummarizedExperiment_1.33.3 Biobase_2.63.0
[9] GenomicRanges_1.55.4 GenomeInfoDb_1.39.9
[11] IRanges_2.37.1 S4Vectors_0.41.5
[13] MatrixGenerics_1.15.0 matrixStats_1.2.0
[15] BiocGenerics_0.49.1 dplyr_1.1.4
[17] UPDhmm_0.99-1
loaded via a namespace (and not attached):
[1] DBI_1.2.2 bitops_1.0-7 gridExtra_2.3
[4] rlang_1.1.3 magrittr_2.0.3 biovizBase_1.51.0
[7] compiler_4.4.0 RSQLite_2.3.5 GenomicFeatures_1.55.4
[10] png_0.1-8 vctrs_0.6.5 ProtGenerics_1.35.4
[13] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.2
[16] fastmap_1.1.1 backports_1.4.1 utf8_1.2.4
[19] rmarkdown_2.26 bit_4.0.5 xfun_0.42
[22] zlibbioc_1.49.3 cachem_1.0.8 blob_1.2.4
[25] highr_0.10 DelayedArray_0.29.9 BiocParallel_1.37.1
[28] parallel_4.4.0 cluster_2.1.6 R6_2.5.1
[31] stringi_1.8.3 RColorBrewer_1.1-3 bezier_1.1.2
[34] rtracklayer_1.63.1 rpart_4.1.23 Rcpp_1.0.12
[37] knitr_1.45 base64enc_0.1-3 Matrix_1.6-5
[40] nnet_7.3-19 tidyselect_1.2.1 rstudioapi_0.15.0
[43] dichromat_2.0-0.1 abind_1.4-5 yaml_2.3.8
[46] codetools_0.2-19 curl_5.2.1 lattice_0.22-6
[49] tibble_3.2.1 KEGGREST_1.43.0 evaluate_0.23
[52] foreign_0.8-86 pillar_1.9.0 BiocManager_1.30.22
[55] checkmate_2.3.1 generics_0.1.3 RCurl_1.98-1.14
[58] ensembldb_2.27.1 ggplot2_3.5.0 munsell_0.5.0
[61] scales_1.3.0 BiocStyle_2.31.0 glue_1.7.0
[64] lazyeval_0.2.2 Hmisc_5.1-2 tools_4.4.0
[67] BiocIO_1.13.0 data.table_1.15.2 BSgenome_1.71.2
[70] GenomicAlignments_1.39.4 XML_3.99-0.16.1 grid_4.4.0
[73] AnnotationDbi_1.65.2 colorspace_2.1-0 GenomeInfoDbData_1.2.11
[76] htmlTable_2.4.2 restfulr_0.0.15 Formula_1.2-5
[79] cli_3.6.2 HMM_1.0.1 fansi_1.0.6
[82] S4Arrays_1.3.6 AnnotationFilter_1.27.0 gtable_0.3.4
[85] digest_0.6.35 SparseArray_1.3.4 rjson_0.2.21
[88] htmlwidgets_1.6.4 memoise_2.0.1 htmltools_0.5.7
[91] lifecycle_1.0.4 httr_1.4.7 bit64_4.0.5
[94] bamsignals_1.35.0