| Type: | Package | 
| Title: | Multisource Graph Synthesis with EHR Data | 
| Version: | 0.1.0 | 
| Description: | We develop Multi-source Graph Synthesis (MUGS), an algorithm designed to create embeddings for pediatric Electronic Health Record (EHR) codes by leveraging graphical information from three distinct sources: (1) pediatric EHR data, (2) EHR data from the general patient population, and (3) existing hierarchical medical ontology knowledge shared across different patient populations. See Li et al. (2024) <doi:10.1038/s41746-024-01320-4> for details. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| LazyDataCompression: | xz | 
| RoxygenNote: | 7.3.2 | 
| URL: | https://github.com/celehs/MUGS, https://celehs.github.io/MUGS/, https://doi.org/10.1038/s41746-024-01320-4 | 
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) | 
| VignetteBuilder: | knitr | 
| Imports: | MASS, Matrix, fastDummies, doSNOW, dplyr, grplasso, foreach, glmnet, grpreg, inline, mvtnorm, pROC, parallel, RcppArmadillo, rsvd, methods | 
| Depends: | R (≥ 3.5.0) | 
| Config/testthat/edition: | 3 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-05-15 03:56:15 UTC; User1 | 
| Author: | Mengyan Li [cre, aut], Thomas Charlon [ctb] (ORCID: 0000-0001-7497-0470), Xiaoou Li [aut], Tianxi Cai [aut], PARSE LTD [aut], CELEHS Team [aut] | 
| Maintainer: | Mengyan Li <mengyanli@bentley.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-05-19 13:40:09 UTC | 
Function Used To Estimate Code Effects
Description
This function estimates code effects using left and right embeddings from source and target sites.
Usage
CodeEff_Matrix(
  S.1,
  S.2,
  n1,
  n2,
  U.1,
  U.2,
  V.1,
  V.2,
  common_codes,
  zeta.int,
  lambda,
  p
)
Arguments
S.1 | 
 SPPMI from the source site.  | 
S.2 | 
 SPPMI from the target site.  | 
n1 | 
 The number of codes from the source site.  | 
n2 | 
 The number of codes from the target site.  | 
U.1 | 
 The left embeddings left singular vectors times the square root of the singular values from the source site.  | 
U.2 | 
 The left embeddings left singular vectors times the square root of the singular values from the target site.  | 
V.1 | 
 The right embeddings right singular vectors times the square root of the singular values from the source site.  | 
V.2 | 
 The right embeddings right singular vectors times the square root of the singular values from the target site.  | 
common_codes | 
 The list of overlapping codes.  | 
zeta.int | 
 The initial estimator for the code effects.  | 
lambda | 
 The tuning parameter controls the intensity of penalization on the code effect.  | 
p | 
 The length of an embedding.  | 
Value
A list with the following elements:
zeta | 
 The estimated code effects.  | 
dif_F | 
 The Frobenius norm difference between the updated and initial estimators.  | 
V.1.new | 
 Updated right embeddings for the source site.  | 
V.2.new | 
 Updated right embeddings for the target site.  | 
Function Used To Estimate Code-Site Effects Parallelly
Description
Function Used To Estimate Code-Site Effects Parallelly
Usage
CodeSiteEff_l2_par(
  S.1,
  S.2,
  n1,
  n2,
  U.1,
  U.2,
  V.1,
  V.2,
  delta.int,
  lambda.delta,
  p,
  common_codes,
  n.common,
  n.core
)
Arguments
S.1 | 
 SPPMI from the source site  | 
S.2 | 
 SPPMI from the target site  | 
n1 | 
 the number of codes from the source site  | 
n2 | 
 the number of codes from the target site  | 
U.1 | 
 the left embeddings (left singular vectors times the square root of the singular values) from the source site  | 
U.2 | 
 the left embeddings (left singular vectors times the square root of the singular values) from the target site  | 
V.1 | 
 the right embeddings (right singular vectors times the square root of the singular values) from the source site  | 
V.2 | 
 the right embeddings (right singular vectors times the square root of the singular values) from the target site  | 
delta.int | 
 the initial estimator for the code-site effect  | 
lambda.delta | 
 the tuning parameter controls the intensity of penalization on the code-site effects  | 
p | 
 the length of an embedding  | 
common_codes | 
 the list of overlapping codes  | 
n.common | 
 the number of overlapping codes  | 
n.core | 
 the number of cored used for parallel computation  | 
Value
The output for the estimation of code-site effects
Function used to generate input data (used only for Simulations) Generate SPPMIs, dummy matrices based on prior group structures, and code-code pairs for tuning and evaluation
Description
Function used to generate input data (used only for Simulations) Generate SPPMIs, dummy matrices based on prior group structures, and code-code pairs for tuning and evaluation
Usage
DataGen_rare_group(
  seed = NULL,
  p,
  n1,
  n2,
  n.common,
  n.group,
  sigma.eps.1,
  sigma.eps.2,
  ratio.delta,
  network.k,
  rho.beta,
  rho.U0,
  rho.delta,
  sigma.rare,
  n.rare,
  group.size
)
Arguments
seed | 
 for reproducibility  | 
p | 
 the length of an embedding  | 
n1 | 
 the number of codes in site 1  | 
n2 | 
 the number of codes in site 2  | 
n.common | 
 common: the number of overlapping codes  | 
n.group | 
 the number of groups  | 
sigma.eps.1 | 
 the sd of error in site 1  | 
sigma.eps.2 | 
 the sd of error in site 2  | 
ratio.delta | 
 the proportion of codes in each site that have site-specific effects applied to them  | 
network.k | 
 the number of distinct blocks within each site for which unique inter-code correlations are modeled  | 
rho.beta | 
 AR parameter for the group effects covariance matrix  | 
rho.U0 | 
 AR parameter for the code effects covariance matrix  | 
rho.delta | 
 AR parameter for the code-site effects covariance matrix  | 
sigma.rare | 
 the sd of error for rare codes (usually larger than sigma.eps.1 and sigma.eps.2)  | 
n.rare | 
 The number of rare codes  | 
group.size | 
 the size of each group  | 
Value
Returns input data, SPPMIs, dummy matrices based on prior group structures and code-code pairs for tuning and evaluation
Function Used To Estimate Group Effects Parallelly
Description
Function Used To Estimate Group Effects Parallelly
Usage
GroupEff_par(
  S.MGB,
  S.BCH,
  n.MGB,
  n.BCH,
  U.MGB,
  U.BCH,
  V.MGB,
  V.BCH,
  X.MGB.group,
  X.BCH.group,
  n.group,
  name.list,
  beta.int,
  lambda = 0,
  p,
  n.core
)
Arguments
S.MGB | 
 SPPMI from the source site  | 
S.BCH | 
 SPPMI from the target site  | 
n.MGB | 
 the number of codes from the source site  | 
n.BCH | 
 the number of codes from the target site  | 
U.MGB | 
 the left embeddings (left singular vectors times the square root of the singular values) from the source site  | 
U.BCH | 
 the left embeddings (left singular vectors times the square root of the singular values) from the target site  | 
V.MGB | 
 the right embeddings (right singular vectors times the square root of the singular values) from the source site  | 
V.BCH | 
 the right embeddings (right singular vectors times the square root of the singular values) from the target site  | 
X.MGB.group | 
 the dummy matrix based on prior group structures at the source site  | 
X.BCH.group | 
 the dummy matrix based on prior group structures at the target site  | 
n.group | 
 the number of groups  | 
name.list | 
 the full list of code names from the source site and the target site with repeated names of overlapping codes  | 
beta.int | 
 the initial estimator for the group effects  | 
lambda | 
 the tuning parameter controls the intensity of penalization on the group effect; by default we set it to 0  | 
p | 
 the length of an embedding  | 
n.core | 
 the number of cored used for parallel computation  | 
Value
The output of estimating group effects parallelly
Main function for MUGS algorithm
Description
Main function for MUGS algorithm
Usage
MUGS(
  TUNE = FALSE,
  Eva = TRUE,
  Lambda = c(10),
  Lambda.delta = c(1000),
  n.core = 4,
  tol = 1,
  seed = NULL,
  S.1 = NULL,
  S.2 = NULL,
  X.group.source = NULL,
  X.group.target = NULL,
  pairs.rel.CV = NULL,
  pairs.rel.EV = NULL,
  p = 100,
  n.group = 400,
  outdir = NULL
)
Arguments
TUNE | 
 Logical value indicating whether the function should tune parameters TRUE or use predefined parameters FALSE.  | 
Eva | 
 Logical value indicating whether to perform evaluation (TRUE) or skip it (FALSE).  | 
Lambda | 
 The candidate values for the tuning parameter controlling the intensity of penalization on the code effects.  | 
Lambda.delta | 
 The candidate values for the tuning parameter controlling the intensity of penalization on the code-site effects.  | 
n.core | 
 Integer specifying the number of cores to use for parallel processing.  | 
tol | 
 Numeric value representing the tolerance level for convergence in the algorithm.  | 
seed | 
 Integer used to set the seed for random number generation, ensuring reproducibility. Set to NULL to disable.  | 
S.1 | 
 The SPPMI matrix from site 1.  | 
S.2 | 
 The SPPMI matrix from site 2.  | 
X.group.source | 
 The dummy matrix representing the group structure of codes at site 1.  | 
X.group.target | 
 The dummy matrix representing the group structure of codes at site 2.  | 
pairs.rel.CV | 
 Code-code pairs used for tuning via cross-validation.  | 
pairs.rel.EV | 
 Code-code pairs used for evaluation.  | 
p | 
 Integer indicating the length of embeddings.  | 
n.group | 
 The number of groups.  | 
outdir | 
 Optional directory to write output files. Defaults to a temporary directory.  | 
Value
A list or saved files containing the embedding matrices, similarity matrices, and site-heterogeneous code analysis.
S.1 Dataset
Description
A matrix containing SPPMI data from the source site. This dataset is used as input for analysis in the package.
Usage
S.1
Format
A matrix with 2000 rows and 10 columns:
- Row Names
 Unique identifiers for each row.
- Columns
 Numeric values representing SPPMI data.
S.2 Dataset
Description
A matrix containing SPPMI data from the target site. This dataset is used as input for analysis in the package.
Usage
S.2
Format
A matrix with 2000 rows and 10 columns:
- Row Names
 Unique identifiers for each row.
- Columns
 Numeric values representing SPPMI data.
U.1 Dataset
Description
A matrix containing left embeddings from the source site. These embeddings are used for embedding-based computations.
Usage
U.1
Format
A matrix with 2000 rows and 10 columns:
- Row Names
 Unique identifiers for each row.
- Columns
 Numeric values representing embeddings.
U.2 Dataset
Description
A matrix containing left embeddings from the target site. These embeddings are used for embedding-based computations.
Usage
U.2
Format
A matrix with 2000 rows and 10 columns:
- Row Names
 Unique identifiers for each row.
- Columns
 Numeric values representing embeddings.
X.group.source Dataset
Description
A matrix containing group structures at the source site. It represents binary group membership of entities at the source.
Usage
X.group.source
Format
A matrix with 2000 rows and 50 columns:
- Rows
 Entities at the source site.
- Columns
 Binary values (0 or 1) indicating group membership.
X.group.target Dataset
Description
A matrix containing group structures at the target site. It represents binary group membership of entities at the target.
Usage
X.group.target
Format
A matrix with 2000 rows and 50 columns:
- Rows
 Entities at the target site.
- Columns
 Binary values (0 or 1) indicating group membership.
Download and Load Example Data from Zenodo
Description
Download and Load Example Data from Zenodo
Usage
download_example_data(file, destdir = tempdir())
Arguments
file | 
 Name of the .Rdata file to download (e.g., "S.1.Rdata").  | 
destdir | 
 Directory to store the downloaded data. Defaults to a temporary directory.  | 
Value
A list containing the loaded dataset.
Function Used For Tuning And Evaluation
Description
Function Used For Tuning And Evaluation
Usage
evaluation.sim(pairs.rel, U, seed = NULL)
Arguments
pairs.rel | 
 the known code-code pairs  | 
U | 
 the code embedding matrix  | 
seed | 
 Optional integer for reproducibility of sampling.  | 
Value
The output of tuning and evaluation
Function For Getting Embedding From SVD
Description
Function For Getting Embedding From SVD
Usage
get_embed(mysvd, d = 2000, normalize = TRUE)
Arguments
mysvd | 
 the (managed) svd result (adding an element with 'names')  | 
d | 
 dim of the final embedding  | 
normalize | 
 if the output embeddings have l2 norm equal to 1  | 
Value
The embedding from SVD
pairs.rel.CV Dataset
Description
A data frame containing cross-validation pairs for relative comparisons.
Usage
pairs.rel.CV
Format
A data frame with multiple columns:
- col
 Integer representing the column index of a pair.
- row
 Integer representing the row index of a pair.
- type
 Character string indicating the type of data (e.g., "train", "test").
pairs.rel.EV Dataset
Description
A data frame containing evaluation pairs for relative comparisons.
Usage
pairs.rel.EV
Format
A data frame with multiple columns:
- col
 Integer representing the column index of a pair.
- row
 Integer representing the row index of a pair.
- type
 Character string indicating the type of data (e.g., "validation").