| Type: | Package | 
| Title: | A Survival Tree Based on Stabilized Score Tests for High-dimensional Covariates | 
| Version: | 1.5 | 
| Author: | Takeshi Emura and Wei-Chern Hsu | 
| Maintainer: | Takeshi Emura <takeshiemura@gmail.com> | 
| Description: | A classification (decision) tree is constructed from survival data with high-dimensional covariates. The method is a robust version of the logrank tree, where the variance is stabilized. The main function "uni.tree" returns a classification tree for a given survival dataset. The inner nodes (splitting criterion) are selected by minimizing the P-value of the two-sample the score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument (specified by user). This tree construction algorithm is proposed by Emura et al. (2021, in review). | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.0 | 
| Depends: | survival,compound.Cox | 
| NeedsCompilation: | no | 
| Packaged: | 2021-03-22 05:50:46 UTC; biouser | 
| Repository: | CRAN | 
| Date/Publication: | 2021-03-22 06:40:02 UTC | 
Kaplan-Meier estimator of binary splitting
Description
Given a cut-off-point and selected covariate, return the survival curve for binary classification and the P-value of two sample log-rank test.
Usage
KM.split(t.vec, d.vec, X.mat, x.name, cutoff)
Arguments
| t.vec | :Vector of survival times (time to either death or censoring) | 
| d.vec | :Vector of censoring indicators (1=death, 0=censoring) | 
| X.mat | :n by p matrix of covariates, where n is the sample size and p is the number of covariates | 
| x.name | :the name of covariate | 
| cutoff | :cut-off-point | 
Value
P-value of two sample logrank test and a plot of two KM estimates
Examples
data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
KM.split(t.vec,d.vec,x.mat,x.name="ANXA5",cutoff=1)
Generate a matrix of gene expressions (discrete version of X.pathway() against to Emura (2012)) in the presence of gene pathways
Description
Generate a matrix of gene expressions in the presence of gene pathways, we first produce the matrix by X.pathway(Emura et al. 2012), then we change each value to 1 ~ 4 depend on the quantile.
Usage
X.pathway_discrete.balanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)
Arguments
| n | :the number of individuals (sample size) | 
| p | :the number of genes | 
| q1 | :the number of genes in the first pathway | 
| q2 | :the number of genes in the second pathway | 
| rho1 | :the correlation coefficient for the first pathway | 
| rho2 | :the correlation coefficient for the second pathway | 
Value
X n by p matrix of gene expressions
References
Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627
Examples
## generate 6 gene expressions from 10 individuals
X.pathway_discrete.balanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)
Generate a matrix of unbalance gene expressions (discrete version of X.pathway() against to Emura (2012)) in the presence of gene pathways
Description
Generate a matrix of gene expressions in the presence of gene pathways, we first produce the matrix by X.pathway(Emura et al. 2012), then we change each value to 1 ~ 3 depend on the quantile and randomly replace a element to 4 in the last p-(q1+q2) column for each row.
Usage
X.pathway_discrete.imbalanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)
Arguments
| n | :the number of individuals (sample size) | 
| p | :the number of genes | 
| q1 | :the number of genes in the first pathway | 
| q2 | :the number of genes in the second pathway | 
| rho1 | :the correlation coefficient for the first pathway | 
| rho2 | :the correlation coefficient for the second pathway | 
Value
X n by p matrix of gene expressions
References
Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627
Examples
## generate 6 gene expressions from 10 individuals
X.pathway_discrete.imbalanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)
The names of features that are selected in a tree
Description
The function returns the names of features (covariates) that are selected as the internal nodes of a tree. Only the names of the covariates are shown by excluding the cutt-off values.
Usage
feature.selected(tree)
Arguments
| tree | :an object made from the "uni.tree" function | 
Details
The outputs show important features for predicting survival outcomes.
Value
An array of characters that are the names from those covariates selected in the tree
Examples
data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)
feature.selected(res)
The risk ranks of the samples predicted by a tree
Description
The function returns the ranks (1=the lowest risk, 2=the 2nd lowest risk, ..., k=the highest risk) predicted for the samples.
Usage
risk.classification(tree, X.mat)
Arguments
| tree | :an object made from the "uni.tree" function | 
| X.mat | :n by p matrix of covariates from the samples, where n is the sample size and p is the number of covariates | 
Details
If the tree has k terminal nodes, then the response 1 respresents the lowest risk and k represents the highest risk.
Value
A vector of integers, 1, 2, ..., k, that represent the ranks predicted for the samples.
Examples
data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)
risk.classification(res,x.mat)
Univariate binary splits by the logrank test
Description
The output is the summary of significance tests for binary splits, where the cut-off values are optimized for each covariate.
Usage
uni.logrank(t.vec, d.vec, X.mat)
Arguments
| t.vec | :Vector of survival times (time to either death or censoring) | 
| d.vec | :Vector of censoring indicators (1=death, 0=censoring) | 
| X.mat | :n by p matrix of covariates, where n is the sample size and p is the number of covariates | 
Details
The output can be used to construct a logrank tree.
Value
A dataframe containing:
Pvalue: the P-value of the two-sample logrank test, where the cut-off value is optimized
cut_off_point: the optimal cutt-off values of the binary splits given a feature
left.sample.size: the sample size of a left child node
right.sample.size: the sample size of a right child node
Examples
data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
uni.logrank(t.vec,d.vec,x.mat)
A survival tree based on stabilized score tests
Description
This function returns a classification (decision) tree for a given survival dataset. The decision of making inner nodes (splitting criterion) is based on the univariate score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument. This tree construction algorithm is proposed by Emura et al. (2021).
Usage
uni.tree(
  t.vec,
  d.vec,
  X.mat,
  P.value = 0.01,
  d0 = 0.01,
  S.plot = FALSE,
  score = TRUE
)
Arguments
| t.vec | :Vector of survival times (time to either death or censoring) | 
| d.vec | :Vector of censoring indicators (1=death, 0=censoring) | 
| X.mat | :n by p matrix of covariates (features), where n is the sample size and p is the number of covariates | 
| P.value | :the threshold of P-value for stop splitting (stopping criterion) | 
| d0 | :A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010) | 
| S.plot | :call for plot the KM estimator for each split | 
| score | :TRUE = score test (Emura et al. 2019); FALSE = log-rank test | 
Details
In order to stabilize the univariate score tests, a small value "d0" is added to the variance of the score statistics (Witten and Tibshirani 2010). d0=0 corresponds to the logrank test. To perform a large number of the score tests, the "compound.Cox" packages (Emura et al.2019) is applied with d0 as a option.
Value
A nested list describing a classification tree, consisting of inner nodes and terminal node.
References
Emura T, Hsu WC, Chou WC (2021). A survival tree based on stabilized score tests for high-dimensional covariates, in review
Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.
Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51
Examples
data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)