Conditioned data frames, or cnd_df, are a powerful tool
in the {sdtm.oak} package designed to facilitate
conditional transformations on data frames. This article explains how to
create and use conditioned data frames, particularly in the context of
SDTM domain derivations.
A conditioned data frame is a regular data frame extended with a
logical vector cnd that marks rows for subsequent
conditional transformations. The condition_add() function
is used to create these conditioned data frames.
Consider a simple data frame df:
## # A tibble: 3 × 2
##       x y    
##   <int> <chr>
## 1     1 a    
## 2     2 b    
## 3     3 cWe can create a conditioned data frame where only rows where
x > 1 are marked:
## # A tibble:  3 × 2
## # Cond. tbl: 2/1/0
##         x y    
##     <int> <chr>
## 1 F     1 a    
## 2 T     2 b    
## 3 T     3 cHere, only the second and third rows are marked as
TRUE.
The real power of conditioned data frames manifests when they are
used with functions such as assign_no_ct,
assign_ct, hardcode_no_ct, and
hardcode_ct. These functions perform derivations only for
the records that match the pattern of TRUE values in
conditioned data frames.
Consider a simplified dataset of concomitant medications, where we
want to derive a new variable CMGRPID (Concomitant Medication Group ID)
based on the condition that the medication treatment (CMTRT) is
"BENADRYL".
Here is a simplified raw Concomitant Medications data set
(cm_raw):
cm_raw <- tibble::tibble(
  oak_id = seq_len(14L),
  raw_source = "ConMed",
  patient_number = c(375L, 375L, 376L, 377L, 377L, 377L, 377L, 378L, 378L, 378L, 378L, 379L, 379L, 379L),
  MDNUM = c(1L, 2L, 1L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 3L, 1L, 2L, 3L),
  MDRAW = c(
    "BABY ASPIRIN", "CORTISPORIN", "ASPIRIN",
    "DIPHENHYDRAMINE HCL", "PARCETEMOL", "VOMIKIND",
    "ZENFLOX OZ", "AMITRYPTYLINE", "BENADRYL",
    "DIPHENHYDRAMINE HYDROCHLORIDE", "TETRACYCLINE",
    "BENADRYL", "SOMINEX", "ZQUILL"
  )
)
cm_raw## # A tibble: 14 × 5
##    oak_id raw_source patient_number MDNUM MDRAW                        
##     <int> <chr>               <int> <int> <chr>                        
##  1      1 ConMed                375     1 BABY ASPIRIN                 
##  2      2 ConMed                375     2 CORTISPORIN                  
##  3      3 ConMed                376     1 ASPIRIN                      
##  4      4 ConMed                377     1 DIPHENHYDRAMINE HCL          
##  5      5 ConMed                377     2 PARCETEMOL                   
##  6      6 ConMed                377     3 VOMIKIND                     
##  7      7 ConMed                377     5 ZENFLOX OZ                   
##  8      8 ConMed                378     4 AMITRYPTYLINE                
##  9      9 ConMed                378     1 BENADRYL                     
## 10     10 ConMed                378     2 DIPHENHYDRAMINE HYDROCHLORIDE
## 11     11 ConMed                378     3 TETRACYCLINE                 
## 12     12 ConMed                379     1 BENADRYL                     
## 13     13 ConMed                379     2 SOMINEX                      
## 14     14 ConMed                379     3 ZQUILLTo derive the CMTRT variable we use the
assign_no_ct() function to map the MDRAW
variable to the CMTRT variable:
## # A tibble: 14 × 4
##    oak_id raw_source patient_number CMTRT                        
##     <int> <chr>               <int> <chr>                        
##  1      1 ConMed                375 BABY ASPIRIN                 
##  2      2 ConMed                375 CORTISPORIN                  
##  3      3 ConMed                376 ASPIRIN                      
##  4      4 ConMed                377 DIPHENHYDRAMINE HCL          
##  5      5 ConMed                377 PARCETEMOL                   
##  6      6 ConMed                377 VOMIKIND                     
##  7      7 ConMed                377 ZENFLOX OZ                   
##  8      8 ConMed                378 AMITRYPTYLINE                
##  9      9 ConMed                378 BENADRYL                     
## 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11     11 ConMed                378 TETRACYCLINE                 
## 12     12 ConMed                379 BENADRYL                     
## 13     13 ConMed                379 SOMINEX                      
## 14     14 ConMed                379 ZQUILLThen we create a conditioned data frame from the target data set
(tgt_dat), meaning we create a conditioned data frame where
only rows with CMTRT equal to "BENADRYL" are
marked:
## # A tibble:  14 × 4
## # Cond. tbl: 2/12/0
##      oak_id raw_source patient_number CMTRT                        
##       <int> <chr>               <int> <chr>                        
## 1  F      1 ConMed                375 BABY ASPIRIN                 
## 2  F      2 ConMed                375 CORTISPORIN                  
## 3  F      3 ConMed                376 ASPIRIN                      
## 4  F      4 ConMed                377 DIPHENHYDRAMINE HCL          
## 5  F      5 ConMed                377 PARCETEMOL                   
## 6  F      6 ConMed                377 VOMIKIND                     
## 7  F      7 ConMed                377 ZENFLOX OZ                   
## 8  F      8 ConMed                378 AMITRYPTYLINE                
## 9  T      9 ConMed                378 BENADRYL                     
## 10 F     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 F     11 ConMed                378 TETRACYCLINE                 
## 12 T     12 ConMed                379 BENADRYL                     
## 13 F     13 ConMed                379 SOMINEX                      
## 14 F     14 ConMed                379 ZQUILLFinally, we derive the CMGRPID variable conditionally.
Using assign_no_ct(), we derive CMGRPID which
indicates the group ID for the medication, based on the conditioned
target data set:
derived_tgt_dat <- assign_no_ct(
  tgt_dat = cnd_tgt_dat,
  tgt_var = "CMGRPID",
  raw_dat = cm_raw,
  raw_var = "MDNUM"
)
derived_tgt_dat## # A tibble: 14 × 5
##    oak_id raw_source patient_number CMTRT                         CMGRPID
##     <int> <chr>               <int> <chr>                           <int>
##  1      1 ConMed                375 BABY ASPIRIN                       NA
##  2      2 ConMed                375 CORTISPORIN                        NA
##  3      3 ConMed                376 ASPIRIN                            NA
##  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
##  5      5 ConMed                377 PARCETEMOL                         NA
##  6      6 ConMed                377 VOMIKIND                           NA
##  7      7 ConMed                377 ZENFLOX OZ                         NA
##  8      8 ConMed                378 AMITRYPTYLINE                      NA
##  9      9 ConMed                378 BENADRYL                            1
## 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
## 11     11 ConMed                378 TETRACYCLINE                       NA
## 12     12 ConMed                379 BENADRYL                            1
## 13     13 ConMed                379 SOMINEX                            NA
## 14     14 ConMed                379 ZQUILL                             NAConditioned data frames in the {sdtm.oak} package
provide a flexible way to perform conditional transformations on data
sets. By marking specific rows for transformation, users can efficiently
derive SDTM variables, ensuring that only relevant records are
processed.