Clinical ADaM Derivations with sasif
Source:vignettes/sasif_introduction.Rmd
sasif_introduction.RmdIntroduction
Clinical programmers working in R often face a common challenge when
migrating from SAS: in SAS, a single IF ... THEN DO block
can assign multiple variables at once under one
condition. In R, traditional approaches like case_when() or
fifelse() force you to repeat the same condition
for every variable — increasing QC risk and reducing
readability.
sasif solves this by bringing SAS-style
IF / ELSE IF / ELSE control flow into R’s
data.table ecosystem. One condition governs all assignments
in a block — just like SAS.
This vignette walks through three real-world ADaM derivation scenarios:
- ADSL — Population flags and treatment variables
- ADLB — Laboratory value categorisation
- ADAE — Treatment-emergent adverse event flags
Setup
library(sasif)
library(data.table)
#> Warning: package 'data.table' was built under R version 4.5.2Scenario 1 — ADSL: Population Flags
The Problem
In a typical ADSL derivation, when a subject is in the treatment arm, multiple variables need to be assigned simultaneously — population flags, treatment labels, numeric codes, and treatment dates.
In traditional R, every variable requires its own repeated condition:
# ❌ Traditional R — condition repeated for every variable
adsl <- adsl %>% mutate(
SAFFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
SAFFLN = case_when(ACTARMCD == "TRTA" ~ 1),
TRT01A = case_when(ACTARMCD == "TRTA" ~ ACTARMCD),
TRT01AN = case_when(ACTARMCD == "TRTA" ~ 1),
ITTFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
FASFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
RANDFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
PPFL = case_when(ACTARMCD == "TRTA" ~ "Y")
# Same condition written 8 times — high QC risk
)If the condition ever changes, you must update it in 8 places. Miss one and your derivation silently diverges — a real risk in regulated environments.
The sasif Solution
# Create sample ADSL data
adsl <- data.table(
USUBJID = c("S01", "S02", "S03", "S04"),
ACTARMCD = c("TRTA", "TRTA", "SCRNFAIL", "TRTA"),
RFSTDTC = c("2024-01-10", "2024-01-15", NA, "2024-01-20"),
RFENDTC = c("2024-06-10", "2024-06-15", NA, "2024-06-20")
)
# ✅ sasif — condition written ONCE, governs all assignments
ADSL <- data_step(adsl,
if_do(ACTARMCD == "TRTA",
SAFFL = "Y",
SAFFLN = 1,
TRT01A = "Treatment A",
TRT01AN = 1,
TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d"),
TRTEDT = as.Date(RFENDTC, "%Y-%m-%d"),
ITTFL = "Y",
FASFL = "Y",
RANDFL = "Y",
PPFL = "Y"
)
)
print(ADSL[, .(USUBJID, ACTARMCD, SAFFL, TRT01A, TRT01AN, ITTFL, FASFL)])
#> USUBJID ACTARMCD SAFFL TRT01A TRT01AN ITTFL FASFL
#> <char> <char> <char> <char> <num> <char> <char>
#> 1: S01 TRTA Y Treatment A 1 Y Y
#> 2: S02 TRTA Y Treatment A 1 Y Y
#> 3: S03 SCRNFAIL <NA> <NA> NA <NA> <NA>
#> 4: S04 TRTA Y Treatment A 1 Y YAll 10 variables are derived from a single condition block. Clean,
readable, and audit-friendly — exactly like SAS
IF ... THEN DO.
Scenario 2 — ADSL: Multi-Arm Treatment Assignment (IF / ELSE IF / ELSE)
When a study has multiple treatment arms, use the full IF / ELSE IF / ELSE chain. The first matching condition wins — all others are skipped:
adsl2 <- data.table(
USUBJID = c("S01", "S02", "S03", "S04", "S05"),
ACTARMCD = c("TRTA", "TRTB", "TRTC", "TRTA", "TRTB"),
AGE = c(35, 52, 67, 44, 58)
)
ADSL2 <- data_step(adsl2,
if_do(ACTARMCD == "TRTA",
TRT01A = "Treatment A",
TRT01AN = 1
),
else_if_do(ACTARMCD == "TRTB",
TRT01A = "Treatment B",
TRT01AN = 2
),
else_do(
TRT01A = "Placebo",
TRT01AN = 99
)
)
print(ADSL2[, .(USUBJID, ACTARMCD, TRT01A, TRT01AN)])
#> USUBJID ACTARMCD TRT01A TRT01AN
#> <char> <char> <char> <num>
#> 1: S01 TRTA Treatment A 1
#> 2: S02 TRTB Treatment B 2
#> 3: S03 TRTC Placebo 99
#> 4: S04 TRTA Treatment A 1
#> 5: S05 TRTB Treatment B 2Notice that both TRT01A (character label) and
TRT01AN (numeric code) are derived together under each
condition — no repetition needed.
Scenario 3 — ADSL: Age Categorisation
Derive both the age category label and its numeric code in one chain:
adsl3 <- data.table(
USUBJID = c("S01", "S02", "S03", "S04", "S05"),
AGE = c(32, 45, 58, 71, 80)
)
ADSL3 <- data_step(adsl3,
if_do(AGE <= 45,
AGECAT = "YOUNG",
AGECATN = 1
),
else_if_do(AGE <= 70,
AGECAT = "MIDDLE",
AGECATN = 2
),
else_do(
AGECAT = "OLD",
AGECATN = 3
)
)
print(ADSL3[, .(USUBJID, AGE, AGECAT, AGECATN)])
#> USUBJID AGE AGECAT AGECATN
#> <char> <num> <char> <num>
#> 1: S01 32 YOUNG 1
#> 2: S02 45 YOUNG 1
#> 3: S03 58 MIDDLE 2
#> 4: S04 71 OLD 3
#> 5: S05 80 OLD 3Scenario 4 — ADLB: Laboratory Value Categorisation
A common ADaM derivation — categorise lab values as LOW, NORMAL, or HIGH based on reference ranges, and derive both the character and numeric category together:
adlb <- data.table(
USUBJID = c("S01", "S01", "S02", "S02", "S03"),
LBTESTCD = c("ALB", "ALB", "ALB", "ALB", "ALB"),
AVAL = c(2.8, 4.2, 5.6, 3.5, 1.9),
ANRLO = c(3.5, 3.5, 3.5, 3.5, 3.5),
ANRHI = c(5.0, 5.0, 5.0, 5.0, 5.0)
)
ADLB <- data_step(adlb,
if_do(LBTESTCD == "ALB" & AVAL < ANRLO,
ALBCAT = "LOW",
ALBCATN = 1
),
else_if_do(LBTESTCD == "ALB" & AVAL > ANRHI,
ALBCAT = "HIGH",
ALBCATN = 2
),
else_do(
ALBCAT = "NORMAL",
ALBCATN = 3
)
)
print(ADLB[, .(USUBJID, LBTESTCD, AVAL, ANRLO, ANRHI, ALBCAT, ALBCATN)])
#> USUBJID LBTESTCD AVAL ANRLO ANRHI ALBCAT ALBCATN
#> <char> <char> <num> <num> <num> <char> <num>
#> 1: S01 ALB 2.8 3.5 5 LOW 1
#> 2: S01 ALB 4.2 3.5 5 NORMAL 3
#> 3: S02 ALB 5.6 3.5 5 HIGH 2
#> 4: S02 ALB 3.5 3.5 5 NORMAL 3
#> 5: S03 ALB 1.9 3.5 5 LOW 1Both ALBCAT and ALBCATN are always
consistent — they are derived from the same condition, so they can never
diverge.
Scenario 5 — ADAE: Treatment-Emergent Flag (TRTEMFL)
Flag adverse events that started on or after the treatment start date:
adae <- data.table(
USUBJID = c("S01", "S01", "S02", "S02", "S03"),
AEDECOD = c("Headache", "Nausea", "Fatigue", "Dizziness", "Rash"),
ASTDT = as.Date(c("2024-01-15", "2023-12-01",
"2024-01-20", "2024-02-10", "2024-01-25")),
TRTSDT = as.Date(c("2024-01-10", "2024-01-10",
"2024-01-15", "2024-01-15", "2024-01-20")),
TRTEDT = as.Date(c("2024-06-10", "2024-06-10",
"2024-06-15", "2024-06-15", "2024-06-20"))
)
ADAE <- data_step(adae,
if_do(ASTDT >= TRTSDT & ASTDT <= TRTEDT,
TRTEMFL = "Y",
TRTEMA = AEDECOD
)
)
print(ADAE[, .(USUBJID, AEDECOD, ASTDT, TRTSDT, TRTEMFL)])
#> USUBJID AEDECOD ASTDT TRTSDT TRTEMFL
#> <char> <char> <Date> <Date> <char>
#> 1: S01 Headache 2024-01-15 2024-01-10 Y
#> 2: S01 Nausea 2023-12-01 2024-01-10 <NA>
#> 3: S02 Fatigue 2024-01-20 2024-01-15 Y
#> 4: S02 Dizziness 2024-02-10 2024-01-15 Y
#> 5: S03 Rash 2024-01-25 2024-01-20 YScenario 6 — DELETE: Remove Unwanted Records
Use delete_if() to remove rows explicitly — mirrors the
SAS DELETE statement and makes the intent clear in the
code:
adlb2 <- data.table(
USUBJID = c("S01", "S02", "S03", "S04", "S05"),
LBTESTCD = c("ALB", NA, "ALB", "ALB", NA),
VISIT = c("WEEK 1", "WEEK 1", "UNSCHEDULED", "WEEK 2", "WEEK 4"),
AVAL = c(4.2, 3.8, 5.1, 4.0, 3.5)
)
ADLB2 <- data_step(adlb2,
delete_if(is.na(LBTESTCD)),
delete_if(VISIT == "UNSCHEDULED")
)
print(ADLB2)
#> USUBJID LBTESTCD VISIT AVAL
#> <char> <char> <char> <num>
#> 1: S01 ALB WEEK 1 4.2
#> 2: S04 ALB WEEK 2 4.0Only records with valid test codes and scheduled visits are retained.
Scenario 7 — Independent Flags (if_independent)
Use if_independent() when conditions are
not mutually exclusive — each condition is evaluated on
its own, so multiple flags can apply to the same row simultaneously:
adsl4 <- data.table(
USUBJID = c("S01", "S02", "S03", "S04"),
AGE = c(30, 68, 45, 72),
WEIGHTKG = c(48, 72, 55, 43),
DIABFL = c("N", "Y", "N", "Y")
)
ADSL4 <- data_step(adsl4,
if_independent(AGE > 65, SENIORFL = "Y"),
if_independent(WEIGHTKG < 50, LOWWTFL = "Y"),
if_independent(DIABFL == "Y", COMORBFL = "Y")
)
print(ADSL4)
#> USUBJID AGE WEIGHTKG DIABFL SENIORFL LOWWTFL COMORBFL
#> <char> <num> <num> <char> <char> <char> <char>
#> 1: S01 30 48 N <NA> Y <NA>
#> 2: S02 68 72 Y Y <NA> Y
#> 3: S03 45 55 N <NA> <NA> <NA>
#> 4: S04 72 43 Y Y Y YSubject S04 (age 72, weight 43, diabetic) receives all three flags — because all three conditions are TRUE for that row simultaneously.
Key Principle: When to Use Which Function
| Situation | Use |
|---|---|
| First matching condition should win |
if_do() + else_if_do() +
else_do()
|
| Multiple conditions can apply to same row | if_independent() |
| Remove rows from dataset | delete_if() |
Important: Do not mix
if_do()chains withif_independent()on the same variable.if_independent()runs after the chain and will overwrite earlier assignments. Use one approach consistently per variable.
Summary
sasif brings three key benefits to clinical R
programming:
- One condition, multiple assignments — no repeated logic, no QC risk of conditions diverging
-
Familiar SAS syntax —
IF / ELSE IF / ELSEcontrol flow that clinical programmers already know - data.table performance — fully vectorized, no row loops, scales to millions of rows
For more information, see the package documentation.