Overview

This tutorial demonstrates how to build a machine learning pipeline to predict vaccine response outcomes using transcriptomic data from the Immune Signatures Data Resource (ISDR). The ISDR is a standardized compendium of systems vaccinology datasets that integrates pre- and post-vaccination gene expression profiles alongside matched immune response measurements from multiple vaccine studies.

Citation: Diray-Arce, J., Miller, H.E.R., Henrich, E. et al. The Immune Signatures data resource, a compendium of systems vaccinology datasets. Sci Data 9, 635 (2022). https://doi.org/10.1038/s41597-022-01714-7 [doi.org]

We walk through each step from data preparation to model evaluation, focusing on:

Preparing phenotype and expression data from ISDR
Filtering, normalizing, and structuring data for analysis
Training and validating a Random Forest classifier to distinguish between high and low responders
Evaluating model performance using ROC curves and feature importance
Visualizing results by vaccine type and running subgroup analyses (e.g., influenza)

This tutorial supports reproducible research and serves as a reference pipeline for systems vaccinology studies using public immunology data. It can be adapted to other datasets and extended to test additional classifiers.

Setup

Load essential R packages for data processing, modeling, and visualization
Configure knitting options to suppress messages and warnings
Set working directory to your data folder (next chunk)

Step 1: Data Preparation

Combine phenotype and immune response data into a single dataset
Filter participants by age and availability of response measurements
Create a unified identifier for pathogen and vaccine type

1.1 Load and Filter Data

Update the file paths to point to your local directory where all_immdata_with_response.rds and all_norm_eset.rds are stored
Read immune response data (all_immdata_with_response.rds)
Read normalized expression set (all_norm_eset.rds)
Filter participants under 50 years old
Retain participants with both phenotype and response data
Merge immune response into phenotype data
Create a combined pathogen_vaccine_type identifier

immdataR<-readRDS(file.path("/Users/joannarce/BCH Dropbox/Joann Arce/NAIRR/ISDR",  "all_immdata_with_response.rds"))
eset<-readRDS(file.path("/Users/joannarce/BCH Dropbox/Joann Arce/NAIRR/ISDR",  "all_norm_eset.rds"))
youngIDs<-eset@phenoData@data[which(eset@phenoData@data$age_imputed<50),"participant_id"]
youngIDswithResponse<-intersect(youngIDs,immdataR$participant_id)
eset@phenoData@data<-eset@phenoData@data[which(eset@phenoData@data$participant_id %in% youngIDswithResponse),]
immdataR_ordered<-eset@phenoData@data %>% left_join(immdataR, by = "participant_id")
MissingCols<-which(colnames(immdataR_ordered)%in%colnames(eset@phenoData@data)==FALSE)
cols_to_add<- immdataR_ordered %>% select(all_of(MissingCols))
eset@phenoData@data<- bind_cols(eset@phenoData@data,cols_to_add)
exprs_matrix <-Biobase::exprs(eset)[,which(colnames(Biobase::exprs(eset))%in%eset@phenoData@data$uid)]
rownames(eset@phenoData@data) <- eset@phenoData@data$uid
eset@phenoData@data <- eset@phenoData@data[colnames(exprs_matrix), ]
eset$pathogen_vaccine_type=paste0(eset$pathogen,"_",eset$vaccine_type)

1.2 Data Cleaning

Retain only pre-vaccination samples (timepoint = 0)
Exclude studies with fewer than 10 samples to maintain statistical power
Remove genes with any missing expression values
Drop known outlier samples by UID

#Timepoints of interest
#tp_int=c(-7,0)
tp_int=c(0)
# Get sample indices for timepoints of interest
ind=lapply(tp_int, function(x){ which(eset$study_time_collected==x)})

# Retain only samples from timepoints of interest
eset=eset[,Reduce(union,ind)]

#Recompute timepoint indices after removing extraneous timepoints
ind=lapply(tp_int, function(x){which(eset$study_time_collected==x)})

# Remove samples from a single study with fewer than sample_cutoff samples at any timepoint
sample_cutoff = 10
matrix_uni_tp=lapply(ind,function(x){unique(eset[,x]$matrix)})
matrix_ind=lapply(1:length(ind),function(x){
  lapply(1:length(matrix_uni_tp[[x]]), function(y){which(matrix_uni_tp[[x]][[y]]==eset[,ind[[x]]]$matrix)})})

# Create empty vector
ind_cut_all=vector()

for (i in 1:length(matrix_ind)) {
  ind_cut=which(sapply(matrix_ind[[i]],length)<sample_cutoff)
  if (is_empty(ind_cut)==FALSE) {
    for (j in 1:length(ind_cut)) {
      ind_cut_all=c(ind_cut_all,ind[[i]][matrix_ind[[i]][[ind_cut[j]]]])
    }
  }
}

if (is_empty(ind_cut_all)==FALSE) {
  eset=eset[,-ind_cut_all]
}

# Recompute timepoint indices after removing samples
tp_int=unique(eset$study_time_collected[which(eset$study_time_collected>=0)])
ind=lapply(tp_int, function(x) {which(eset$study_time_collected==x)})

# Get Pheno Data
PD<-pData(eset)

eset=eset[complete.cases(Biobase::exprs(eset)),]
eset=eset[,eset$uid!="SUB187457.1289_0_Days_BS974314"]

1.3 Wrangling for Machine Learning

Reshape expression data from wide to long format for merging
Merge with phenotype information via sample UID
Cast data back to wide format (data.ML) for ML: rows = gene-time points, cols = participants
Extract PatientInfo table with demographics and outcomes

PD.use <- droplevels(pData(eset))
EXP <- Biobase::exprs(eset)
EXP <- EXP[complete.cases(EXP), ]

EXP.melt <- reshape2::melt(EXP)
colnames(EXP.melt)[2] <- "uid"
EXP.melt$uid <- as.character(EXP.melt$uid)

EXP.PD <- inner_join(EXP.melt, PD.use, by = "uid")
EXP.PD$Gene_Time <- paste0(EXP.PD$Var1, "_", EXP.PD$study_time_collected)

data.ML <- dcast(EXP.PD, Var1 + Gene_Time + study_time_collected ~ participant_id, value.var = "value")
rownames(data.ML) <- data.ML$Gene_Time
data.ML <- data.ML[, -c(1:3)]

PatientInfo <- EXP.PD[!duplicated(EXP.PD$participant_id),
  c("age_imputed", "gender", "pathogen_vaccine_type", "maxRBA_p30", "MFC_p30", "participant_id", "maxRBA", "MFC")]
rownames(PatientInfo) <- PatientInfo$participant_id

Step 2: Train Random Forest Classifier

Define responder status levels for classification
Select top variable genes as model features
Configure cross-validation and tuning parameters
Train Random Forest to optimize ROC AUC

2.1 Define High vs Low Responders

We filter PatientInfo to include only participants labeled as highResponder or lowResponder, and set highResponder as the reference level for classification.

PatientInfo %>% 
  filter(MFC_p30%in%c("lowResponder","highResponder")) %>%
  droplevels() %>%
  mutate(MFC_p30=fct_relevel(MFC_p30, c("highResponder", "lowResponder"))) -> PatientInfo

2.2 Assemble Training Matrices

Identify participants common to phenotype and gene-expression data
Select top 500 most variable genes for modeling
Form feature matrix X.Train and response vector Y.Train

DevID<-rownames(PatientInfo)
DevID<-intersect(DevID, colnames(data.ML))
X.Dev<-data.ML[,DevID]
rownames(X.Dev)<-strsplit2(rownames(X.Dev), "_")[,1]#didn't work, have genes at two timepoints

Y.Dev<-PatientInfo[DevID, 'MFC_p30']
names(Y.Dev)<-DevID

## Keep top 500 most varying genes ##
VarVec<-sort(apply(X.Dev, 1, var), decreasing = T)
keep.genes<-names(VarVec[1:500])

X.Train.t1<-t(X.Dev[keep.genes,])
Y.Train1<-PatientInfo[rownames(X.Train.t1), 'MFC_p30']
names(Y.Train1)<-rownames(X.Train.t1)

2.3 Model Training with 10-Fold CV

Set random seed (1987) for reproducibility
Define 10-fold CV with probability estimation and AUC summary
Specify mtry grid for Random Forest tuning
Center and scale features before fitting

SEED<-1987 # a great scientist was born this year

set.seed(SEED)
ctrl=trainControl(method = "adaptive_cv", 
                  number=10,
                  adaptive = list(min = 5, alpha = 0.05,
                                  method = "gls", complete = TRUE),
                  selectionFunction = 'best',
                  search="grid",
                  returnResamp = "final",
                  classProbs = TRUE,
                  returnData = F,
                  verboseIter = F,
                  summaryFunction = twoClassSummary,
                  savePredictions = T)

2.4 Run model training for random forest classifier

Training data is centered and scaled.
A tune length of 10 is used.
Model is trained to maximize the AUROC.
Parallelization code is provided for mac or windows.

## Parallel on mac
# doMC::registerDoMC(detectCores()-2)

## Parallel on windows
#library(doParallel)
#cl<-makePSOCKcluster(detectCores()-2)
#registerDoParallel(cl)

set.seed(SEED)
model_list <- train(x=X.Train.t1,
                    y=Y.Train1[rownames(X.Train.t1)],
                    trControl=ctrl,
                    method="rf",
                    metric="ROC", 
                    tuneLength=10,
                    maximize=TRUE,
                    preProc = c("center", "scale"))

#stopCluster(cl)

#save(model_list, file="Top500_Var_Model.rda")

2.5 Get Model Predictions and Variable Importance

Extracts 10-CV predictions, Training predictions and feature importance

## Feature importance ##
db.imp<-arrange(varImp(model_list)$importance, desc(Overall))
#write.csv(db.imp, file="Top500_Var_VarImp.csv")

## Model Predictions 10-CV ##
db.preds<-model_list$pred %>%
  mutate(participant_id=rownames(X.Train.t1)[rowIndex],
         pathogen_vaccine_type=PatientInfo[rownames(X.Train.t1)[rowIndex],"pathogen_vaccine_type"])

preds.train <- predict(model_list, newdata=X.Train.t1, type="prob")
preds.train <- data.frame(preds.train) %>% mutate(obs=Y.Train1[rownames(preds.train)])

Step 3: Model Evaluation

Extract feature importance to rank predictive genes
Generate cross-validated probability predictions
Plot overall ROC curve and compute AUC with confidence intervals

3.1 Overall Predictions & ROC Curve

We retrieve the top predictive genes, generate cross-validated probability predictions, and plot the ROC curve with its confidence interval.

#dev.new()
#par(pty="s")

### 10-CV 
ROC.OBJ1<-pROC::roc(response = db.preds$obs, 
                    predictor = db.preds$highResponder, 
                    levels=c("lowResponder", "highResponder"),
                    direction="<",
                    plot=TRUE, legacy.axes=TRUE, main= "Overall ROC", percent=TRUE,
                    xlab="False Positive %", ylab="True Positive %", identity=FALSE,
                    #main=paste(time, type, meth, bag),
                    col="black",lwd=4, lty=1, print.auc.x=77, print.auc.y=28,
                    print.auc.cex=0.9,
                    ci=TRUE)

#plot(ROC.OBJ1, legacy.axes = TRUE, percent = TRUE, col = "black", lwd = 4,lty = 1, print.auc = TRUE)

### Training-this is exactly the same data as test data
#ROC.OBJ2<-pROC::roc(response = preds.train$obs, 
#                    levels=c("lowResponder", "highResponder"),
#                    predictor = preds.train$highResponder, 
#                    direction="<",
#                    legacy.axes=TRUE, 
#                    percent=TRUE,
#                    ci=TRUE)

#plot(ROC.OBJ2, legacy.axes = TRUE,
#         percent = TRUE,
#         col = "gray60",
#         lwd = 4,
#         lty = 3,
#         print.auc = TRUE)

ci.auc (ROC.OBJ1)

## 95% CI: 65.55%-75.89% (DeLong)

This section of code uses the 10-CV predictions in db.preds to calculate performance metrics for each vaccine.

db.preds.filtered <- db.preds %>%  group_by(pathogen_vaccine_type) %>%  filter(n_distinct(obs) == 2) %>% ungroup()
perf.stats.by.vaccine<-plyr::ddply(.data=db.preds.filtered,
                             .variable=c("pathogen_vaccine_type"),
                             .fun=function(df){
                               
                               CONF.MAT<-caret::confusionMatrix(data=relevel(df$pred, ref="highResponder"),
                                                                reference=relevel(df$obs, ref="highResponder"))
                               
                               ROC.OBJ<-pROC::roc(response = df$obs,levels=c("lowResponder", "highResponder"),
                                                  direction="<", 
                                                  predictor = df$highResponder)
                               
                               AUC.CI<-as.vector(pROC::ci.auc(ROC.OBJ))
                               names(AUC.CI)<-c("AUC.CI.LOW", "AUC", "AUC.CI.HIGH")
                               
                               BRIER = ModelMetrics::brier(actual = (as.numeric(df$obs)-1), predicted = df$highResponder)
                               
                               N = length(df$pathogen_vaccine_type)
                               
                               PROP_TEST<-prop.test((CONF.MAT$overall["Accuracy"]*N)[1], N)
                               
                               FISHER_TEST<-fisher.test(df$pred, df$obs)
                               
                               VEC=c(CONF.MAT$overall, CONF.MAT$byClass, AUC.CI, 
                                     Brier=BRIER, n_vac=N, n_correct=(CONF.MAT$overall["Accuracy"]*N)[[1]],
                                     n_fail=((1-CONF.MAT$overall["Accuracy"])*N)[[1]],
                                     PropTestPvalue=PROP_TEST$p.value,
                                     FisherTestPvalue=FISHER_TEST$p.value)
                               
                               return(VEC)})

perf.stats.by.vaccine<-arrange(perf.stats.by.vaccine, desc(AUC))

perf.stats.by.vaccine$n_correct<-perf.stats.by.vaccine$Accuracy*perf.stats.by.vaccine$n_vac
perf.stats.by.vaccine$n_fail<-perf.stats.by.vaccine$n_vac-perf.stats.by.vaccine$n_correct

perf.stats.by.vaccine<-arrange(perf.stats.by.vaccine, desc(n_vac))

perf.stats.by.vaccine

This section of code uses ggplot() to show the AUC and confidence intervals for vaccine.

ggplot(perf.stats.by.vaccine, aes(x=reorder(pathogen_vaccine_type, -AUC),
                       y=AUC, fill=pathogen_vaccine_type))+
  geom_bar(stat='identity', show.legend = F)+
  scale_fill_brewer(palette = "Set3")+
  geom_errorbar(aes(ymin = AUC.CI.LOW, ymax = AUC.CI.HIGH),width = 0.5)+
  labs(y="AUC (CI)", x = "Vaccine")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size=8))+
  geom_text(aes(y=0.05, label=n_vac))+
  ylim(0,1)

Optional: 3.2 Additional Visualizations

3.2.1 Principal Component Analysis of Top Genes

X.Train <- t(X.Dev[keep.genes, ])
Y.Train <- PatientInfo[rownames(X.Train), "MFC_p30"]
# Perform PCA on top variable genes from training data
pca_res <- prcomp(X.Train, scale. = TRUE)
# Build data frame for plotting
pca_df <- data.frame(
  PC1 = pca_res$x[,1],
  PC2 = pca_res$x[,2],
  Vaccine = PatientInfo[rownames(X.Train), "pathogen_vaccine_type"],
  Responder = PatientInfo[rownames(X.Train), "MFC_p30"]
)
# Plot PCA
ggplot(pca_df, aes(x = PC1, y = PC2, color = Vaccine, shape = Responder)) +
  geom_point(size = 3) +
  theme_bw() +
  labs(
    title = "PCA of Top 500 Most Variable Genes",
    x = paste0("PC1 (", round(summary(pca_res)$importance[2,1]*100,1), "% variance)"),
    y = paste0("PC2 (", round(summary(pca_res)$importance[2,2]*100,1), "% variance)")
  )

3.2.2 Predicted Probability Distribution and Confusion Matrix

# Predicted probability distribution
ggplot(db.preds, aes(x = highResponder, fill = obs)) +
  geom_density(alpha = 0.5) +
  theme_bw() +
  labs(title = "Predicted Probability Distribution by Class",
       x = "Predicted Probability for High Responder", y = "Density")

# Confusion matrix heatmap
library(reshape2)
pred_labels <- factor(ifelse(db.preds$highResponder > 0.5, "highResponder", "lowResponder"),
                       levels = c("lowResponder", "highResponder"))
true_labels <- factor(db.preds$obs, levels = c("lowResponder", "highResponder"))
cm_table <- table(Predicted = pred_labels, Actual = true_labels)
cm_melt <- melt(cm_table)
ggplot(cm_melt, aes(x = Actual, y = Predicted, fill = value)) +
  geom_tile(color = "white") +
  geom_text(aes(label = value), size = 5) +
  scale_fill_gradient(low = "white", high = "steelblue") +
  theme_minimal() +
  labs(title = "Confusion Matrix Heatmap")

3.2.3 Top 20 Most Variable Genes Across Entire Dataset

var_df_all <- data.frame(Gene = names(VarVec), Variance = VarVec)

# Stick-dot plot for variance
ggplot(var_df_all[1:20, ], aes(x = reorder(Gene, Variance), y = Variance)) +
  geom_segment(aes(xend = Gene, y = 0, yend = Variance), color = "grey") +
  geom_point(size = 3, color = "coral") +
  coord_flip() +
  theme_bw() +
  labs(title = "Top 20 Most Variable Genes - Entire Dataset", x = "Gene", y = "Variance")

3.2.4 Top 20 Feature Importance from Overall Model

# Re-extract feature importance for clarity
db.imp <- arrange(varImp(model_list)$importance, desc(Overall))
db.imp$Feature <- rownames(db.imp)

# Stick-dot plot for feature importance
ggplot(db.imp[1:20, ], aes(x = reorder(Feature, Overall), y = Overall)) +
  geom_segment(aes(xend = Feature, y = 0, yend = Overall), color = "grey") +
  geom_point(size = 3, color = "steelblue") +
  coord_flip() +
  theme_bw() +
  labs(title = "Top 20 Feature Importance - Entire Dataset", x = "Gene", y = "Importance")

Optional: 3.3 Influenza (Inactivated) Subgroup Tutorial

3.3.1 Subset and Feature Selection

customGrid <- expand.grid(mtry = c(2, 5, 10))
influenza_ids <- PatientInfo %>% filter(pathogen_vaccine_type == "Influenza_Inactivated") %>% rownames()
influenza_ids <- intersect(influenza_ids, colnames(data.ML))

X.Influenza <- data.ML[, influenza_ids]
rownames(X.Influenza) <- strsplit2(rownames(X.Influenza), "_")[, 1]

VarVec_influenza <- sort(apply(X.Influenza, 1, var), decreasing = TRUE)
keep.genes.influenza <- names(VarVec_influenza[1:500])

X.Influenza.Train <- t(X.Influenza[keep.genes.influenza, ])
Y.Influenza.Train <- PatientInfo[rownames(X.Influenza.Train), "MFC_p30"]

3.3.2 Visualize Top Varying Genes

var_df <- data.frame(Gene = names(VarVec_influenza), Variance = VarVec_influenza)

ggplot(var_df[1:20, ], aes(x = reorder(Gene, Variance), y = Variance)) +
  geom_bar(stat = "identity", fill = "coral") +
  coord_flip() +
  theme_bw() +
  labs(title = "Top 20 Most Variable Genes - Influenza (Inactivated)", x = "Gene", y = "Variance")

3.3.3 Train Random Forest Model

model_influenza <- train(
  x = X.Influenza.Train,
  y = Y.Influenza.Train,
  trControl = ctrl,
  method = "rf",
  metric = "ROC",
  tuneGrid = customGrid,
  preProc = c("center", "scale")
)

3.3.4 Visualize Feature Importance

db.imp.influenza <- arrange(varImp(model_influenza)$importance, desc(Overall))
db.imp.influenza$Feature <- rownames(db.imp.influenza)

ggplot(db.imp.influenza[1:20, ], aes(x = reorder(Feature, Overall), y = Overall)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  theme_bw() +
  labs(title = "Top 20 Feature Importance - Influenza (Inactivated)", x = "Gene", y = "Importance")

3.3.5 Subgroup Predictions and ROC Curve

db.preds.influenza <- model_influenza$pred %>% mutate(participant_id = rownames(X.Influenza.Train)[rowIndex])

if (length(unique(db.preds.influenza$obs)) == 2) {
  pROC::roc(
    response = db.preds.influenza$obs,
    predictor = db.preds.influenza$highResponder,
    levels = c("lowResponder", "highResponder"),
    direction = "<",
    plot = TRUE,
    legacy.axes = TRUE,
    percent = TRUE,
    col = "purple",
    lwd = 3,
    main = "ROC Curve - Influenza (Inactivated) Subgroup"
  )
} else {
  print("Cannot compute ROC: Only one response class present.")
}

## 
## Call:
## roc.default(response = db.preds.influenza$obs, predictor = db.preds.influenza$highResponder,     levels = c("lowResponder", "highResponder"), percent = TRUE,     direction = "<", plot = TRUE, legacy.axes = TRUE, col = "purple",     lwd = 3, main = "ROC Curve - Influenza (Inactivated) Subgroup")
## 
## Data: db.preds.influenza$highResponder in 57 controls (db.preds.influenza$obs lowResponder) < 69 cases (db.preds.influenza$obs highResponder).
## Area under the curve: 70.09%

Session Info

sessionInfo()

## R version 4.5.0 (2025-04-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS 26.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rlang_1.1.6          ModelMetrics_1.2.2.2 pROC_1.19.0.1       
##  [4] reshape2_1.4.4       randomForest_4.7-1.2 limma_3.64.3        
##  [7] caret_7.0-1          lattice_0.22-7       ggplot2_4.0.0       
## [10] forcats_1.0.1        dplyr_1.1.4          data.table_1.17.8   
## [13] Biobase_2.68.0       BiocGenerics_0.54.0  generics_0.1.4      
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        xfun_0.53           bslib_0.9.0        
##  [4] recipes_1.3.1       vctrs_0.6.5         tools_4.5.0        
##  [7] stats4_4.5.0        parallel_4.5.0      proxy_0.4-27       
## [10] tibble_3.3.0        pkgconfig_2.0.3     Matrix_1.7-4       
## [13] RColorBrewer_1.1-3  S7_0.2.0            lifecycle_1.0.4    
## [16] compiler_4.5.0      farver_2.1.2        stringr_1.5.2      
## [19] statmod_1.5.0       codetools_0.2-20    htmltools_0.5.8.1  
## [22] class_7.3-23        sass_0.4.10         yaml_2.3.10        
## [25] prodlim_2025.04.28  pillar_1.11.1       jquerylib_0.1.4    
## [28] MASS_7.3-65         cachem_1.1.0        gower_1.0.2        
## [31] iterators_1.0.14    rpart_4.1.24        foreach_1.5.2      
## [34] nlme_3.1-168        parallelly_1.45.1   lava_1.8.1         
## [37] tidyselect_1.2.1    digest_0.6.37       stringi_1.8.7      
## [40] future_1.67.0       purrr_1.1.0         listenv_0.9.1      
## [43] labeling_0.4.3      splines_4.5.0       fastmap_1.2.0      
## [46] grid_4.5.0          cli_3.6.5           magrittr_2.0.4     
## [49] survival_3.8-3      e1071_1.7-16        future.apply_1.20.0
## [52] withr_3.0.2         scales_1.4.0        lubridate_1.9.4    
## [55] timechange_0.3.0    rmarkdown_2.29      globals_0.18.0     
## [58] nnet_7.3-20         timeDate_4041.110   evaluate_1.0.5     
## [61] knitr_1.50          hardhat_1.4.2       Rcpp_1.1.0         
## [64] glue_1.8.0          ipred_0.9-15        rstudioapi_0.17.1  
## [67] jsonlite_2.0.0      R6_2.6.1            plyr_1.8.9

Tutorial: Random Forest Classifier for Vaccine Response Prediction

Authors: Juilee Thakar, Joann Diray Arce and Steven Kleinstein

Last updated: November 17, 2025