Reviewer Response Analyses

Reviewer Response Analyses

This directory contains all interactive analyses addressing reviewer questions and concerns.

πŸ“ Structure

reviewer_responses/
β”œβ”€β”€ README.md                      # This file - navigation hub
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ R1/                        # Referee #1 analyses
β”‚   β”œβ”€β”€ R2/                        # Referee #2 analyses
β”‚   β”œβ”€β”€ R3/                        # Referee #3 analyses
β”‚   β”œβ”€β”€ framework/                 # Framework overview
β”‚   β”œβ”€β”€ archive/                   # Archived/removed notebooks
β”‚   └── results/                   # Reviewer-specific results
β”œβ”€β”€ preprocessing/                 # Data preprocessing utilities
β”‚   β”œβ”€β”€ preprocessing_utils.py     # Standalone preprocessing functions
β”‚   β”œβ”€β”€ create_preprocessing_files.html  # Interactive preprocessing notebook
β”‚   └── WORKFLOW.md                # Complete workflow documentation
└── SIMPLE_EXAMPLE.py              # Simple example of preprocessing functions

🎯 How to Use

  1. Click on any question below to navigate to its dedicated analysis notebook
  2. Each notebook is self-contained and can be run independently
  3. All notebooks use the same data paths and setup

πŸ”§ Technical Notes


Referee #1: Human Genetics, Disease Risk

Question Notebook Status
Q1: Selection bias / socioeconomic bias notebooks/R1/R1_Q1_Selection_Bias.html βœ… Complete
Q3: Clinical/biological meaningfulness notebooks/R1/R1_Q3_Clinical_Meaning.html βœ… Complete
Q3: ICD vs PheCode aggregation notebooks/R1/R1_Q3_ICD_vs_PheCode_Comparison.html βœ… Complete
Q7: Heritability estimates notebooks/R1/R1_Q7_Heritability.html βœ… Complete
Q9: AUC vs clinical risk scores notebooks/R1/R1_Q9_AUC_Comparisons.html βœ… Complete
Q10: Age-specific discrimination notebooks/R1/R1_Q10_Age_Specific.html βœ… Complete
Additional: Biological plausibility (CHIP) notebooks/R1/R1_Biological_Plausibility_CHIP.html βœ… Complete
Additional: Clinical utility (dynamic risk) notebooks/R1/R1_Clinical_Utility_Dynamic_Risk_Updating.html βœ… Complete
Additional: Genetic validation (GWAS) notebooks/R1/R1_Genetic_Validation_GWAS.html βœ… Complete
Additional: Genetic validation (Gene-based RVAS) notebooks/R1/R1_Genetic_Validation_Gene_Based_RVAS.html βœ… Complete
Additional: Multi-disease patterns notebooks/R1/R1_Multi_Disease_Patterns_Competing_Risks.html βœ… Complete
Additional: Robustness (LOO validation) notebooks/R1/R1_Robustness_LOO_Validation.html βœ… Complete

Referee #2: EHRs

Concern Notebook Status
Temporal accuracy / leakage notebooks/R2/R2_Temporal_Leakage.html βœ… Complete
Model validity / learning notebooks/R2/R2_R3_Model_Validity_Learning.html βœ… Complete
Washout approaches comparison notebooks/R2/R2_Washout_Comparisons.html βœ… Complete
Delphi Phecode mapping comparison notebooks/R2/R2_Delphi_Phecode_Mapping.html βœ… Complete

Referee #3: Statistical Genetics, PRS

Question Notebook Status
Q3: Avoiding reverse causation (washout analysis) notebooks/R3/R3_AvoidingReverseCausation.html βœ… Complete
Q4: Competing risks notebooks/R3/R3_Competing_Risks.html βœ… Complete
Q4: Decreasing_Hazards notebooks/R3/R3_Q4_Decreasing_Hazards_Censoring_Bias.html βœ… Complete
Q8: Heterogeneity analysis (main paper method) notebooks/R3/R3_Q8_Heterogeneity_MainPaper_Method.html βœ… Complete
Q8: Heterogeneity analysis (continued) notebooks/R3/R3_Q8_Heterogeneity_Continued.html βœ… Complete
Population Stratification: Continuous ancestry effects notebooks/R3/R3_Population_Stratification_Ancestry.html βœ… Complete
Additional: Linear vs Nonlinear mixing notebooks/R3/R3_Linear_vs_NonLinear_Mixing.html βœ… Complete
Additional: Cross-cohort similarity notebooks/R3/R3_Cross_Cohort_Similarity.html βœ… Complete
Additional: Corrected_Data notebooks/R3/R3_Verify_Corrected_Data.html βœ… Complete

Framework Overview

Notebook Description
notebooks/framework/Discovery_Prediction_Framework_Overview.html Overview of the discovery and prediction framework

Preprocessing & Workflow

Addresses reviewer questions about data preprocessing and the complete analysis workflow.

Resource Description
preprocessing/WORKFLOW.md Complete end-to-end workflow documentation - Step-by-step guide from preprocessing β†’ batch training β†’ master checkpoint β†’ prediction
preprocessing/create_preprocessing_files.html Interactive notebook for data preprocessing with visualizations (smoothed prevalence, clustering, signature references)
preprocessing/enhanced_simulation_showcase_v2.html Enhanced simulation framework with comprehensive parameter recovery analysis, training progress tracking, and calibration validation
preprocessing/preprocessing_utils.py Standalone preprocessing functions (compute_smoothed_prevalence_at_risk, create_initial_clusters_and_psi, create_reference_trajectories)
preprocessing/SIMPLE_EXAMPLE.py Minimal copy-paste example demonstrating how to use the preprocessing functions

Workflow Overview: 1. Preprocessing: Create smoothed prevalence, initial clusters, and reference trajectories 2. Batch Training: Run run_aladyn_batch_vector_e_censor with E matrix using complete patient history 3. Master Checkpoint: Generate master checkpoint (phi and psi) 4. Prediction: Run run_aladyn_predict_with_master_vector_cenosrE (automatically loads E_enrollment_full.pt) meaning it’s trained with only enrollment data.

See preprocessing/WORKFLOW.md for detailed instructions.


Quick Navigation

βœ… All Completed Analyses

Referee #1: - Selection bias (IPW): notebooks/R1/R1_Q1_Selection_Bias.html - Clinical meaning (FH): notebooks/R1/R1_Q3_Clinical_Meaning.html - ICD vs PheCode aggregation: notebooks/R1/R1_Q3_ICD_vs_PheCode_Comparison.html - Heritability: notebooks/R1/R1_Q7_Heritability.html - AUC comparisons: notebooks/R1/R1_Q9_AUC_Comparisons.html - Age-specific discrimination: notebooks/R1/R1_Q10_Age_Specific.html - Biological plausibility (CHIP): notebooks/R1/R1_Biological_Plausibility_CHIP.html - Clinical utility (dynamic risk): notebooks/R1/R1_Clinical_Utility_Dynamic_Risk_Updating.html - Genetic validation (GWAS): notebooks/R1/R1_Genetic_Validation_GWAS.html - Identifies 10 novel loci for Signature 5 not found in individual trait GWAS - Genetic validation (Gene-based RVAS): notebooks/R1/R1_Genetic_Validation_Gene_Based_RVAS.html - Multi-disease patterns: notebooks/R1/R1_Multi_Disease_Patterns_Competing_Risks.html - Robustness (LOO validation): notebooks/R1/R1_Robustness_LOO_Validation.html

Referee #2: - Temporal leakage: notebooks/R2/R2_Temporal_Leakage.html - Model validity / learning: notebooks/R2/R2_R3_Model_Validity_Learning.html - Washout approaches comparison: notebooks/R2/R2_Washout_Comparisons.html - Delphi Phecode mapping comparison: notebooks/R2/R2_Delphi_Phecode_Mapping.html

Referee #3: - Avoiding reverse causation (washout analysis): notebooks/R3/R3_AvoidingReverseCausation.html - Competing risks: notebooks/R3/R3_Competing_Risks.html - Heterogeneity analysis (main paper method): notebooks/R3/R3_Q8_Heterogeneity_MainPaper_Method.html - Heterogeneity analysis (continued): notebooks/R3/R3_Q8_Heterogeneity_Continued.html - Population stratification: notebooks/R3/R3_Population_Stratification_Ancestry.html - Linear vs Nonlinear mixing: notebooks/R3/R3_Linear_vs_NonLinear_Mixing.html - Cross-cohort similarity: notebooks/R3/R3_Cross_Cohort_Similarity.html - Verify corrected data: notebooks/R3/R3_Verify_Corrected_Data.html

Framework: - Framework overview: notebooks/framework/Discovery_Prediction_Framework_Overview.html

βœ… All complete