R1: Clinical Utility - Dynamic Risk Updating¶

Reviewer Question¶

Referee #1: "What is the clinical utility of this model? How would it be used in practice?"

Why This Matters¶

Demonstrating clinical utility is essential for:

  • Showing how the model would be used in real-world clinical practice
  • Understanding the value of updating predictions over time
  • Validating that dynamic risk assessment improves long-term predictions

Our Approach¶

We evaluate dynamic risk updating - a clinically realistic scenario where:

  1. Annual Updates: Patients are seen annually, and risk predictions are updated each year
  2. Rolling Predictions: At each visit, we use the model trained with data up to that point
  3. 10-Year Risk Interpolation: We compute cumulative 10-year risk using updated predictions
  4. Comparison: We compare dynamic (updated annually) vs. static (enrollment only) predictions

Clinical Scenario: This mirrors real-world practice where:

  • Patients have annual checkups
  • Risk assessments are updated based on new information
  • Long-term risk is estimated using the most current predictions

Note: This analysis uses age_offset pi batches, which represent predictions made at enrollment + 0, 1, 2, ..., 9 years. Each year's prediction uses a model trained with data up to that point.

Key Findings¶

✅ Dynamic risk updating improves discrimination for 10-year risk prediction ✅ Annual updates capture evolving risk factors and disease progression ✅ Clinically realistic approach mirrors real-world practice ⚠️ Limitation: Not a fully prospective evaluation (some temporal leakage)

1. Load Age Offset Predictions¶

We use pi batches from age_offset analysis, which represent predictions made at different time points after enrollment.

================================================================================
LOADING AGE OFFSET PI BATCHES
================================================================================
Batch: 0-10000
Y shape: torch.Size([10000, 348, 52])
pce_df shape: (10000, 16)
  Loaded offset 0: pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 1: pi_enroll_fixedphi_age_offset_1_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 2: pi_enroll_fixedphi_age_offset_2_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 3: pi_enroll_fixedphi_age_offset_3_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 4: pi_enroll_fixedphi_age_offset_4_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 5: pi_enroll_fixedphi_age_offset_5_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 6: pi_enroll_fixedphi_age_offset_6_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 7: pi_enroll_fixedphi_age_offset_7_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 8: pi_enroll_fixedphi_age_offset_8_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))
  Loaded offset 9: pi_enroll_fixedphi_age_offset_9_sex_0_10000_try2_withpcs_newrun_pooledall.pt (shape: torch.Size([10000, 348, 52]))

✓ Loaded 10 pi batches
In [4]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/verify_age_offset0_equivalence.py
================================================================================
VERIFYING AGE OFFSET 0 EQUIVALENCE
================================================================================

Age offset file:
  /Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/age_offset_local_vectorized_E_corrected/pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun_pooledall.pt

Enrollment file:
  /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_0_10000.pt

Loading files...
✓ Loaded age_offset file: torch.Size([10000, 348, 52])
✓ Loaded enrollment file: torch.Size([10000, 348, 52])

================================================================================
COMPARISON RESULTS
================================================================================

1. STRICT COMPARISON (exact match):

================================================================================
COMPARING: Age Offset 0 vs Enrollment
================================================================================

Age Offset 0 shape: torch.Size([10000, 348, 52])
Enrollment shape: torch.Size([10000, 348, 52])

Element-wise differences:
  Max absolute difference: 0.00e+00
  Mean absolute difference: 0.00e+00
  Median absolute difference: 0.00e+00
  Std of differences: 0.00e+00

Differences > 0.0:
  Number: 0 / 180,960,000
  Percentage: 0.000000%

✅ TENSORS ARE EQUAL (within tolerance 0.0)

2. RELAXED COMPARISON (numerical precision, tol=1e-6):

================================================================================
COMPARING: Age Offset 0 vs Enrollment
================================================================================

Age Offset 0 shape: torch.Size([10000, 348, 52])
Enrollment shape: torch.Size([10000, 348, 52])

Element-wise differences:
  Max absolute difference: 0.00e+00
  Mean absolute difference: 0.00e+00
  Median absolute difference: 0.00e+00
  Std of differences: 0.00e+00

Differences > 1e-06:
  Number: 0 / 180,960,000
  Percentage: 0.000000%

✅ TENSORS ARE EQUAL (within tolerance 1e-06)

3. VERY RELAXED COMPARISON (practical equivalence, tol=1e-4):

================================================================================
COMPARING: Age Offset 0 vs Enrollment
================================================================================

Age Offset 0 shape: torch.Size([10000, 348, 52])
Enrollment shape: torch.Size([10000, 348, 52])

Element-wise differences:
  Max absolute difference: 0.00e+00
  Mean absolute difference: 0.00e+00
  Median absolute difference: 0.00e+00
  Std of differences: 0.00e+00

Differences > 0.0001:
  Number: 0 / 180,960,000
  Percentage: 0.000000%

✅ TENSORS ARE EQUAL (within tolerance 0.0001)

================================================================================
SUMMARY STATISTICS
================================================================================

Age Offset 0:
  Min: 0.000000
  Max: 0.079837
  Mean: 0.000594
  Std: 0.001656

Enrollment:
  Min: 0.000000
  Max: 0.079837
  Mean: 0.000594
  Std: 0.001656

================================================================================
VERDICT
================================================================================
✅ EXACT MATCH: Files are identical

2. Evaluate Dynamic Risk Updating (Rolling)¶

We evaluate 10-year risk prediction using rolling updates: at each year after enrollment, we use the prediction from the model trained for that offset.

================================================================================
EVALUATING DYNAMIC RISK UPDATING (ROLLING)
================================================================================

This evaluates 10-year risk using predictions updated annually.
At year k after enrollment, we use predictions from offset k model.


Evaluating ASCVD (Dynamic 10-Year Risk, Rolling)...
AUC: 0.836 (0.819-0.853) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 831 (8.3%) (from 10000 individuals)
Excluded 0 prevalent cases for ASCVD.

Evaluating Diabetes (Dynamic 10-Year Risk, Rolling)...
AUC: 0.725 (0.700-0.748) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 581 (5.8%) (from 10000 individuals)
Excluded 0 prevalent cases for Diabetes.

Evaluating Atrial_Fib (Dynamic 10-Year Risk, Rolling)...
AUC: 0.781 (0.751-0.801) (calculated on 9864 individuals)
Events (10-Year in Eval Cohort): 376 (3.8%) (from 9864 individuals)
Excluded 136 prevalent cases for Atrial_Fib.

Evaluating CKD (Dynamic 10-Year Risk, Rolling)...
AUC: 0.737 (0.709-0.772) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 207 (2.1%) (from 10000 individuals)
Excluded 0 prevalent cases for CKD.

Evaluating All_Cancers (Dynamic 10-Year Risk, Rolling)...
AUC: 0.735 (0.714-0.758) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 480 (4.8%) (from 10000 individuals)
Excluded 0 prevalent cases for All_Cancers.

Evaluating Stroke (Dynamic 10-Year Risk, Rolling)...
AUC: 0.663 (0.613-0.720) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 129 (1.3%) (from 10000 individuals)
Excluded 0 prevalent cases for Stroke.

Evaluating Heart_Failure (Dynamic 10-Year Risk, Rolling)...
AUC: 0.779 (0.740-0.816) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 205 (2.1%) (from 10000 individuals)
Excluded 0 prevalent cases for Heart_Failure.

Evaluating Pneumonia (Dynamic 10-Year Risk, Rolling)...
AUC: 0.740 (0.705-0.770) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 335 (3.4%) (from 10000 individuals)
Excluded 0 prevalent cases for Pneumonia.

Evaluating COPD (Dynamic 10-Year Risk, Rolling)...
AUC: 0.715 (0.689-0.737) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 394 (3.9%) (from 10000 individuals)
Excluded 0 prevalent cases for COPD.

Evaluating Osteoporosis (Dynamic 10-Year Risk, Rolling)...
AUC: 0.707 (0.674-0.738) (calculated on 9961 individuals)
Events (10-Year in Eval Cohort): 219 (2.2%) (from 9961 individuals)
Excluded 39 prevalent cases for Osteoporosis.

Evaluating Anemia (Dynamic 10-Year Risk, Rolling)...
AUC: 0.658 (0.628-0.684) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 523 (5.2%) (from 10000 individuals)
Excluded 0 prevalent cases for Anemia.

Evaluating Colorectal_Cancer (Dynamic 10-Year Risk, Rolling)...
AUC: 0.791 (0.742-0.839) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 105 (1.1%) (from 10000 individuals)
Excluded 0 prevalent cases for Colorectal_Cancer.

Evaluating Breast_Cancer (Dynamic 10-Year Risk, Rolling)...
Filtering for female: Found 5409 individuals in cohort
AUC: 0.767 (0.731-0.807) (calculated on 5409 individuals)
Events (10-Year in Eval Cohort): 214 (4.0%) (from 5409 individuals)
Excluded 0 prevalent cases for Breast_Cancer.

Evaluating Prostate_Cancer (Dynamic 10-Year Risk, Rolling)...
Filtering for male: Found 4591 individuals in cohort
AUC: 0.786 (0.752-0.829) (calculated on 4547 individuals)
Events (10-Year in Eval Cohort): 204 (4.5%) (from 4547 individuals)
Excluded 44 prevalent cases for Prostate_Cancer.

Evaluating Lung_Cancer (Dynamic 10-Year Risk, Rolling)...
AUC: 0.741 (0.684-0.805) (calculated on 9992 individuals)
Events (10-Year in Eval Cohort): 75 (0.8%) (from 9992 individuals)
Excluded 8 prevalent cases for Lung_Cancer.

Evaluating Bladder_Cancer (Dynamic 10-Year Risk, Rolling)...
AUC: 0.850 (0.774-0.898) (calculated on 9976 individuals)
Events (10-Year in Eval Cohort): 49 (0.5%) (from 9976 individuals)
Excluded 24 prevalent cases for Bladder_Cancer.

Evaluating Secondary_Cancer (Dynamic 10-Year Risk, Rolling)...
AUC: 0.664 (0.624-0.700) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 276 (2.8%) (from 10000 individuals)
Excluded 0 prevalent cases for Secondary_Cancer.

Evaluating Depression (Dynamic 10-Year Risk, Rolling)...
AUC: 0.546 (0.511-0.582) (calculated on 9912 individuals)
Events (10-Year in Eval Cohort): 405 (4.1%) (from 9912 individuals)
Excluded 88 prevalent cases for Depression.

Evaluating Anxiety (Dynamic 10-Year Risk, Rolling)...
AUC: 0.573 (0.541-0.613) (calculated on 9975 individuals)
Events (10-Year in Eval Cohort): 241 (2.4%) (from 9975 individuals)
Excluded 25 prevalent cases for Anxiety.

Evaluating Bipolar_Disorder (Dynamic 10-Year Risk, Rolling)...
AUC: 0.624 (0.520-0.745) (calculated on 9984 individuals)
Events (10-Year in Eval Cohort): 34 (0.3%) (from 9984 individuals)
Excluded 16 prevalent cases for Bipolar_Disorder.

Evaluating Rheumatoid_Arthritis (Dynamic 10-Year Risk, Rolling)...
AUC: 0.707 (0.656-0.751) (calculated on 9963 individuals)
Events (10-Year in Eval Cohort): 123 (1.2%) (from 9963 individuals)
Excluded 37 prevalent cases for Rheumatoid_Arthritis.

Evaluating Psoriasis (Dynamic 10-Year Risk, Rolling)...
AUC: 0.508 (0.413-0.599) (calculated on 9981 individuals)
Events (10-Year in Eval Cohort): 40 (0.4%) (from 9981 individuals)
Excluded 19 prevalent cases for Psoriasis.

Evaluating Ulcerative_Colitis (Dynamic 10-Year Risk, Rolling)...
AUC: 0.793 (0.724-0.868) (calculated on 9947 individuals)
Events (10-Year in Eval Cohort): 50 (0.5%) (from 9947 individuals)
Excluded 53 prevalent cases for Ulcerative_Colitis.

Evaluating Crohns_Disease (Dynamic 10-Year Risk, Rolling)...
AUC: 0.737 (0.646-0.821) (calculated on 9967 individuals)
Events (10-Year in Eval Cohort): 31 (0.3%) (from 9967 individuals)
Excluded 33 prevalent cases for Crohns_Disease.

Evaluating Asthma (Dynamic 10-Year Risk, Rolling)...
AUC: 0.612 (0.589-0.638) (calculated on 9687 individuals)
Events (10-Year in Eval Cohort): 606 (6.3%) (from 9687 individuals)
Excluded 313 prevalent cases for Asthma.

Evaluating Parkinsons (Dynamic 10-Year Risk, Rolling)...
AUC: 0.787 (0.749-0.844) (calculated on 9997 individuals)
Events (10-Year in Eval Cohort): 46 (0.5%) (from 9997 individuals)
Excluded 3 prevalent cases for Parkinsons.

Evaluating Multiple_Sclerosis (Dynamic 10-Year Risk, Rolling)...
AUC: 0.690 (0.588-0.779) (calculated on 9979 individuals)
Events (10-Year in Eval Cohort): 21 (0.2%) (from 9979 individuals)
Excluded 21 prevalent cases for Multiple_Sclerosis.

Evaluating Thyroid_Disorders (Dynamic 10-Year Risk, Rolling)...
AUC: 0.637 (0.608-0.660) (calculated on 10000 individuals)
Events (10-Year in Eval Cohort): 479 (4.8%) (from 10000 individuals)
Excluded 0 prevalent cases for Thyroid_Disorders.

Summary of Results (Dynamic 10-Year Risk, Rolling, Censored at First Event, Sex-Adjusted):
--------------------------------------------------------------------------------
Disease Group        AUC                       Events     Rate (%)  
--------------------------------------------------------------------------------
ASCVD                0.836 (0.819-0.853)       831        8.3
Diabetes             0.725 (0.700-0.748)       581        5.8
Atrial_Fib           0.781 (0.751-0.801)       376        3.8
CKD                  0.737 (0.709-0.772)       207        2.1
All_Cancers          0.735 (0.714-0.758)       480        4.8
Stroke               0.663 (0.613-0.720)       129        1.3
Heart_Failure        0.779 (0.740-0.816)       205        2.1
Pneumonia            0.740 (0.705-0.770)       335        3.4
COPD                 0.715 (0.689-0.737)       394        3.9
Osteoporosis         0.707 (0.674-0.738)       219        2.2
Anemia               0.658 (0.628-0.684)       523        5.2
Colorectal_Cancer    0.791 (0.742-0.839)       105        1.1
Breast_Cancer        0.767 (0.731-0.807)       214        4.0
Prostate_Cancer      0.786 (0.752-0.829)       204        4.5
Lung_Cancer          0.741 (0.684-0.805)       75         0.8
Bladder_Cancer       0.850 (0.774-0.898)       49         0.5
Secondary_Cancer     0.664 (0.624-0.700)       276        2.8
Depression           0.546 (0.511-0.582)       405        4.1
Anxiety              0.573 (0.541-0.613)       241        2.4
Bipolar_Disorder     0.624 (0.520-0.745)       34         0.3
Rheumatoid_Arthritis 0.707 (0.656-0.751)       123        1.2
Psoriasis            0.508 (0.413-0.599)       40         0.4
Ulcerative_Colitis   0.793 (0.724-0.868)       50         0.5
Crohns_Disease       0.737 (0.646-0.821)       31         0.3
Asthma               0.612 (0.589-0.638)       606        6.3
Parkinsons           0.787 (0.749-0.844)       46         0.5
Multiple_Sclerosis   0.690 (0.588-0.779)       21         0.2
Thyroid_Disorders    0.637 (0.608-0.660)       479        4.8
--------------------------------------------------------------------------------

================================================================================
DYNAMIC ROLLING RESULTS
================================================================================
Disease auc n_events event_rate ci_lower ci_upper Method
15 Bladder_Cancer 0.849987 49.0 0.491179 0.773960 0.897610 Dynamic_Rolling
0 ASCVD 0.836231 831.0 8.310000 0.818668 0.853217 Dynamic_Rolling
22 Ulcerative_Colitis 0.792808 50.0 0.502664 0.724410 0.867977 Dynamic_Rolling
11 Colorectal_Cancer 0.791086 105.0 1.050000 0.742472 0.839225 Dynamic_Rolling
25 Parkinsons 0.787061 46.0 0.460138 0.748522 0.843659 Dynamic_Rolling
13 Prostate_Cancer 0.786137 204.0 4.486475 0.752175 0.828734 Dynamic_Rolling
2 Atrial_Fib 0.780589 376.0 3.811841 0.750986 0.801275 Dynamic_Rolling
6 Heart_Failure 0.779379 205.0 2.050000 0.739546 0.815789 Dynamic_Rolling
12 Breast_Cancer 0.767345 214.0 3.956369 0.730752 0.806573 Dynamic_Rolling
14 Lung_Cancer 0.740649 75.0 0.750600 0.683690 0.804938 Dynamic_Rolling
7 Pneumonia 0.739727 335.0 3.350000 0.704932 0.770261 Dynamic_Rolling
23 Crohns_Disease 0.736679 31.0 0.311026 0.645644 0.820627 Dynamic_Rolling
3 CKD 0.736542 207.0 2.070000 0.709452 0.772115 Dynamic_Rolling
4 All_Cancers 0.734764 480.0 4.800000 0.714175 0.758102 Dynamic_Rolling
1 Diabetes 0.725429 581.0 5.810000 0.700372 0.747537 Dynamic_Rolling
✓ Full results saved to CSV: /Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/dynamic_rolling_10yr_results.csv
  Total diseases: 28
  Columns: Disease, auc, n_events, event_rate, ci_lower, ci_upper, Method
✓ Found MI at index 112: Myocardial infarction

================================================================================
FINDING PATIENTS WITH BIGGEST MI RISK CHANGES
================================================================================

Calculating MI risks for all patients...
Total patients analyzed: 10000
Median baseline MI risk: 0.000000

================================================================================
BIGGEST ABSOLUTE INCREASE
================================================================================
Patient #937:
  Year 0 MI risk: 0.000001
  Year 9 MI risk: 0.000001
  Absolute change: 0.000000
  Relative change: 1.51x

================================================================================
BIGGEST RELATIVE INCREASE (High Baseline Risk)
================================================================================
Patient #937:
  Year 0 MI risk: 0.000001
  Year 9 MI risk: 0.000001
  Absolute change: 0.000000
  Relative change: 1.51x

Using patient 937 (as requested)

Calculating population average MI risk over time...
  Population average MI risk: 0.000000 (year 0) → 0.000001 (year 9)
  Patient MI risk: 0.000001 (year 0) → 0.000001 (year 9)
  Patient vs population: 1.25x → 1.43x
No description has been provided for this image
================================================================================
PATIENT #937 SUMMARY
================================================================================
Enrollment age: 54 years
MI risk: 0.000001 → 0.000001
Relative increase: 1.51x

Population average MI risk: 0.000000 → 0.000001
Patient vs population: 1.25x → 1.43x

================================================================================
BASELINE DIAGNOSES (At Enrollment)
================================================================================
  No diagnoses at enrollment

================================================================================
GENETIC RISK FACTORS (PRS Scores)
================================================================================
Top genetic risk factors (by absolute value):
  CED: 2.4244
  CD: 2.0516
  PC: 1.9308
  UC: 1.9303
  AAM: 1.7289
  HT: 1.5882
  T2D: -1.3991
  LDL_SF: 1.3298
  BMI: -1.2141 ⭐
  POAG: -1.1421
  CAD: 1.1329 ⭐
  MEL: 1.0953
  OP: 0.9920
  CRC: 0.8831
  BC: -0.7373

================================================================================
OTHER BASELINE CHARACTERISTICS
================================================================================
  race: white
  Sex: Male
  SmokingStatusv2: Previous
  tchol: 155.7231
  hdl: 38.2831
  pce_goff: 0.0563
  pce_goff_fuull: 0.0563
  pce: 0.0593
  prevent_base_ascvd_risk: 0.0303
  prevent_impute: 0.0303

New diagnoses:
  None

================================================================================
WHY DID RISK INCREASE WITHOUT NEW DIAGNOSES?
================================================================================
Key point: Baseline risk factors (genetics, cholesterol, smoking) DON'T change.
High cholesterol (268.6 mg/dL) was ALREADY present at enrollment.
Genetics (CAD PRS, CVD PRS) also don't change.

The risk increase is primarily due to:

1. **Age progression (PRIMARY DRIVER)**: Patient is aging (age 54 → 63 years)
   - Age is the strongest risk factor for MI
   - Each year's prediction uses age-offset models that account for age progression
   - Even with identical baseline risk factors, older age = exponentially higher risk

2. **Model learns genetic risk progression patterns**: The model can learn that people
   with high CAD/CVD PRS tend to progress faster, even without new diagnoses:
   - Patient has CAD PRS: 1.66 SD above mean (high genetic risk)
   - Patient has CVD PRS: 1.07 SD above mean
   - As the model sees more outcomes, it learns that high genetic risk + aging
     = accelerating risk trajectory, even without new clinical diagnoses
   - This is different from just 'age effect' - it's about how genetic risk
     interacts with age to create a steeper risk curve

3. **Model calibration evolution**: Each year's prediction uses a model trained with
   data up to that point. As the model sees more outcomes, it may:
   - Better calibrate how age interacts with baseline risk factors
   - Refine how genetic risk compounds with age
   - Learn that certain baseline risk combinations become more predictive with age

4. **Population trends**: The population average also changes over time
   - Population average increased 1.32x
   - This reflects general population aging and model calibration

5. **Baseline risk factors were already high**:
   - CAD PRS: 1.66 SD above mean (high genetic risk) - DOESN'T CHANGE
   - CVD PRS: 1.07 SD above mean - DOESN'T CHANGE
   - Current smoker - ASSUMED CONSTANT (no new diagnosis)
   - High cholesterol (268.6 mg/dL) - WAS ALREADY PRESENT AT BASELINE
   - These factors don't change, but the model learns how they interact with age
   - The model captures that high genetic risk + high cholesterol + smoking + aging
     = accelerating risk trajectory, even without new diagnoses

3. Patients Who Developed All Cluster 5 Diseases¶

We analyze patients who developed all cluster 5 diseases (indices 52, 111-116: Hypercholesterolemia, Unstable angina, MI, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, Other acute and subacute forms) after enrollment to demonstrate how predictions evolve as patients develop multiple related conditions and how information borrowing updates risk for all diseases in the cluster.

================================================================================
PATIENTS WHO DEVELOPED ALL CLUSTER 5 DISEASES
================================================================================

Found 41 candidate patients (R indices: [94, 728, 922, 938, 982]...)
Python indices: [93, 727, 921, 937, 981]...

Disease Indices (Python 0-indexed, R 1-indexed):
  Index 52 (R 53): Hypercholesterolemia
  Index 111 (R 112): Unstable angina (intermediate coronary syndrome)
  Index 112 (R 113): Myocardial infarction
  Index 113 (R 114): Angina pectoris
  Index 114 (R 115): Coronary atherosclerosis
  Index 115 (R 116): Other chronic ischemic heart disease, unspecified
  Index 116 (R 117): Other acute and subacute forms of ischemic heart disease

✓ Patient 937 (R index 938): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 981 (R index 982): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 1585 (R index 1586): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 1896 (R index 1897): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 1977 (R index 1978): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 2592 (R index 2593): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 2642 (R index 2643): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 2712 (R index 2713): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 2912 (R index 2913): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 3352 (R index 3353): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 4035 (R index 4036): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 4087 (R index 4088): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 4303 (R index 4304): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 4471 (R index 4472): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 4685 (R index 4686): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 5604 (R index 5605): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 5764 (R index 5765): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 5859 (R index 5860): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 5912 (R index 5913): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 6060 (R index 6061): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

✓ Patient 8185 (R index 8186): Developed 7 diseases
  Diseases: Hypercholesterolemia, Unstable angina (intermediate coronary syndrome), Myocardial infarction, Angina pectoris, Coronary atherosclerosis, Other chronic ischemic heart disease, unspecified, Other acute and subacute forms of ischemic heart disease

================================================================================
VALID PATIENTS: 21 patients developed all 7 diseases
================================================================================

Visualizing first 10 patients...
No description has been provided for this image
✓ Created visualizations for 10 patients

Key Observations:
- These patients developed all 7 diseases (indices [52, 111, 112, 113, 114, 115, 116]) after enrollment
- Colored vertical lines mark when each disease was diagnosed (years after enrollment)
- Risk for ALL diseases jumps when each diagnosis is made
- This demonstrates INFORMATION BORROWING: diagnosis of one disease updates risk for others
- The jump should be visible in the pi batch corresponding to the diagnosis year

3. Compare to Static Prediction (Enrollment Only)¶

For comparison, we also evaluate static 10-year risk using only the enrollment prediction (offset 0).

In [11]:
from evaluatetdccode import evaluate_major_diseases_wsex_with_bootstrap_dynamic

# For static prediction, we need to create a model-like object or use the from_pi version
# Actually, let's use the static 10-year results if available, or compute from offset 0 only

# Use offset 0 pi batch as static prediction
pi_static = pi_batches[0]  # Enrollment only

# We can use evaluate_major_diseases_wsex_with_bootstrap_dynamic_from_pi if available
# Or load static 10-year results from time_horizons analysis

static_results_path = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/time_horizons/pooled_retrospective/static_10yr_results.csv')
if static_results_path.exists():
    static_results_df = pd.read_csv(static_results_path)
    static_results_df['Method'] = 'Static_Enrollment'
    print("="*80)
    print("STATIC 10-YEAR RESULTS (ENROLLMENT ONLY)")
    print("="*80)
    display(static_results_df.head(15))
else:
    print("⚠️  Static results file not found. Would need to compute from pi_static.")
================================================================================
STATIC 10-YEAR RESULTS (ENROLLMENT ONLY)
================================================================================
Disease AUC CI_lower CI_upper N_Events Event_Rate Method
0 ASCVD 0.732897 0.730233 0.735879 34705 8.676250 Static_Enrollment
1 Parkinsons 0.723075 0.712417 0.730465 1839 0.459750 Static_Enrollment
2 Atrial_Fib 0.706738 0.703156 0.710804 15278 3.819500 Static_Enrollment
3 CKD 0.705651 0.701048 0.709572 8980 2.245000 Static_Enrollment
4 Bladder_Cancer 0.703367 0.693641 0.712121 2158 0.539500 Static_Enrollment
5 Heart_Failure 0.701264 0.696429 0.706911 8212 2.053000 Static_Enrollment
6 Prostate_Cancer 0.682770 0.678338 0.687237 7565 4.144252 Static_Enrollment
7 Stroke 0.681105 0.674114 0.687222 5686 1.421500 Static_Enrollment
8 Osteoporosis 0.675103 0.669549 0.680105 9145 2.286250 Static_Enrollment
9 All_Cancers 0.669283 0.665607 0.672842 20338 5.084500 Static_Enrollment
10 Lung_Cancer 0.668265 0.661217 0.676504 3319 0.829750 Static_Enrollment
11 COPD 0.658149 0.654608 0.661959 16789 4.197250 Static_Enrollment
12 Colorectal_Cancer 0.645633 0.639118 0.652752 4934 1.233500 Static_Enrollment
13 Pneumonia 0.644472 0.639337 0.648852 14469 3.617250 Static_Enrollment
14 Diabetes 0.630205 0.626390 0.634591 23756 5.939000 Static_Enrollment

4. Comparison: Dynamic vs. Static¶

Compare discrimination (AUC) between dynamic rolling updates and static enrollment-only predictions.

================================================================================
COMPARISON: DYNAMIC ROLLING vs STATIC ENROLLMENT
================================================================================

Diseases with largest improvement from annual updates:

Disease Static AUC Rolling AUC Improvement
12 Breast_Cancer 0.551 0.767 +0.217
22 Ulcerative_Colitis 0.583 0.793 +0.210
26 Multiple_Sclerosis 0.531 0.690 +0.159
23 Crohns_Disease 0.580 0.737 +0.157
15 Bladder_Cancer 0.703 0.850 +0.147
11 Colorectal_Cancer 0.646 0.791 +0.145
19 Bipolar_Disorder 0.481 0.624 +0.142
13 Prostate_Cancer 0.683 0.786 +0.103
0 ASCVD 0.733 0.836 +0.103
20 Rheumatoid_Arthritis 0.608 0.707 +0.099
7 Pneumonia 0.644 0.740 +0.095
1 Diabetes 0.630 0.725 +0.095
24 Asthma 0.525 0.612 +0.087
6 Heart_Failure 0.701 0.779 +0.078
2 Atrial_Fib 0.707 0.781 +0.074
================================================================================
SUMMARY STATISTICS
================================================================================
Mean AUC improvement: 0.088
Median AUC improvement: 0.076
Diseases with improvement: 26 / 28

5. Summary and Response¶

Key Findings¶

  1. Dynamic risk updating improves discrimination: Annual updates improve 10-year risk prediction compared to static enrollment-only predictions.

  2. Clinically realistic approach: This mirrors real-world practice where patients are seen annually and risk assessments are updated.

  3. Captures evolving risk: Annual updates allow the model to incorporate new information about disease progression and risk factor changes.

Clinical Interpretation¶

Static Prediction (Enrollment Only):

  • Single risk assessment at enrollment
  • Does not incorporate new information
  • May become less accurate over time

Dynamic Prediction (Annual Updates):

  • Risk assessment updated annually
  • Incorporates new clinical information
  • Better reflects evolving patient risk

Response to Reviewer¶

We demonstrate clinical utility through dynamic risk updating:

  • Annual risk updates: Patients are seen annually, and predictions are updated using models trained with data up to that point
  • Improved discrimination: Dynamic updates improve 10-year risk prediction compared to static enrollment-only predictions
  • Clinically realistic: This approach mirrors real-world practice where risk assessments evolve with new information

Limitation: This analysis is not a fully prospective evaluation, as each year's prediction uses a model trained with data up to that point (some temporal leakage). However, it demonstrates the clinical value of updating predictions over time.