Washout Analysis Summary (1, 3, 6 Months)¶

This notebook presents washout analyses to address reverse causation concerns, focusing on short-term washout periods (1, 3, 6 months) as requested by reviewers.

Analysis: Short-Term Washout¶

Fixed prediction point: Enrollment (t0)

Method:

  • Uses pre-trained phi/psi (learned from full data)
  • Only refits lambda with censored E matrix
  • Varies the washout period (amount of data blinded before enrollment)

Washout periods tested:

  • 1, 3, 6 months (HESIN-based, using hospital episode data)

Question: "If we exclude events in the washout window before enrollment, how well can we predict future events from enrollment?"

Why this approach:

  • Same prediction timepoint (enrollment) across all washout periods
  • Only the amount of blinded data varies
  • Cleaner interpretation: isolates the effect of excluding reverse causation events
  • Directly addresses reviewer concerns about diagnostic cascade and reverse causation
In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set style
plt.style.use('default')
sns.set_palette("husl")

# Paths
results_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/washout_evaluation')
offset_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/washout_fixed_timepoint/pooled_retrospective')

# Key diseases for presentation
KEY_DISEASES = [
    'ASCVD', 'All_Cancers', 'Atrial_Fib', 'Diabetes', 'CKD', 
    'Heart_Failure', 'Breast_Cancer', 'Colorectal_Cancer', 
    'Lung_Cancer', 'Parkinsons'
]

Washout Results (1, 3, 6 months)¶

HESIN-based washout analysis using hospital episode data to exclude events in the washout window before enrollment.

In [17]:
# Load 1, 3, 6 month washout results
df_short = pd.read_csv(results_dir / 'washout_comparison_1yr_10yr.csv', index_col=0)

# Extract key diseases
df_short_key = df_short.loc[df_short.index.intersection(KEY_DISEASES)]

# Create summary table for 1-year predictions
summary_short = pd.DataFrame({
    'Disease': df_short_key.index,
    'No_Washout': df_short_key['no_washout_1yr_AUC'].values,
    '1_Month': df_short_key['1month_1yr_AUC'].values,
    '3_Month': df_short_key['3month_1yr_AUC'].values,
    '6_Month': df_short_key['6month_1yr_AUC'].values,
})

# Calculate drops
summary_short['Drop_1mo'] = summary_short['No_Washout'] - summary_short['1_Month']
summary_short['Drop_3mo'] = summary_short['No_Washout'] - summary_short['3_Month']
summary_short['Drop_6mo'] = summary_short['No_Washout'] - summary_short['6_Month']

summary_short = summary_short.sort_values('No_Washout', ascending=False)
summary_short
Out[17]:
Disease No_Washout 1_Month 3_Month 6_Month Drop_1mo Drop_3mo Drop_6mo
9 Parkinsons 0.996999 0.996999 0.996999 0.996999 0.000000 0.000000 0.000000
4 CKD 0.874050 0.874000 0.874125 0.874175 0.000050 -0.000075 -0.000125
0 ASCVD 0.870014 0.869998 0.868618 0.861954 0.000017 0.001397 0.008060
5 Colorectal_Cancer 0.868501 0.868529 0.868608 0.868515 -0.000029 -0.000107 -0.000014
7 Heart_Failure 0.835543 0.835500 0.835226 0.835199 0.000043 0.000317 0.000344
2 Atrial_Fib 0.773741 0.770820 0.768815 0.768735 0.002921 0.004926 0.005006
6 Diabetes 0.754432 0.739635 0.732297 0.732268 0.014797 0.022135 0.022164
1 All_Cancers 0.753935 0.752962 0.753392 0.751774 0.000974 0.000544 0.002161
8 Lung_Cancer 0.752895 0.752912 0.752061 0.751894 -0.000017 0.000834 0.001001
3 Breast_Cancer 0.744770 0.741300 0.719557 0.686415 0.003469 0.025213 0.058354
In [ ]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/evaluate_washout_results.py

Summary Statistics¶

Key statistics for short-term washout analysis.

In [18]:
# Calculate summary statistics for short-term washout
summary_stats = pd.DataFrame({
    'Disease': summary_short['Disease'],
    'No_Washout_AUC': summary_short['No_Washout'],
    '1_Month_AUC': summary_short['1_Month'],
    '3_Month_AUC': summary_short['3_Month'],
    '6_Month_AUC': summary_short['6_Month'],
    'Drop_1mo': summary_short['Drop_1mo'],
    'Drop_3mo': summary_short['Drop_3mo'],
    'Drop_6mo': summary_short['Drop_6mo'],
})

# Calculate mean drop across all short-term washout periods
summary_stats['Mean_Drop'] = summary_stats[['Drop_1mo', 'Drop_3mo', 'Drop_6mo']].mean(axis=1)
summary_stats['Max_Drop'] = summary_stats[['Drop_1mo', 'Drop_3mo', 'Drop_6mo']].max(axis=1)

summary_stats = summary_stats.sort_values('No_Washout_AUC', ascending=False)

# Create summary table
summary_table = summary_stats[['Disease', 'No_Washout_AUC', 
                                'Drop_1mo', 'Drop_3mo', 'Drop_6mo', 
                                'Mean_Drop', 'Max_Drop']].round(4)
summary_table.columns = ['Disease', 'No Washout AUC', 
                         'Drop (1mo)', 'Drop (3mo)', 'Drop (6mo)',
                         'Mean Drop', 'Max Drop']
summary_table
Out[18]:
Disease No Washout AUC Drop (1mo) Drop (3mo) Drop (6mo) Mean Drop Max Drop
9 Parkinsons 0.9970 0.0000 0.0000 0.0000 0.0000 0.0000
4 CKD 0.8740 0.0001 -0.0001 -0.0001 -0.0001 0.0001
0 ASCVD 0.8700 0.0000 0.0014 0.0081 0.0032 0.0081
5 Colorectal_Cancer 0.8685 -0.0000 -0.0001 -0.0000 -0.0001 -0.0000
7 Heart_Failure 0.8355 0.0000 0.0003 0.0003 0.0002 0.0003
2 Atrial_Fib 0.7737 0.0029 0.0049 0.0050 0.0043 0.0050
6 Diabetes 0.7544 0.0148 0.0221 0.0222 0.0197 0.0222
1 All_Cancers 0.7539 0.0010 0.0005 0.0022 0.0012 0.0022
8 Lung_Cancer 0.7529 -0.0000 0.0008 0.0010 0.0006 0.0010
3 Breast_Cancer 0.7448 0.0035 0.0252 0.0584 0.0290 0.0584
In [19]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/plot_washout_results.py
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/washout_evaluation/washout_performance_plot.png
✓ Saved summary plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/washout_evaluation/washout_drop_summary.png

================================================================================
WASHOUT IMPACT SUMMARY
================================================================================

1-Year Predictions (No Washout → 6-Month Washout):
  Mean AUC drop: 0.0112
  Median AUC drop: 0.0026
  Max AUC drop: 0.0584 (Breast_Cancer)
  Min AUC drop: -0.0000 (Colorectal_Cancer)

10-Year Predictions (No Washout → 6-Month Washout):
  Mean AUC drop: 0.0010
  Median AUC drop: 0.0001
  Max AUC drop: 0.0066 (Breast_Cancer)
  Min AUC drop: 0.0000 (Stroke)

✓ All plots saved successfully!
No description has been provided for this image
No description has been provided for this image