Healthcare Readmission Risk Prediction

Predictive triage workflow for chronic care follow-up

Project A · Predictive ML System

Readmission risk scoring for chronic care.

A portfolio-grade demonstration of how I would frame, train, monitor, and operationalize a readmission risk model in a health system where staffing, continuity of care, and follow-up capacity are all constrained.

0.81AUC on validation split
74%Recall at high-risk threshold
68%Precision for flagged discharges
48hFollow-up window for care teams

Problem

Facilities cannot give the same post-discharge attention to every patient. When nurses are stretched and follow-up staff are limited, the most practical question is: which patients are most likely to bounce back within 30 days if nothing changes?

This project packages that question into a reproducible workflow:

  • prepare discharge-level features from a protected or synthetic EHR-style extract
  • train a classification model with transparent validation steps
  • expose patient-level risk scores and feature drivers for review
  • define operational triggers for follow-up calls, medicine checks, and case management

Data used

This demonstration uses a synthetic sample dataset shaped around chronic care workflows so that no patient-identifiable data is exposed in the repository.

Field Description
age, sex, county Basic demographic and service context
prior_admissions_6m Utilization history
length_of_stay_days Admission severity proxy
medication_gap_days Continuity of treatment indicator
comorbidity_score Disease burden summary
follow_up_scheduled Whether a follow-up visit was scheduled before discharge
transport_barrier_flag Practical access constraint
readmitted_30d Target outcome

Pipeline design

1

Feature preparation

Transform encounter-level records into discharge-ready features with clear definitions for medication gaps, prior utilization, and follow-up readiness.

2

Model training

Train an XGBoost classifier and compare it to a regularized logistic baseline so the performance gain is justified.

3

Operational scoring

Push risk scores into a triage view where care teams can sort discharges by urgency, explainers, and outstanding actions.

4

Monitoring

Track drift, calibration, and alert volume over time so the model stays useful instead of quietly degrading.

Validation snapshot

Metric Logistic baseline XGBoost model
ROC AUC 0.74 0.81
Precision 0.59 0.68
Recall 0.65 0.74
F1 score 0.62 0.71

Interpretation: the tree-based model improves recall without flooding the team with too many false positives. That matters because a follow-up workflow has a real staffing cost.

What drives risk

The strongest signals in this demo are:

  • previous admissions in the last 6 months
  • long medication gaps before the current admission
  • high comorbidity burden
  • no follow-up scheduled before discharge
  • transport barriers that make return visits difficult

These are useful because they are operationally legible. A nurse or programme manager can look at them and decide what support is actually feasible.

Deployment approach

The repository includes a simple R workflow:

If this moved beyond portfolio scope, I would add:

  • model versioning and run metadata
  • scheduled retraining checks
  • calibration reporting by facility or county
  • role-based access controls for patient-facing environments

Example intervention playbook

Risk band Threshold Suggested action
High >= 0.70 Follow-up call within 48 hours, medication reconciliation, clinician review
Medium 0.45 - 0.69 SMS reminder plus nurse check-in at next clinic day
Low < 0.45 Standard discharge counseling and routine follow-up

Why this project matters in the portfolio

It demonstrates the bridge I want the portfolio to make clear: rigorous health data work can evolve naturally into production-minded ML systems, especially when the model is tied to a real workflow and monitored like an operational tool instead of a Kaggle artifact.

Back to Projects →