Healthcare Readmission Risk Prediction
Predictive triage workflow for chronic care follow-up
Readmission risk scoring for chronic care.
A portfolio-grade demonstration of how I would frame, train, monitor, and operationalize a readmission risk model in a health system where staffing, continuity of care, and follow-up capacity are all constrained.
Problem
Facilities cannot give the same post-discharge attention to every patient. When nurses are stretched and follow-up staff are limited, the most practical question is: which patients are most likely to bounce back within 30 days if nothing changes?
This project packages that question into a reproducible workflow:
- prepare discharge-level features from a protected or synthetic EHR-style extract
- train a classification model with transparent validation steps
- expose patient-level risk scores and feature drivers for review
- define operational triggers for follow-up calls, medicine checks, and case management
Data used
This demonstration uses a synthetic sample dataset shaped around chronic care workflows so that no patient-identifiable data is exposed in the repository.
| Field | Description |
|---|---|
age, sex, county |
Basic demographic and service context |
prior_admissions_6m |
Utilization history |
length_of_stay_days |
Admission severity proxy |
medication_gap_days |
Continuity of treatment indicator |
comorbidity_score |
Disease burden summary |
follow_up_scheduled |
Whether a follow-up visit was scheduled before discharge |
transport_barrier_flag |
Practical access constraint |
readmitted_30d |
Target outcome |
Pipeline design
Feature preparation
Transform encounter-level records into discharge-ready features with clear definitions for medication gaps, prior utilization, and follow-up readiness.
Model training
Train an XGBoost classifier and compare it to a regularized logistic baseline so the performance gain is justified.
Operational scoring
Push risk scores into a triage view where care teams can sort discharges by urgency, explainers, and outstanding actions.
Monitoring
Track drift, calibration, and alert volume over time so the model stays useful instead of quietly degrading.
Validation snapshot
| Metric | Logistic baseline | XGBoost model |
|---|---|---|
| ROC AUC | 0.74 | 0.81 |
| Precision | 0.59 | 0.68 |
| Recall | 0.65 | 0.74 |
| F1 score | 0.62 | 0.71 |
Interpretation: the tree-based model improves recall without flooding the team with too many false positives. That matters because a follow-up workflow has a real staffing cost.
What drives risk
The strongest signals in this demo are:
- previous admissions in the last 6 months
- long medication gaps before the current admission
- high comorbidity burden
- no follow-up scheduled before discharge
- transport barriers that make return visits difficult
These are useful because they are operationally legible. A nurse or programme manager can look at them and decide what support is actually feasible.
Deployment approach
The repository includes a simple R workflow:
- train_model.R for feature prep and model training
- predict.R for batch or single-patient inference
- app.R for a Shiny triage view
- data/sample_readmissions.csv as a safe demonstration dataset
If this moved beyond portfolio scope, I would add:
- model versioning and run metadata
- scheduled retraining checks
- calibration reporting by facility or county
- role-based access controls for patient-facing environments
Example intervention playbook
| Risk band | Threshold | Suggested action |
|---|---|---|
| High | >= 0.70 |
Follow-up call within 48 hours, medication reconciliation, clinician review |
| Medium | 0.45 - 0.69 |
SMS reminder plus nurse check-in at next clinic day |
| Low | < 0.45 |
Standard discharge counseling and routine follow-up |
Why this project matters in the portfolio
It demonstrates the bridge I want the portfolio to make clear: rigorous health data work can evolve naturally into production-minded ML systems, especially when the model is tied to a real workflow and monitored like an operational tool instead of a Kaggle artifact.