Healthcare Readmission Risk Prediction

Predictive triage workflow for chronic care follow-up

Project A · Predictive ML System

Readmission risk scoring for chronic care.

A portfolio-grade demonstration of how I would frame, train, monitor, and operationalize a readmission risk model in a health system where staffing, continuity of care, and follow-up capacity are all constrained.

0.81AUC on validation split

74%Recall at high-risk threshold

68%Precision for flagged discharges

48hFollow-up window for care teams

Problem

Facilities cannot give the same post-discharge attention to every patient. When nurses are stretched and follow-up staff are limited, the most practical question is: which patients are most likely to bounce back within 30 days if nothing changes?

This project packages that question into a reproducible workflow:

prepare discharge-level features from a protected or synthetic EHR-style extract
train a classification model with transparent validation steps
expose patient-level risk scores and feature drivers for review
define operational triggers for follow-up calls, medicine checks, and case management

Data used

This demonstration uses a synthetic sample dataset shaped around chronic care workflows so that no patient-identifiable data is exposed in the repository.

Field	Description
`age`, `sex`, `county`	Basic demographic and service context
`prior_admissions_6m`	Utilization history
`length_of_stay_days`	Admission severity proxy
`medication_gap_days`	Continuity of treatment indicator
`comorbidity_score`	Disease burden summary
`follow_up_scheduled`	Whether a follow-up visit was scheduled before discharge
`transport_barrier_flag`	Practical access constraint
`readmitted_30d`	Target outcome

Pipeline design

Feature preparation

Transform encounter-level records into discharge-ready features with clear definitions for medication gaps, prior utilization, and follow-up readiness.

Model training

Train an XGBoost classifier and compare it to a regularized logistic baseline so the performance gain is justified.

Operational scoring

Push risk scores into a triage view where care teams can sort discharges by urgency, explainers, and outstanding actions.

Monitoring

Track drift, calibration, and alert volume over time so the model stays useful instead of quietly degrading.

Validation snapshot

Metric	Logistic baseline	XGBoost model
ROC AUC	0.74	0.81
Precision	0.59	0.68
Recall	0.65	0.74
F1 score	0.62	0.71

Interpretation: the tree-based model improves recall without flooding the team with too many false positives. That matters because a follow-up workflow has a real staffing cost.

What drives risk

The strongest signals in this demo are:

previous admissions in the last 6 months
long medication gaps before the current admission
high comorbidity burden
no follow-up scheduled before discharge
transport barriers that make return visits difficult

These are useful because they are operationally legible. A nurse or programme manager can look at them and decide what support is actually feasible.

Deployment approach

The repository includes a simple R workflow:

train_model.R for feature prep and model training
predict.R for batch or single-patient inference
app.R for a Shiny triage view
data/sample_readmissions.csv as a safe demonstration dataset

If this moved beyond portfolio scope, I would add:

model versioning and run metadata
scheduled retraining checks
calibration reporting by facility or county
role-based access controls for patient-facing environments

Example intervention playbook

Risk band	Threshold	Suggested action
High	`>= 0.70`	Follow-up call within 48 hours, medication reconciliation, clinician review
Medium	`0.45 - 0.69`	SMS reminder plus nurse check-in at next clinic day
Low	`< 0.45`	Standard discharge counseling and routine follow-up

Why this project matters in the portfolio

It demonstrates the bridge I want the portfolio to make clear: rigorous health data work can evolve naturally into production-minded ML systems, especially when the model is tied to a real workflow and monitored like an operational tool instead of a Kaggle artifact.

Back to Projects →