TB Treatment Adherence
Predicting Treatment Drop-off of Tuberculosis Patients at Scale
Read the paper โThe Problem
Why TB treatment drop-off matters
Tuberculosis remains one of India's deadliest infectious diseases. Effective treatment requires 6โ9 months of continuous medication, but many patients drop off treatment early due to side effects, logistical barriers, or loss of follow-up. Each drop-off risks drug resistance, continued transmission, and preventable death.
India's Nikshay system tracks millions of TB patients, but existing rule-based flagging methods catch only a fraction of those at risk. The challenge: build a predictive model that identifies high-risk patients before they drop off, enabling timely intervention by community health workers.
Approach
Model design
๐ข Similarity Encoding
Categorical features like district, facility type, and drug regimen are encoded using similarity-based methods that capture relationships between categories โ outperforming one-hot encoding on high-cardinality features.
๐ฒ Ensemble Model
An ensemble of XGBoost, LightGBM, and Explainable Boosting Machines (EBM) combines gradient boosting performance with interpretability. The model achieves AUC ROC of 0.80 and Recall@20 of 0.62 โ 3ร the rule-based baseline.
โ๏ธ Fairness Reweighting
Training data is reweighted to equalise performance across underserved and served districts. This raises recall in underserved areas by 1.3ร without sacrificing overall accuracy, ensuring equitable resource allocation.
๐ 0.8M Training Records
The model is trained on 800,000 historical TB case records from India's Nikshay database, covering demographics, treatment history, facility data, and outcome labels.
Interactive
Threshold explorer
Drag the slider to see how the risk threshold affects coverage and recall.
Lower thresholds flag more patients (higher recall) but increase workload. The deployed model uses a threshold balancing coverage with capacity.
Deployment
Deployed across 15 states
The model is live in India's national TB programme, flagging high-risk patients for intervention.
Fairness
Equitable recall across districts
Fairness reweighting raised recall in underserved districts by 1.3ร without reducing overall accuracy.
Impact
Real-world deployment
The model is deployed across 15 Indian states through the national Nikshay platform, where it has flagged over 100,000 high-risk patients for proactive follow-up by community health workers. By identifying patients likely to drop off treatment before they do, the system enables early intervention โ a phone call, a home visit, or a counselling session โ that can keep patients on track and save lives.
The fairness-aware reweighting ensures that patients in underserved districts โ who are most at risk and hardest to reach โ receive equitable attention from the model. The work was awarded Best Paper at the Machine Learning for Health (ML4H) workshop at NeurIPS 2022.
Read the full paper
For technical details on model architecture, similarity encoding, fairness analysis, and deployment, see the published paper.