基于可解释机器学习的卒中患者康复相关出院预测模型研究
作者:
作者单位:

1首都医科大学附属北京康复医院神经康复中心,北京 100144;2首都医科大学附属北京康复医院中医康复中心,北京 100144

作者简介:

王平(1988―),女,硕士,主治医师,主要从事神经康复的相关研究。

通信作者:

刘爱贤(1969―),男,硕士,主任医师,主要从事神经康复的相关研究。Email:lax@163.com。

基金项目:

首都卫生发展科研专项项目(2022-3-2254)。


Construction of an explainable machine learning-based model for predicting rehabilitation-related discharge in stroke patients
Author:
Affiliation:

1Department of Neurological Rehabilitation, Beijing Rehabilitation Hospital, Capital Medical University, Beijing 100144, China;2Department of Traditional Chinese Medicine Rehabilitation, Beijing Rehabilitation Hospital, Capital Medical University, Beijing 100144, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    目的 基于可解释机器学习(XML)方法构建卒中患者康复相关出院预测模型,识别影响康复出院的关键因素,为康复评估与医疗资源配置提供数据支持。方法 该研究利用MIMIC-IV v3.1数据库,依据ICD-9/10编码筛选卒中患者共14 824例。提取患者人口学信息、入院特征及住院过程等结构化变量,构建康复相关出院预测模型。采用Logistic回归与极端梯度提升算法(XGBoost)进行建模,比较判别效能、校准性能及临床净获益。使用SHapley Additive exPlanations(SHAP)算法评估各特征对预测结果的贡献,并通过部分依赖于个体条件期望曲线进行可解释性分析。结果 在总体样本中,康复相关出院率为36.9%。Logistic与XGBoost模型的曲线下面积(AUC)分别为0.637(95%CI:0.620~0.653)和0.630(95%CI:0.613~0.647),判别效能处于中等水平。两种模型的平均精确率均为0.473,Brier分数分别为0.220和0.223,均表现出较好的校准度。决策曲线分析显示,Logistic模型在30%~40%阈值范围内的净获益最高(P<0.05)。SHAP结果显示,年龄(平均SHAP=0.280)、医保类型(平均SHAP=0.266)及入院途径(平均SHAP=0.237)为主要影响因素。部分依赖分析显示,40岁前后康复相关出院概率最高,随后随年龄增加而下降(P<0.001);急诊入院及转院患者康复相关出院概率高于门诊及自我入院者(P<0.001);卒中类型间差异无统计学意义(P=0.236)。模型在2008年至2019年AUC值稳定维持在0.60~0.70,提示其稳健性良好。高预测风险组患者康复出院率显著高于低预测风险组(P<0.001),且预测概率与住院时间呈弱负相关。结论 基于XML结合SHAP方法构建卒中康复相关出院预测模型,结果显示年龄、医保类型及入院途径是影响康复出院的主要特征变量。模型在真实世界数据中具有良好的校准性与稳定性,为卒中患者康复早期识别与资源优化配置提供了可行的量化工具。

    Abstract:

    Objective To construct an explainable machine learning (XML)-based model for predicting rehabilitation-related discharge in stroke patients, to identify the key influencing factors for rehabilitation-related discharge, and to provide data support for rehabilitation assessment and healthcare resource allocation.Methods Data were extracted from the MIMIC-IV v3.1 database, and a total of 14 824 stroke patients were identified based on the ICD-9/10 codes. Structured variables including demographic data, admission features, and hospitalization data were extracted to construct predictive models for rehabilitation-related discharge. Logistic regression and extreme gradient boosting (XGBoost) algorithms were used to construct models, and these models were compared in terms of discriminatory ability, calibration performance, and net clinical benefit. The SHapley Additive exPlanations (SHAP) algorithm was used to evaluate the contribution of each feature to the results of prediction, and partial dependence and individual conditional expectation plots were used for interpretability analysis.Results In the overall samples, the rate of rehabilitation-related discharge was 36.9%. The logistic and XGBoost models had an area under the curve of 0.637 (95%CI: 0.620-0.653) and 0.630 (95%CI: 0.613-0.647), respectively, indicating a level of moderate discriminatory performance. Both models achieved an average precision of 0.473, with a Brier score of 0.220 and 0.223, respectively, suggesting that both models had good calibration. The decision curve analysis showed that the logistic model provided the greatest net benefit within the threshold range of 30%- 40% (P<0.05). The SHAP analysis showed that age (mean SHAP=0.280), insurance type (mean SHAP=0.266), and admission route (mean SHAP=0.237) were the main influencing factors. The partial dependence analysis showed the highest probability of rehabilitation-related discharge around the age of 40 years, which then decreased with the increase in age (P<0.001); the patients admitted through emergency or transfer had a significantly higher probability of rehabilitation-related discharge than those admitted through outpatient service or self-admissions (P<0.001); there was no significant difference between the patients with different subtypes of stroke (P=0.236). The AUC of the model remained stable (0.60-0.70) across 2008-2019, suggesting that the model had good robustness. The high predicted risk group had a significantly higher rate of rehabilitation-related discharge than the low predicted risk group (P<0.001), and the predicted probability was weakly negatively correlated with the length of hospital stay.Conclusions The predictive model for rehabilitation-related discharge in stroke patients is constructed based on XML and SHAP, and the results show that age, insurance type, and admission route are the main characteristic variables affecting rehabilitation-related discharge. The model shows favorable calibration and robustness in real-world data, providing a feasible quantitative tool for early identification of rehabilitation in stroke patients and optimization of resource allocation.

    图1 研究对象筛选流程Fig.1
    图2 不同人群特征下康复相关出院比例的分布Fig.2
    图8 Logistic模型与XGBoost模型的校准曲线Fig.8
    图9 Logistic模型与XGBoost模型的决策曲线分析Fig.9
    图7 Logistic模型与XGBoost模型的ROC曲线与精确率–召回率曲线Fig.7
    图10 变量重要性排序的SHAP值分析Fig.10
    图11 主要特征的部分依赖关系Fig.11
    图12 年龄对康复相关出院概率的个体条件期望分析Fig.12
    图13 入院途径、卒中类型与出院结局的多维流向分布Fig.13
    图14 模型按风险十分位分组的校准验证Fig.14
    图16 不同风险层级下康复启动延迟与预测概率的关系Fig.16
    图15 模型在不同入院年份的AUC时间序列变化Fig.15
    图17 预测康复相关出院概率与住院时间的二维密度分布Fig.17
    图6 不同变量在缺失与非缺失样本中的分布比较Fig.6
    参考文献
    相似文献
    引证文献
引用本文

王平,刘爱贤,王丹456.基于可解释机器学习的卒中患者康复相关出院预测模型研究[J].国际神经病学神经外科学杂志,2026,(1):20-27111WANG Ping, LIU Aixian, WANG Dan222. Construction of an explainable machine learning-based model for predicting rehabilitation-related discharge in stroke patients[J]. Journal of International Neurology and Neurosurgery,2026,(1):20-27

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-09-13
  • 最后修改日期:2025-12-04
  • 录用日期:
  • 在线发布日期: 2026-03-31
关闭
关闭
关于作者收到不明邮件或短信的再次申明

关闭