案例说明(模拟数据)
目前SAS、STATA、R等软件都可以进行限制性立方样条分析。基于画图的方便,我们以R语言为例进行说明。首先参照rms包,生成一个模拟数据集,包括性别(sex),年龄(age)以及生存时间(time)和结局变量(death)。 若想分析年龄和生存率之间关系,传统的方法可以在Cox回归中将年龄作为连续变量处理,也可以对年龄进行分组,这样的做法都无法更直观的呈现年龄与死亡风险之间的关联。以下我们 用限制性立方样条来分析年龄与死亡风险之间的关系 : #####加载所需要的包 library(ggplot2) library(rms) #####立方样条所需要的包 ###参照rms包,先生成一个模拟数据集,包括性别(sex),年龄(age)以及生存时间(time)和结局变量(death); n set.seed(731) age label(age) sex cens h time label(time) death time units(time) data #######开始正式画图 dd options(datadist='dd') #为后续程序设定数据环境 ####拟合cox回归模型,注意这里的R命令是“cph”,而不是常见的生存分析中用到的“coxph"命令 fit dd$limits$age[2] fit=update(fit) HR P1 P1 ######自己画图 P2 geom_ribbon(data=HR, aes(age,ymin = lower, ymax = upper),alpha = 0.1,fill="red") ######进一步设置图形 P2 labs(title = "RCS", x="age", y="HR (95%CI)") P2 ####如果想看是否存在非线性关系,可以使用anova() anova(fit) 可以看到age整体是有意义的(包括线性或者非线性关联),然后看P-Nonlinear =0.0168<0.05,这里我们可以说年龄与死亡风险之间存在非线性关联。 如果自变量与关注的结局变量存在非线性关系,如何在文章中对结果更详细的描述呢,建议大家可以参照上文中提到的Lancet的文章。 个人还发现BMJ的一篇文章 《Predicted lean body mass, fat mass, and all cause and cause specific mortality in men: prospective US cohort study》 对于非线性关系描述的非常好,摘抄一部分放在这里供大家参考: In figure 1, we used restricted cubic splines to flexibly model and visualize the relation of predicted fat mass and lean body mass with all cause mortality in men. The risk of all cause mortality was relatively flat until around 21 kg of predicted fat mass and then started to increase rapidly afterwards (P for non-linearity <0.001). The average BMI for men with 21 kg of predicted fat mass was 25. Above 21 kg, the hazard ratio per standard deviation higher predicted fat mass was 1.22 (1.18 to 1.26). Regarding the strong U shaped relation between predicted lean body mass and all cause mortality, the plot showed a substantial reduction of the risk within the lower range of predicted lean body mass, which reached the lowest risk around 56 kg and then increased thereafter (P for non-linearity <0.001). Below 56 kg, the hazard ratio per standard deviation higher predicted lean body mass was 0.87 (0.82 to 0.92). 那么在方法部分如何描述使用了限制性立方样条,还是可以参照这篇文章: We also used restricted cubic splines with five knots at the 5th, 35th, 50th, 65th, and 95th centiles to flexibly model the association of lean body mass, fat mass, and BMI with mortality. 以上就是对限制性立方样条的简单介绍,原理和操作都比较简单。但加入到分析中,能更直观的描述感兴趣的自变量和因变量之间的关系,发现更有趣的点,可以为文章增色不少。对R不熟悉也不要紧,在sas和stata中也可以实现,感兴趣的同学可以去尝试。 [1] Bhaskaran K, Dos-Santos-Silva I, Leon DA, Douglas IJ, Smeeth L. Association of BMI with overall and cause-specific mortality: a population-based cohort study of 3·6 million adults in the UK. Lancet Diabetes Endocrinol. 2018;6(12):944–953. doi:10.1016/S2213-8587(18)30288-2. [2] Lee DH, Keum N, Hu FB, et al. Predicted lean body mass, fat mass, and all cause and cause specific mortality in men: prospective US cohort study. BMJ. 2018;362:k2575. Published 2018 Jul 3. doi:10.1136/bmj.k2575 [3] 罗剑锋, et al. "限制性立方样条在非线性回归中的应用研究%The Application of Restricted Cubic Spline in Nonlinear Regression." 中国卫生统计 027.003(2010):229-232.更多阅读
1. R课程第18期:多项式回归、分段回归、限制性立方样条...
2. 建立非线性回归预测模型,来看R教程!
3. 如何理解线性回归的方差齐性检验
医咖会微信:medieco-ykh关注医咖会,及时获取最新统计教程!