计算队列研究的相对风险度(Calculating Relative Risks for a Cohort Study)
生物医学统计数据中一种非常常见的数据集是一项队列研究,在这项研究中,您可以了解接触某些治疗或环境的人(例如,服用某种药物的人或吸烟的人)以及是否同样的人有特定的疾病。您的数据集看起来像这样:
Disease | Control | |
Exposed | 156 | 9421 |
Unexposed | 1531 | 14797 |
在R语言中输入:
mymatrix <- matrix(c(156,9421,1531,14797),nrow=2,byrow=TRUE)
colnames(mymatrix) <- c("Disease","Control")
rownames(mymatrix) <- c("Exposed","Unexposed")
print(mymatrix)
# Disease Control
#Exposed 156 9421
#Unexposed 1531 14797
暴露因素对疾病的相对风险是接触治疗或环境因素的人患病的概率除以未接触治疗或环境因素的人患病的概率。您可以使用函数calcRelativeRisk()函数计算在R中暴露疾病的相对风险。为了能够使用此功能,只需复制以下函数代码并将其粘贴到R中:
calcRelativeRisk <- function(mymatrix,alpha=0.05,referencerow=2)
{
numrow <- nrow(mymatrix)
myrownames <- rownames(mymatrix)
for (i in 1:numrow)
{
rowname <- myrownames[i]
DiseaseUnexposed <- mymatrix[referencerow,1]
ControlUnexposed <- mymatrix[referencerow,2]
if (i != referencerow)
{
DiseaseExposed <- mymatrix[i,1]
ControlExposed <- mymatrix[i,2]
totExposed <- DiseaseExposed + ControlExposed
totUnexposed <- DiseaseUnexposed + ControlUnexposed
probDiseaseGivenExposed <- DiseaseExposed/totExposed
probDiseaseGivenUnexposed <- DiseaseUnexposed/totUnexposed
# calculate the relative risk
relativeRisk <- probDiseaseGivenExposed/probDiseaseGivenUnexposed
print(paste("category =", rowname, ", relative risk = ",relativeRisk))
# calculate a confidence interval
confidenceLevel <- (1 - alpha)*100
sigma <- sqrt((1/DiseaseExposed) - (1/totExposed) +
(1/DiseaseUnexposed) - (1/totUnexposed))
# sigma is the standard error of estimate of log of relative risk
z <- qnorm(1-(alpha/2))
lowervalue <- relativeRisk * exp(-z * sigma)
uppervalue <- relativeRisk * exp( z * sigma)
print(paste("category =", rowname, ", ", confidenceLevel,
"% confidence interval = [",lowervalue,",",uppervalue,"]"))
}
}
}
您现在可以使用函数calcRelativeRisk()来计算给予暴露的疾病的相对风险,以及该相对风险的置信区间。例如,要计算99%置信区间,请键入:
calcRelativeRisk(mymatrix,alpha=0.01)
#[1] "category = Exposed , relative risk = 0.173721236521721"
#[1] "category = Exposed , 99 % confidence interval = [ 0.140263410926649 , #0.215159946697844 ]"
这告诉您相对风险的估计值约为0.174,而99%的置信区间为[0.140,0.215]。0.174的相对风险意味着暴露人群(对于我们正在检查的治疗或环境因素等)的疾病风险是未暴露人群疾病风险的0.174倍。如果相对风险为1(即,如果置信区间包括1),则表示没有证据表明暴露与疾病之间存在关联。否则,如果相对风险> 1,则证明暴露与疾病之间存在正相关;而如果相对风险<1,则存在负相关的证据。可以估计队列研究的相对风险,但不能用于病例对照研究。请注意,在我们有多个暴露类别的情况下,我们也可以使用calcRelativeRisk()函数(例如,与非吸烟相比,吸烟与吸烟雪茄相比)。为此,它与calcOddsRatio()函数类似地使用(见下文)。
计算队列或病例对照研究的比值比(Calculating Odds Ratios for a Cohort or Case-Control Study )
除了接触疾病(某些治疗或环境因素,如吸烟或某些药物)的相对风险,您还可以计算队列研究中暴露与疾病之间关联的比值比。优势比也通常在病例对照研究中计算。暴露与疾病之间关联的比值比是:(i)接触治疗或环境因素的人患病的概率除以未患病的概率。暴露,以及(ii)对未接触治疗或环境因素的人患病的可能性除以未接触治疗的人患病的可能性。同样,对于队列研究或病例对照研究,假如您的数据集还是上述一样的:
Disease | Control | |
Exposed | 156 | 9421 |
Unexposed | 1531 | 14797 |
您可以使用以下R函数calcOddsRatio()来计算暴露与疾病之间关联的优势比。您需要先将函数复制并粘贴到R中,然后才能使用它:
calcOddsRatio <- function(mymatrix,alpha=0.05,referencerow=2,quiet=FALSE)
{
numrow <- nrow(mymatrix)
myrownames <- rownames(mymatrix)
for (i in 1:numrow)
{
rowname <- myrownames[i]
DiseaseUnexposed <- mymatrix[referencerow,1]
ControlUnexposed <- mymatrix[referencerow,2]
if (i != referencerow)
{
DiseaseExposed <- mymatrix[i,1]
ControlExposed <- mymatrix[i,2]
totExposed <- DiseaseExposed + ControlExposed
totUnexposed <- DiseaseUnexposed + ControlUnexposed
probDiseaseGivenExposed <- DiseaseExposed/totExposed
probDiseaseGivenUnexposed <- DiseaseUnexposed/totUnexposed
probControlGivenExposed <- ControlExposed/totExposed
probControlGivenUnexposed <- ControlUnexposed/totUnexposed
# calculate the odds ratio
oddsRatio <- (probDiseaseGivenExposed*probControlGivenUnexposed)/
(probControlGivenExposed*probDiseaseGivenUnexposed)
if (quiet == FALSE)
{
print(paste("category =", rowname, ", odds ratio = ",oddsRatio))
}
# calculate a confidence interval
confidenceLevel <- (1 - alpha)*100
sigma <- sqrt((1/DiseaseExposed)+(1/ControlExposed)+
(1/DiseaseUnexposed)+(1/ControlUnexposed))
# sigma is the standard error of our estimate of the log of the odds ratio
z <- qnorm(1-(alpha/2))
lowervalue <- oddsRatio * exp(-z * sigma)
uppervalue <- oddsRatio * exp( z * sigma)
if (quiet == FALSE)
{
print(paste("category =", rowname, ", ", confidenceLevel,
"% confidence interval = [",lowervalue,",",uppervalue,"]"))
}
}
}
if (quiet == TRUE && numrow == 2) # If there are just two treatments (exposed/nonexposed)
{
return(oddsRatio)
}
}
然后,您可以使用该函数计算暴露与疾病之间关联的优势比,以及优势比的置信区间。例如,计算比值比和比值比的95%置信区间:
calcOddsRatio(mymatrix,alpha=0.05)
#[1] "category = Exposed , odds ratio = 0.160039091621751"
#[1] "category = Exposed , 95 % confidence interval = [ 0.135460641900536 , 0.189077140693912 ]"
这告诉我们,我们对优势比的估计约为0.160,优势比的95%置信区间为[0.135,0.189]。如果优势比为1(即,如果置信区间包括1),则意味着没有证据表明暴露与疾病之间存在关联。否则,如果比值比> 1,则证明暴露与疾病之间存在正相关;而如果优势比<1,则存在负相关的证据。可以针对队列研究或病例对照研究估计优势比。
我们也可能有几种不同的暴露(例如,吸烟与吸烟雪茄相比,与不吸烟相比)。在这种情况下,我们的数据将如下所示:
mymatrix <- matrix(c(30,24,76,241,82,509),nrow=3,byrow=TRUE)
colnames(mymatrix) <- c("Disease","Control")
rownames(mymatrix) <- c("Exposure1","Exposure2","Unexposed")
print(mymatrix)
Disease Control
Exposure1 30 24
Exposure2 76 241
Unexposed 82 509
我们可以再次使用函数calcOddsRatio()来计算每个暴露类别相对于缺乏暴露的优势比。我们需要告诉calcOddsRatio()我们的数据矩阵中的哪一行包含缺少曝光的数据(这里是第3行),使用“referencerow =”参数:
> calcOddsRatio(mymatrix, referencerow=3)
[1] "category = Exposure1 , odds ratio = 7.75914634146342"
[1] "category = Exposure1 , 95 % confidence interval = [ 4.32163714854064 ,
13.9309131884372 ]"
[1] "category = Exposure2 , odds ratio = 1.95749418075094"
[1] "category = Exposure2 , 95 % confidence interval = [ 1.38263094540732 ,
2.77137111707344 ]"
如果您的数据来自队列研究(但不是来自病例对照研究),您还可以计算每种暴露类别的相对风险:
> calcRelativeRisk(mymatrix, referencerow=3)
[1] "category = Exposure1 , relative risk = 4.00406504065041"
[1] "category = Exposure1 , 95 % confidence interval = [ 2.93130744422409 ,
5.46941498113737 ]"
[1] "category = Exposure2 , relative risk = 1.72793721628068"
[1] "category = Exposure2 , 95 % confidence interval = [ 1.30507489771431 ,
2.2878127750653 ]"
参考资料
https://a-little-book-of-r-for-biomedical-statistics.readthedocs.io/en/latest/src/biomedicalstats.html#biomedical-statistics