R语言 1:2 1:n 倾向性评分匹配法PSM 一个病例多个对照 MatchIt包

MatchIt包进行倾向性评分匹配法(Propensity Score Matching, PSM)主要是1:N匹配。一个病例对应2个对照等。

MatchIt包帮助文档MatchIt.pdf

1:1匹配也可以参考逆概率加权笔记IPTW

结果分析采用条件logistic回归

# --------PSM 1:N----------
#加载包
library(MatchIt)
#加载数据
data <- read.csv("MatchIt_data.csv")
names(data)
#多变量因子化
str(data)
data[,4:8]<-lapply(data[,4:8],as.factor)
#

matchlist <- matchit(Group ~ sex+age+drink+hypertension+
                       diabetes+smoke, data=data,
                     method   = "nearest",
                     distance = "glm",      #
                     caliper  = 0.05,       # 卡钳值
                     ratio    = 2,          # 1:N 匹配
                     replace  = F)          # 不替换
summary(matchlist)
# 提取匹配后的病例对照
matchdata<- match.data(matchlist,
                       group = "all",
                       distance = "distance",
                       weights = "weights",
                       subclass = "subclass",
                       data = NULL,
                       include.s.weights = TRUE,
                       drop.unmatched = TRUE)
table(duplicated(matchdata$id))
table(matchdata$Group) # 匹配后
table(data$Group)      # 匹配前

#排序配对子
matchdata$subclass <- sort(matchdata$subclass)
#查看配对子情况,有的病例只有一个对照
a=table(matchdata$subclass);table(a)

match2data <- data.frame()
for (i in seq_along(table(matchdata$subclass))) {
  if(table(matchdata$subclass)[i]==2){
    match2data[i,1]=i}
}
match2data <- tidyr::drop_na(match2data,V1)
names(match2data)[1] <- "subclass"
match2data$subclass <- as.factor(match2data$subclass)
match2data2 <- dplyr::left_join(match2data,matchdata)


#-------制作表格-----------
library(tableone)

vars <- c("sex","age","smoke","drink","hypertension",
            "diabetes" )
catVars <- c("sex","drink","hypertension",
            "diabetes","smoke" )
nonVars <- c("age")
#建立表格
tabMatched <- CreateTableOne(vars = vars, 
                             strata = "Group", 
                             factorVars = catVars,
                             data = matchdata, 
                      addOverall = TRUE,#添加Overall列的分析结果 
                             test = TRUE)
#最终表格
tab5 <- print(tabMatched, 
              showAllLevels = TRUE, #显示所有水平,不折叠
              cramVars = catVars, 
              nonnormal = nonVars, 
              #exact = "M", 
              catDigits=2,        #分类变量保留小数位数
              contDigits=2,       #连续变量保留小数位数
              quote = FALSE,      #不显示引号 
              noSpaces = TRUE,    #是否删除为对齐而添加的空间。
              #如果您希望自己在其他软件中对齐数字,请使用此选项。
              printToggle = FALSE #输出matrix
) 
tab5
write.csv(tab5, "Table_One2.csv", quote=TRUE, row.names=TRUE) 

#------PSM的条件logistic回归----------
library(reportReg);library(survival)
model <- clogit(Group~ hypertension+age+ strata(subclass), 
                data=matchdata)
reportReg(model)
summary(model)

问题:1:2或1:N有的病例只能匹配上一个对照,并不是全部都是一个病例有两个对照与其匹配,测试数据也展现这样的结果。

是否需要根据配对子把1:1的去掉?然后保留1:2的数据???

R语言 1:2 1:n 倾向性评分匹配法PSM 一个病例多个对照 MatchIt包_第1张图片

你可能感兴趣的:(个人笔记,r语言)