现有match表,
v1 v2
1 001;02
2 03,004;001
3 003;002,001
我想把match表中V2这一列的数据分隔开,形成表match_new:
V1 V2 V3 V4
1 001 02
2 03 004 001
3 003 002 001
需要注意的是原match表中v2这一列里的数据有的以“;”分隔,有的以“,”分隔。
代码一
a <- data.frame(v1 = 1:3, V2 = c('001;02', '03,004;001', '003;002,001'), V3 = NA, V4 = NA, stringsAsFactors = F )
for(i in 1:nrow(a)){
l <- strsplit(a[i, 2], ',')
l <- strsplit(l[[1]], ';')
s <- l[[1]]
if(length(l)>1)
for(n in 2:length(l)){
s <- c(s, l[[n]])
}
a[i,2:(1+length(s))] <- s
}
代码二
> match<-data.frame(V1=c(1,2,3),v2=c('001;002','003,004;001','003;002,001'))
> match
V1 v2
1 1 001;002
2 2 003,004;001
3 3 003;002,001
>
> parts<-function(x) {
+ m <- regexec("([0-9]+)(;|,)*([0-9]*)(;|,)*([0-9]*)", x)
+ parts <- unlist(lapply(regmatches(x, m), `[`, c(2L, 4L, 6L)))
+ parts
+ }
>
> rr<-do.call(rbind,lapply(as.character(match$v2),parts))
> colnames(rr)<-c("V2","V3","V4")
> cbind(V1=match$V1,rr)
V1 V2 V3 V4
[1,] "1" "001" "002" ""
[2,] "2" "003" "004" "001"
[3,] "3" "003" "002" "001"