产生合适的交易流数据(transactions)是进行关联规则分析的前提条件,下面将用四种方法来产生交易流数据。
1. 使用list来产生:
(1)形成list
> a_list <- list(
c("a","b","c"),
c("a","b"),
c("a","b","d"),
c("c","e"),
c("a","b","d","e")
)
> a_list
[1]
[1] "a" "b" "c"
[2]
[1] "a" "b"
[3]
[1] "a" "b" "d"
[4]
[1] "c" "e"
[5]
[1] "a" "b" "d" "e"
(2)#命名
> names(a_list) <- paste("Tr",c(1:5), sep = "")
(3)#转换
> trans <- as(a_list, "transactions")
(4)#test是否建立成功
> trans
> summary(trans)
2. 使用matrix数据来产生:
(1)矩阵
>a_matrix <- matrix(
c(1,1,1,0,0,
1,1,0,0,0,
1,1,0,1,0,
0,0,1,0,1,
1,1,0,1,1), ncol = 5)
(2)命名
> dimnames(a_matrix) <- list(
c("a","b","c","d","e"),
paste("Tr",c(1:5), sep = ""))
(3)查看矩阵与转化
> a_matrix
Tr1 Tr2 Tr3 Tr4 Tr5
a 1 1 1 0 1
b 1 1 1 0 1
c 1 0 0 1 0
d 0 0 1 0 1
e 0 0 0 1 1
> trans2 <- as(a_matrix, "transactions")
(4)检查是否成功
> inspect(trans2)
3. 使用 dataframe 来产生:
(1)因子变量转化为数据框
> a_df <- data.frame(
age = as.factor(c(6,8,7,6,9,5)),
grade = as.factor(c(1,3,1,1,4,1)))
(2)转化
> trans3 <- as(a_df, "transactions")
(3)验证是否成功
> inspect(trans3)
4. 当 dataframe 中有缺失值时:
(1)取样样本
> a_df2 <- sample(c(LETTERS[1:5], NA),10,TRUE) #有缺失值
> a_df2
[1] "C" "C" "D" "A" "A" "E" "A" "D" "E" NA
(2)转化为数据框
> a_df2 <- data.frame(X = a_df2, Y = sample(a_df2))
> a_df2
X Y
1 C E
2 C A
3 D A
4 A C
5 A C
6 E E
7 A D
8 D A
9 E
10 D
(3)转换
> trans4 <- as(a_df2, "transactions")
(4)再转为数据框
> as(trans4, "data.frame")
(5)检验是否成功
> inspect(trans4)
5. 当 dataframe 中包含交易ID 和交易物品时:
(1)数据源
> a_df3 <- data.frame(TID = c(1,1,2,2,2,3), item=c("a","b","a","b","c", "b"))
> a_df3
TID item
1 1 a
2 1 b
3 2 a
4 2 b
5 2 c
6 3 b
(2)拆分转换
> trans5<- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")
> inspect(trans5)
> LIST(trans5)
$`1`
[1] "a" "b"
$`2`
[1] "a" "b" "c"
$`3`
[1] "b"