





# 安装库专用

# 通过如下命令设定镜像
options(repos = 'http://mirrors.ustc.edu.cn/CRAN/')
# 查看镜像是否修改
# 尝试下载R包


Installing package into 'C:/Users/天涯过客/Documents/R/win-library/4.0'
(as 'lib' is unspecified)

also installing the dependencies 'fracdiff', 'urca'

package 'fracdiff' successfully unpacked and MD5 sums checked
package 'urca' successfully unpacked and MD5 sums checked
package 'forecast' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
A tibble: 3 × 98
id date 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 ... 21:30-21:45 21:45-22:00 22:00-22:15 22:15-22:30 22:30-22:45 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00
客户4 2020-10-01 0.619 0.619 0.619 0.619 0.619 0.619 0.619 0.619 ... 0.581 0.581 0.581 0.581 0.581 0.581 0.619 0.619 0.619 0.619
客户5 2020-10-01 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 ... 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210
客户8 2020-10-01 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000


### 可发现维度一致,将其合并

A tibble: 6 × 98
id date 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 ... 21:30-21:45 21:45-22:00 22:00-22:15 22:15-22:30 22:30-22:45 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00
客户4 2020-10-01 0.619 0.619 0.619 0.619 0.619 0.619 0.619 0.619 ... 0.581 0.581 0.581 0.581 0.581 0.581 0.619 0.619 0.619 0.619
客户5 2020-10-01 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 ... 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210
客户8 2020-10-01 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
客户7 2020-10-01 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625 ... 0.635 0.635 0.635 0.635 0.635 0.635 0.625 0.625 0.625 0.625
客户89 2020-10-01 22.278 22.278 22.278 22.278 22.278 22.278 22.278 22.278 ... 14.470 14.470 14.470 14.470 14.470 14.470 22.278 22.278 22.278 22.278
客户160 2020-10-01 65.200 48.480 65.840 62.560 49.120 64.560 62.560 65.840 ... 84.560 85.920 85.920 83.920 84.560 67.200 98.000 83.920 84.560 98.640
tibble [3,900 x 98] (S3: tbl_df/tbl/data.frame)
 $ id         : chr [1:3900] "客户4" "客户5" "客户8" "客户7" ...
 $ date       : chr [1:3900] "2020-10-01" "2020-10-01" "2020-10-01" "2020-10-01" ...
 $ 00:00-00:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 00:15-00:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 00:30-00:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 00:45-01:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:00-01:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:15-01:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:30-01:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:45-02:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:00-02:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:15-02:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:30-02:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:45-03:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:00-03:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:15-03:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:30-03:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:45-04:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:00-04:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:15-04:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:30-04:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:45-05:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:00-05:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:15-05:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:30-05:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:45-06:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:00-06:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:15-06:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:30-06:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:45-07:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 07:00-07:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 07:15-07:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 07:30-07:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 07:45-08:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:00-08:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:15-08:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:30-08:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:45-09:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:00-09:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:15-09:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:30-09:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:45-10:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:00-10:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:15-10:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:30-10:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:45-11:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 11:00-11:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 11:15-11:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 11:30-11:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 11:45-12:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:00-12:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:15-12:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:30-12:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:45-13:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:00-13:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:15-13:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:30-13:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:45-14:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:00-14:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:15-14:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:30-14:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:45-15:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:00-15:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:15-15:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:30-15:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:45-16:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:00-16:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:15-16:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:30-16:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:45-17:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:00-17:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:15-17:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:30-17:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:45-18:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:00-18:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:15-18:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:30-18:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:45-19:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 19:00-19:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 19:15-19:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 19:30-19:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 19:45-20:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:00-20:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:15-20:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:30-20:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:45-21:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:00-21:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:15-21:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:30-21:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:45-22:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:00-22:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:15-22:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:30-22:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:45-23:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 23:00-23:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 23:15-23:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 23:30-23:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 23:45-00:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
## 可视化查看数据是否有缺失值



# 按行计算总和
data<-data %>%mutate(rowsum = rowSums(.[3:98]))#添加一列,计算行和


date_rowsum<-aggregate(x =data$rowsum, by= list(data$date), FUN =sum)
A data.frame: 20 × 2
date sum
2020-10-01 93125.13
2020-10-02 89564.60
2020-10-03 89715.30
2020-10-04 91119.79
2020-10-05 95884.30
2020-10-06 95558.53
2020-10-07 97199.34
2020-10-08 97358.87
2020-10-09 91890.29
2020-10-10 94850.10
2020-10-11 95424.02
2020-10-12 100501.56
2020-10-13 100307.05
2020-10-14 101899.14
2020-10-15 105264.88
2020-10-16 105435.29
2020-10-17 100331.14
2020-10-18 103870.77
2020-10-19 99513.48
2020-10-20 95456.16
plot(date_rowsum,type = "o", col = "red", xlab = "date", ylab = "sum",
   main = "date_sum")



id_rowsum<-aggregate(x =data$rowsum, by= list(data$id), FUN =sum)
A data.frame: 6 × 2
id sum
1 客户10 880.640
2 客户100 1343.552
3 客户101 6545.280
4 客户102 5985.280
5 客户103 1501.560
6 客户104 94735.159
      id                 sum          
 Length:176         Min.   :     0.0  
 Class :character   1st Qu.:   325.4  
 Mode  :character   Median :   892.7  
                    Mean   : 11047.0  
                    3rd Qu.:  3533.8  
                    Max.   :164139.1  


hist(id_rowsum$sum, col = rgb(1,0,0,0.2))



date_id_rowsum<-aggregate(x =data$rowsum, by= list(data$date,data$id), FUN =sum)
A data.frame: 6 × 3
date id sum
1 2020-10-01 客户10 44.032
2 2020-10-02 客户10 44.032
3 2020-10-03 客户10 44.032
4 2020-10-04 客户10 44.032
5 2020-10-05 客户10 44.032
6 2020-10-06 客户10 44.032
hourdata<-data%>%select(-id)%>% summarise(across(contains(":"),sum,na.rm=TRUE))

A data.frame: 6 × 2
sum time
00:00-00:15 22090.48 00:00-00:15
00:15-00:30 22195.23 00:15-00:30
00:30-00:45 22047.53 00:30-00:45
00:45-01:00 22128.37 00:45-01:00
01:00-01:15 22152.18 01:00-01:15
01:15-01:30 22407.36 01:15-01:30




hour_meanid<-cdata%>%select(-date)%>% group_by(id)%>%summarise(across(everything(),mean,na.rm=TRUE))
A tibble: 6 × 98
id 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 02:00-02:15 ... 21:45-22:00 22:00-22:15 22:15-22:30 22:30-22:45 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00 rowsum
客户10 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 ... 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 44.0320
客户100 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 ... 0.63815 0.63815 0.63815 0.63815 0.63815 0.28375 0.28375 0.28375 0.28375 67.1776
客户101 10.69600 10.36000 9.40800 10.41600 10.30400 10.64000 10.36000 10.36000 9.96800 ... 0.00000 0.00000 0.00000 0.00000 0.00000 10.30400 10.92000 10.02400 10.80800 327.2640
客户102 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 ... 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 299.2640
客户103 0.83400 0.76200 0.70200 0.80400 0.75600 0.80700 0.73500 0.82500 0.72300 ... 0.84600 0.78000 0.82200 0.81000 0.82800 0.76200 0.85200 0.77400 0.81000 75.0780
客户104 23.31025 24.53075 21.38475 25.41075 22.83675 24.02475 22.11075 23.66960 23.85660 ... 25.58030 24.26030 24.70030 24.66730 24.66730 24.73730 20.77730 24.73730 22.09730 2368.3790


## 系统聚类及可视化
hc1 <- hclust(dist(hour_meanid[,-98]),method = "ward.D2")
##  可视化结果
par(family = "STKaiti",cex = 0.45)
Warning message in dist(hour_meanid[, -98]):
plot(hc1,hang = -1)
rect.hclust(hc1, k=3, border="red") 


ggdendrogram(hc1, segments = T,rotate = F, theme_dendro = FALSE,size = 4)+
  theme_bw()+theme(axis.text.x = element_text(size = 5,angle = 90))




## 计算组内平方和  组间平方和
tot_withinss <- vector()
betweenss <- vector()
for(ii in 1:15){
  k1 <- kmeans(hour_meanid[,c(-1,-98)],ii)
  tot_withinss[ii] <- k1$tot.withinss
  betweenss[ii] <- k1$betweenss

kmeanvalue <- data.frame(kk = 1:15,
                         tot_withinss = tot_withinss,
                         betweenss = betweenss)

p1 <- ggplot(kmeanvalue,aes(x = kk,y = tot_withinss))+
  geom_point() + geom_line() +labs(y = "value") +
  ggtitle("Total within-cluster sum of squares")+
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_continuous("kmean 聚类个数",kmeanvalue$kk)

p2 <- ggplot(kmeanvalue,aes(x = kk,y = betweenss))+
  geom_point() +geom_line() +labs(y = "value") +
  ggtitle("The between-cluster sum of squares") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_continuous("kmean 聚类个数",kmeanvalue$kk)




k3 <- kmeans(hour_meanid[,c(-1,-98)],4)
             Length Class  Mode   
cluster      176    -none- numeric
centers      384    -none- numeric
totss          1    -none- numeric
withinss       4    -none- numeric
tot.withinss   1    -none- numeric
betweenss      1    -none- numeric
size           4    -none- numeric
iter           1    -none- numeric
ifault         1    -none- numeric
K-means clustering with 4 clusters of sizes 6, 14, 12, 144

Cluster means:
  00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30
1  62.3225417  61.5598583  63.6116083  60.9971833  63.6551667  63.0593333
2  11.0446536  11.1182536  10.5954179  10.9049750  10.8082393  10.9635143
3  27.1211361  27.7766042  27.0752431  27.9496597  27.0779917  28.0800056
4   0.6796588   0.6750685   0.6726699   0.6789852   0.6641741   0.6706109
  01:30-01:45 01:45-02:00 02:00-02:15 02:15-02:30 02:30-02:45 02:45-03:00
1  62.3378667  61.9536917  62.5135250  63.8845000  61.6807417  62.3915167
2  10.8759214  10.8956643  10.7552357  10.9384250  10.8762357  10.7623750
3  27.1717583  27.4775153  27.5150958  27.8758236  27.9836847  27.5832681
4   0.6772897   0.6643105   0.6722411   0.6603515   0.6549296   0.6562005
  03:00-03:15 03:15-03:30 03:30-03:45 03:45-04:00 04:00-04:15 04:15-04:30
1  60.9340500  62.9090583  61.6450583  60.8104000    61.76591  62.3655333
2  10.8683857  11.0325750  10.7480857  10.8762000    10.91756  10.7136857
3  27.7674792  28.0245264  27.7221507  27.6713938    27.59359  27.7191694
4   0.6507282   0.6617699   0.6492977   0.6531032     0.65239   0.6486241

Clustering vector:
  [1] 4 4 2 4 4 3 3 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4
 [38] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 1 4 2 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 2 4 2 4 3 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4
[112] 4 4 4 4 4 4 2 2 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 2 4 4 4 4 4 4 4 3 4 4
[149] 1 3 2 4 3 3 4 4 4 2 2 4 1 4 4 1 3 4 4 4 3 2 3 4 1 4 4 2

Within cluster sum of squares by cluster:
[1] 91916.57 33987.51 60979.15 15200.70
 (between_SS / total_SS =  92.7 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
A tibble: 6 × 99
id 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 02:00-02:15 ... 22:00-22:15 22:15-22:30 22:30-22:45 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00 rowsum cluster
客户10 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 ... 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 44.0320 4
客户100 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 ... 0.63815 0.63815 0.63815 0.63815 0.28375 0.28375 0.28375 0.28375 67.1776 4
客户101 10.69600 10.36000 9.40800 10.41600 10.30400 10.64000 10.36000 10.36000 9.96800 ... 0.00000 0.00000 0.00000 0.00000 10.30400 10.92000 10.02400 10.80800 327.2640 2
客户102 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 ... 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 299.2640 4
客户103 0.83400 0.76200 0.70200 0.80400 0.75600 0.80700 0.73500 0.82500 0.72300 ... 0.78000 0.82200 0.81000 0.82800 0.76200 0.85200 0.77400 0.81000 75.0780 4
客户104 23.31025 24.53075 21.38475 25.41075 22.83675 24.02475 22.11075 23.66960 23.85660 ... 24.26030 24.70030 24.66730 24.66730 24.73730 20.77730 24.73730 22.09730 2368.3790 3


  1   2   3   4 
  6  14  12 144 
## 对聚类结果可视化
clusplot(hour_meanid[,c(-1,-98)],k3$cluster,main = "kmean cluster number=4")


## 可视化轮廓图,表示聚类效果
sis1 <- silhouette(k3$cluster,dist(hour_meanid[,c(-1,-98)],method = "euclidean"))

plot(sis1,main = " kmean silhouette",
     col = c("red", "green", "blue","orange"))




A tibble: 6 × 3
id rowsum cluster
客户10 44.0320 4
客户100 67.1776 4
客户101 327.2640 2
客户102 299.2640 4
客户103 75.0780 4
客户104 2368.3790 3


hour_perdata<-data%>%select(-id)%>% group_by(date)%>%summarise(across(contains(":"),sum,na.rm=TRUE))
A tibble: 6 × 97
date 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 02:00-02:15 ... 21:30-21:45 21:45-22:00 22:00-22:15 22:15-22:30 22:30-22:45 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00
2020-10-01 1072.639 1029.341 1126.394 1006.468 1149.825 1049.246 1081.704 1043.298 1035.710 ... 903.034 925.398 892.032 895.551 916.666 848.736 1061.042 1039.832 1070.716 1108.838
2020-10-02 1053.230 1061.432 1071.509 1066.805 1064.543 1082.807 1077.749 1059.869 1061.881 ... 854.297 890.661 889.655 926.648 934.576 840.278 1080.117 1047.243 1068.415 1046.172
2020-10-03 1066.082 1041.977 1063.421 1035.097 1065.615 1051.490 1083.908 1086.658 1094.954 ... 887.702 882.407 906.146 880.330 927.592 933.959 1020.249 1063.968 1034.539 1070.460
2020-10-04 1020.191 1019.630 1024.692 998.385 1051.136 1027.053 1039.158 1028.785 1043.948 ... 888.178 872.275 900.867 888.164 846.700 944.616 1009.187 990.422 1063.672 1011.367
2020-10-05 1045.160 1042.054 1027.871 1040.656 1029.198 1024.341 1063.339 1048.033 1053.538 ... 949.303 986.068 1005.798 914.553 975.165 955.135 998.024 1213.050 1093.703 1096.247
2020-10-06 1135.002 1150.890 1084.839 1116.269 1113.480 1106.495 1085.247 1134.322 1125.822 ... 889.987 898.923 866.585 919.430 972.184 913.558 1072.352 1091.411 1104.961 1133.166


Warning message:
"package 'forecast' was built under R version 4.0.4"
Registered S3 methods overwritten by 'forecast':
  method                 from     
  autoplot.Arima         ggfortify
  autoplot.acf           ggfortify
  autoplot.ar            ggfortify
  autoplot.bats          ggfortify
  autoplot.decomposed.ts ggfortify
  autoplot.ets           ggfortify
  autoplot.forecast      ggfortify
  autoplot.stl           ggfortify
  autoplot.ts            ggfortify
  fitted.ar              ggfortify
  fortify.ts             ggfortify
  residuals.ar           ggfortify


A tibble: 6 × 102
date 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 02:00-02:15 ... 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00 day week week01 week02 fes
2020-10-01 1072.639 1029.341 1126.394 1006.468 1149.825 1049.246 1081.704 1043.298 1035.710 ... 848.736 1061.042 1039.832 1070.716 1108.838 1 星期四 0 0 1
2020-10-02 1053.230 1061.432 1071.509 1066.805 1064.543 1082.807 1077.749 1059.869 1061.881 ... 840.278 1080.117 1047.243 1068.415 1046.172 2 星期五 0 0 1
2020-10-03 1066.082 1041.977 1063.421 1035.097 1065.615 1051.490 1083.908 1086.658 1094.954 ... 933.959 1020.249 1063.968 1034.539 1070.460 3 星期六 0 1 1
2020-10-04 1020.191 1019.630 1024.692 998.385 1051.136 1027.053 1039.158 1028.785 1043.948 ... 944.616 1009.187 990.422 1063.672 1011.367 4 星期日 1 0 1
2020-10-05 1045.160 1042.054 1027.871 1040.656 1029.198 1024.341 1063.339 1048.033 1053.538 ... 955.135 998.024 1213.050 1093.703 1096.247 5 星期一 0 0 1
2020-10-06 1135.002 1150.890 1084.839 1116.269 1113.480 1106.495 1085.247 1134.322 1125.822 ... 913.558 1072.352 1091.411 1104.961 1133.166 6 星期二 0 0 1
hour_perdata$week01 <- ifelse(hour_perdata$week =="星期日" ,1,0)
hour_perdata$week02 <- ifelse(hour_perdata$week =="星期六" ,1,0)
hour_perdata$fes <- ifelse(hour_perdata$day <=8 ,1,0)
A tibble: 6 × 102
date 00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30 01:30-01:45 01:45-02:00 02:00-02:15 ... 22:45-23:00 23:00-23:15 23:15-23:30 23:30-23:45 23:45-00:00 day week week01 week02 fes
2020-10-01 1072.639 1029.341 1126.394 1006.468 1149.825 1049.246 1081.704 1043.298 1035.710 ... 848.736 1061.042 1039.832 1070.716 1108.838 1 星期四 0 0 1
2020-10-02 1053.230 1061.432 1071.509 1066.805 1064.543 1082.807 1077.749 1059.869 1061.881 ... 840.278 1080.117 1047.243 1068.415 1046.172 2 星期五 0 0 1
2020-10-03 1066.082 1041.977 1063.421 1035.097 1065.615 1051.490 1083.908 1086.658 1094.954 ... 933.959 1020.249 1063.968 1034.539 1070.460 3 星期六 0 1 1
2020-10-04 1020.191 1019.630 1024.692 998.385 1051.136 1027.053 1039.158 1028.785 1043.948 ... 944.616 1009.187 990.422 1063.672 1011.367 4 星期日 1 0 1
2020-10-05 1045.160 1042.054 1027.871 1040.656 1029.198 1024.341 1063.339 1048.033 1053.538 ... 955.135 998.024 1213.050 1093.703 1096.247 5 星期一 0 0 1
2020-10-06 1135.002 1150.890 1084.839 1116.269 1113.480 1106.495 1085.247 1134.322 1125.822 ... 913.558 1072.352 1091.411 1104.961 1133.166 6 星期二 0 0 1


A data.frame: 6 × 21
2020-10-01 2020-10-02 2020-10-03 2020-10-04 2020-10-05 2020-10-06 2020-10-07 2020-10-08 2020-10-09 2020-10-10 ... 2020-10-12 2020-10-13 2020-10-14 2020-10-15 2020-10-16 2020-10-17 2020-10-18 2020-10-19 2020-10-20 hour
00:00-00:15 1072.639 1053.230 1066.082 1020.191 1045.160 1135.002 1137.515 1152.525 1067.732 994.568 ... 1151.424 1110.697 1115.400 1212.665 1165.643 1182.825 1135.229 1150.423 1071.966 00:00-00:15
00:15-00:30 1029.341 1061.432 1041.977 1019.630 1042.054 1150.890 1184.908 1133.851 1065.854 956.716 ... 1199.721 1162.543 1110.875 1141.027 1207.853 1162.515 1105.980 1180.014 1121.971 00:15-00:30
00:30-00:45 1126.394 1071.509 1063.421 1024.692 1027.871 1084.839 1106.957 1172.472 995.550 950.803 ... 1120.132 1151.471 1130.677 1203.013 1201.945 1197.636 1152.169 1163.545 1093.053 00:30-00:45
00:45-01:00 1006.468 1066.805 1035.097 998.385 1040.656 1116.269 1157.146 1102.331 1095.603 1008.611 ... 1159.193 1169.906 1062.571 1165.559 1151.602 1166.322 1193.887 1174.572 1171.423 00:45-01:00
01:00-01:15 1149.825 1064.543 1065.615 1051.136 1029.198 1113.480 1092.436 1127.408 1033.743 961.246 ... 1204.588 1095.289 1152.565 1192.592 1277.848 1084.228 1141.221 1189.006 1062.351 01:00-01:15
01:15-01:30 1049.246 1082.807 1051.490 1027.053 1024.341 1106.495 1175.856 1156.887 1083.512 1037.840 ... 1198.193 1125.723 1120.467 1232.879 1196.588 1169.740 1212.220 1192.188 1125.762 01:15-01:30
timedata<- read.csv("hour_perdatat.csv",,encoding='UTF-8')
A data.frame: 6 × 2
hour sum
1 00:00-00:15 1072.639
2 00:15-00:30 1029.341
3 00:30-00:45 1126.394
4 00:45-01:00 1006.468
5 01:00-01:15 1149.825
6 01:15-01:30 1049.246
timedata$sum<- ts(timedata$sum,start = timedata$sum[1],frequency = 96)
## 可视化序列


Series: timedata$sum 

          ar1      ar2      ar3      ar4      ma1      ma2
      -0.9814  -0.3626  -0.1371  -0.0666  -0.0126  -0.3784
s.e.   0.2001   0.1060   0.0771   0.0390   0.1994   0.1503

sigma^2 estimated as 3525:  log likelihood=-10028.98
AIC=20071.97   AICc=20072.03   BIC=20110.52
## 白噪声检验
Box.test(timedata$sum,type ="Ljung-Box")
	Box-Ljung test

data:  timedata$sum
X-squared = 1198.6, df = 1, p-value < 2.2e-16

p-value < 2.2e-16,说明不是白噪声

## 平稳性检验,单位根检验
Warning message in adf.test(timedata$sum):
"p-value smaller than printed p-value"

	Augmented Dickey-Fuller Test

data:  timedata$sum
Dickey-Fuller = -8.312, Lag order = 12, p-value = 0.01
alternative hypothesis: stationary

p-value = 0.01,说明数据是平稳的


Series: timedata$sum

ar1 ar2 ar3 ar4 ma1 ma2
-0.9814 -0.3626 -0.1371 -0.0666 -0.0126 -0.3784
s.e. 0.2001 0.1060 0.0771 0.0390 0.1994 0.1503

sigma^2 estimated as 3525: log likelihood=-10028.98
AIC=20071.97 AICc=20072.03 BIC=20110.52

## 对数据建立ARIMA(4,1,2)(0,1,0)[96]模型,并预测后面的数据

ARIMA <- arima(timedata$sum, c(4, 1, 2),
              seasonal = list(order = c(0, 1, 0),period = 96))
arima(x = timedata$sum, order = c(4, 1, 2), seasonal = list(order = c(0, 1, 
    0), period = 96))

          ar1      ar2      ar3      ar4      ma1      ma2
      -0.9814  -0.3626  -0.1371  -0.0666  -0.0126  -0.3784
s.e.   0.2001   0.1060   0.0771   0.0390   0.1994   0.1503

sigma^2 estimated as 3513:  log likelihood = -10028.98,  aic = 20071.97

Training set error measures:
                     ME     RMSE      MAE        MPE     MAPE      MASE
Training set 0.09120194 57.75377 39.33659 -0.1172248 3.877619 0.9060558
Training set 0.0008442811
Box.test(ARIMA$residuals,type ="Ljung-Box")
## p-value = 0.9705,此时,模型的残差已经是白噪声数据,数据中的信息已经充分的提取出来了
	Box-Ljung test

data:  ARIMA$residuals
X-squared = 0.0013707, df = 1, p-value = 0.9705

p-value = 0.9705,此时,模型的残差已经是白噪声数据,数据中的信息已经充分的提取出来了

# 可视化模型的预测值和这是值之间的差距
par(family = "STKaiti")
points(timedata$sum,col = "red")
lines(timedata$sum,col = "red")


A data.frame: 6 × 5
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
1092.639 1098.481 1022.523 1174.438 982.3139 1214.648
1092.649 1155.922 1079.964 1231.881 1039.7534 1272.091
1092.660 1133.867 1055.742 1211.992 1014.3850 1253.349
1092.670 1206.424 1126.309 1286.539 1083.8980 1328.950
1092.681 1102.090 1020.172 1184.008 976.8067 1227.373
1092.691 1161.522 1077.076 1245.969 1032.3722 1290.673
A data.frame: 96 × 6
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 label
00:00-00:15 1098.4808 1022.5234 1174.438 982.3139 1214.648 00:00-00:15
00:15-00:30 1155.9223 1079.9635 1231.881 1039.7534 1272.091 00:15-00:30
00:30-00:45 1133.8668 1055.7419 1211.992 1014.3850 1253.349 00:30-00:45
00:45-01:00 1206.4239 1126.3085 1286.539 1083.8980 1328.950 00:45-01:00
01:00-01:15 1102.0896 1020.1716 1184.008 976.8067 1227.373 01:00-01:15
01:15-01:30 1161.5225 1077.0756 1245.969 1032.3722 1290.673 01:15-01:30
01:30-01:45 1191.8077 1105.8064 1277.809 1060.2800 1323.335 01:30-01:45
01:45-02:00 1127.9435 1039.7995 1216.088 993.1389 1262.748 01:45-02:00
02:00-02:15 1120.8414 1030.9879 1210.695 983.4223 1258.260 02:00-02:15
02:15-02:30 1163.5182 1071.7734 1255.263 1023.2066 1303.830 02:15-02:30
02:30-02:45 1118.0552 1024.5597 1211.551 975.0661 1261.044 02:30-02:45
02:45-03:00 1135.7189 1040.4717 1230.966 990.0510 1281.387 02:45-03:00
03:00-03:15 1141.9914 1045.0253 1238.957 993.6945 1290.288 03:00-03:15
03:15-03:30 1124.3686 1025.7242 1223.013 973.5051 1275.232 03:15-03:30
03:30-03:45 1165.7177 1065.4104 1266.025 1012.3110 1319.124 03:30-03:45
03:45-04:00 1168.6668 1066.7342 1270.599 1012.7743 1324.559 03:45-04:00
04:00-04:15 1117.2710 1013.7314 1220.811 958.9209 1275.621 04:00-04:15
04:15-04:30 1159.8146 1054.6969 1264.932 999.0509 1320.578 04:15-04:30
04:30-04:45 1108.2668 1001.5921 1214.941 945.1220 1271.412 04:30-04:45
04:45-05:00 1143.0876 1034.8792 1251.296 977.5971 1308.578 04:45-05:00
05:00-05:15 1116.9023 1007.1815 1226.623 949.0988 1284.706 05:00-05:15
05:15-05:30 1167.5618 1056.3490 1278.775 997.4765 1337.647 05:15-05:30
05:30-05:45 1194.9829 1082.2980 1307.668 1022.6462 1367.320 05:30-05:45
05:45-06:00 1125.4082 1011.2701 1239.546 950.8490 1299.967 05:45-06:00
06:00-06:15 1178.4606 1062.8876 1294.034 1001.7070 1355.214 06:00-06:15
06:15-06:30 1131.3154 1014.3251 1248.306 952.3942 1310.237 06:15-06:30
06:30-06:45 1181.8554 1063.4649 1300.246 1000.7927 1362.918 06:30-06:45
06:45-07:00 1130.1775 1010.4029 1249.952 946.9981 1313.357 06:45-07:00
07:00-07:15 1026.6404 905.4977 1147.783 841.3687 1211.912 07:00-07:15
07:15-07:30 962.4825 839.9869 1084.978 775.1417 1149.823 07:15-07:30
... ... ... ... ... ... ...
16:30-16:45 1011.7984 846.8720 1176.725 759.5652 1264.032 16:30-16:45
16:45-17:00 951.7214 785.7988 1117.644 697.9646 1205.478 16:45-17:00
17:00-17:15 947.1024 780.1895 1114.015 691.8311 1202.374 17:00-17:15
17:15-17:30 980.5094 812.6120 1148.407 723.7324 1237.286 17:15-17:30
17:30-17:45 983.0574 814.1813 1151.934 724.7836 1241.331 17:30-17:45
17:45-18:00 979.1334 809.2842 1148.983 719.3714 1238.895 17:45-18:00
18:00-18:15 995.6674 824.8506 1166.484 734.4257 1256.909 18:00-18:15
18:15-18:30 1012.9484 841.1695 1184.727 750.2353 1275.662 18:15-18:30
18:30-18:45 1025.9084 853.1728 1198.644 761.7321 1290.085 18:30-18:45
18:45-19:00 1042.0334 868.3463 1215.721 776.4019 1307.665 18:45-19:00
19:00-19:15 1079.0614 904.4280 1253.695 811.9827 1346.140 19:00-19:15
19:15-19:30 1010.4204 834.8459 1185.995 741.9023 1278.939 19:15-19:30
19:30-19:45 1034.4144 857.9037 1210.925 764.4646 1304.364 19:30-19:45
19:45-20:00 997.5734 820.1315 1175.015 726.1994 1268.948 19:45-20:00
20:00-20:15 1036.5914 858.2231 1214.960 763.8006 1309.382 20:00-20:15
20:15-20:30 1030.1404 850.8505 1209.430 755.9402 1304.341 20:15-20:30
20:30-20:45 1031.2764 851.0697 1211.483 755.6740 1306.879 20:30-20:45
20:45-21:00 1014.4834 833.3644 1195.602 737.4858 1291.481 20:45-21:00
21:00-21:15 1060.6284 878.6018 1242.655 782.2427 1339.014 21:00-21:15
21:15-21:30 986.6754 803.7456 1169.605 706.9085 1266.442 21:15-21:30
21:30-21:45 1031.7954 847.9669 1215.624 750.6540 1312.937 21:30-21:45
21:45-22:00 1006.3614 821.6386 1191.084 723.8522 1288.871 21:45-22:00
22:00-22:15 1007.0594 821.4466 1192.672 723.1890 1290.930 22:00-22:15
22:15-22:30 992.1524 805.6538 1178.651 706.9273 1277.378 22:15-22:30
22:30-22:45 960.7194 773.3392 1148.100 674.1461 1247.293 22:30-22:45
22:45-23:00 1027.5134 839.2557 1215.771 739.5981 1315.429 22:45-23:00
23:00-23:15 1176.6744 987.5433 1365.806 887.4234 1465.926 23:00-23:15
23:15-23:30 1156.0004 965.9999 1346.001 865.4197 1446.581 23:15-23:30
23:30-23:45 1201.3474 1010.4815 1392.213 909.4432 1493.252 23:30-23:45
23:45-00:00 1162.6794 970.9520 1354.407 869.4576 1455.901 23:45-00:00





plot(date_rowsum,type = "o", col = "red", xlab = "date", ylab = "sum",
   main = "date_sum")
A data.frame: 20 × 2
date sum
2020-10-01 93125.13
2020-10-02 89564.60
2020-10-03 89715.30
2020-10-04 91119.79
2020-10-05 95884.30
2020-10-06 95558.53
2020-10-07 97199.34
2020-10-08 97358.87
2020-10-09 91890.29
2020-10-10 94850.10
2020-10-11 95424.02
2020-10-12 100501.56
2020-10-13 100307.05
2020-10-14 101899.14
2020-10-15 105264.88
2020-10-16 105435.29
2020-10-17 100331.14
2020-10-18 103870.77
2020-10-19 99513.48
2020-10-20 95456.16



