R语言聚类分析

原理讲解

1、层次聚类

这里主要使用R自带的聚类分析功能,不用指定类别的数目 hclust()
** (1) pvclust 生成碎石图、P值和突出显示的方框**
在线案例1
在线案例2

数据预处理
鸢尾花数据:根据花萼和花瓣的长度进行归类
Iris 鸢尾花数据集是一个经典数据集,在统计学习和机器学习领域都经常被用作示例。

该数据集内包含 3 类(setosa, versicolour, virginica)共 150 条记录,所以肯定适合分为3类,每条记录都有 4 项特征:花萼长度、花萼宽度、花瓣长度、花瓣宽度

head(iris) 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
mydata <- iris[,1:4]
table(is.na(iris))
## 
## FALSE 
##   750
# mydata <- na.omit(mydata) # 删除缺失值
mydata <- scale(mydata) # 数据标准化

计算距离矩阵
dist() in “stats” is a distance/similarity functions, it returns an object of class “dist”
注:另一个求距离的方法:The distance() function implemented in “philentropy” is able to compute 46 different distances/similarities. 可以通过getDistMethods()函数调用。

d <- dist(mydata, method = "euclidean") 
head(as.matrix(dist(d)), n=10) # matrix format; same as: d <- dist(mydata, method = "euclidean", diag = T)
##           1         2         3         4        5         6        7        8
## 1  0.000000  7.295880  5.192492  6.598361 1.954679  6.783623 3.379304 1.820842
## 2  7.295880  0.000000  3.416889  2.321796 8.446804 11.785739 5.642378 5.879343
## 3  5.192492  3.416889  0.000000  1.734143 6.056920 10.293041 2.712936 3.875123
## 4  6.598361  2.321796  1.734143  0.000000 7.439024 11.192198 4.130604 5.280178
## 5  1.954679  8.446804  6.056920  7.439024 0.000000  5.618212 3.716349 3.504365
## 6  6.783623 11.785739 10.293041 11.192198 5.618212  0.000000 8.291670 8.214123
## 7  3.379304  5.642378  2.712936  4.130604 3.716349  8.291670 0.000000 2.809432
## 8  1.820842  5.879343  3.875123  5.280178 3.504365  8.214123 2.809432 0.000000
## 9  9.314921  4.427074  5.101978  3.722033 9.715219 12.436313 6.702509 8.356082
## 10 6.010701  1.620559  2.051240  1.730363 7.177666 10.961288 4.312405 4.540213
##            9        10        11       12        13        14        15
## 1   9.314921  6.010701  3.864157 2.389147  7.281920  9.223825 10.236856
## 2   4.427074  1.620559  9.751389 5.586632  1.429376  6.240605 14.351198
## 3   5.101978  2.051240  8.049997 3.065442  2.839340  5.431223 12.859004
## 4   3.722033  1.730363  9.169086 4.568206  1.419609  4.703047 13.565897
## 5   9.715219  7.177666  2.990061 3.487927  8.192419  9.188064  8.825268
## 6  12.436313 10.961288  3.177915 8.325627 11.620614 11.971069  4.418194
## 7   6.702509  4.312405  6.041847 1.630917  5.142708  6.342383 10.980140
## 8   8.356082  4.540213  5.420421 1.289592  5.976847  8.566393 11.578478
## 9   0.000000  4.940175 11.067067 7.580285  3.339484  2.697582 13.900578
## 10  4.940175  0.000000  8.735801 4.179528  1.766737  6.130328 13.641362
##          16        17        18        19        20       21        22       23
## 1  16.94692  7.047915 0.7570195  6.424342  4.957036 3.498509  3.111193 5.467911
## 2  20.05873 12.094847 7.1877663 11.042565 10.683192 6.420744  9.256322 9.383449
## 3  18.67534 10.405927 5.3232847  9.952335  8.630497 5.937920  7.377084 6.627404
## 4  19.11654 11.334149 6.6648939 10.770250  9.754068 6.790624  8.564473 7.731081
## 5  15.43449  5.657125 2.5024346  5.767063  3.346832 5.093696  2.163940 4.017731
## 6  11.45204  1.216180 6.9128106  2.005056  2.783390 7.872232  3.945982 6.893126
## 7  16.98354  8.322284 3.7028110  8.234051  6.343979 5.423024  5.224754 4.428659
## 8  18.14448  8.518876 1.7414666  7.702602  6.512018 3.181869  4.663897 6.265616
## 9  18.44499 12.438736 9.4595068 12.154871 11.316601 9.502020 10.643415 8.884762
## 10 19.53272 11.221722 5.9657699 10.322641  9.630004 5.626862  8.181542 8.222669
##           24       25        26       27        28       29         30
## 1   5.439129 2.524856  7.507021 2.925076 0.9189443 1.808426  5.2128968
## 2   5.360199 5.232498  1.313642 5.751085 7.2971171 6.119204  2.8836431
## 3   5.812928 3.221583  4.332498 4.651113 5.5519162 4.617894  0.9807404
## 4   6.250977 4.529627  3.473851 5.717156 6.8528933 5.858909  1.7649740
## 5   7.127401 3.863930  8.815399 4.674147 2.5488238 3.661356  6.2962501
## 6  10.122240 8.324137 11.939018 8.584013 6.6746055 7.835288 10.4329939
## 7   6.207738 2.278787  6.325227 4.116578 3.9975900 3.702355  3.1557031
## 8   4.316962 1.169726  6.077215 1.826861 2.0249571 1.176645  3.7258075
## 9   9.095838 7.592462  5.451659 8.810413 9.5807559 8.807604  5.3480303
## 10  4.955724 3.930421  2.416430 4.707632 6.1055978 4.928118  1.4865057
##            31       32        33        34         35       36       37
## 1   6.3562052 4.242250 11.367374 13.199210  6.2010015 4.437023 2.509639
## 2   1.4094607 6.658872 15.500632 16.996793  1.5039877 3.174422 7.546764
## 3   2.4050442 6.473227 13.588196 15.335262  2.6400490 1.930503 6.340528
## 4   1.8620851 7.215500 14.339789 15.971009  2.3056572 2.847505 7.392169
## 5   7.5758289 5.806947  9.674789 11.604111  7.5006364 5.768724 3.360536
## 6  11.3170018 8.259324  6.531003  7.791466 11.2211888 9.905828 5.719679
## 7   4.6796714 6.080770 11.587978 13.454485  4.7746499 3.267011 4.993707
## 8   4.8284673 3.873938 12.694935 14.498237  4.6582723 2.893679 3.400363
## 9   5.0533098 9.895158 14.445306 15.820189  5.4225371 6.173005 9.741690
## 10  0.7257916 6.012485 14.686603 16.300385  0.8210706 1.762776 6.585546
##          38        39        40        41        42        43       44
## 1  2.995566  8.337181 1.8354871 0.5818482 16.783362  6.439085 3.117455
## 2  8.657652  4.191492 5.9978407 7.1297484 12.446742  4.845336 6.679719
## 3  6.031403  3.972488 4.3243194 4.9123486 13.809433  2.587840 5.773090
## 4  7.388931  2.803654 5.6285379 6.3395022 12.579716  2.720093 6.706086
## 5  1.341608  8.700035 3.6836571 1.9463973 16.798815  6.619969 4.450306
## 6  5.866256 11.808940 8.1098692 6.9173173 17.294270 10.323016 7.330773
## 7  3.584121  5.541416 3.4058001 2.9887134 14.750884  3.335384 4.792574
## 8  4.228614  7.379389 0.7034692 1.6915537 16.264349  5.643452 2.971584
## 9  9.380419  1.460381 8.6737904 9.0720612  9.176074  3.971671 9.405859
## 10 7.374657  4.258303 4.7442070 5.8305235 13.364727  4.031877 5.828228
##           45         46        47         48        49       50       51
## 1   4.632499  7.5067195  5.102381  5.4967811  3.592349 3.204323 29.06269
## 2  10.125144  0.5215353 10.774948  3.5477489  9.705608 4.544312 27.56301
## 3   8.459187  3.5561430  8.668921  0.6049613  7.835728 2.858426 29.58025
## 4   9.489993  2.3467331  9.793902  1.5589553  9.012687 4.121857 29.04888
## 5   3.576773  8.6455085  3.425485  6.2188830  2.502381 4.760372 29.72148
## 6   2.448642 11.9536508  2.952663 10.3483555  3.464197 9.197079 27.34624
## 7   6.365172  5.7801567  6.373247  2.7614630  5.723096 2.950794 29.84818
## 8   6.073275  6.0821782  6.648201  4.2767880  5.210608 1.499886 28.92469
## 9  11.199903  4.3222019 11.296690  4.7096195 10.957669 7.402122 28.92848
## 10  9.182960  1.8963366  9.709629  2.3477588  8.632179 3.186356 28.34368
##          52       53       54       55       56       57       58       59
## 1  28.18813 30.66766 25.93969 30.87851 27.04747 27.83713 20.45110 29.57652
## 2  26.47393 28.94916 21.80230 28.37249 23.96379 26.30670 15.66868 27.28085
## 3  28.63463 31.06688 24.58365 30.81641 26.59078 28.37169 18.49461 29.65980
## 4  28.10206 30.49310 23.48222 30.10074 25.79445 27.86504 17.26974 28.99714
## 5  29.06303 31.38573 26.80921 31.79857 28.14832 28.64408 21.32242 30.49029
## 6  27.26745 29.16190 26.16271 30.26269 27.43286 26.66730 21.40809 28.89172
## 7  29.04356 31.41941 25.63952 31.46518 27.46165 28.69242 19.73119 30.25352
## 8  27.89387 30.46778 25.26498 30.45592 26.41146 27.59884 19.63612 29.19051
## 9  28.32794 30.36352 22.34427 29.94456 25.78413 28.06510 15.90587 28.95078
## 10 27.28608 29.77725 23.08208 29.35322 25.02716 27.08517 17.02520 28.21385
##          60       61       62       63       64       65       66       67
## 1  23.63974 23.75174 27.28267 26.14404 29.33349 23.65588 28.79171 26.27558
## 2  19.92753 19.30513 24.83881 22.17547 26.78216 20.65727 26.97607 23.66991
## 3  22.66798 21.68308 27.26625 24.86841 29.25638 23.24559 29.17242 26.12097
## 4  21.72914 20.42557 26.61076 23.78439 28.55913 22.51096 28.60863 25.42465
## 5  24.71901 24.16979 28.34618 26.90448 30.35756 24.83919 29.63383 27.33081
## 6  24.39059 23.49724 27.30030 25.95494 29.15630 24.45514 27.74813 26.38150
## 7  23.68185 22.66540 27.94691 25.84404 29.95693 24.13002 29.60420 26.82793
## 8  22.89831 23.19530 26.76389 25.56336 28.83381 22.97073 28.51166 25.72526
## 9  21.40408 18.03325 26.86054 22.46231 28.63276 22.78939 28.68954 25.59494
## 10 21.10390 20.56286 25.77843 23.42115 27.75580 21.67417 27.81320 24.63901
##          68       69       70       71       72       73       74       75
## 1  25.05141 29.66271 24.98245 28.05774 27.50216 30.91693 28.39789 28.62252
## 2  21.64178 26.11453 21.02009 26.14832 24.63753 27.80495 25.57172 26.19781
## 3  24.37412 28.68647 23.85956 28.33345 27.22264 30.38334 28.13705 28.63822
## 4  23.52369 27.67085 22.85687 27.74423 26.47726 29.50476 27.38312 27.96967
## 5  26.18759 30.35102 26.04133 28.91476 28.59199 31.77170 29.44152 29.62246
## 6  25.73573 28.92847 25.68830 27.18938 27.70651 30.35330 28.39718 28.30726
## 7  25.36071 29.50917 24.95787 28.76160 28.04198 31.15947 28.92357 29.30320
## 8  24.35155 29.18878 24.23705 27.72581 26.92068 30.43430 27.84973 28.16528
## 9  23.43300 26.40036 22.28963 27.85193 26.53321 28.85678 27.34878 28.04937
## 10 22.75682 27.29845 22.25241 27.00371 25.66215 28.91132 26.59616 27.14650
##          76       77       78       79       80       81       82       83
## 1  29.30218 30.92786 32.21671 29.08310 23.45512 24.53678 23.77258 25.93819
## 2  27.22716 28.58036 30.26117 26.48406 19.63708 20.33692 19.49977 22.57684
## 3  29.53099 30.93782 32.47669 28.96904 22.46482 23.18964 22.36704 25.29700
## 4  28.91966 30.23458 31.85507 28.26472 21.52907 22.11260 21.27926 24.45364
## 5  30.20739 31.74068 32.99273 30.12264 24.59231 25.52188 24.76297 27.07392
## 6  28.53143 29.90279 30.94014 28.99643 24.39369 25.17419 24.49752 26.57638
## 7  30.06058 31.50056 32.91306 29.68510 23.56369 24.31907 23.52139 26.26848
## 8  28.95074 30.59097 31.94889 28.56304 22.68775 23.79062 23.01095 25.24569
## 9  28.98462 29.93001 31.70889 28.35391 21.25442 21.24405 20.39727 24.37651
## 10 28.11489 29.53461 31.14069 27.46544 20.82536 21.61660 20.78794 23.68754
##          84       85       86       87       88       89       90       91
## 1  30.90592 24.91499 24.99032 30.09148 28.93352 24.35035 25.75494 25.83772
## 2  27.99520 22.18592 23.60145 28.29077 25.43232 21.67099 21.84371 22.13153
## 3  30.54280 24.65141 25.59005 30.47083 28.05947 24.15522 24.65484 24.90654
## 4  29.72954 23.92684 25.12116 29.89932 27.07930 23.47748 23.64915 23.95776
## 5  31.84259 25.96272 25.79506 30.90090 29.71552 25.49180 26.78379 26.90257
## 6  30.53275 25.10424 23.89237 28.91186 28.41631 24.86813 26.32063 26.41346
## 7  31.29187 25.37844 25.85701 30.88311 28.91317 24.92969 25.71746 25.92836
## 8  30.40031 24.34164 24.76030 29.82973 28.42652 23.73587 25.03079 25.12453
## 9  29.39725 24.03930 25.42432 29.91817 26.04986 23.82129 23.03120 23.54102
## 10 29.05720 23.17521 24.34059 29.13050 26.60687 22.63127 23.07408 23.31812
##          92       93       94       95       96       97       98       99
## 1  28.55055 26.39514 21.41740 26.28650 24.53064 25.92643 28.19190 20.43779
## 2  26.21896 22.83648 16.66728 22.86483 21.89804 23.05393 25.64812 15.86127
## 3  28.61899 25.60198 19.44718 25.58248 24.37932 25.61646 28.13129 18.75366
## 4  27.97555 24.69684 18.20673 24.71291 23.71350 24.88570 27.44937 17.62089
## 5  29.57154 27.48638 22.21516 27.39720 25.67468 27.06214 29.24500 21.45797
## 6  28.31852 26.94312 22.12772 26.87458 25.01900 26.41015 28.13273 21.62365
## 7  29.25788 26.60615 20.64962 26.54947 25.14801 26.44669 28.84894 19.99454
## 8  28.08200 25.70109 20.64977 25.59159 23.92701 25.29649 27.68061 19.58648
## 9  28.17291 24.39879 16.63254 24.55274 24.07752 25.08589 27.59204 16.75923
## 10 27.14477 23.99491 18.02054 23.99200 22.84909 24.06330 26.61300 17.18413
##         100      101      102      103      104      105      106      107
## 1  26.22225 32.52883 30.84637 35.13938 32.78561 34.15789 36.49525 24.87342
## 2  23.08920 31.06294 27.93411 33.44529 30.50905 32.24401 35.00971 20.99937
## 3  25.73464 32.86239 30.43251 35.39892 32.81856 34.33723 36.70078 23.61512
## 4  24.94241 32.26458 29.59722 34.76619 32.11885 33.67803 36.06885 22.54260
## 5  27.35998 32.89127 31.70901 35.58533 33.58187 34.75216 36.65359 25.64000
## 6  26.78997 30.01221 30.26944 32.82950 31.68897 32.33981 33.46466 24.75779
## 7  26.63858 32.95861 31.13488 35.62360 33.33580 34.66636 36.76157 24.51852
## 8  25.55578 32.50597 30.37063 35.06723 32.45080 33.97423 36.57821 24.26753
## 9  25.00006 31.59799 29.11324 34.09026 31.83894 33.19016 34.93812 21.36295
## 10 24.15365 31.84771 29.00297 34.27808 31.45609 33.12556 35.78137 22.22958
##         108      109      110      111      112      113      114      115
## 1  34.92550 33.70989 34.00407 31.78269 33.26664 34.36985 30.38550 31.08313
## 2  33.14248 31.06470 33.20356 30.15662 30.70549 32.54912 27.15799 28.52916
## 3  35.08904 33.37606 34.53982 32.20239 33.09076 34.62213 29.68354 30.81249
## 4  34.42592 32.54392 34.04243 31.63486 32.31755 33.98748 28.74967 30.01241
## 5  35.31557 34.30378 33.97259 32.45730 34.04007 34.96020 31.14820 31.74239
## 6  32.51450 32.16624 30.37794 30.13997 32.19159 32.49294 29.67301 29.77369
## 7  35.32139 33.91317 34.32458 32.49133 33.67673 34.93555 30.42431 31.31632
## 8  34.86928 33.42810 34.25265 31.60125 32.90542 34.20875 29.91154 30.75863
## 9  33.57397 31.59097 32.99912 31.49265 31.81092 33.53708 27.85967 29.30354
## 10 33.98700 32.08771 33.82511 30.97284 31.71280 33.41050 28.29111 29.53622
##         116      117      118      119      120      121      122      123
## 1  32.36933 32.94071 35.36614 38.37630 29.90877 34.30920 29.39312 36.84837
## 2  30.74422 30.93502 34.91383 36.59006 26.32590 32.86184 26.56250 35.17402
## 3  32.72159 33.14854 35.90287 38.23593 28.89169 34.70957 29.01756 36.89289
## 4  32.12769 32.50620 35.45592 37.51011 27.86198 34.12901 28.20152 36.21212
## 5  32.93365 33.69628 35.04360 38.39401 30.58290 34.72030 30.24929 36.97964
## 6  30.41822 31.62063 31.14865 35.22864 29.16122 31.85334 28.81740 33.84933
## 7  32.95303 33.58101 35.50162 38.32517 29.71353 34.84373 29.68405 36.99721
## 8  32.23578 32.66945 35.77447 38.46495 29.43161 34.28141 28.92422 36.91440
## 9  31.76701 32.29545 34.07842 35.90376 26.54336 33.54831 27.77307 34.90311
## 10 31.56458 31.82786 35.42492 37.40344 27.51630 33.64486 27.61231 35.97766
##         124      125      126      127      128      129      130      131
## 1  32.24394 32.72553 33.42940 31.74685 30.83093 33.93940 33.14921 35.34525
## 2  29.55473 31.34978 32.02473 29.17063 28.60090 31.62382 31.41367 33.48177
## 3  32.01758 33.21572 33.86616 31.61300 30.92478 33.88818 33.42800 35.42721
## 4  31.23745 32.66678 33.29694 30.87086 30.26460 33.15202 32.80218 34.73894
## 5  33.10668 33.20740 33.82982 32.65135 31.72789 34.62611 33.66189 35.70232
## 6  31.50724 30.45143 30.92348 31.11877 30.09844 32.51957 31.03564 32.88846
## 7  32.67825 33.35639 33.99721 32.25947 31.46942 34.36532 33.69886 35.66877
## 8  31.81719 32.67328 33.42187 31.31088 30.44250 33.65393 33.04691 35.29183
## 9  30.85390 32.28003 32.70840 30.65265 30.24548 32.61114 32.23142 33.76697
## 10 30.58172 32.11729 32.79365 30.17080 29.52749 32.58576 32.25049 34.34076
##         132      133      134      135      136      137      138      139
## 1  34.83310 34.04444 31.33651 30.69245 36.61409 31.18872 32.09613 30.19711
## 2  34.43766 31.75636 28.75642 27.67560 35.16893 29.87425 30.22601 27.90923
## 3  35.39496 33.99018 31.21752 30.22290 36.82124 31.64354 32.38880 30.25223
## 4  34.96174 33.25193 30.48159 29.36415 36.19388 31.08987 31.77833 29.58434
## 5  34.49195 34.69388 32.26020 31.54727 36.73824 31.57964 32.84647 31.11530
## 6  30.57413 32.51163 30.76371 30.09754 33.49979 28.71616 30.74161 29.57693
## 7  34.97667 34.44169 31.88191 30.96635 36.85702 31.71040 32.77811 30.81851
## 8  35.25869 33.78007 30.89534 30.21882 36.71663 31.17434 31.84301 29.78539
## 9  33.58295 32.65002 30.29722 28.78272 35.02038 30.57846 31.64563 29.58539
## 10 34.93234 32.71388 29.75269 28.76240 35.93078 30.62659 31.08964 28.84434
##         140      141      142      143      144      145      146      147
## 1  33.97859 34.35829 33.79773 30.84637 34.33724 33.56702 34.01047 32.31430
## 2  32.34968 32.70435 32.18323 27.93411 32.86147 32.22284 32.16953 29.37076
## 3  34.33867 34.64654 34.13578 30.43251 34.71890 33.98001 34.23135 31.84597
## 4  33.74203 34.02619 33.53263 29.59722 34.13155 33.40872 33.58729 30.97818
## 5  34.52285 34.82974 34.29475 31.70901 34.75365 33.90191 34.58281 33.06744
## 6  31.91977 32.12827 31.61580 30.26944 31.90660 30.91413 32.10459 31.36279
## 7  34.58258 34.85522 34.35108 31.13488 34.86017 34.04465 34.53538 32.52375
## 8  33.86758 34.27013 33.70883 30.37063 34.30106 33.58146 33.85119 31.90479
## 9  33.31418 33.44427 33.01994 29.11324 33.54692 32.74964 33.09344 30.20302
## 10 33.17086 33.53232 33.00234 29.00297 33.65071 32.98270 33.03640 30.45267
##         148      149      150
## 1  33.16085 30.21451 30.13053
## 2  31.18195 28.86343 27.80633
## 3  33.37143 30.68448 30.14651
## 4  32.72984 30.13700 29.46223
## 5  33.89305 30.67789 31.02442
## 6  31.76600 27.96020 29.45017
## 7  33.78317 30.78902 30.70667
## 8  32.90315 30.16038 29.72365
## 9  32.48825 29.75333 29.39517
## 10 32.07186 29.62282 28.75084
  • 计算距离还可以使用这些选项:“ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC)*

拟合并画出树形图
垂直轴测量相似性水平(或者,可以显示距离水平),沿水平轴列出不同的观测值

fit <- hclust(d, method="ward.D")
plot(fit)
groups <- cutree(fit, k=3) # 将聚类树切成3个聚类,得到指定数目的簇
# Draw Rectangles Around Hierarchical Clusters
rect.hclust(fit, k=3, border="red")

R语言聚类分析_第1张图片

Ward Hierarchical Clustering with Bootstrapped p values

library(pvclust) # a package to assess the uncertainty in hierarchical cluster analysis; 如果没有按照输入 install.packages(“pvclust”)
安装包说明
Introduction: Suzuki & Shimodaira’s (2006) paper in Bioinformatics
GitHub的说明
pvclust provides two types of p-values: AU (Approximately Unbiased) p-value and BP (Bootstrap Probability) value

library(pvclust)
## Warning: package 'pvclust' was built under R version 3.6.3
fit <- pvclust(mydata, method.hclust="ward.D", method.dist="euclidean") # bootstrap resampling; in default nboot=1000
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.6)... Done.
## Bootstrap (r = 0.7)... Done.
## Bootstrap (r = 0.8)... Done.
## Bootstrap (r = 0.9)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.1)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.4)... Done.
# plot dendrogram with p-values
plot(fit)

R语言聚类分析_第2张图片

你可能感兴趣的:(r语言,开发语言)