原理讲解
这里主要使用R自带的聚类分析功能,不用指定类别的数目 hclust()
** (1) pvclust 生成碎石图、P值和突出显示的方框**
在线案例1
在线案例2
数据预处理
鸢尾花数据:根据花萼和花瓣的长度进行归类
Iris 鸢尾花数据集是一个经典数据集,在统计学习和机器学习领域都经常被用作示例。
该数据集内包含 3 类(setosa, versicolour, virginica)共 150 条记录,所以肯定适合分为3类,每条记录都有 4 项特征:花萼长度、花萼宽度、花瓣长度、花瓣宽度
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
mydata <- iris[,1:4]
table(is.na(iris))
##
## FALSE
## 750
# mydata <- na.omit(mydata) # 删除缺失值
mydata <- scale(mydata) # 数据标准化
计算距离矩阵
dist() in “stats” is a distance/similarity functions, it returns an object of class “dist”
注:另一个求距离的方法:The distance()
function implemented in “philentropy” is able to compute 46 different distances/similarities. 可以通过getDistMethods()
函数调用。
d <- dist(mydata, method = "euclidean")
head(as.matrix(dist(d)), n=10) # matrix format; same as: d <- dist(mydata, method = "euclidean", diag = T)
## 1 2 3 4 5 6 7 8
## 1 0.000000 7.295880 5.192492 6.598361 1.954679 6.783623 3.379304 1.820842
## 2 7.295880 0.000000 3.416889 2.321796 8.446804 11.785739 5.642378 5.879343
## 3 5.192492 3.416889 0.000000 1.734143 6.056920 10.293041 2.712936 3.875123
## 4 6.598361 2.321796 1.734143 0.000000 7.439024 11.192198 4.130604 5.280178
## 5 1.954679 8.446804 6.056920 7.439024 0.000000 5.618212 3.716349 3.504365
## 6 6.783623 11.785739 10.293041 11.192198 5.618212 0.000000 8.291670 8.214123
## 7 3.379304 5.642378 2.712936 4.130604 3.716349 8.291670 0.000000 2.809432
## 8 1.820842 5.879343 3.875123 5.280178 3.504365 8.214123 2.809432 0.000000
## 9 9.314921 4.427074 5.101978 3.722033 9.715219 12.436313 6.702509 8.356082
## 10 6.010701 1.620559 2.051240 1.730363 7.177666 10.961288 4.312405 4.540213
## 9 10 11 12 13 14 15
## 1 9.314921 6.010701 3.864157 2.389147 7.281920 9.223825 10.236856
## 2 4.427074 1.620559 9.751389 5.586632 1.429376 6.240605 14.351198
## 3 5.101978 2.051240 8.049997 3.065442 2.839340 5.431223 12.859004
## 4 3.722033 1.730363 9.169086 4.568206 1.419609 4.703047 13.565897
## 5 9.715219 7.177666 2.990061 3.487927 8.192419 9.188064 8.825268
## 6 12.436313 10.961288 3.177915 8.325627 11.620614 11.971069 4.418194
## 7 6.702509 4.312405 6.041847 1.630917 5.142708 6.342383 10.980140
## 8 8.356082 4.540213 5.420421 1.289592 5.976847 8.566393 11.578478
## 9 0.000000 4.940175 11.067067 7.580285 3.339484 2.697582 13.900578
## 10 4.940175 0.000000 8.735801 4.179528 1.766737 6.130328 13.641362
## 16 17 18 19 20 21 22 23
## 1 16.94692 7.047915 0.7570195 6.424342 4.957036 3.498509 3.111193 5.467911
## 2 20.05873 12.094847 7.1877663 11.042565 10.683192 6.420744 9.256322 9.383449
## 3 18.67534 10.405927 5.3232847 9.952335 8.630497 5.937920 7.377084 6.627404
## 4 19.11654 11.334149 6.6648939 10.770250 9.754068 6.790624 8.564473 7.731081
## 5 15.43449 5.657125 2.5024346 5.767063 3.346832 5.093696 2.163940 4.017731
## 6 11.45204 1.216180 6.9128106 2.005056 2.783390 7.872232 3.945982 6.893126
## 7 16.98354 8.322284 3.7028110 8.234051 6.343979 5.423024 5.224754 4.428659
## 8 18.14448 8.518876 1.7414666 7.702602 6.512018 3.181869 4.663897 6.265616
## 9 18.44499 12.438736 9.4595068 12.154871 11.316601 9.502020 10.643415 8.884762
## 10 19.53272 11.221722 5.9657699 10.322641 9.630004 5.626862 8.181542 8.222669
## 24 25 26 27 28 29 30
## 1 5.439129 2.524856 7.507021 2.925076 0.9189443 1.808426 5.2128968
## 2 5.360199 5.232498 1.313642 5.751085 7.2971171 6.119204 2.8836431
## 3 5.812928 3.221583 4.332498 4.651113 5.5519162 4.617894 0.9807404
## 4 6.250977 4.529627 3.473851 5.717156 6.8528933 5.858909 1.7649740
## 5 7.127401 3.863930 8.815399 4.674147 2.5488238 3.661356 6.2962501
## 6 10.122240 8.324137 11.939018 8.584013 6.6746055 7.835288 10.4329939
## 7 6.207738 2.278787 6.325227 4.116578 3.9975900 3.702355 3.1557031
## 8 4.316962 1.169726 6.077215 1.826861 2.0249571 1.176645 3.7258075
## 9 9.095838 7.592462 5.451659 8.810413 9.5807559 8.807604 5.3480303
## 10 4.955724 3.930421 2.416430 4.707632 6.1055978 4.928118 1.4865057
## 31 32 33 34 35 36 37
## 1 6.3562052 4.242250 11.367374 13.199210 6.2010015 4.437023 2.509639
## 2 1.4094607 6.658872 15.500632 16.996793 1.5039877 3.174422 7.546764
## 3 2.4050442 6.473227 13.588196 15.335262 2.6400490 1.930503 6.340528
## 4 1.8620851 7.215500 14.339789 15.971009 2.3056572 2.847505 7.392169
## 5 7.5758289 5.806947 9.674789 11.604111 7.5006364 5.768724 3.360536
## 6 11.3170018 8.259324 6.531003 7.791466 11.2211888 9.905828 5.719679
## 7 4.6796714 6.080770 11.587978 13.454485 4.7746499 3.267011 4.993707
## 8 4.8284673 3.873938 12.694935 14.498237 4.6582723 2.893679 3.400363
## 9 5.0533098 9.895158 14.445306 15.820189 5.4225371 6.173005 9.741690
## 10 0.7257916 6.012485 14.686603 16.300385 0.8210706 1.762776 6.585546
## 38 39 40 41 42 43 44
## 1 2.995566 8.337181 1.8354871 0.5818482 16.783362 6.439085 3.117455
## 2 8.657652 4.191492 5.9978407 7.1297484 12.446742 4.845336 6.679719
## 3 6.031403 3.972488 4.3243194 4.9123486 13.809433 2.587840 5.773090
## 4 7.388931 2.803654 5.6285379 6.3395022 12.579716 2.720093 6.706086
## 5 1.341608 8.700035 3.6836571 1.9463973 16.798815 6.619969 4.450306
## 6 5.866256 11.808940 8.1098692 6.9173173 17.294270 10.323016 7.330773
## 7 3.584121 5.541416 3.4058001 2.9887134 14.750884 3.335384 4.792574
## 8 4.228614 7.379389 0.7034692 1.6915537 16.264349 5.643452 2.971584
## 9 9.380419 1.460381 8.6737904 9.0720612 9.176074 3.971671 9.405859
## 10 7.374657 4.258303 4.7442070 5.8305235 13.364727 4.031877 5.828228
## 45 46 47 48 49 50 51
## 1 4.632499 7.5067195 5.102381 5.4967811 3.592349 3.204323 29.06269
## 2 10.125144 0.5215353 10.774948 3.5477489 9.705608 4.544312 27.56301
## 3 8.459187 3.5561430 8.668921 0.6049613 7.835728 2.858426 29.58025
## 4 9.489993 2.3467331 9.793902 1.5589553 9.012687 4.121857 29.04888
## 5 3.576773 8.6455085 3.425485 6.2188830 2.502381 4.760372 29.72148
## 6 2.448642 11.9536508 2.952663 10.3483555 3.464197 9.197079 27.34624
## 7 6.365172 5.7801567 6.373247 2.7614630 5.723096 2.950794 29.84818
## 8 6.073275 6.0821782 6.648201 4.2767880 5.210608 1.499886 28.92469
## 9 11.199903 4.3222019 11.296690 4.7096195 10.957669 7.402122 28.92848
## 10 9.182960 1.8963366 9.709629 2.3477588 8.632179 3.186356 28.34368
## 52 53 54 55 56 57 58 59
## 1 28.18813 30.66766 25.93969 30.87851 27.04747 27.83713 20.45110 29.57652
## 2 26.47393 28.94916 21.80230 28.37249 23.96379 26.30670 15.66868 27.28085
## 3 28.63463 31.06688 24.58365 30.81641 26.59078 28.37169 18.49461 29.65980
## 4 28.10206 30.49310 23.48222 30.10074 25.79445 27.86504 17.26974 28.99714
## 5 29.06303 31.38573 26.80921 31.79857 28.14832 28.64408 21.32242 30.49029
## 6 27.26745 29.16190 26.16271 30.26269 27.43286 26.66730 21.40809 28.89172
## 7 29.04356 31.41941 25.63952 31.46518 27.46165 28.69242 19.73119 30.25352
## 8 27.89387 30.46778 25.26498 30.45592 26.41146 27.59884 19.63612 29.19051
## 9 28.32794 30.36352 22.34427 29.94456 25.78413 28.06510 15.90587 28.95078
## 10 27.28608 29.77725 23.08208 29.35322 25.02716 27.08517 17.02520 28.21385
## 60 61 62 63 64 65 66 67
## 1 23.63974 23.75174 27.28267 26.14404 29.33349 23.65588 28.79171 26.27558
## 2 19.92753 19.30513 24.83881 22.17547 26.78216 20.65727 26.97607 23.66991
## 3 22.66798 21.68308 27.26625 24.86841 29.25638 23.24559 29.17242 26.12097
## 4 21.72914 20.42557 26.61076 23.78439 28.55913 22.51096 28.60863 25.42465
## 5 24.71901 24.16979 28.34618 26.90448 30.35756 24.83919 29.63383 27.33081
## 6 24.39059 23.49724 27.30030 25.95494 29.15630 24.45514 27.74813 26.38150
## 7 23.68185 22.66540 27.94691 25.84404 29.95693 24.13002 29.60420 26.82793
## 8 22.89831 23.19530 26.76389 25.56336 28.83381 22.97073 28.51166 25.72526
## 9 21.40408 18.03325 26.86054 22.46231 28.63276 22.78939 28.68954 25.59494
## 10 21.10390 20.56286 25.77843 23.42115 27.75580 21.67417 27.81320 24.63901
## 68 69 70 71 72 73 74 75
## 1 25.05141 29.66271 24.98245 28.05774 27.50216 30.91693 28.39789 28.62252
## 2 21.64178 26.11453 21.02009 26.14832 24.63753 27.80495 25.57172 26.19781
## 3 24.37412 28.68647 23.85956 28.33345 27.22264 30.38334 28.13705 28.63822
## 4 23.52369 27.67085 22.85687 27.74423 26.47726 29.50476 27.38312 27.96967
## 5 26.18759 30.35102 26.04133 28.91476 28.59199 31.77170 29.44152 29.62246
## 6 25.73573 28.92847 25.68830 27.18938 27.70651 30.35330 28.39718 28.30726
## 7 25.36071 29.50917 24.95787 28.76160 28.04198 31.15947 28.92357 29.30320
## 8 24.35155 29.18878 24.23705 27.72581 26.92068 30.43430 27.84973 28.16528
## 9 23.43300 26.40036 22.28963 27.85193 26.53321 28.85678 27.34878 28.04937
## 10 22.75682 27.29845 22.25241 27.00371 25.66215 28.91132 26.59616 27.14650
## 76 77 78 79 80 81 82 83
## 1 29.30218 30.92786 32.21671 29.08310 23.45512 24.53678 23.77258 25.93819
## 2 27.22716 28.58036 30.26117 26.48406 19.63708 20.33692 19.49977 22.57684
## 3 29.53099 30.93782 32.47669 28.96904 22.46482 23.18964 22.36704 25.29700
## 4 28.91966 30.23458 31.85507 28.26472 21.52907 22.11260 21.27926 24.45364
## 5 30.20739 31.74068 32.99273 30.12264 24.59231 25.52188 24.76297 27.07392
## 6 28.53143 29.90279 30.94014 28.99643 24.39369 25.17419 24.49752 26.57638
## 7 30.06058 31.50056 32.91306 29.68510 23.56369 24.31907 23.52139 26.26848
## 8 28.95074 30.59097 31.94889 28.56304 22.68775 23.79062 23.01095 25.24569
## 9 28.98462 29.93001 31.70889 28.35391 21.25442 21.24405 20.39727 24.37651
## 10 28.11489 29.53461 31.14069 27.46544 20.82536 21.61660 20.78794 23.68754
## 84 85 86 87 88 89 90 91
## 1 30.90592 24.91499 24.99032 30.09148 28.93352 24.35035 25.75494 25.83772
## 2 27.99520 22.18592 23.60145 28.29077 25.43232 21.67099 21.84371 22.13153
## 3 30.54280 24.65141 25.59005 30.47083 28.05947 24.15522 24.65484 24.90654
## 4 29.72954 23.92684 25.12116 29.89932 27.07930 23.47748 23.64915 23.95776
## 5 31.84259 25.96272 25.79506 30.90090 29.71552 25.49180 26.78379 26.90257
## 6 30.53275 25.10424 23.89237 28.91186 28.41631 24.86813 26.32063 26.41346
## 7 31.29187 25.37844 25.85701 30.88311 28.91317 24.92969 25.71746 25.92836
## 8 30.40031 24.34164 24.76030 29.82973 28.42652 23.73587 25.03079 25.12453
## 9 29.39725 24.03930 25.42432 29.91817 26.04986 23.82129 23.03120 23.54102
## 10 29.05720 23.17521 24.34059 29.13050 26.60687 22.63127 23.07408 23.31812
## 92 93 94 95 96 97 98 99
## 1 28.55055 26.39514 21.41740 26.28650 24.53064 25.92643 28.19190 20.43779
## 2 26.21896 22.83648 16.66728 22.86483 21.89804 23.05393 25.64812 15.86127
## 3 28.61899 25.60198 19.44718 25.58248 24.37932 25.61646 28.13129 18.75366
## 4 27.97555 24.69684 18.20673 24.71291 23.71350 24.88570 27.44937 17.62089
## 5 29.57154 27.48638 22.21516 27.39720 25.67468 27.06214 29.24500 21.45797
## 6 28.31852 26.94312 22.12772 26.87458 25.01900 26.41015 28.13273 21.62365
## 7 29.25788 26.60615 20.64962 26.54947 25.14801 26.44669 28.84894 19.99454
## 8 28.08200 25.70109 20.64977 25.59159 23.92701 25.29649 27.68061 19.58648
## 9 28.17291 24.39879 16.63254 24.55274 24.07752 25.08589 27.59204 16.75923
## 10 27.14477 23.99491 18.02054 23.99200 22.84909 24.06330 26.61300 17.18413
## 100 101 102 103 104 105 106 107
## 1 26.22225 32.52883 30.84637 35.13938 32.78561 34.15789 36.49525 24.87342
## 2 23.08920 31.06294 27.93411 33.44529 30.50905 32.24401 35.00971 20.99937
## 3 25.73464 32.86239 30.43251 35.39892 32.81856 34.33723 36.70078 23.61512
## 4 24.94241 32.26458 29.59722 34.76619 32.11885 33.67803 36.06885 22.54260
## 5 27.35998 32.89127 31.70901 35.58533 33.58187 34.75216 36.65359 25.64000
## 6 26.78997 30.01221 30.26944 32.82950 31.68897 32.33981 33.46466 24.75779
## 7 26.63858 32.95861 31.13488 35.62360 33.33580 34.66636 36.76157 24.51852
## 8 25.55578 32.50597 30.37063 35.06723 32.45080 33.97423 36.57821 24.26753
## 9 25.00006 31.59799 29.11324 34.09026 31.83894 33.19016 34.93812 21.36295
## 10 24.15365 31.84771 29.00297 34.27808 31.45609 33.12556 35.78137 22.22958
## 108 109 110 111 112 113 114 115
## 1 34.92550 33.70989 34.00407 31.78269 33.26664 34.36985 30.38550 31.08313
## 2 33.14248 31.06470 33.20356 30.15662 30.70549 32.54912 27.15799 28.52916
## 3 35.08904 33.37606 34.53982 32.20239 33.09076 34.62213 29.68354 30.81249
## 4 34.42592 32.54392 34.04243 31.63486 32.31755 33.98748 28.74967 30.01241
## 5 35.31557 34.30378 33.97259 32.45730 34.04007 34.96020 31.14820 31.74239
## 6 32.51450 32.16624 30.37794 30.13997 32.19159 32.49294 29.67301 29.77369
## 7 35.32139 33.91317 34.32458 32.49133 33.67673 34.93555 30.42431 31.31632
## 8 34.86928 33.42810 34.25265 31.60125 32.90542 34.20875 29.91154 30.75863
## 9 33.57397 31.59097 32.99912 31.49265 31.81092 33.53708 27.85967 29.30354
## 10 33.98700 32.08771 33.82511 30.97284 31.71280 33.41050 28.29111 29.53622
## 116 117 118 119 120 121 122 123
## 1 32.36933 32.94071 35.36614 38.37630 29.90877 34.30920 29.39312 36.84837
## 2 30.74422 30.93502 34.91383 36.59006 26.32590 32.86184 26.56250 35.17402
## 3 32.72159 33.14854 35.90287 38.23593 28.89169 34.70957 29.01756 36.89289
## 4 32.12769 32.50620 35.45592 37.51011 27.86198 34.12901 28.20152 36.21212
## 5 32.93365 33.69628 35.04360 38.39401 30.58290 34.72030 30.24929 36.97964
## 6 30.41822 31.62063 31.14865 35.22864 29.16122 31.85334 28.81740 33.84933
## 7 32.95303 33.58101 35.50162 38.32517 29.71353 34.84373 29.68405 36.99721
## 8 32.23578 32.66945 35.77447 38.46495 29.43161 34.28141 28.92422 36.91440
## 9 31.76701 32.29545 34.07842 35.90376 26.54336 33.54831 27.77307 34.90311
## 10 31.56458 31.82786 35.42492 37.40344 27.51630 33.64486 27.61231 35.97766
## 124 125 126 127 128 129 130 131
## 1 32.24394 32.72553 33.42940 31.74685 30.83093 33.93940 33.14921 35.34525
## 2 29.55473 31.34978 32.02473 29.17063 28.60090 31.62382 31.41367 33.48177
## 3 32.01758 33.21572 33.86616 31.61300 30.92478 33.88818 33.42800 35.42721
## 4 31.23745 32.66678 33.29694 30.87086 30.26460 33.15202 32.80218 34.73894
## 5 33.10668 33.20740 33.82982 32.65135 31.72789 34.62611 33.66189 35.70232
## 6 31.50724 30.45143 30.92348 31.11877 30.09844 32.51957 31.03564 32.88846
## 7 32.67825 33.35639 33.99721 32.25947 31.46942 34.36532 33.69886 35.66877
## 8 31.81719 32.67328 33.42187 31.31088 30.44250 33.65393 33.04691 35.29183
## 9 30.85390 32.28003 32.70840 30.65265 30.24548 32.61114 32.23142 33.76697
## 10 30.58172 32.11729 32.79365 30.17080 29.52749 32.58576 32.25049 34.34076
## 132 133 134 135 136 137 138 139
## 1 34.83310 34.04444 31.33651 30.69245 36.61409 31.18872 32.09613 30.19711
## 2 34.43766 31.75636 28.75642 27.67560 35.16893 29.87425 30.22601 27.90923
## 3 35.39496 33.99018 31.21752 30.22290 36.82124 31.64354 32.38880 30.25223
## 4 34.96174 33.25193 30.48159 29.36415 36.19388 31.08987 31.77833 29.58434
## 5 34.49195 34.69388 32.26020 31.54727 36.73824 31.57964 32.84647 31.11530
## 6 30.57413 32.51163 30.76371 30.09754 33.49979 28.71616 30.74161 29.57693
## 7 34.97667 34.44169 31.88191 30.96635 36.85702 31.71040 32.77811 30.81851
## 8 35.25869 33.78007 30.89534 30.21882 36.71663 31.17434 31.84301 29.78539
## 9 33.58295 32.65002 30.29722 28.78272 35.02038 30.57846 31.64563 29.58539
## 10 34.93234 32.71388 29.75269 28.76240 35.93078 30.62659 31.08964 28.84434
## 140 141 142 143 144 145 146 147
## 1 33.97859 34.35829 33.79773 30.84637 34.33724 33.56702 34.01047 32.31430
## 2 32.34968 32.70435 32.18323 27.93411 32.86147 32.22284 32.16953 29.37076
## 3 34.33867 34.64654 34.13578 30.43251 34.71890 33.98001 34.23135 31.84597
## 4 33.74203 34.02619 33.53263 29.59722 34.13155 33.40872 33.58729 30.97818
## 5 34.52285 34.82974 34.29475 31.70901 34.75365 33.90191 34.58281 33.06744
## 6 31.91977 32.12827 31.61580 30.26944 31.90660 30.91413 32.10459 31.36279
## 7 34.58258 34.85522 34.35108 31.13488 34.86017 34.04465 34.53538 32.52375
## 8 33.86758 34.27013 33.70883 30.37063 34.30106 33.58146 33.85119 31.90479
## 9 33.31418 33.44427 33.01994 29.11324 33.54692 32.74964 33.09344 30.20302
## 10 33.17086 33.53232 33.00234 29.00297 33.65071 32.98270 33.03640 30.45267
## 148 149 150
## 1 33.16085 30.21451 30.13053
## 2 31.18195 28.86343 27.80633
## 3 33.37143 30.68448 30.14651
## 4 32.72984 30.13700 29.46223
## 5 33.89305 30.67789 31.02442
## 6 31.76600 27.96020 29.45017
## 7 33.78317 30.78902 30.70667
## 8 32.90315 30.16038 29.72365
## 9 32.48825 29.75333 29.39517
## 10 32.07186 29.62282 28.75084
拟合并画出树形图
垂直轴测量相似性水平(或者,可以显示距离水平),沿水平轴列出不同的观测值
fit <- hclust(d, method="ward.D")
plot(fit)
groups <- cutree(fit, k=3) # 将聚类树切成3个聚类,得到指定数目的簇
# Draw Rectangles Around Hierarchical Clusters
rect.hclust(fit, k=3, border="red")
Ward Hierarchical Clustering with Bootstrapped p values
library(pvclust) # a package to assess the uncertainty in hierarchical cluster analysis; 如果没有按照输入 install.packages(“pvclust”)
安装包说明
Introduction: Suzuki & Shimodaira’s (2006) paper in Bioinformatics
GitHub的说明
pvclust provides two types of p-values: AU (Approximately Unbiased) p-value and BP (Bootstrap Probability) value
library(pvclust)
## Warning: package 'pvclust' was built under R version 3.6.3
fit <- pvclust(mydata, method.hclust="ward.D", method.dist="euclidean") # bootstrap resampling; in default nboot=1000
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.6)... Done.
## Bootstrap (r = 0.7)... Done.
## Bootstrap (r = 0.8)... Done.
## Bootstrap (r = 0.9)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.1)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.4)... Done.
# plot dendrogram with p-values
plot(fit)