【R图千言】主成分分析之3D绘图

主成分分析 (PCA, principal component analysis)是一种数学降维方法。

PCA降维过程;
1)数据标准化
2)求协方差矩阵
3)特征向量排序
4)投影矩阵
5)数据转换

将样本数据求一个维度的协方差矩阵,然后求解这个协方差矩阵的特征值和对应的特征向量,将这些特征向量按照对应的特征值从大到小排列,组成新的矩阵,被称为特征向量矩阵,也可以称为投影矩阵,然后用改投影矩阵将样本数据转换。取前K维数据即可,实现对数据的降维。

案例1

创建数据集

  1. 用R模拟芯片数据矩阵,矩阵为10000行(10000个基因),100列(100个样本),生成均值为0的正态分布的随机数据。
    chip.data<-matrix(rnorm(10000*100,mean=0),nrow=10000,ncol=100)
    显示结果:
1.jpg

2,在10000个基因中,假定有100个基因在两组间存在差异,前50个上调,另50个下调;

1)创建1000个1~1000的随机数,作为索引
2)创建50*10的正态分布矩阵,均值为2,通过sha上一步的随机数读取1:50的数字作为行号,前10列,赋值给chip.data,作为上调数据集。
3)相同方法得到50个下调的数据集

diff.index<-sample(1:1000,1000)

chip.data[diff.index[1:50],1:10]<-rnorm(50*10,mean=2)
chip.data[diff.index[1:50],1:10]<-rnorm(50*10,mean=-2)
  1. PCA作图

princomp函数使用方法

Description
princomp performs a principal components analysis on the given numeric data matrix and returns the results as an object of class princomp.
## Default S3 method:
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
         subset = rep_len(TRUE, nrow(as.matrix(x))), ...)

PCA统计
chip.data<-princomp(chip.data)
显示chip.data的数据

> chip.data
                  [,1]          [,2]          [,3]          [,4]          [,5]          [,6]
    [1,] -8.764830e-01 -2.585436e+00  1.7486665932  0.6825088090  0.8905718598  2.2543743674
    [2,]  2.756559e+00  9.191507e-01  1.7224333465  2.5164729313  0.3655551313  0.3940460436
    [3,]  9.754316e-01 -9.121371e-01 -0.0534088859  0.4711108467 -0.6567994543 -0.9404594391
    [4,] -1.443449e+00  6.328793e-01  0.7067575122 -2.0083705142 -0.0641474431  0.5404051953
    [5,] -1.678596e+00 -4.086325e-01 -0.6946972480  0.9941794052  1.9677986393  0.4281278343
    [6,]  2.318705e+00  2.574536e+00  2.4483722951  3.7352614791  0.6849518201  2.5269332706
    [7,]  1.368299e+00 -6.396757e-01 -0.3016863422 -0.9881343210  0.7250075490 -1.1474935276
    [8,]  4.547110e-01 -1.388434e+00  0.5724884590  1.3446862438  0.2708813623  0.0768302649
    [9,] -3.320154e-01  1.015236e+00  0.0524039788  0.8327729956  1.5803932962 -1.1469311968
   [10,]  1.442150e+00 -1.005228e+00  0.9377764607  1.5061633084 -0.7742683227 -1.9687078752

显示统计结果

> summary(chip.data)
Importance of components:
                         Comp.1    Comp.2    Comp.3    Comp.4    Comp.5     Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
Standard deviation     3.240085 3.2099856 3.1956557 3.1691590 3.1505363 3.13960683 3.11757677 3.10222437 3.07273039 3.05572866
Proportion of Variance 0.105799 0.1038424 0.1029174 0.1012178 0.1000317 0.09933886 0.09794967 0.09698734 0.09515192 0.09410186
Cumulative Proportion  0.105799 0.2096414 0.3125588 0.4137765 0.5138082 0.61314710 0.71109677 0.80808411 0.90323603 0.99733790

Standard deviation # 标准方差
Proportion of Variance # 贡献度
Cumulative Proportion # 累计贡献度

前10个主成分已可以dad达到解析0.99733790的数据

  1. 画图
    1)设置两组100个差异基因的颜色。可以通过更改,“2”“7”的1:10范围的数字,更改两组的颜色
    2)plot3d(xlab,ylab,zlab三维数据集,分组颜色,图形类型,半径)
    以下为type:s,代表图形为球星
colour<-c(rep(2,50),rep(7,50))
library(rgl)
plot3d(chip.data.pca$loadings[,1:3],col=colour,type="s",radius = 0.025)

显示结果3D图,可以使用鼠标进行旋转和方法缩小,直到最清晰角度为止。

2.jpg
plot3d(chip.data.pca$loadings[,1:3],col=colour,type="l",radius = 0.025)

显示线性结果:

3.jpg


案例2
加载包和数据集

rm(list=ls())
library(pca3d)
library(rgl)

data(metabo)
head(metabo)

数据集介绍


4.jpg
Metabolic profiles in tuberculosis. # 肺结核代谢数据集

Description

Relative abundances of metabolites from serum samples of three groups of individuals
# 三组血清样本的相对丰度
Details

A data frame with 136 observations on 425 metabolic variables.
136个观测值,425ge个daixie个代谢变量


Serum samples from three groups of individuals were compared: tuberculin skin test negative (NEG), positive (POS) and clinical tuberculosis (TB).
#比较三组患者的血清样本:结核菌素皮肤试验阴性(NEG)、阳性(POS)和临床结核(TB)。
PCA计算

prcomp函数使用方法

Principal Components Analysis

Description

Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.

## Default S3 method:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
       tol = NULL, rank. = NULL, ...)

1)去除数据集的第一列行名作为数据集,标准化数据
2)以数据集的第一列行名作为分组因子

metabo.pca <- prcomp(metabo[,-1], scale.=TRUE)
groups  <- factor(metabo[,1])

统计计算结果

> summary(metabo.pca)
Importance of components:
                           PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8    PC9   PC10    PC11    PC12    PC13    PC14
Standard deviation     5.86992 5.38923 4.74978 4.11434 3.88969 3.81589 3.30208 3.09675 2.9872 2.9157 2.80259 2.71364 2.60341 2.56392
Proportion of Variance 0.08146 0.06866 0.05333 0.04002 0.03577 0.03442 0.02578 0.02267 0.0211 0.0201 0.01857 0.01741 0.01602 0.01554
Cumulative Proportion  0.08146 0.15012 0.20345 0.24347 0.27924 0.31366 0.33944 0.36211 0.3832 0.4033 0.42187 0.43928 0.45530 0.47084

作图

pca3d使用方法

pca2d {pca3d}   R Documentation
Show a three- or two-dimensional plot of a prcomp object

Description

Show a three- two-dimensional plot of a prcomp object or a matrix, using different symbols and colors for groups of data

Usage
pca3d(pca, components = 1:3, col = NULL, title = NULL, new = FALSE,
  axes.color = "grey", bg = "white", radius = 1, group = NULL,
  shape = NULL, palette = NULL, fancy = FALSE, biplot = FALSE,
  biplot.vars = 5, legend = NULL, show.scale = FALSE,
  show.labels = FALSE, labels.col = "black", show.axes = TRUE,
  show.axe.titles = TRUE, axe.titles = NULL, show.plane = TRUE,
  show.shadows = FALSE, show.centroids = FALSE, show.group.labels = FALSE,
  show.shapes = TRUE, show.ellipses = FALSE, ellipse.ci = 0.95)

pca3d(数据集,分组,是否显示置信区间,显示默认值是0.95,而椭圆的大小为95。是否实现分隔平面)
pca3d(metabo.pca, group=groups, show.ellipses=TRUE, elle.ci=0.75, show.plane=FALSE)

显示结果3D图,可以使用鼠标进行旋转和方法缩小,直到最清晰角度为止。


5.jpg

取消外包围分隔平面

pca3d(metabo.pca, group=groups, show.ellipses=TRUE, ellipse.ci=0.75, show.plane=FALSE)

显示结果:

6.jpg

你可能感兴趣的:(【R图千言】主成分分析之3D绘图)