矩阵相关性分析的相关参数设置

使用了WGCNA示例数据,计算了矩阵两两相关性,发现其中有use这个参数当中的设置变化时,其结果也有所变化。记录一下。

使用的数据

使用的是WGCNA官方文档提供的数据


使用不同参数计算矩阵相关性

使用use = 'p'参数


> cor_data <- cor(datExpr,use = 'p', method = 'pearson')
> cor_data[1:5,1:5]
            MMT00000044 MMT00000046 MMT00000051 MMT00000076 MMT00000080
MMT00000044  1.00000000 -0.03797242  0.09326526  0.24716099  0.13636580
MMT00000046 -0.03797242  1.00000000 -0.56393957 -0.02934881 -0.06291723
MMT00000051  0.09326526 -0.56393957  1.00000000  0.06120014 -0.05422173
MMT00000076  0.24716099 -0.02934881  0.06120014  1.00000000 -0.02327867
MMT00000080  0.13636580 -0.06291723 -0.05422173 -0.02327867  1.00000000

使用use = "everything" 参数


> cor_data <- cor(datExpr, method = "pearson",use = "everything" )
> cor_data[1:5,1:5]
            MMT00000044 MMT00000046 MMT00000051 MMT00000076 MMT00000080
MMT00000044  1.00000000 -0.03797242  0.09326526          NA  0.13636580
MMT00000046 -0.03797242  1.00000000 -0.56393957          NA -0.06291723
MMT00000051  0.09326526 -0.56393957  1.00000000          NA -0.05422173
MMT00000076          NA          NA          NA           1          NA
MMT00000080  0.13636580 -0.06291723 -0.05422173          NA  1.00000000

使用use = "all.obs"提示错误,有遗漏值

> cor_data <- cor(datExpr, method = "pearson",use = "all.obs" )
Error in cor(datExpr, method = "pearson", use = "all.obs") : 
  cov/cor中有遗漏值

使用use = "complete.obs"

> cor_data <- cor(datExpr, method = "pearson",use = "complete.obs" )
> cor_data[1:5,1:5]
            MMT00000044 MMT00000046 MMT00000051 MMT00000076 MMT00000080
MMT00000044  1.00000000  0.02925872  -0.1375516  0.62442994  0.14064180
MMT00000046  0.02925872  1.00000000  -0.5110969  0.03828571  0.19707483
MMT00000051 -0.13755164 -0.51109691   1.0000000  0.11570214 -0.18874582
MMT00000076  0.62442994  0.03828571   0.1157021  1.00000000 -0.04126554
MMT00000080  0.14064180  0.19707483  -0.1887458 -0.04126554  1.00000000

使用use = "na.or.complete"


> cor_data <- cor(datExpr, method = "pearson",use = "na.or.complete" )
> cor_data[1:5,1:5]
            MMT00000044 MMT00000046 MMT00000051 MMT00000076 MMT00000080
MMT00000044  1.00000000  0.02925872  -0.1375516  0.62442994  0.14064180
MMT00000046  0.02925872  1.00000000  -0.5110969  0.03828571  0.19707483
MMT00000051 -0.13755164 -0.51109691   1.0000000  0.11570214 -0.18874582
MMT00000076  0.62442994  0.03828571   0.1157021  1.00000000 -0.04126554
MMT00000080  0.14064180  0.19707483  -0.1887458 -0.04126554  1.00000000

使用use = "pairwise.complete.obs"参数


> cor_data <- cor(datExpr, method = "pearson",use = "pairwise.complete.obs" )
> cor_data[1:5,1:5]
            MMT00000044 MMT00000046 MMT00000051 MMT00000076 MMT00000080
MMT00000044  1.00000000 -0.03797242  0.09326526  0.24716099  0.13636580
MMT00000046 -0.03797242  1.00000000 -0.56393957 -0.02934881 -0.06291723
MMT00000051  0.09326526 -0.56393957  1.00000000  0.06120014 -0.05422173
MMT00000076  0.24716099 -0.02934881  0.06120014  1.00000000 -0.02327867
MMT00000080  0.13636580 -0.06291723 -0.05422173 -0.02327867  1.00000000

对各个参数的意义进行学习

查阅官方文档,给出的解释如下

If use is "everything", NAs will propagate conceptually, i.e., a resulting value will be NA whenever one of its contributing observations is NA.
If use is "all.obs", then the presence of missing observations will produce an error. If use is "complete.obs" then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).
"na.or.complete" is the same unless there are no complete cases, that gives NA. Finally, if use has the value "pairwise.complete.obs" then the correlation or covariance between each pair of variables is computed using all complete pairs of observations on those variables. This can result in covariance or correlation matrices which are not positive semi-definite, as well as NA entries if there are no complete pairs for that pair of variables. For cov and var, "pairwise.complete.obs" only works with the "pearson" method. Note that (the equivalent of) var(double(0), use = *) gives NA for use = "everything" and "na.or.complete", and gives an error in the other cases.

翻译一下

如果使用的是 "all.obs",那么存在缺失的观察值会产生一个错误。如果使用的是 "complete.obs",那么缺失值将通过个案删除来处理(如果没有完整的个案,则会产生错误)。

如果使用值为 "pairwise.complete.obs",那么每对变量之间的相关性或协方差将使用这些变量上的所有完整观测对进行计算。这可能导致协方差或相关矩阵不是正半无限的,如果没有完整的变量对,也可能导致NA条目。

对于cov和var,"pairwise.complete.obs "只适用于 "pearson "方法。注意,(相当于)var(double(0), use = *)对于use = "everything "和 "na.or.complete "给出了NA,而在其他情况下给出了一个错误。

大致可以这么理解

all.obs:假设不存在缺失数据,遇到缺失数据时将报错
everything:遇到缺失数据时,相关系数的计算结果将设为missing
complete.obs:遇到缺失数据时,行删除
pairwise.complete.obs:成对删除

大致先这样理解了

你可能感兴趣的:(矩阵相关性分析的相关参数设置)