table起的作用是排序和分组, 类似select score,count(*) from X$X3 group by score order by score;
这样画图就比较直观.
> table(X$X3)
39 41 47 50 51 52 54 55 59 61 62 63 64 65 66 68 70 71 72
1 1 2 1 1 2 2 1 5 2 1 1 3 1 2 2 4 2 1
73 75 76 77 78 79 81 82 83 84 86 87 88 89 90 92 93 94 95
2 2 3 1 2 6 1 1 4 2 1 1 1 2 3 3 2 3 1
96 97 99 100
1 3 2 18
测试过程死机了, 下面的图形重新生成了一批测试数据 :
我们将三科成绩用两种箱线图画出来,箱线图可以更加清楚的解释数据的分布情况,和数据的集中区域.
命令如下 :
> boxplot(X$X1, X$X2, X$X3)
> boxplot(X[2:4],col=c("red","green","blue"),notch=T)
X数据框第二到第四个元素对应高等数学, 线性代数, 运筹学的成绩.
notch: if ‘notch’ is ‘TRUE’, a notch is drawn in each side of the
boxes. If the notches of two plots do not overlap this is
‘strong evidence’ that the two medians differ (Chambers _et
al_, 1983, p. 62). See ‘boxplot.stats’ for the calculations
used.
为了更方便的观测单位个体的特性,R提供了星相图,脸谱图(根据脸的形状和眼睛的大小来反映数据)揭示每个个体属性上的差异,具体命令如下 :
> stars(X$X1)
错误于stars(X$X1) : 'x'要么是矩阵,要么是数据框
> class(X[2:4])
[1] "data.frame"
> class(X$X1)
[1] "numeric"
正确用法, 以下用法结果一样 :
> stars(X[c("X1","X2","X3")])
> stars(X[2:4])
我们这里用到了数据框中的3组学科分数数据, 星图展示的是3个方向的差异.
如果使用4组数据, 那么将展示4个方向的个体差异. 注意是个体差异, 而不是同一行的几组数据之间的差异.
如果只有一组数据的话, 表示一组数据的个体差异.
脸谱图也可以表示个体的差异, 也可以只有一组数据, 因为它反映的不是数据之间的差异.
> install.packages("TeachingDemos")
--- 在此連線階段时请选用CRAN的鏡子 ---
试开URL’http://mirrors.xmu.edu.cn/CRAN/bin/windows/contrib/3.1/TeachingDemos_2.9.zip'
Content type 'application/zip' length 1608012 bytes (1.5 Mb)
打开了URL
downloaded 1.5 Mb
程序包‘TeachingDemos’打开成功,MD5和检查也通过
下载的二进制程序包在
D:\Temp\RtmpsrZfOH\downloaded_packages里
> library("TeachingDemos")
警告信息:
程辑包‘TeachingDemos’是用R版本3.1.3 来建造的
> faces2(X[2:4])
如果我们拿Num来绘图的话 , 因为Num是从小到大的序列值, 你会发现和stars(X[1])一样, (线越来越长), 脸越来越胖.
在形象化展示数据方面,R还提供了茎叶图控我们观看数据分布情况,命令如下 :
> stem(X$X1)
The decimal point is at the | # 注意这句话的意思是, | 右边每个0代表一个点/值, 例如100这行|右边有4个0, 表示有4个100.
80 | 0000
82 | 000000000000
84 | 0000000000000
86 | 00000
88 | 00000000
90 | 0000000000000
92 | 0000000000000
94 | 0000
96 | 00000000000
98 | 0000000000000
100 | 0000
上面这组数据也可以用table来反映, 不过stem更形象.
> table(X$X1)
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
2 2 7 5 10 3 3 2 4 4 4 9 8 5 3 1 3 8 5
99 100
8 4
> stem(X$X2)
The decimal point is 1 digit(s) to the right of the |
# 注意, 这句话和前面又不一样了, 右边表示的是剩余数值, 例如9 | 58 代表95,98. 9|0002代表90,90,90,92.
6 | 57789
7 | 002233334444444
7 | 5555666666777788888888888999999999
8 | 00001111112223333344444444
8 | 55666777888999
9 | 0002
9 | 58
> stem(X$X3)
The decimal point is 1 digit(s) to the right of the |
2 | 9
3 | 7
4 |
5 | 0245888
6 | 245566779
7 | 0012222334455567788889
8 | 0001112233344445677899
9 | 0001233334566667899
10 | 0000000000000000000
R语言还提供了判断数列是否服从正态分布的形象展示图形,可以简单的借助肉眼判断,当散落的点的分布越接近直线,则数列的分布越接近正态分布.
命令如下 :
X2是使用runif生成的均匀分布数据, 显然从图上看非正态分布.
qqnorm(X1)
qqline(X1)
X2是使用rnorm生成的遵循正态分布.
[参考]
1. http://en.wikipedia.org/wiki/Standard_deviation
2. http://blog.csdn.net/howardge/article/details/41681137
3. > help(runif)
Uniform package:stats R Documentation
The Uniform Distribution
Description:
These functions provide information about the uniform distribution
on the interval from ‘min’ to ‘max’. ‘dunif’ gives the density,
‘punif’ gives the distribution function ‘qunif’ gives the quantile
function and ‘runif’ generates random deviates.
Usage:
dunif(x, min = 0, max = 1, log = FALSE)
punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
runif(n, min = 0, max = 1)
Arguments:
x, q: vector of quantiles.
p: vector of probabilities.
n: number of observations. If ‘length(n) > 1’, the length is
taken to be the number required.
min, max: lower and upper limits of the distribution. Must be finite.
log, log.p: logical; if TRUE, probabilities p are given as log(p).
lower.tail: logical; if TRUE (default), probabilities are P[X <= x],
otherwise, P[X > x].
Details:
If ‘min’ or ‘max’ are not specified they assume the default values
of ‘0’ and ‘1’ respectively.
The uniform distribution has density
f(x) = 1/(max-min)
for min <= x <= max.
For the case of u := min == max, the limit case of X == u is
assumed, although there is no density in that case and ‘dunif’
will return ‘NaN’ (the error condition).
‘runif’ will not generate either of the extreme values unless ‘max
= min’ or ‘max-min’ is small compared to ‘min’, and in particular
not for the default arguments.
Value:
‘dunif’ gives the density, ‘punif’ gives the distribution
function, ‘qunif’ gives the quantile function, and ‘runif’
generates random deviates.
The length of the result is determined by ‘n’ for ‘runif’, and is
the maximum of the lengths of the numerical arguments for the
other functions.
The numerical arguments other than ‘n’ are recycled to the length
of the result. Only the first elements of the logical arguments
are used.
4. help( hist , plot , barplot , pie , boxplot , stars, faces2 , stem , qqnorm , qqline )