直方hist散点plot柱状barplot饼pie箱线boxplot星相stars脸谱faces茎叶stemQQ图qqnorm

例子取自: http://blog.csdn.net/howardge/article/details/41681137

讲解R绘图, 包含 :

直方图hist()

散点图plot()

柱状图barplot()

饼图pie()

箱线图boxplot()

星相图stars()

脸谱图faces()

茎叶图stem()

QQ图qqnorm()

地图包maps,mapdata,geosphere

样本数据 :

为了方便地进行图形展示,我们创建一个数据场景,假设我们需要统计学生的3门课的考试情况.

我们分别生成100组数据 :

Num 学号;

X1 高等数学(80到100的均匀分布);

X2 线性代数(均值80,标准差为7的正态分布);

X3 运筹学(均值83,标准差18的正态分布)三科成绩;

相应的语句为:

Num=seq(102001,102100)

X1=round(runif(100,min=80,max=100))

X2=round(rnorm(100,mean=80,sd=7))

X3=round(rnorm(100,mean=83,sd=18))

数据预处理 :

考虑到所有的成绩不能超过100分,因此需要将随机生成的成绩中高于100分的改为100分并将最终修正后的所有成绩连同学号存入数据框.

X2[which(X2>100)]=100

X3[which(X3>100)]=100

X=data.frame(Num,X1,X2,X3)

[解释]

1. runif用于生成均匀分布数据(The Uniform Distribution ), 另外还有dunif, punif, qunif几个形态的函数.

2. rnorm生成正态分布数据. sd指定标准差.

标准差定义参考 http://en.wikipedia.org/wiki/Standard_deviation

3. wihch用于返回符合条件的向量索引. 如 :

> x

[1] 8 44 43 6 36

> which(x>10)

[1] 2 3 5

生成的数据如下 :

> Num
  [1] 102001 102002 102003 102004 102005 102006 102007 102008 102009 102010
 [11] 102011 102012 102013 102014 102015 102016 102017 102018 102019 102020
 [21] 102021 102022 102023 102024 102025 102026 102027 102028 102029 102030
 [31] 102031 102032 102033 102034 102035 102036 102037 102038 102039 102040
 [41] 102041 102042 102043 102044 102045 102046 102047 102048 102049 102050
 [51] 102051 102052 102053 102054 102055 102056 102057 102058 102059 102060
 [61] 102061 102062 102063 102064 102065 102066 102067 102068 102069 102070
 [71] 102071 102072 102073 102074 102075 102076 102077 102078 102079 102080
 [81] 102081 102082 102083 102084 102085 102086 102087 102088 102089 102090
 [91] 102091 102092 102093 102094 102095 102096 102097 102098 102099 102100

> X1
  [1] 100  97  83  80  86  87  99  91  95  99  81  80  85  82  81  92  85  98
 [19]  82  90  94  90  90  93  89  97  85  87  99  90  80  92  93  81  84  96
 [37]  82  85  81  84  93  82  91  84  83  85  96  99  84  89  94  92  98  88
 [55]  94  97  84 100  90  84  87  86  97  84  81  84  81 100  89  87  93  85
 [73]  92  88  81  81  87  82  88  96  98  83  88  85  83  94  93  81  92  90
 [91]  98  85  91  80  82  84  94  90  89  91

> X2[which(X2>100)]=100  
> X3[which(X3>100)]=100
> X2
  [1] 85 94 69 62 79 76 80 73 78 73 79 77 93 76 72 81 88 84 81 71 76 69 87 78
 [25] 83 78 70 94 77 92 88 95 85 82 83 75 73 85 80 82 80 72 71 72 71 81 81 66
 [49] 78 79 87 77 82 76 79 84 70 69 67 85 75 86 85 79 78 78 87 79 79 86 73 76
 [73] 85 88 77 75 67 85 79 67 77 72 69 80 78 77 83 80 82 77 82 80 77 88 84 76
 [97] 74 84 68 87

> X3
  [1]  75  79  59  97  50  39  52 100 100  61  90  70  93  82  76  51  77 100
 [19]  89  70  99  72  65  94  59  89 100  70  84 100 100  64  54  79  92  97
 [37]  76 100  96  88  47  83  87 100 100  41  92 100  73  59  63  54  71  70
 [55]  64  90  75  94  93 100  81  47  92 100 100  97  52  78  99  71  84  90
 [73]  61  79  83  78  79 100  86 100 100  68  73  62  55  59  79  59  76  66
 [91] 100  79  94  83  66  64 100  95  68  83

> X=data.frame(Num,X1,X2,X3)
> X
       Num  X1 X2  X3
1   102001 100 85  75
2   102002  97 94  79
3   102003  83 69  59
4   102004  80 62  97
5   102005  86 79  50
6   102006  87 76  39
....
96  102096  84 76  64
97  102097  94 74 100
98  102098  90 84  95
99  102099  89 68  68
100 102100  91 87  83

绘图 :

高等数学成绩的直方图,

hist(X$X1)

高等数学和线性代数的相关关系的散点图,
plot(X1, X2)

从图上看, 高等数学的成绩和线性代数的成绩看不出有什么线性相关性.

运筹学成绩的柱状图,
> barplot(table(X$X3))

运筹学成绩的饼图,

> pie(table(X$X3))

table起的作用是排序和分组, 类似select score,count(*) from X$X3 group by score order by score;

这样画图就比较直观.

> table(X$X3)
 39  41  47  50  51  52  54  55  59  61  62  63  64  65  66  68  70  71  72 
  1   1   2   1   1   2   2   1   5   2   1   1   3   1   2   2   4   2   1 

 73  75  76  77  78  79  81  82  83  84  86  87  88  89  90  92  93  94  95 
  2   2   3   1   2   6   1   1   4   2   1   1   1   2   3   3   2   3   1 

 96  97  99 100 
  1   3   2  18

测试过程死机了, 下面的图形重新生成了一批测试数据 :

我们将三科成绩用两种箱线图画出来,箱线图可以更加清楚的解释数据的分布情况,和数据的集中区域.

命令如下 :

> boxplot(X$X1, X$X2, X$X3)

> boxplot(X[2:4],col=c("red","green","blue"),notch=T)

X数据框第二到第四个元素对应高等数学, 线性代数, 运筹学的成绩.

notch: if ‘notch’ is ‘TRUE’, a notch is drawn in each side of the

boxes. If the notches of two plots do not overlap this is

‘strong evidence’ that the two medians differ (Chambers _et

al_, 1983, p. 62). See ‘boxplot.stats’ for the calculations

used.

为了更方便的观测单位个体的特性,R提供了星相图,脸谱图(根据脸的形状和眼睛的大小来反映数据)揭示每个个体属性上的差异,具体命令如下 :

> stars(X$X1)

错误于stars(X$X1) : 'x'要么是矩阵，要么是数据框

> class(X[2:4])

[1] "data.frame"

> class(X$X1)

[1] "numeric"

正确用法, 以下用法结果一样 :

> stars(X[c("X1","X2","X3")])

> stars(X[2:4])

我们这里用到了数据框中的3组学科分数数据, 星图展示的是3个方向的差异.

如果使用4组数据, 那么将展示4个方向的个体差异. 注意是个体差异, 而不是同一行的几组数据之间的差异.

如果只有一组数据的话, 表示一组数据的个体差异.

脸谱图也可以表示个体的差异, 也可以只有一组数据, 因为它反映的不是数据之间的差异.

> install.packages("TeachingDemos")

--- 在此連線階段时请选用CRAN的鏡子 ---

试开URL’http://mirrors.xmu.edu.cn/CRAN/bin/windows/contrib/3.1/TeachingDemos_2.9.zip'

Content type 'application/zip' length 1608012 bytes (1.5 Mb)

打开了URL

downloaded 1.5 Mb

程序包‘TeachingDemos’打开成功，MD5和检查也通过

下载的二进制程序包在

D:\Temp\RtmpsrZfOH\downloaded_packages里

> library("TeachingDemos")

警告信息：

程辑包‘TeachingDemos’是用R版本3.1.3 来建造的

> faces2(X[2:4])

如果我们拿Num来绘图的话 , 因为Num是从小到大的序列值, 你会发现和stars(X[1])一样, (线越来越长), 脸越来越胖.

在形象化展示数据方面,R还提供了茎叶图控我们观看数据分布情况,命令如下 :

> stem(X$X1)

The decimal point is at the | # 注意这句话的意思是, | 右边每个0代表一个点/值, 例如100这行|右边有4个0, 表示有4个100.

80 | 0000

82 | 000000000000

84 | 0000000000000

86 | 00000

88 | 00000000

90 | 0000000000000

92 | 0000000000000

94 | 0000

96 | 00000000000

98 | 0000000000000

100 | 0000

上面这组数据也可以用table来反映, 不过stem更形象.

> table(X$X1)

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98

2 2 7 5 10 3 3 2 4 4 4 9 8 5 3 1 3 8 5

99 100

8 4

> stem(X$X2)

The decimal point is 1 digit(s) to the right of the |

# 注意, 这句话和前面又不一样了, 右边表示的是剩余数值, 例如9 | 58 代表95,98. 9|0002代表90,90,90,92.

6 | 57789

7 | 002233334444444

7 | 5555666666777788888888888999999999

8 | 00001111112223333344444444

8 | 55666777888999

9 | 0002

9 | 58

> stem(X$X3)

The decimal point is 1 digit(s) to the right of the |

2 | 9

3 | 7

4 |

5 | 0245888

6 | 245566779

7 | 0012222334455567788889

8 | 0001112233344445677899

9 | 0001233334566667899

10 | 0000000000000000000

R语言还提供了判断数列是否服从正态分布的形象展示图形,可以简单的借助肉眼判断,当散落的点的分布越接近直线,则数列的分布越接近正态分布.

命令如下 :

X2是使用runif生成的均匀分布数据, 显然从图上看非正态分布.

qqnorm(X1)

qqline(X1)

X2是使用rnorm生成的遵循正态分布.

qqnorm(X2)

qqline(X2)

[参考]

1. http://en.wikipedia.org/wiki/Standard_deviation

2. http://blog.csdn.net/howardge/article/details/41681137

3. > help(runif)

Uniform package:stats R Documentation

The Uniform Distribution

Description:

These functions provide information about the uniform distribution

on the interval from ‘min’ to ‘max’. ‘dunif’ gives the density,

‘punif’ gives the distribution function ‘qunif’ gives the quantile

function and ‘runif’ generates random deviates.

Usage:

dunif(x, min = 0, max = 1, log = FALSE)

punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)

qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)

runif(n, min = 0, max = 1)

Arguments:

x, q: vector of quantiles.

p: vector of probabilities.

n: number of observations. If ‘length(n) > 1’, the length is

taken to be the number required.

min, max: lower and upper limits of the distribution. Must be finite.

log, log.p: logical; if TRUE, probabilities p are given as log(p).

lower.tail: logical; if TRUE (default), probabilities are P[X <= x],

otherwise, P[X > x].

Details:

If ‘min’ or ‘max’ are not specified they assume the default values

of ‘0’ and ‘1’ respectively.

The uniform distribution has density

f(x) = 1/(max-min)

for min <= x <= max.

For the case of u := min == max, the limit case of X == u is

assumed, although there is no density in that case and ‘dunif’

will return ‘NaN’ (the error condition).

‘runif’ will not generate either of the extreme values unless ‘max

= min’ or ‘max-min’ is small compared to ‘min’, and in particular

not for the default arguments.

Value:

‘dunif’ gives the density, ‘punif’ gives the distribution

function, ‘qunif’ gives the quantile function, and ‘runif’

generates random deviates.

The length of the result is determined by ‘n’ for ‘runif’, and is

the maximum of the lengths of the numerical arguments for the

other functions.

The numerical arguments other than ‘n’ are recycled to the length

of the result. Only the first elements of the logical arguments

are used.

4. help( hist , plot , barplot , pie , boxplot , stars, faces2 , stem , qqnorm , qqline )

直方hist散点plot柱状barplot饼pie箱线boxplot星相stars脸谱faces茎叶stemQQ图qqnorm

你可能感兴趣的:(直方hist散点plot柱状barplot饼pie箱线boxplot星相stars脸谱faces茎叶stemQQ图qqnorm)