R语言实战 前三章 统计 数据框 经典画图

目录

  • 导论
    • 案例1 stat
    • 案例2 packages
  • 第一章 R语言介绍
    • 基本的操作命令
    • 保存图片
  • 第二章 创建数据集
    • 2.1. 合并
    • 2.2. 向量
      • 2.2.1. 赋值
      • 2.2.2. 删除
      • 2.2.3. 索引
      • 2.2.4. seq
      • 2.2.5. rep
      • 案例1
      • 案例2
    • 2.2. 矩阵
      • 索引
      • 变换
      • 计算
    • 2.3. 数组
    • 2.4. 数据框
      • 索引
        • 案例1
      • 绑定
    • 2.5. 因子
    • 2.6. 列表
    • 2.6. 导入外部文件的数据(以csv为例)
  • 第三章 绘制图形
    • 3.1. 初尝试 - example1
    • 3.1. 初尝试 - example2
    • 3.2. example - 小练习 - A
    • 3.2. example - 小练习 - B
    • 3.2. example - 小练习 - C
    • 3.3. example - 彩虹图 - A
    • 3.3. example - 彩虹图 - B
    • 3.3. example - 灰度图
    • 3.4. example - RColorBrewer - A
    • 3.4. example - RColorBrewer - B
    • 3.5. example - 文本属性(添加文本)
      • 3.5.1. 字体font
      • 3.5.1. 字体族
        • windowsfonts创建字体族
        • family调用创建好的
      • 3.5. 案例1
      • 3.5. 案例2
      • 3.5. 案例3
  • 4. 基本数据管理
  • 5. 统计前篇
    • 5.1. 数学统计函数
    • 5.2. 描述统计函数

导论

案例1 stat

> age <- c(1, 3, 5, 2, 11, 9, 3, 9, 12, 3) #导入数据
> weight <- c(4.4, 5.3, 7.2, 5.2, 8.5, 7.3, 6.0, 10.4, 10.2, 6.1)
> mean(weight)                             #均值
> sd(weight)                               #方差
> cor(age, weight)                         #相关系数
> plot(age, weight)                        #作图

案例2 packages

insatll.packages("ggplot2")    #下载包
update.packages("ggplot2")     #更新包
remove.packages("ggplot2")     #删除包

第一章 R语言介绍

基本的操作命令

R语言实战 前三章 统计 数据框 经典画图_第1张图片

> setwd("C:/myprojects/project1")     # 设置工作目录
> options( )    # 显示选项设置
> options(digits=3)    # 显示选项设置,其中的数字以小数点三位有效数字的形式显示
> x <- runif(20)    # 创建包含20个均匀分布的变量
> summary(x)    # 显示摘要
> hist(x)    # 绘制直方图
> q()    # 退出R程序
> source("F:/R语言/c2_r1.R")#加载文件
#可在当前会话中执行一个脚本;如果文件名中不包含路径,R将假设此脚本在当前工作目录
> sink(“filename”,append,split)
#将文本定向输出到文件filename中

保存图片

R语言实战 前三章 统计 数据框 经典画图_第2张图片

#保存多张	
x<-runif(10)
y<-runif(10)
pdf("1.png")
hist(x)
plot(x,y)
dev.off()#两张图片都会保存到pdf里

第二章 创建数据集

view(a)#查看a,但不能编辑
fix(a)#可以编辑,若a是向量,则打开记事本

2.1. 合并

> a<-c(1,2)
> b<-c("a","b")
> c<-c(T,F)
#c除了是向量还是合并的命令

> c(a,b)
[1] "1" "2" "a" "b"
> c(a,c)
[1] 1 2 1 0
> c(b,c)
[1] "a"     "b"     "TRUE"  "FALSE"
#或者用cbind(a,b)和rbind(a,b)

> cbind(a,b)  #横向合并
     a   b  
[1,] "1" "a"
[2,] "2" "b"
> rbind(a,b)  #纵向合并
  [,1] [,2]
a "1"  "2" 
b "a"  "b" 

#dataframe
> data.frame(a=c("Zhao","Qian","Sun","Li","Ma"),b=c(1,2,19,4,5),c=6: 10) 
     a  b  c
1 Zhao  1  6
2 Qian  2  7
3  Sun 19  8
4   Li  4  9
5   Ma  5 10

2.2. 向量

a <- c(1,2,5,6) 创建数值型向量 a,元素为:1,2,5,6

> mode(a) #看类型
[1] "numeric"

b <- c("one", "two", "three")
#创建字符型向量 b
> b['one' %in% b]
[1] "one"   "two"   "three"

c <- c(TRUE, TRUE, FALSE)
#创建逻辑型向量 c

2.2.1. 赋值

> c <- c(1:3)
> c[4:6] <- c(4:6)
> c
[1] 1 2 3 4 5 6
> c[10]<-1
> c
 [1]  1  2  3  4  5  6 NA NA NA  1
 #下面是在第五个元素之后插入
> append(x=c,values=0,after=7)
 [1]  1  2  3  4  5  6 NA  0 NA NA  1

2.2.2. 删除

#删除整个向量
> rm(c)
> c
function (...)  .Primitive("c")


#删除某一个
> a[-c(1,3)]
[1] 2 6

2.2.3. 索引

> a <- c(1,2,5,6)
> a[2] #第二个
[1] 2
> a[-2] #除了2以外的
[1] 1 5 6
> a[c(1,3)] #多个
[1] 1 5

> a[c(TRUE)] #逻辑值
[1] 1 2 5 6

> a[a>2]
[1] 5 6

2.2.4. seq

a <- seq(1, 10, 2) #生成从 1 开始,步长为 2,到 10 为止的向量 a 

b <- seq(10, 1, -1) #生成从 10 开始,步长为-1,到 1 为止的向量 b 

c <- seq(1, by=2, length=10) #生成从 1 开始,步长为 2,包含 10 个元素的向量 c 

d <- rep(c(1, 3), 3) #用生成重复元素的函数生成向量 d: 1,3,1,3,1,3 

2.2.5. rep

e <- rep(c(1,3), each=3) #用生成重复元素的函数生成向量 e: 1,1,1,3,3,3 

案例1

> c<-1:100
> c
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
 [17]  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48
 [49]  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
 [81]  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100
> sum(c)
[1] 5050
> max(c)
[1] 100
> min(c)
[1] 1
> sd(c)
[1] 29.01149
> range(c)
[1]   1 100
> var(c)
[1] 841.6667
> round(var(c))
[1] 842
> round(sd(c),2)
[1] 29.01
> median(c)
[1] 50.5
> quantile(c)
 	0%    25%    50%    75%   100% 
	1.00  25.75  50.50  75.25 100.00
> prod(c)#连乘的积
[1] 9.332622e+157
> which(c==25) #返回索引值
[1] 25
> which.max(c)
[1] 100

案例2

abs()#绝对值
sqrt()#开根号
exp()#指数
> log(25)
[1] 3.218876
> log(25,base=5)
[1] 2	
> log10(10)
[1] 1
> ceiling(c(2.354,-6.245)) #向上取整
[1]  3 -6
> floor(c(2.354,-6.245))#返回不大于x的最大整数
[1]  2 -7
> trunc(c(2.354,-6.245))#返回整数部分,即去掉小数点
[1]  2 -6

2.2. 矩阵

矩阵是二维数组,数组是多维的,但他俩很像,基本没什么区别。
mymatrix <- matrix(1:20, nrow=5, ncol=4,  #以 1 至 20 的整数为元素,创建 5 行 4 列的矩阵 mymatrix
 byrow=TRUE,                              #byrow矩阵默认是按列填充,这里T表示按行
dimnames(mymatrix)=list(c("R1","R2","R3","R4","R5"), c("C1","C2","C3","C4"))                 
#list(cnames,rnames)
#cnames=c("R1","R2","R3","R4","R5");
> mytrix
   C1 C2 C3 C4
R1  1  2  3  4
R2  5  6  7  8
R3  9 10 11 12
R4 13 14 15 16
R5 17 18 19 20
dim(x)#返回行列数
#dim转换矩阵和向量
> dim(x)<-c(3,3)
> view(x)
> x<-as.vector(m)#矩阵->向量

索引

创造一个矩阵,提取部分行列和删除部分行列

> a<-matrix(1:15,3,5)
> a
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    4    7   10   13
[2,]    2    5    8   11   14
[3,]    3    6    9   12   15
> b<-matrix(,nrow=3,ncol=4)
> b
     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA
> a[c(1:3),c(2:4)]            #加括号可以
     [,1] [,2] [,3]
[1,]    4    7   10
[2,]    5    8   11
[3,]    6    9   12
> a[1:3,2:4]                  #不加括号也可以
     [,1] [,2] [,3]
[1,]    4    7   10
[2,]    5    8   11
[3,]    6    9   12
> a[2:3,]                     #全部列 
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    5    8   11   14
[2,]    3    6    9   12   15
> a[1:2,c(1,3,4,5)]           #一部分加括号也可以
     [,1] [,2] [,3] [,4]
[1,]    1    7   10   13
[2,]    2    8   11   14
> a[-3,-2]                    #负数去除
     [,1] [,2] [,3] [,4]
[1,]    1    7   10   13
[2,]    2    8   11   14

> m
   C1 C2 C3 C4
R1  1  6 11 16
R2  2  7 12 17
R3  3  8 13 18
R4  4  9 14 19
> m[3,4]
[1] 18
> m[c(2:4),3]
R2 R3 R4 
12 13 14 
> m['R1','C3']
[1] 11
> m[c('R1','R3'),]
   C1 C2 C3 C4
R1  1  6 11 16
R3  3  8 13 18
> m[1,]
C1 C2 C3 C4 
 1  6 11 16 

变换

让矩阵变成向量,向量变成矩阵:

m<-matrix(1:9,3,3)
> m
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> c(m)
[1] 1 2 3 4 5 6 7 8 9
> m
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> m1<-c(m)
> m2<-matrix(m1)
> m2
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9
> m2<-matrix(m1,3,3)
> m2
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

计算

> sum(m)
[1] 160
> rowMeans(m)
  R1   R2   R3   R4 
 8.5  9.5 10.5 11.5 
> rowSums(m)
R1 R2 R3 R4 
34 38 42 46 
> m1<-matrix(1:9,3,3)
> m2<-matrix(2:10,3,3)
> m1
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> m2
     [,1] [,2] [,3]
[1,]    2    5    8
[2,]    3    6    9
[3,]    4    7   10
> m1*m2 #内积,对应位置相乘
     [,1] [,2] [,3]
[1,]    2   20   56
[2,]    6   30   72
[3,]   12   42   90
> m1 %*% m2 #外积,矩阵相乘
     [,1] [,2] [,3]
[1,]   42   78  114
[2,]   51   96  141
[3,]   60  114  168
> diag(m1)
[1] 1 5 9
> diag(m2)
[1]  2  6 10
> t(m1) #转置
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

2.3. 数组

array(数组里的数, 数组各维度里数的个数, 取名字)
dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3") 
dim3 <- c("C1", "C2", "C3", "C4"
z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3)) 

A<-matrix(1:20,4,5)
D<-array(1:20,c(4,5))
#没什么区别
D1<-array(1:60,c(3,4,5))
D2<-c(1:60)
dim(D2)<-c(3,4,5)
c(D1[1,1,],D1[1,,1])
c(D1[2,2,],D1[2,,2])

2.4. 数据框

即数据集,数据类型pattern;列-字段var;行-观测obs,必须有列名。

索引

> patientdata<-data.frame(li=c(1,2,3),ke=c(3,2,1),yo=c(2,2,2))
> patientdata
  li ke yo
1  1  3  2
2  2  2  2
3  3  1  2
> patientdata[1:2]  #提取1-2列
  li ke
1  1  3
2  2  2
3  3  1
> patientdata[c("li","yo")] #提取叫li ,yo的列
  li yo
1  1  2
2  2  2
3  3  2
> new<-patientdata[which(patientdata$li>1),]  #提取li>1的观测
> new
  li ke yo
2  2  2  2
3  3  1  2

mydata<-data.frame(m1,m2,m3)   #每一列对应一个变量,不同列的元素类型可以不一样

> class(women)
[1] "data.frame"
> plot(women$height,women$weight)
> lm(weight~height, data = women)

Call:
lm(formula = weight ~ height, data = women)

Coefficients:
(Intercept)       height  
     -87.52         3.45 

案例1

把所有35岁以下病人的状态改为excellent

patientID <- c(1, 2, 3, 4) 
age <- c(25, 34, 28, 52) 
diabetes <- c("Type1", "Type2", "Type1", "Type1") 
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes,status) 
#do it
x<-which(patientdata$age<35)
patientdata$status[x]<-"excellent"
patientdata[c("age","status")]

绑定

#为了避免每次都用“数据集名$”来绑定数据集,可以使用attach( ),detach( ),with( )

	attach(mtcars)   #绑定数据集
	summary(mpg)     #对变量 mpg 做摘要总结;
	plot(mpg,disp)   #绘制 mpg 和 disp 的散点图; 
	detach(mtcars)   #解除绑定
#  

	with(mtcars,{                          #绑定数据集 mtcars;
	nokeep <- summary(mpg)                 #对变量 mpg 做摘要总结,保存在 nokeep 中;
	keep <<- summary(mpg)                  #对变量 mpg 做摘要总结,保存在 keep 中,保证 keep 在 with( )结构之外的全局环境 ....中也能使用;
	})                                     #解除绑定

2.5. 因子

因子有点像是一种可以给数据定级的存储方式
在R中名义型变量和有序性变量称为因子,factor,这些分类变量的可能值称为一个水平level,例如一个字段叫grade,其水平就有good,better,best。

> status<-c("Poor","Improved","Excellent","Poor") 
> status<-factor(status,order=TRUE,levels=c("Poor","Improved","Excellent"))
> status
[1] Poor      Improved  Excellent Poor     
Levels: Poor < Improved < Excellent

> fcy1<-factor(mtcars$cyl)
> fcy1
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Levels: 4 6 8
> plot(mtcars$cyl)
> plot(factor(mtcars$cyl))
> num<-1:100
> num
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
 [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
 [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
 [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
 [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
 [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
> cut(num,c(seq(0,100,10))) #分组并打上标签,seq把100分成10个组
  [1] (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]  
  [8] (0,10]   (0,10]   (0,10]   (10,20]  (10,20]  (10,20]  (10,20] 
 [15] (10,20]  (10,20]  (10,20]  (10,20]  (10,20]  (10,20]  (20,30] 
 [22] (20,30]  (20,30]  (20,30]  (20,30]  (20,30]  (20,30]  (20,30] 
 [29] (20,30]  (20,30]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40] 
 [36] (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (40,50]  (40,50] 
 [43] (40,50]  (40,50]  (40,50]  (40,50]  (40,50]  (40,50]  (40,50] 
 [50] (40,50]  (50,60]  (50,60]  (50,60]  (50,60]  (50,60]  (50,60] 
 [57] (50,60]  (50,60]  (50,60]  (50,60]  (60,70]  (60,70]  (60,70] 
 [64] (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70] 
 [71] (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80] 
 [78] (70,80]  (70,80]  (70,80]  (80,90]  (80,90]  (80,90]  (80,90] 
 [85] (80,90]  (80,90]  (80,90]  (80,90]  (80,90]  (80,90]  (90,100]
 [92] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100]
 [99] (90,100] (90,100]
10 Levels: (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] ... (90,100]

2.6. 列表

> g <- "My First List"
> h <- c(25, 26, 18, 39)
> j <- matrix(1:10, nrow = 5)
> k <- c("one", "two", "three") 
> > mylist <- list(title=g,ages=h,shuzu=j,str=k) 
> mylist
$`title`
[1] "My First List"

$ages
[1] 25 26 18 39

$shuzu
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

$str
[1] "one"   "two"   "three"

如果要访问第三分量:

> mylist[[3]] #两个中括号,并且每次只能访问一个元素
     [,1] [,2] #用一个中括号还是会得到一个列表
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
> mylist[["shuzu"]]
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
> mylist$shuzu
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

访问三到四个变量:

> mylist[3:4]
$`shuzu`
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

$str
[1] "one"   "two"   "three"

如果要访问第三分量中的第三行元素:

> mylist$shuzu[3,]
[1] 3 8
> mylist$shuzu[3]
[1] 3
> mylist$shuzu[3,1]
[1] 3
> mylist$shuzu[3,2]
[1] 8

其余命令:

length(mylist) 【补】展现列表 mylist 分量的数目

names(mylist) 【补】展现列表 mylist 各分量名字

> length(mylist)
[1] 4
> names(mylist)
[1] "title" "ages"  "shuzu" "str"  
> x<-unlist(mylist)          #转化为一个文本
> x
          title           ages1           ages2           ages3           ages4          shuzu1          shuzu2 
"My First List"            "25"            "26"            "18"            "39"             "1"             "2" 
         shuzu3          shuzu4          shuzu5          shuzu6          shuzu7          shuzu8          shuzu9 
            "3"             "4"             "5"             "6"             "7"             "8"             "9" 
        shuzu10            str1            str2            str3 
           "10"           "one"           "two"         "three" 

2.6. 导入外部文件的数据(以csv为例)

> mdata<-read.table("I:/compitition/round3/1.csv",header=TRUE,sep=",")
> View(mdata)
> m<-read.csv("I:/compitition/round3/1.csv",header=TRUE,sep=",")

第三章 绘制图形

dose <- c(20, 30, 40, 45, 60)
drugA <- c(16, 20, 27, 40, 60)
drugB <- c(15, 18, 25, 31, 40)
opar <- par(no.readonly = TRUE)       #复制一份当前图形参数设置的列表
par(lty = 2, pch = 17)                #将线条类型修改成虚线(2),将点符号修改成实心三角(17)
plot(dose, drugA, type = "b")         #绘制药物 A 的剂量(dose)和响应关系(drugA)的折线图,类型 为 b
par(opar)                             #还原图形参数设置

> plot(dose, drugA, type = "b", lty = 2, pch = 17, main = "good boy") #lty=line type  pch=point character
####### dev.new dev.next dev.set dev.off dev.prev

R语言实战 前三章 统计 数据框 经典画图_第3张图片

3.1. 初尝试 - example1

建议这个例子一行一行验证,

x11()  #打开绘图窗口  A
plot(1:10) # 绘制[1,10]的散点图 A
x11()   B
plot(rnorm(10)) #绘制正态分布图 B
dev.set(dev.prev()) #设置前一个窗口为当前绘图窗口 A
abline(0, 1) #绘制以原点为起始点、斜率为1的直线   A
dev.set(dev.next())  #设置下一个窗口为当前绘图窗口 B
abline(h = 0, v = 5,col = "red" ) #绘制一条y=0的灰色水平线,h水平,v垂直 B
dev.set(dev.prev())  #设置前一个窗口为当前绘图窗口 A
dev.off()  #关闭该图形设备 A
#此时留下了个B
dev.off()  #可以把B也关了
#下面是左A右B

R语言实战 前三章 统计 数据框 经典画图_第4张图片

3.1. 初尝试 - example2

drug <- data.frame (dose, drugA, drugB)
pdf("mygraph.pdf") 
attach(drug) #绑定数据框
plot(dose, drugA, main="Responseof drugAondose")
abline(lm(drugA~dose)) # 添加最优拟合曲线
# title("Responseof drugAondose" )#添加标题
detach(drug)  #解除绑定
dev.off()  #关闭图形设备

R语言实战 前三章 统计 数据框 经典画图_第5张图片

3.2. example - 小练习 - A

mpar<-par(no.readonly = T)
mdata1<-read.table("online shopping.txt",header=T)

attach(mdata1)
opar<-par(no.readonly = TRUE)
par(pch=22,bg="grey",col="red")
plot(period, amount,main = "Figure.1",bg="blue")
#plot里的bg是控制符号的背景,比如这个空心方框里面是蓝色
#par里的bg是控制整个的背景,col控制了pch的颜色
#如果par里的pch不是空心的,就是另外的样子了
par(opar) 
detach(mdata1)

R语言实战 前三章 统计 数据框 经典画图_第6张图片

然后改一下:

par(pch=15,bg="grey",col="red")     #par里的参数管所有的plot,plot只管这一个图
#pch是15,那么plot里的blue就没用了,可以理解为优先级
plot(period,amount,main = "Figure.1",bg="blue")

R语言实战 前三章 统计 数据框 经典画图_第7张图片

3.2. example - 小练习 - B

opar <- par(no.readonly = TRUE)
t<-seq(0,2*pi,by=0.1)
x<-16*sin(t)^3
y<-13*cos(t)-5*cos(2*t)-2*cos(3*t)-cos(4*t)
a<-(x-min(x))/(max(x)-min(x))
b<-(y-min(y))/(max(y)-min(y))
par(bg="pink")
plot(a,b,ann=FALSE,pch=16,axes=FALSE,col="red")
#ann移除默认的标题或标签
#axes禁用全部坐标轴,包括坐标轴框线
par(opar) #好丑...

R语言实战 前三章 统计 数据框 经典画图_第8张图片

3.2. example - 小练习 - C

x<-colors()  
#返回所有可用颜色的名称
x<-seq(-0.5*pi,0.5*pi,by=0.01)
y<-sin(4*x)^2
plot(t*y*cos(x),t*y*sin(x),pch=21,col="green",bg="gray",axes=FALSE,ann=F)
points((t*8/9)*y*cos(x),(t*8/9)*y*sin(x),pch=13,col=rgb(0.1,0.6,0.4))
points((t*2/3)*y*cos(x),(t*2/3)*y*sin(x),pch=13,col=rgb(0.9,0.3,0.4))
lines((t/2)*y*cos(x),(t/2)*y*sin(x),pch=12,col=rgb(0.2,0.4,0.7))
lines((t/3)*y*cos(x),(t/3)*y*sin(x), pch=11,col="yellowgreen")

R语言实战 前三章 统计 数据框 经典画图_第9张图片

3.3. example - 彩虹图 - A

par(mfrow=c(1,2))#行数为nrows、列数为ncols的图形矩
pie(rep(1,12), col = rainbow(12), main = "rainbow12")
pie(rep(1,1000), labels = "", col=rainbow(1000), 
    border = rainbow(1000), main = "rainbow1000")

R语言实战 前三章 统计 数据框 经典画图_第10张图片

3.3. example - 彩虹图 - B

我那个大小有问题,于是设置了两种方式调节大小,自己拆开吧
par(mfrow = c(2,2),mar = c(1,1,1,1))
pdf("mygraph.pdf")
layout(matrix(c(1, 2, 3, 4), 2, 2, byrow = TRUE), 
       widths = c(1, 1), heights = c(1, 1)) 
pie(rep(1,12), col = heat.colors(12), main = "heat")
pie(rep(1,12), col = terrain.colors(12), main = "terrain")
pie(rep(1,12), col = topo.colors(12), main = "topo")
pie(rep(1,12), col = cm.colors(12), main = "cm")
dev.off()

R语言实战 前三章 统计 数据框 经典画图_第11张图片

3.3. example - 灰度图

n <- 10 
mygrays <- gray(0:n/n) 
pie(rep(1, n), labels = mygrays, col = mygrays)

R语言实战 前三章 统计 数据框 经典画图_第12张图片

3.4. example - RColorBrewer - A

brewer.pal(n, name)
可以按照顺序提取出用户想要的颜色,n表示维度,name是类型(序列型Seq、离散型Div、分类型Qual)
display.brewer.all(type="qual")(type="seq")(type="div")

> rep(1,4)
[1] 1 1 1 1
library(RColorBrewer)
barplot(rep(1,7), col=brewer.pal(n,"Set2"))

R语言实战 前三章 统计 数据框 经典画图_第13张图片

3.4. example - RColorBrewer - B

#horiz=T  使其水平
#axes=F  去刻度
#border=F 去框
par(mfrow=c(1,5),mar=c(1,1,1,1))          #mar用数值向量设置边界,pin用英寸表示图形尺寸
n<-9
barplot(rep(1,n), col=brewer.pal(n,"Greys"), horiz=T, main="my greys",axes=F)
barplot(rep(1,n), col=brewer.pal(n,"PuRd"), horiz=T, main="my purd",axes=F)
barplot(rep(1,n), col=brewer.pal(n,"BuGn"), horiz=T, main="my bugn",axes=F)
barplot(rep(1,n), col=brewer.pal(n,"Oranges"), horiz=T, main="my oranges",axes=F)
barplot(rep(1,n), col=brewer.pal(n,"Blues"), horiz=T, main="my blues",axes=F)

R语言实战 前三章 统计 数据框 经典画图_第14张图片

3.5. example - 文本属性(添加文本)

3.5.1. 字体font

#1
plot(1:7,c(0:6),col="blue",pch=19,xlab="index")
text(x=2, y=4, "font=1:正常字体",font=1)
text(x=3, y=3, "font=2:粗体字体",font=2)
text(x=4, y=2, "font=3:斜体字体",font=3)
text(x=5, y=1, "font=4:粗斜体字体",font=4)

R语言实战 前三章 统计 数据框 经典画图_第15张图片

3.5.1. 字体族

windowsfonts创建字体族

family调用创建好的

2
plot(rnorm, lty=0, ann=F, axes=F)                  #ann取消坐标轴和标题文本
box(which="plot", lty = '1371', col="gray")        #box绘制边框, (虚线)lty = 2
windowsFonts(A=windowsFont("楷体"),                #楷体
             B=windowsFont("隶书"),                #隶书
             C=windowsFont("华文新魏"),             #华文新魏
             D=windowsFont("微软雅黑"),             #微软雅黑
             E=windowsFont("幼圆")                 #幼圆
             )
text(x=0.5, y=2, family="A", "画")
text(x=0.5, y=1, "远看山有色",family="B")
text(x=0.5, y=0, "近听水无声",family="C")
text(x=0.5, y=-1, "春去花还在",family="D")
text(x=0.5, y=-2, "人来鸟不惊",family="E")

R语言实战 前三章 统计 数据框 经典画图_第16张图片

3.5. 案例1

plot(dose, drugA, type = "b",  
     col = "red",  
     lty = 2, pch = 2, lwd = 2,  
     main = "Clinical Trials for Drug A
     ",  
     sub = "This is hypothetical data",   #副标题
     xlab = "Dosage",  
     ylab = "Drug Response",  
     xlim = c(0, 60),  
     ylim = c(0, 70)) 

3.5. 案例2

x <- c(1:10)
y <- x
z <- 10/x
x2<-c(2,4,6,8,10)
opar <- par(no.readonly=TRUE)
par(mar=c(5, 4, 4, 8) + 0.1)
plot(x, y, type="b",pch=21, col="red",yaxt="n", lty=3, ann=FALSE)
lines(x, z, type="b", pch=22, col="blue", lty=2)
axis(2, at=x, labels=x, col.axis="red", las=2)
axis(3, at=x2, labels=x2, col.axis="green")
axis(4, at=z, labels=round(z, digits=2),col.axis="blue", las=2, cex.axis=0.7, tck=-.01)
mtext("y=1/x", side=4, line=3, cex.lab=1, las=2, col="blue")
mtext("Top X Axies", side=3, line=3, cex.lab=1, las=0, col="green")
title("An Example of Creative Axes",sub="Three Axies", xlab="X values",ylab="Y=X")
par(opar)

R语言实战 前三章 统计 数据框 经典画图_第17张图片

3.5. 案例3

mpar<-par(no.readonly = T)
t = seq(0, 2*pi, by = 0.01)
x = 0.9*sin(t)
y = 0.9*cos(t)
par(mar = c(3,3,3,3))#pin = c(3,3)
plot(x, y, col = "red", type = "l", lwd = 2)
lines(x/2, y/2, col = "red", type = "l", lwd = 8)
lines(x*4/5, y*4/5, col = "red", type = "l", lwd = 4)
abline(h = 0, v = 0)
text(-0.9, 0, "1")
text(0.9, 0, "1")
text(-0.7, 0, "2")
text(0.7, 0, "2")
text(-0.4, 0, "3")
text(0.4, 0, "3")
axis(1, c(-0.5, 0.0, 0.5))
axis(2, c(-0.5, 0.0, 0.5))
legend("topright", c("outer", "middle", "inner"),lty = c(1,1,1), 
       col = c("red", "red", "red"), lwd = c(2, 2, 4))
par(mpar)

R语言实战 前三章 统计 数据框 经典画图_第18张图片

4. 基本数据管理

这一章最后整理整理再更新。

> nchar("hello world")
[1] 11
> month.name 
 [1] "January"   "February"  "March"     "April"     "May"      
 [6] "June"      "July"      "August"    "September" "October"  
[11] "November"  "December" 
> nchar(month.name)
 [1] 7 8 5 5 3 4 4 6 9 7 8 8
> length(month.name)
[1] 12
> nchar(c(12,2,345))
[1] 2 1 3
> paste(c('everybody','loves','stat'))#粘合成一个字符串
[1] "everybody" "loves"     "stat"     
> paste('everybody','loves','stat')#粘合成一个字符串,默认用空格分割
[1] "everybody loves stat"
> paste('everybody','loves','stat',sep='_')#粘合成一个字符串,用_分割
[1] "everybody_loves_stat"
> names<-c('moe','larry','curly')
> paste(names,'loves stats') #names的每个都粘结
[1] "moe loves stats"   "larry loves stats" "curly loves stats"
> substr(x = month.name,start = 1, stop = 3)#取前三个
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
[12] "Dec"
> temp<-substr(x = month.name,start = 1, stop = 3)#
> toupper(temp) #变大写
 [1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV"
[12] "DEC"
> tolower(temp) #变小写
 [1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov"
[12] "dec"
> #首字母大写,用正则表达式
> gsub("^(\\w)","\\U\\1",tolower(temp),perl = T)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
[12] "Dec"
> gsub("^(\\w)","\\L\\1",tolower(temp),perl = T)
 [1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov"
[12] "dec"
> gsub("^(\\w)","\\L\\1",toupper(temp),perl = T)
 [1] "jAN" "fEB" "mAR" "aPR" "mAY" "jUN" "jUL" "aUG" "sEP" "oCT" "nOV"
[12] "dEC"

5. 统计前篇

5.1. 数学统计函数

正态分布的随机数函数

?norm #查看帮助
> rnorm(n=100,mean=15,sd=2)
  [1] 14.39507 19.20102 14.49449 15.06635 15.25867 13.81112 11.70314
  [8] 12.71880 12.85904 13.01281 13.51563 16.27566 17.57319 16.83718
 [15] 11.94163 15.45456 15.93432 14.15210 15.44173 14.07534 14.53258
 [22] 15.06427 13.24477 13.07941 16.07982 14.04915 17.42273 16.92472
 [29] 18.19617 15.34587 13.68050 17.17103 16.65054 13.52107 15.96886
 [36] 15.82427 13.68439 14.83785 10.45197 13.48326 12.35020 13.79613
 [43] 14.82436 15.64499 15.60782 12.05625 15.64949 14.13571 16.61578
 [50] 14.43983 13.23951 12.39544 13.91216 16.49914 14.55893 11.68325
 [57] 14.23137 21.13787 18.11357 14.52999 14.44863 14.66166 11.48979
 [64] 12.84015 16.66261 18.70409 15.48486 12.86410 12.04665 16.54661
 [71] 15.42460 16.32328 16.97274 18.65668 15.03878 15.54645 14.71909
 [78] 16.81800 18.92858 13.52018 12.83845 17.52546 17.30444 13.89279
 [85] 16.16979 12.97065 17.71496 15.64037 16.68800 17.38156 16.37317
 [92] 16.23341 12.17538 18.60695 13.07655 13.70086 16.33865 14.72251
 [99] 16.12488 14.96604
> round(rnorm(n=100,mean=15,sd=2))
  [1] 16 15 14 17 15 15 15 16 14 15 19 13 15 12 16 17 13 16 18 16 14 12 16
 [24] 17 16 14 17 14 13 12 17 16 17 11 20 14 12 17 14 15 13 15 17 18 20 15
 [47] 17 16 12 18 15 15 12 15 15 15 11 16 12 16 18 13 13 15 19 13 12 12 13
 [70] 17 11 16 13 18 13 16 12 16 16 13 16 19 16 16 16 16 15 11 18 14 12 17
 [93]  8 12 13 14 18 16 15 14
#需要什么函数 就“?函数”查看帮助即可
> x<-rnorm(n=100,mean=15,sd=2)
> qqnorm(x)   #绘制正态分布图
> runif(3)    #随机生成0-1的三个数
[1] 0.3934077 0.5689905 0.2948520
> runif(3,min=1,max=10)  #控制随机数的范围
[1] 7.419201 3.729455 4.189400

R语言实战 前三章 统计 数据框 经典画图_第19张图片
d:概率密度函数
p:分布函数
q:分布函数的反函数
r:产生相同分布的随机数

> dgamma(c(1:9),shape=2,rate=1) #随机生成gamma函数的概率密度
[1] 0.367879441 0.270670566 0.149361205 0.073262556
[5] 0.033689735 0.014872513 0.006383174 0.002683701
[9] 0.001110688

由于毕竟是生成随机数嘛,每次运行都随机一次,所以可以用设置随机种子的方式,即下面这种方式:

> set.seed(2020)
> runif(3)  #绑定
[1] 0.6469028 0.3942258 0.6185018
> runif(3)  #不行
[1] 0.47689114 0.13609719 0.06738439
> set.seed(2020)
> runif(3)  #行
[1] 0.6469028 0.3942258 0.6185018

5.2. 描述统计函数

> myvars<-mtcars[c('mpg','hp','wt','am')]
> summary(myvars)
      mpg              hp              wt       
 Min.   :10.40   Min.   : 52.0   Min.   :1.513  
 1st Qu.:15.43   1st Qu.: 96.5   1st Qu.:2.581  
 Median :19.20   Median :123.0   Median :3.325  
 Mean   :20.09   Mean   :146.7   Mean   :3.217  
 3rd Qu.:22.80   3rd Qu.:180.0   3rd Qu.:3.610  
 Max.   :33.90   Max.   :335.0   Max.   :5.424  
       am        
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.4062  
 3rd Qu.:1.0000  
 Max.   :1.0000  

> fivenum(myvars$mpg)    #返回 最小值 下四分位数 中位数 上四分位数 最大值
[1] 10.40 15.35 19.20 22.80 33.90  

> library(Hmisc)  
describe(myvars)   #描述统计
#参数trim=0.1,去除最高最低10%的部分

上面这个Hmisc的安装如果不好办可以看这个博客,说不定好使,链接

跟上面这个describe()类似,但我喜欢用这个函数:

library(pastecs)
stat.desc(myvars)
#可以选择的参数有norm=T(一些统计量,很详细了),desc=T(一些描述值),basic=T(一些基本值)

#因为结果比较直观:
                     mpg           hp          wt
nbr.val       32.0000000   32.0000000  32.0000000
nbr.null       0.0000000    0.0000000   0.0000000
nbr.na         0.0000000    0.0000000   0.0000000
min           10.4000000   52.0000000   1.5130000
max           33.9000000  335.0000000   5.4240000
range         23.5000000  283.0000000   3.9110000
sum          642.9000000 4694.0000000 102.9520000
median        19.2000000  123.0000000   3.3250000
mean          20.0906250  146.6875000   3.2172500
SE.mean        1.0654240   12.1203173   0.1729685
CI.mean.0.95   2.1729465   24.7195501   0.3527715
var           36.3241028 4700.8669355   0.9573790
std.dev        6.0269481   68.5628685   0.9784574
coef.var       0.2999881    0.4674077   0.3041285

未完待续…

你可能感兴趣的:(统计软件,数据分析,r语言,统计学)