练习题
数据行
> #练习3-1
> # 1.读取excise.csv这个文件,赋值给test。
> test = read.csv("exercise.csv");test
Petal.Length Petal.Width Species
1 4.6 1.5 a
2 5.9 2.1 b
3 4.5 1.5 a
4 6.0 2.5 b
5 4.0 1.3 a
6 4.7 1.4 a
7 1.3 0.2 c
8 1.4 0.2 c
9 5.1 1.9 b
10 5.8 2.2 b
11 4.9 1.5 a
12 1.4 0.2 c
13 1.5 0.2 c
14 5.6 1.8 b
15 1.4 0.2 c
> # 2.描述test的属性(行名列名,行数列数)。
> # 3.求第一列数值的中位数
> dim(test)
[1] 15 3
> colnames(test)
[1] "Petal.Length" "Petal.Width" "Species"
> rownames(test)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
[13] "13" "14" "15"
> median(test[,1])
[1] 4.6
> # 4.修改test前两列的列名为Length和Width
> colnames(test[,1:2]) = c("length","Width");test
Petal.Length Petal.Width Species
1 4.6 1.5 a
2 5.9 2.1 b
3 4.5 1.5 a
4 6.0 2.5 b
5 4.0 1.3 a
6 4.7 1.4 a
7 1.3 0.2 c
8 1.4 0.2 c
9 5.1 1.9 b
10 5.8 2.2 b
11 4.9 1.5 a
12 1.4 0.2 c
13 1.5 0.2 c
14 5.6 1.8 b
15 1.4 0.2 c
> # 5.提取test中,最后一列值为a或c的行,组成一个新的数据框,赋值给test2。
> test2 = test[test$Species%in%c("a","c"),];test2
length Width Species
1 4.6 1.5 a
3 4.5 1.5 a
5 4.0 1.3 a
6 4.7 1.4 a
7 1.3 0.2 c
8 1.4 0.2 c
11 4.9 1.5 a
12 1.4 0.2 c
13 1.5 0.2 c
15 1.4 0.2 c
> test2 = test[test$Species != "b",];test2
length Width Species
1 4.6 1.5 a
3 4.5 1.5 a
5 4.0 1.3 a
6 4.7 1.4 a
7 1.3 0.2 c
8 1.4 0.2 c
11 4.9 1.5 a
12 1.4 0.2 c
13 1.5 0.2 c
15 1.4 0.2 c
数据框进阶
(1)行数较多的数据框可以截取前/后几行查看-head()函数。需要列数有限。
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> head(iris,3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
(2)行数和列数均比较多时,注意中括号,是数据框
> iris[1:3,1:3]
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
(3)查看每一列的数据类型和具体内容
> str(df)
'data.frame': 4 obs. of 4 variables:
$ r1: chr "gene1" "gene2" "gene3" "gene4"
$ r2: chr "up" "up" "down" "down"
$ r3: num 5 3 5 -4
$ r4: num 0.001 0.002 0.003 0.004
> df
r1 r2 r3 r4
1 gene1 up 5 0.001
2 gene2 up 3 0.002
3 gene3 down 5 0.003
4 gene4 down -4 0.004
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
(4)去除含有缺失值
> na.omit(df)
r1 r2 r3 r4
1 gene1 up 5 0.001
2 gene2 up 3 0.002
3 gene3 down 5 0.003
4 gene4 down -4 0.004
(5)数据框的连接
行数相同时:按列连接:cbind()
列数相同时:按行连接:rbind()
取交集 :
列名相同时 > merge(test1,test2,by="name")
列名不同时 > merge(test1,test2,by.x ="name",by.y = "NAME")
#############################################################################
矩阵的转置和转换
> m = matrix(1:16,nrow=4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> colnames(m) = c("g1","g2","g3","g4");m
g1 g2 g3 g4
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> t(m)
[,1] [,2] [,3] [,4]
g1 1 2 3 4
g2 5 6 7 8
g3 9 10 11 12
g4 13 14 15 16
> as.data.frame(m)
g1 g2 g3 g4
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16