生信学习~R数据结构

练习题

数据行

> #练习3-1
> # 1.读取excise.csv这个文件,赋值给test。
> test = read.csv("exercise.csv");test
   Petal.Length Petal.Width Species
1           4.6         1.5       a
2           5.9         2.1       b
3           4.5         1.5       a
4           6.0         2.5       b
5           4.0         1.3       a
6           4.7         1.4       a
7           1.3         0.2       c
8           1.4         0.2       c
9           5.1         1.9       b
10          5.8         2.2       b
11          4.9         1.5       a
12          1.4         0.2       c
13          1.5         0.2       c
14          5.6         1.8       b
15          1.4         0.2       c
> # 2.描述test的属性(行名列名,行数列数)。
> # 3.求第一列数值的中位数
> dim(test)
[1] 15  3
> colnames(test)
[1] "Petal.Length" "Petal.Width"  "Species"     
> rownames(test)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"
[13] "13" "14" "15"
> median(test[,1])
[1] 4.6

> # 4.修改test前两列的列名为Length和Width
> colnames(test[,1:2]) = c("length","Width");test
   Petal.Length Petal.Width Species
1           4.6         1.5       a
2           5.9         2.1       b
3           4.5         1.5       a
4           6.0         2.5       b
5           4.0         1.3       a
6           4.7         1.4       a
7           1.3         0.2       c
8           1.4         0.2       c
9           5.1         1.9       b
10          5.8         2.2       b
11          4.9         1.5       a
12          1.4         0.2       c
13          1.5         0.2       c
14          5.6         1.8       b
15          1.4         0.2       c

> # 5.提取test中,最后一列值为a或c的行,组成一个新的数据框,赋值给test2。
> test2 = test[test$Species%in%c("a","c"),];test2
   length Width Species
1     4.6   1.5       a
3     4.5   1.5       a
5     4.0   1.3       a
6     4.7   1.4       a
7     1.3   0.2       c
8     1.4   0.2       c
11    4.9   1.5       a
12    1.4   0.2       c
13    1.5   0.2       c
15    1.4   0.2       c
> test2 = test[test$Species != "b",];test2
   length Width Species
1     4.6   1.5       a
3     4.5   1.5       a
5     4.0   1.3       a
6     4.7   1.4       a
7     1.3   0.2       c
8     1.4   0.2       c
11    4.9   1.5       a
12    1.4   0.2       c
13    1.5   0.2       c
15    1.4   0.2       c

 数据框进阶

(1)行数较多的数据框可以截取前/后几行查看-head()函数。需要列数有限。

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> head(iris,3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

(2)行数和列数均比较多时,注意中括号,是数据框

> iris[1:3,1:3]
  Sepal.Length Sepal.Width Petal.Length
1          5.1         3.5          1.4
2          4.9         3.0          1.4
3          4.7         3.2          1.3

(3)查看每一列的数据类型和具体内容

> str(df)
'data.frame':	4 obs. of  4 variables:
 $ r1: chr  "gene1" "gene2" "gene3" "gene4"
 $ r2: chr  "up" "up" "down" "down"
 $ r3: num  5 3 5 -4
 $ r4: num  0.001 0.002 0.003 0.004
> df
     r1   r2 r3    r4
1 gene1   up  5 0.001
2 gene2   up  3 0.002
3 gene3 down  5 0.003
4 gene4 down -4 0.004
> str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

(4)去除含有缺失值

> na.omit(df)
     r1   r2 r3    r4
1 gene1   up  5 0.001
2 gene2   up  3 0.002
3 gene3 down  5 0.003
4 gene4 down -4 0.004

(5)数据框的连接

行数相同时:按列连接:cbind()

列数相同时:按行连接:rbind()

取交集   :             

列名相同时                > merge(test1,test2,by="name")

 列名不同时                > merge(test1,test2,by.x ="name",by.y = "NAME")

#############################################################################

矩阵的转置和转换

> m = matrix(1:16,nrow=4)
> m
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16
> colnames(m) = c("g1","g2","g3","g4");m
     g1 g2 g3 g4
[1,]  1  5  9 13
[2,]  2  6 10 14
[3,]  3  7 11 15
[4,]  4  8 12 16
> t(m)
   [,1] [,2] [,3] [,4]
g1    1    2    3    4
g2    5    6    7    8
g3    9   10   11   12
g4   13   14   15   16

> as.data.frame(m)
  g1 g2 g3 g4
1  1  5  9 13
2  2  6 10 14
3  3  7 11 15
4  4  8 12 16

你可能感兴趣的:(学习,r语言)