R语言作业
- 1、打开
Rstudio
告诉我它的工作目录。
[1] "F:/生物信息学/技能树/生信技能树/3天课程资料/1.R/02-plots"
- 2、建立6个向量,基于不同的原子类型
a<-c(1,2,3)
b<-c(z,c,b)
c<-c("susahn","nimakl","xinjone")
d<-c(true, true, false,false)
e<-c(1,a,"ba","bc")
f<-c(TRUE,"c",2,a)
- 3、告诉我在你打开的rstudio里面
getwd()
代码运行后返回的是什么?
getwd()
[1] "F:/生物信息学/技能树/生信技能树/3天课程资料/1.R/02-plots"
- 4、新建一些数据结构,比如矩阵,数组,数据框,列表等重点是数据框,矩阵)
matrix(seq(1:100),nrow = 5)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
[1,] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91
[2,] 2 7 12 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87 92
[3,] 3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93
[4,] 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 94
[5,] 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
[,20]
[1,] 96
[2,] 97
[3,] 98
[4,] 99
[5,] 100
as.data.frame(matrix(seq(1:100),nrow = 4))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25
1 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
2 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 98
3 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99
4 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100
- 5、在你新建的数据框进行切片操作,比如首先取第1,3行, 然后取第4,6列
zym<-as.data.frame(matrix(seq(1:100),nrow = 10))
a<-zym[c(1,3),]
b<- zym[,c(4,6)]
- 6、使用data函数来加载R内置数据集
rivers
描述它。并且可以查看更多的R语言内置的数据集
data(rivers)###没有报错,但是显示不出数据,用下面的命令就可以显示。
rivers
[1] 735 320 325 392 524 450 1459 135 465 600 330 336 280 315 870 906 202 329 290 1000 600
[22] 505 1450 840 1243 890 350 407 286 280 525 720 390 250 327 230 265 850 210 630 260 230
[43] 360 730 600 306 390 420 291 710 340 217 281 352 259 250 470 680 570 350 300 560 900
[64] 625 332 2348 1171 3710 2315 2533 780 280 410 460 260 255 431 350 760 618 338 981 1306 500
[85] 696 605 250 411 1054 735 233 435 490 310 460 383 375 1270 545 445 1885 380 300 380 377
[106] 425 276 210 800 420 350 360 538 1100 1205 314 237 610 360 540 1038 424 310 300 444 301
[127] 268 620 215 652 900 525 246 360 529 500 720 270 430 671 1770
iris[1:5,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
- 7、下载 https://www.ncbi.nlm.nih.gov/sra?term=SRP133642 里面的
RunInfo Table
文件读入到R里面,了解这个数据框,多少列,每一列都是什么属性的元素。(参考B站生信小技巧获取runinfo table) 这是一个单细胞转录组项目的数据,共768个细胞,如果你找不到RunInfo Table
文件,可以点击下载,然后读入你的R里面也可以
dim(a)
[1] 768 31
class(a)
[1] "data.frame"
summary(a)
BioSample Experiment MBases MBytes Run SRA_Sample
Length:768 Length:768 Min. : 0.00 Min. : 0.000 Length:768 Length:768
Class :character Class :character 1st Qu.: 8.00 1st Qu.: 4.000 Class :character Class :character
Mode :character Mode :character Median :12.00 Median : 6.000 Mode :character Mode :character
Mean :12.55 Mean : 6.414
3rd Qu.:16.00 3rd Qu.: 8.000
Max. :74.00 Max. :37.000
Sample_Name Assay_Type AssemblyName AvgSpotLen BioProject Center_Name
Length:768 Length:768 Length:768 Min. :43 Length:768 Length:768
Class :character Class :character Class :character 1st Qu.:43 Class :character Class :character
Mode :character Mode :character Mode :character Median :43 Mode :character Mode :character
Mean :43
3rd Qu.:43
Max. :43
Consent DATASTORE_filetype DATASTORE_provider InsertSize Instrument LibraryLayout
Length:768 Length:768 Length:768 Min. :0 Length:768 Length:768
Class :character Class :character Class :character 1st Qu.:0 Class :character Class :character
Mode :character Mode :character Mode :character Median :0 Mode :character Mode :character
Mean :0
3rd Qu.:0
Max. :0
LibrarySelection LibrarySource LoadDate Organism Platform
Length:768 Length:768 Length:768 Length:768 Length:768
Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character
ReleaseDate SRA_Study age cell_type marker_genes
Length:768 Length:768 Length:768 Length:768 Length:768
Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character
source_name strain tissue
Length:768 Length:768 Length:768
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
- 8、下载 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229 里面的
样本信息sample.csv
读入到R里面,了解这个数据框,多少列,每一列都是什么属性的元素。(参考https://mp.weixin.qq.com/s/fbHMNXOdwiQX5BAlci8brA 获取样本信息sample.csv)如果你实在是找不到样本信息文件sample.csv,也可以点击下载。
dim(b)
[1] 768 12
sumary(b)
Accession Title Sample.Type Taxonomy Channels Platform
Length:768 Length:768 Length:768 Length:768 Min. :1 Length:768
Class :character Class :character Class :character Class :character 1st Qu.:1 Class :character
Mode :character Mode :character Mode :character Mode :character Median :1 Mode :character
Mean :1
3rd Qu.:1
Max. :1
Series Supplementary.Types Supplementary.Links SRA.Accession Contact
Length:768 Length:768 Length:768 Length:768 Length:768
Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character
Release.Date
Length:768
Class :character
Mode :character
- 9、建立两个数据的关联
c=merge(a,b,by.x = 'Sample_Name',by.y = 'Accession')
- 10、对前面读取的
RunInfo Table
文件在R里面探索其MBases列,包括 箱线图(boxplot)和五分位数(fivenum),还有频数图(hist),以及密度图(density)
d<-c[,c("MBases","Title")]
plate=unlist(lapply(e[,2],function(x){
x
strsplit(x,'_')[[1]][3]
}))
e$plate<-plate
下面的不会。