2019-12-24TCGA 2.0

12/2学习内容(TCGA 2.0)

上午

继续阅读关于老师发的文献,不过这次老师发的文献感觉有点难理解了,需要一定的知识储备,所以阅读的时候不懂有去谷歌一下,所以阅读起来就慢了一点

下午

想把之前的那个TCGA的流程再搞一搞,争取理解明白。不过在阅读公众号和博客的时候发现一个强大的包,感觉还不错,就进行了操作,下面是GDCRNATools包的下载数据教程

3 GDCRNATools套餐安装

​ 可以通过运行以下命令来安装稳定发行版:

## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("GDCRNATools")

​ 要安装开发版本,请将R and Biocondutor更新到最新版本并运行:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("GDCRNATools", version = "devel")
library(GDCRNATools)

4快速入门

在中GDCRNATools,内置了一些功能供用户有效下载和处理GDC数据。用户还可以使用自己的数据,这些数据由其他工具处理,例如UCSC Xena GDC集线器,TCGAbiolinks (Colaprico et al.2016)或TCGA-Assembler (Zhu,Qiu和Ji 2014)等。

在这里,我们使用一个小的数据集来显示ceRNAs网络分析的最基本步骤。每个步骤的更详细说明在“ 案例研究”部分中。

4.1数据准备

HTSeq-Counts数据的标准化

library(DT)

### load RNA counts data
data(rnaCounts)

### load miRNAs counts data
data(mirCounts)
####### Normalization of RNAseq data #######
rnaExpr <- gdcVoomNormalization(counts = rnaCounts, filter = FALSE)

####### Normalization of miRNAs data #######
mirExpr <- gdcVoomNormalization(counts = mirCounts, filter = FALSE)

解析元数据

####### Parse and filter RNAseq metadata #######
metaMatrix.RNA <- gdcParseMetadata(project.id = 'TCGA-CHOL',
                                   data.type  = 'RNAseq', 
                                   write.meta = FALSE)

metaMatrix.RNA <- gdcFilterDuplicate(metaMatrix.RNA)
metaMatrix.RNA <- gdcFilterSampleType(metaMatrix.RNA)
datatable(as.data.frame(metaMatrix.RNA[1:5,]), extensions = 'Scroller',
        options = list(scrollX = TRUE, deferRender = TRUE, scroller = TRUE))

搜索:

文件名 file_id 患者 样品 Submitter_id entity_submitter_id sample_type 性别 年龄诊断 肿瘤阶段 肿瘤等级 days_to_death days_to_last_follow_up vital_status project_id
TCGA-3X-AAV9-01A 725eaa94-5221-4c22-bced-0c36c10c2c3b.htseq.counts.gz 85bc7f81-51fb-4446-b12d-8741eef4acee TCGA-3X-AAV9 TCGA-3X-AAV9-01 TCGA-3X-AAV9-01A TCGA-3X-AAV9-01A-72R-A41I-07 原发性肿瘤 26349 阶段I 339 TCGA-CHOL
TCGA-3X-AAVA-01A b6a2c03a-c8ad-41e9-8a19-8f5ac53cae9f.htseq.counts.gz 42b8d463-6209-4ea0-bb01-8023a1302fa0 TCGA-3X-AAVA TCGA-3X-AAVA-01 TCGA-3X-AAVA-01A TCGA-3X-AAVA-01A-11R-A41I-07 原发性肿瘤 18303 舞台 445 TCGA-CHOL
TCGA-3X-AAVB-01A c2765336-c804-4fd2-b45a-e75af2a91954.htseq.counts.gz 6e2031e9-df75-48df-b094-8dc6be89bf8b TCGA-3X-AAVB TCGA-3X-AAVB-01 TCGA-3X-AAVB-01A TCGA-3X-AAVB-01A-31R-A41I-07 原发性肿瘤 25819 阶段性 402 TCGA-CHOL
TCGA-3X-AAVC-01A 8b20cba8-9fd5-4d56-bd02-c6f4a62767e8.htseq.counts.gz 19e8fd21-f6c8-49b0-aa76-109eef46c2e9 TCGA-3X-AAVC TCGA-3X-AAVC-01 TCGA-3X-AAVC-01A TCGA-3X-AAVC-01A-21R-A41I-07 原发性肿瘤 26493 阶段I 709 TCGA-CHOL
TCGA-3X-AAVE-01A 4082f7d5-5656-476a-9aaf-36f7cea0ac55.htseq.counts.gz 1ace0df3-9837-467e-85de-c938efda8fc8 TCGA-3X-AAVE TCGA-3X-AAVE-01 TCGA-3X-AAVE-01A TCGA-3X-AAVE-01A-11R-A41I-07 原发性肿瘤 21943 舞台 650 TCGA-CHOL

显示5个条目中的1-5个

4.2 ceRNAs网络分析

鉴定差异表达基因(DEG)

DEGAll <- gdcDEAnalysis(counts     = rnaCounts, 
                        group      = metaMatrix.RNA$sample_type, 
                        comparison = 'PrimaryTumor-SolidTissueNormal', 
                        method     = 'limma')
datatable(as.data.frame(DEGAll), 
        options = list(scrollX = TRUE, pageLength = 5))
符号 日志FC AveExpr Ť 罗斯福
ENSG00000143257 NR1I3 protein_coding -6.9168253303911 7.02312879999841 -17.290860517483 4.24435471535975e-22 2.41928218775506e-19 40.0428794972668
ENSG00000205707 ETFRF1 protein_coding -2.49218157877227 9.51599650308333 -16.0675281445046 8.35325586851415e-21 2.38067792252653e-18 37.1975058932658
ENSG00000134532 SOX5 protein_coding -4.87111820944092 6.22822704823733 -15.0358907798233 1.16874593482166e-19 2.22061727616116e-17 34.4982800309858
ENSG00000141338 ABCA8 protein_coding -5.65379410618959 7.52058085084197 -14.8606853150024 1.85151852120375e-19 2.63841389271535e-17 34.1158106917837
ENSG00000066583 ISOC1 protein_coding -2.37013127019847 10.4661940943542 -14.5653242812861 4.05395903332572e-19 4.62151329799132e-17 33.3563998870241

显示565条目中的1至5

上一页12345… 113下一页

### All DEGs
deALL <- gdcDEReport(deg = DEGAll, gene.type = 'all')

### DE long-noncoding
deLNC <- gdcDEReport(deg = DEGAll, gene.type = 'long_non_coding')

### DE protein coding genes
dePC <- gdcDEReport(deg = DEGAll, gene.type = 'protein_coding')

DEG的ceRNAs网络分析

ceOutput <- gdcCEAnalysis(lnc         = rownames(deLNC), 
                          pc          = rownames(dePC), 
                          lnc.targets = 'starBase', 
                          pc.targets  = 'starBase', 
                          rna.expr    = rnaExpr, 
                          mir.expr    = mirExpr)
## Step 1/3: Hypergenometric test done !
## Step 2/3: Correlation analysis done !
## Step 3/3: Regulation pattern analysis done !
datatable(as.data.frame(ceOutput), 
          options = list(scrollX = TRUE, pageLength = 5))

搜索:

核糖核酸 基因 计数 listTotal 流行 popTotal foldEnrichment hyperPValue 微小RNA 肺心病 价值 regSim sppc
1个 ENSG00000234456 ENSG00000107864 2 2 95 277 2.91578947368421 0.116805315753675 hsa-miR-374b-5p,hsa-miR-374a-5p 0.673743160640659 1.96357934602162e-7 0.348159146921007 -0.00796286536619112
2 ENSG00000234456 ENSG00000135111 2 2 24 277 11.5416666666667 0.0072202166064982 hsa-miR-374b-5p,hsa-miR-374a-5p 0.646730687388315 7.94394469982837e-7 0.887824907942123 0.000618576702863805
3 ENSG00000234456 ENSG00000165672 2 2 8 277 34.625 0.000732485742688222 hsa-miR-374b-5p,hsa-miR-374a-5p 0.462611638256242 0.00068804277886866 0.42891988840537 0.0000710020933214484
4 ENSG00000234456 ENSG00000100934 2 2 20 277 13.85 0.00497043896824151 hsa-miR-374b-5p,hsa-miR-374a-5p 0.708034965259449 2.66531676689658e-8 0.373352080377257 -0.0084304674105859
5 ENSG00000234456 ENSG00000117500 2 2 28 277 9.89285714285714 0.00988855752629099 hsa-miR-374b-5p,hsa-miR-374a-5p 0.619591906145269 0.00000283650932376371 0.405166803044555 -0.00123287038918679

显示453条目中的1至5

上一页12345… 91下一页

将ceRNAs网络导出到Cytoscape

ceOutput2 <- ceOutput[ceOutput$hyperPValue<0.01 
    & ceOutput$corPValue<0.01 & ceOutput$regSim != 0,]
### Export edges
edges <- gdcExportNetwork(ceNetwork = ceOutput2, net = 'edges')
datatable(as.data.frame(edges), 
        options = list(scrollX = TRUE, pageLength = 5))
### Export nodes
nodes <- gdcExportNetwork(ceNetwork = ceOutput2, net = 'nodes')
datatable(as.data.frame(nodes), 
        options = list(scrollX = TRUE, pageLength = 5))

把ceRNAs网络导出Ctoscape,然后对比文章的,看自己做的是否有差异。
(争取把这个折腾会 )。

你可能感兴趣的:(2019-12-24TCGA 2.0)