2019-04-08 四种针对转录组deconvolution工具介绍

  • Tool 1: MMAD (基于matlab)
  • Tool 2: Timer (web tool)
  • Tool 3: MPC-counter (R package version
    1.1.0)
  • Tool 4: NMF (R package version
    0.21.0)

Tool 1: MMAD (基于matlab)

  1. 安装matlab R2008b

  2. 官网http://sourceforge.net/projects/mmad/ 下载工具包(MMAD version
    1.0),解压,将文件夹MMAD v1.0拷到matlab中,并添加到搜索路径中
    (设置路径→添加文件夹→保存)

1553735804903.png
  1. MMAD数据格式要求:Enter raw expression data as log2-normalized
    data. The selected file should be in tab-delimited format with rows
    representing unique genes/probes and columns representing unique
    samples. 所以提前将我们的转录组数据预处理(python):
###python

import math

#输出文件名为log2TPM.txt
f1=open(r'C:\Users\xiergo\Desktop\Transcriptome\dataset\TPM.2.txt','r')
f2=open(r'C:\Users\xiergo\Desktop\Transcriptome\dataset\log2TPM.txt','w')

next(f1)#去掉colname
for line in f1:
    line_list=line.strip('\n').split('\t')
    remove=line_list.pop(0)#去掉rowname
    ls=[]
    #如果表达值为零,则取对数无意义,将其填为NaN(matlab中的缺失值)
    for a in line_list:
        if a=='0':
            a='NaN'
        else:
            a=math.log(float(a),2)
        ls.append(str(a))
    line='\t'.join(ls)+'\n'
    f2.write(line)
f2.close()
f1.close()
  1. matlab 命令行输入MMAD,回车,弹出参数设置窗口
1553736070982.png

按如下设置:

1553736137888.png
  1. 输出txt文件时,老是失败,最后输出文件格式选择的mat格式,可以直接读进matlab,最后在从matlab中将文件保存到txt中:
%% matlab
>> load('1.mat')
>> % f_est为输出矩阵
[m, n] = size(f_est);
fid=fopen('output.txt', 'wt');
for i = 1 : m
    fprintf(fid, '%g\t', f_est(i, :));
    fprintf(fid, '\n');
end
fclose(fid);

Tool 2: Timer (web tool)

  1. 文件预处理,由于是网上工具,为了确保信息安全,将列名替换为"col_1","col_2"......"col_37",生成文件TPM_without_sampleid.txt
  2. 进入网站https://cistrome.shinyapps.io/timer/,进入estimation模块,将文件输入,选择癌种"stomach
    adenocarcinoma",run!
1553738961552.png
  1. 将生成文件中的列名替换回来

Tool 3: MPC-counter (R package version 1.1.0)

直接跑就行了,没有涉及参数选择问题

result <- MCPcounter.estimate(expr,featuresType = "HUGO_symbols")

注意数据框expr的行名要改为相应的gene symbol

# Credits -----------------------------------------------------------------
# Copyright (c) Peking University Cancer Hosptial, All Rights Reserved.
# Author: Yuhao Xie
# Create date: 21 Mar 2019.
# Last update date: 21 Mar 2019.
# Comments ----------------------------------------------------------------
# This R script is about using deconvolution tool "MCPcounter" to predict 
# the proportion of 8 immune components through RNA_Seq data 


# HEADER ------------------------------------------------------------------
rm(list=ls())
options(stringsAsFactors = F)
options(java.parameters ="-Xmx8384m")
# Input/Output/Lib/Config/Params ------------------------------------------
# 1) Parameters
DEBUG_MODE <- F

# args<-commandArgs(TRUE)
# if(length(args)!=1) {
# stop("--args \n\t RF.model.RData.file \n\t \n\t top.feature.num \n\t test.data.file \n\t outprex")
# }
# DEBUG_MODE<-args[1]


file.dir <-ifelse(DEBUG_MODE,"F:/Transcriptome/","/home/pub/project/deconvolution/Transcriptome/") 
expr.file<-paste0(file.dir,"dataset/TPM.2.txt")
output.dir <- paste0(file.dir,"results/MCPcounter/")
setwd(output.dir)

# 2) Library
library(MCPcounter)

# 3) define functions or source functions 
head.left <- function(x, nrow=6, ncol=6) x[1:nrow, 1:ncol]
#source("E:/R_script/Myfunc.R")

# input data preprocess ------------------------------------------------
expr <- read.table(expr.file,header = T)
if(DEBUG_MODE) head.left(expr)
rownames(expr) <- expr[,1]
expr <- expr[,-1]


# MCPcounter select para ------------------------------------------------
result <- MCPcounter.estimate(expr,featuresType = "HUGO_symbols")


# write out tables or figures------------------------------------------------
png(file="MCPcounter_plot.png", bg="transparent")
heatmap(as.matrix(result),col=colorRampPalette(c("blue","white","red"))(100)) 
dev.off()

result_1 <- data.frame(CellType=row.names(result),result)
write.table(result_1,"MCPcounter_score_matrix.txt", sep="\t",row.names = F,quote = F)

Tool 4: NMF (R package version 0.21.0)

  1. Estimating the factorization rank: 先取rank=2:6,做出
# Credits -----------------------------------------------------------------
# Copyright (c) Peking University Cancer Hosptial, All Rights Reserved.
# Author: Yuhao Xie
# Create date: 27 Mar 2019.
# Last update date: 27 Mar 2019.
# Comments ----------------------------------------------------------------
# This R script is about using deconvolution method NMF to predict 
# the cancer heterogeneity through RNA_Seq data 


# HEADER ------------------------------------------------------------------
rm(list=ls())
options(stringsAsFactors = F)
options(java.parameters ="-Xmx8384m")
# Input/Output/Lib/Config/Params ------------------------------------------
# 1) Parameters
DEBUG_MODE <- F

# args<-commandArgs(TRUE)
# if(length(args)!=1) {
# stop("--args \n\t RF.model.RData.file \n\t \n\t top.feature.num \n\t test.data.file \n\t outprex")
# }
# DEBUG_MODE<-args[1]


file.dir <-ifelse(DEBUG_MODE,"F:/Transcriptome/","/home/pub/project/deconvolution/Transcriptome/") 
expr.file<-paste0(file.dir,"dataset/TPM.2.txt")
output.dir <- paste0(file.dir,"results/NMF/")
setwd(output.dir)

# 2) Library
library(NMF)

# 3) define functions or source functions 
head.left <- function(x, nrow=6, ncol=6) x[1:nrow, 1:ncol]
#source("E:/R_script/Myfunc.R")

# input data preprocess ------------------------------------------------
expr <- read.table(expr.file,header = T)
if(DEBUG_MODE) head.left(expr)
rownames(expr) <- expr[,1]
expr <- expr[,-1]
expr <- as.matrix(expr)

# NMF select para ------------------------------------------------

res <- nmf(expr, 2:6, nrun=20, seed=123456)



# write out tables or figures------------------------------------------------
png(file="NMF_plot.png", bg="transparent")
plot(res)
dev.off()

png(file="consensusmap.png", bg="transparent")
consensusmap(res)
dev.off()

[图片上传失败...(image-c8c1b9-1554726570593)]

Figure 1:Estimation of the rank: Quality measures computed from 20 runs
for each value of r

[图片上传失败...(image-4957fb-1554726570593)]

Figure 2:Estimation of the rank: Consensus matrices computed from 20
runs for each value of r

  1. (Brunet2004) proposed to take the first value of r for which the
    cophenetic coefficient starts decreasing.
    所以选取rank=3,跑完之后导出w(basis)和h(coef)矩阵
# Credits -----------------------------------------------------------------
# Copyright (c) Peking University Cancer Hosptial, All Rights Reserved.
# Author: Yuhao Xie
# Create date: 27 Mar 2019.
# Last update date: 27 Mar 2019.
# Comments ----------------------------------------------------------------
# This R script is about using deconvolution method NMF to predict 
# the cancer heterogeneity through RNA_Seq data 


# HEADER ------------------------------------------------------------------
rm(list=ls())
options(stringsAsFactors = F)
options(java.parameters ="-Xmx8384m")
# Input/Output/Lib/Config/Params ------------------------------------------
# 1) Parameters
DEBUG_MODE <- F

# args<-commandArgs(TRUE)
# if(length(args)!=1) {
# stop("--args \n\t RF.model.RData.file \n\t \n\t top.feature.num \n\t test.data.file \n\t outprex")
# }
# DEBUG_MODE<-args[1]


file.dir <-ifelse(DEBUG_MODE,"F:/Transcriptome/","/home/pub/project/deconvolution/Transcriptome/") 
expr.file<-paste0(file.dir,"dataset/TPM.2.txt")
output.dir <- paste0(file.dir,"results/NMF/")
setwd(output.dir)

# 2) Library
library(NMF)

# 3) define functions or source functions 
head.left <- function(x, nrow=6, ncol=6) x[1:nrow, 1:ncol]
#source("E:/R_script/Myfunc.R")

# input data preprocess ------------------------------------------------
expr <- read.table(expr.file,header = T)
if(DEBUG_MODE) head.left(expr)
rownames(expr) <- expr[,1]
expr <- expr[,-1]
expr <- as.matrix(expr)


# nmf process ------------------------------------------------------------------
res <- nmf(expr,3,nrun=20)
w <- basis(res)
h <- coef(res)
basis <- as.data.frame(w)
coef <- as.data.frame(h)
if (DEBUG_MODE) head(profile)

# write out tables or figures------------------------------------------------

write.table(basis,"NMF_rank=3_nrun=20_basis.txt", sep="\t",row.names = T,col.names = F,quote = F)
write.table(coef,"NMF_rank=3_nrun=20_coef.txt", sep="\t",row.names = F,col.names = T,quote = F)

你可能感兴趣的:(2019-04-08 四种针对转录组deconvolution工具介绍)