- Tool 1: MMAD (基于matlab)
- Tool 2: Timer (web tool)
- Tool 3: MPC-counter (R package version
1.1.0) - Tool 4: NMF (R package version
0.21.0)
Tool 1: MMAD (基于matlab)
安装matlab R2008b
官网http://sourceforge.net/projects/mmad/ 下载工具包(MMAD version
1.0),解压,将文件夹MMAD v1.0拷到matlab中,并添加到搜索路径中
(设置路径→添加文件夹→保存)
- MMAD数据格式要求:Enter raw expression data as log2-normalized
data. The selected file should be in tab-delimited format with rows
representing unique genes/probes and columns representing unique
samples. 所以提前将我们的转录组数据预处理(python):
###python
import math
#输出文件名为log2TPM.txt
f1=open(r'C:\Users\xiergo\Desktop\Transcriptome\dataset\TPM.2.txt','r')
f2=open(r'C:\Users\xiergo\Desktop\Transcriptome\dataset\log2TPM.txt','w')
next(f1)#去掉colname
for line in f1:
line_list=line.strip('\n').split('\t')
remove=line_list.pop(0)#去掉rowname
ls=[]
#如果表达值为零,则取对数无意义,将其填为NaN(matlab中的缺失值)
for a in line_list:
if a=='0':
a='NaN'
else:
a=math.log(float(a),2)
ls.append(str(a))
line='\t'.join(ls)+'\n'
f2.write(line)
f2.close()
f1.close()
- matlab 命令行输入MMAD,回车,弹出参数设置窗口
按如下设置:
- 输出txt文件时,老是失败,最后输出文件格式选择的mat格式,可以直接读进matlab,最后在从matlab中将文件保存到txt中:
%% matlab
>> load('1.mat')
>> % f_est为输出矩阵
[m, n] = size(f_est);
fid=fopen('output.txt', 'wt');
for i = 1 : m
fprintf(fid, '%g\t', f_est(i, :));
fprintf(fid, '\n');
end
fclose(fid);
Tool 2: Timer (web tool)
- 文件预处理,由于是网上工具,为了确保信息安全,将列名替换为"col_1","col_2"......"col_37",生成文件TPM_without_sampleid.txt
- 进入网站https://cistrome.shinyapps.io/timer/,进入estimation模块,将文件输入,选择癌种"stomach
adenocarcinoma",run!
- 将生成文件中的列名替换回来
Tool 3: MPC-counter (R package version 1.1.0)
直接跑就行了,没有涉及参数选择问题
result <- MCPcounter.estimate(expr,featuresType = "HUGO_symbols")
注意数据框expr的行名要改为相应的gene symbol
# Credits -----------------------------------------------------------------
# Copyright (c) Peking University Cancer Hosptial, All Rights Reserved.
# Author: Yuhao Xie
# Create date: 21 Mar 2019.
# Last update date: 21 Mar 2019.
# Comments ----------------------------------------------------------------
# This R script is about using deconvolution tool "MCPcounter" to predict
# the proportion of 8 immune components through RNA_Seq data
# HEADER ------------------------------------------------------------------
rm(list=ls())
options(stringsAsFactors = F)
options(java.parameters ="-Xmx8384m")
# Input/Output/Lib/Config/Params ------------------------------------------
# 1) Parameters
DEBUG_MODE <- F
# args<-commandArgs(TRUE)
# if(length(args)!=1) {
# stop("--args \n\t RF.model.RData.file \n\t \n\t top.feature.num \n\t test.data.file \n\t outprex")
# }
# DEBUG_MODE<-args[1]
file.dir <-ifelse(DEBUG_MODE,"F:/Transcriptome/","/home/pub/project/deconvolution/Transcriptome/")
expr.file<-paste0(file.dir,"dataset/TPM.2.txt")
output.dir <- paste0(file.dir,"results/MCPcounter/")
setwd(output.dir)
# 2) Library
library(MCPcounter)
# 3) define functions or source functions
head.left <- function(x, nrow=6, ncol=6) x[1:nrow, 1:ncol]
#source("E:/R_script/Myfunc.R")
# input data preprocess ------------------------------------------------
expr <- read.table(expr.file,header = T)
if(DEBUG_MODE) head.left(expr)
rownames(expr) <- expr[,1]
expr <- expr[,-1]
# MCPcounter select para ------------------------------------------------
result <- MCPcounter.estimate(expr,featuresType = "HUGO_symbols")
# write out tables or figures------------------------------------------------
png(file="MCPcounter_plot.png", bg="transparent")
heatmap(as.matrix(result),col=colorRampPalette(c("blue","white","red"))(100))
dev.off()
result_1 <- data.frame(CellType=row.names(result),result)
write.table(result_1,"MCPcounter_score_matrix.txt", sep="\t",row.names = F,quote = F)
Tool 4: NMF (R package version 0.21.0)
- Estimating the factorization rank: 先取rank=2:6,做出
# Credits -----------------------------------------------------------------
# Copyright (c) Peking University Cancer Hosptial, All Rights Reserved.
# Author: Yuhao Xie
# Create date: 27 Mar 2019.
# Last update date: 27 Mar 2019.
# Comments ----------------------------------------------------------------
# This R script is about using deconvolution method NMF to predict
# the cancer heterogeneity through RNA_Seq data
# HEADER ------------------------------------------------------------------
rm(list=ls())
options(stringsAsFactors = F)
options(java.parameters ="-Xmx8384m")
# Input/Output/Lib/Config/Params ------------------------------------------
# 1) Parameters
DEBUG_MODE <- F
# args<-commandArgs(TRUE)
# if(length(args)!=1) {
# stop("--args \n\t RF.model.RData.file \n\t \n\t top.feature.num \n\t test.data.file \n\t outprex")
# }
# DEBUG_MODE<-args[1]
file.dir <-ifelse(DEBUG_MODE,"F:/Transcriptome/","/home/pub/project/deconvolution/Transcriptome/")
expr.file<-paste0(file.dir,"dataset/TPM.2.txt")
output.dir <- paste0(file.dir,"results/NMF/")
setwd(output.dir)
# 2) Library
library(NMF)
# 3) define functions or source functions
head.left <- function(x, nrow=6, ncol=6) x[1:nrow, 1:ncol]
#source("E:/R_script/Myfunc.R")
# input data preprocess ------------------------------------------------
expr <- read.table(expr.file,header = T)
if(DEBUG_MODE) head.left(expr)
rownames(expr) <- expr[,1]
expr <- expr[,-1]
expr <- as.matrix(expr)
# NMF select para ------------------------------------------------
res <- nmf(expr, 2:6, nrun=20, seed=123456)
# write out tables or figures------------------------------------------------
png(file="NMF_plot.png", bg="transparent")
plot(res)
dev.off()
png(file="consensusmap.png", bg="transparent")
consensusmap(res)
dev.off()
[图片上传失败...(image-c8c1b9-1554726570593)]
Figure 1:Estimation of the rank: Quality measures computed from 20 runs
for each value of r
[图片上传失败...(image-4957fb-1554726570593)]
Figure 2:Estimation of the rank: Consensus matrices computed from 20
runs for each value of r
- (Brunet2004) proposed to take the first value of r for which the
cophenetic coefficient starts decreasing.
所以选取rank=3,跑完之后导出w(basis)和h(coef)矩阵
# Credits -----------------------------------------------------------------
# Copyright (c) Peking University Cancer Hosptial, All Rights Reserved.
# Author: Yuhao Xie
# Create date: 27 Mar 2019.
# Last update date: 27 Mar 2019.
# Comments ----------------------------------------------------------------
# This R script is about using deconvolution method NMF to predict
# the cancer heterogeneity through RNA_Seq data
# HEADER ------------------------------------------------------------------
rm(list=ls())
options(stringsAsFactors = F)
options(java.parameters ="-Xmx8384m")
# Input/Output/Lib/Config/Params ------------------------------------------
# 1) Parameters
DEBUG_MODE <- F
# args<-commandArgs(TRUE)
# if(length(args)!=1) {
# stop("--args \n\t RF.model.RData.file \n\t \n\t top.feature.num \n\t test.data.file \n\t outprex")
# }
# DEBUG_MODE<-args[1]
file.dir <-ifelse(DEBUG_MODE,"F:/Transcriptome/","/home/pub/project/deconvolution/Transcriptome/")
expr.file<-paste0(file.dir,"dataset/TPM.2.txt")
output.dir <- paste0(file.dir,"results/NMF/")
setwd(output.dir)
# 2) Library
library(NMF)
# 3) define functions or source functions
head.left <- function(x, nrow=6, ncol=6) x[1:nrow, 1:ncol]
#source("E:/R_script/Myfunc.R")
# input data preprocess ------------------------------------------------
expr <- read.table(expr.file,header = T)
if(DEBUG_MODE) head.left(expr)
rownames(expr) <- expr[,1]
expr <- expr[,-1]
expr <- as.matrix(expr)
# nmf process ------------------------------------------------------------------
res <- nmf(expr,3,nrun=20)
w <- basis(res)
h <- coef(res)
basis <- as.data.frame(w)
coef <- as.data.frame(h)
if (DEBUG_MODE) head(profile)
# write out tables or figures------------------------------------------------
write.table(basis,"NMF_rank=3_nrun=20_basis.txt", sep="\t",row.names = T,col.names = F,quote = F)
write.table(coef,"NMF_rank=3_nrun=20_coef.txt", sep="\t",row.names = F,col.names = T,quote = F)