文章结果和资源链接
工具列表
从RNA-seq文件推断TCR
MiTCR v1.0.3 ,由Bolotin等人开发,它允许对TCR和免疫球蛋白序列进行高度可定制的分析
Opitype
专门针对HLA I型基因进行分型的软件,可以提供精确的4位分型结果
polysolver
使用相对低覆盖率的WES数据,也可以实现高精度的HLA分型
NetMHCpan 预测肿瘤新抗原的方法
通过生物信息学方法预测新抗原主要是关注于蛋白酶体对突变蛋白的剪切的预测、肽段转运、以及突变肽段和MHC-I结合的亲和力预测等方面
The Cell-to-Cell Communication Network
细胞互作数据库:FANTOM5(http://fantom.gsc.riken.jp/5/suppl/Ramilowski_et_al_2015/)
Neoantigen Prediction from Indels 从基因组缺失序列推断肿瘤新生抗原
Somatic indel variants were extracted from the MC3 variant file (mc3.v0.2.8.CONTROLLED.maf) with the following filters:
FILTER in ‘‘PASS,’’ ‘‘wga,’’ ‘‘native_wga_mix’’ (with no combination with other tags);
NCALLERS > 1;
barcode in whitelist where do_not_use = False;
Variant_Classification = ‘‘Frame_Shift_Ins,’’ ‘‘Frame_Shift_Del,’’ ‘‘In_Frame_Ins,’’ ‘‘In_Frame_Del,’’ ‘‘Missense_Mutation,’’ ‘‘Nonsense_Mutation’’; and Variant_Type = ‘‘INS,’’ ‘‘DEL.’’
For each Indel, the downstream protein sequence was obtained using VEP v87 (Ensembl Variant Effect Predictor) using default settings. Using 9-mer peptides extracted from VEP downstream protein sequences and the HLA calls from OptiType, for each sample, binding for each pair of mutant peptide-MHC were predicted using pVAC-Seq v4.0.8 pipeline (Hundal et al., 2016) with NetMHCpan v3.0 using default settings, of which an IC50 binding score threshold 500 nM was used to report the predicted binding epitopes as neoantigens.
Master Regulators of Immune Genes
The Master Regulators (MRs) are identified by first inferring protein activity of candidate MRs as transcriptional influence on groups of co-expressed genes using the VIPER algorithm (Alvarez et al., 2016), then using the DIGGIT algorithm (Chen et al., 2014) to find somatically altered proteins significantly associated with the MRs, and finally linking the two through a method called TieDIE (Drake et al., 2016; Paull et al., 2013), which finds connecting ‘‘paths’’ through a network of known and predicted interactions. VIPER: using tissue-matched ARACNE (Margolin et al., 2006) interactomes, to infer protein-activity for 2506 potential transcription factor and co-factor candidate ‘‘master regulators’’ (cMRs)
from the expression of their downstream targets.
Concordance index一致性指数,用来评价模型的预测能力
To further dissect the prognostic impact of individual gene expression signatures or immune cell types within immune subtypes and tumor types, we used the concordance index (CI) (Pencina and D’Agostino, 2004) to correlate the immune signatures and the
cellular fractions with the outcomes (OS and PFI). The concordance index is defined by the relative frequency of accurate pairwise predictions of survival over all pairs of patients for which such a meaningful determination can be achieved. Samples with missing values in the features of interest or the outcomes were excluded from the analysis. Heatmaps were generated in R using the heatmap.2 function from the gplots package.
Intratumoral heterogeneity (ITH)
ABSOLUTE was run, using default parameters, on segmentation data generated from Affymetrix genome-wide human SNP6.0 arrays by hapseg and on SNV and indel calls from the MC3 variant file. All clonality calls for quantifying intratumoral heterogeneity (ITH) were also determined by ABSOLUTE, which models tumor copy number alterations and mutations as mixtures of subclonal and clonal components of varying ploidy. Specifically, for these analyses, ITH score was defined as the subclonal genome fraction (which measures the fraction of tumor genome that is not part of the ‘‘plurality’’ clone), as determined from ABSOLUTE.
聚类方法:mclust包
能够基于高斯有限混合模型进行聚类,分类以及密度估计。对于具有各种协方差结构的高斯混合模型,它提供了根据EM算法的参数预测函数。
EM算法也称为期望最大化算法,在是使用该算法聚类时,将数据集看作一个有隐形变量的概率模型,并实现模型最优化,即获取与数据本身性质最契合的聚类方式为目的,通过‘反复估计’模型参数找出最优解,同时给出相应的最有类别级数k
library(gclus)
data(wine)
head(wine)
wine <- wine[,-1] #去除分类标签
wine <- scale(wine)
set.seed(1234)
library(mclust)
m_clust <- Mclust(as.matrix(wine), G=1:20)
summary(m_clust)
plot(m_clust, "BIC")
ARACNE 构建共表达网络
http://califano.c2b2.columbia.edu/software/
iBBiG 需要学习
As another measure of the robustness of the above model based sample clustering, we applied an entirely different clustering method, iterative binary biclustering using iBBiG (Gusenleitner et al., 2012). The iterative biclustering identifies similarity blocks within the matrix of signature scores, but with tumor sample groups (clusters) that are to allowed to overlap, unlike the model-based clustering. We analyzed the total 160 gene signature score sets using iBBiG, which yielded 15 biclusters. Model-based clustering
and biclustering have commonalities both in terms of shared tumor sample groupings and in the association of clusters to phenotypes, as evidenced by 13 significant overlaps between the biclusters and the six immune subtypes according to a hypergeometric test.
PrePPI数据库
一种基于三维结构信息的全基因组蛋白质相互作用计算预测方法
https://bhapp.c2b2.columbia.edu/PrePPI/
PrePPI is a database of predicted and experimentally determined protein-protein interactions (PPI) for the human proteome. Predicted interactions in the database are determined using a Bayesian framework that combines structural, functional, evolutionary and expression information.