【搬砖】计算HRD(first try)

HRD score = LOH + TAI + LST

参考:Sztupinszki et al, Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer, npj Breast Cancer, https://www.nature.com/articles/s41523-018-0066-6.

R package: scarHRD
https://github.com/sztup/scarHRD#introduction

workflow

第1步最关键,即得到 input file。

一、尝试Sequenza

根据sequenza说明书,需要bam file。。比较难获得。而且,需要使用python,俺不会。


image.png
TCGA data level

附可参考的网页:

  1. Sequenza User Guide
    https://rdrr.io/cran/sequenza/f/vignettes/sequenza.Rmd
  2. TCGA RNAseq BAM File
    http://seqanswers.com/forums/showthread.php?t=65176
  3. TCGA_bam_splicer
    https://freesoft.dev/program/131953985
  4. bam 格式文件
    https://blog.csdn.net/qq_36608036/article/details/104630366

二、尝试ASCAT

参考: ASCAT (Van Loo et al. 2010)
https://github.com/VanLoo-lab/ascat
先跑一下包里的ExampleData

library(ASCAT)
ascat.bc = ascat.loadData("Tumor_LogR.txt","Tumor_BAF.txt","Germline_LogR.txt","Germline_BAF.txt")
ascat.plotRawData(ascat.bc) 
ascat.bc = ascat.aspcf(ascat.bc)
ascat.plotSegmentedData(ascat.bc)
ascat.output = ascat.runAscat(ascat.bc)

ascat.output$nA
ascat.output$nB
ascat.output$ploidy
ascat.output$aberrantcellfraction

目标:跑出下图的数据


ASCAT output

很可惜GitHub里的readme写的不是很仔细,manual.pdf不见了,所以只能阅读原文 ASCAT (Van Loo et al. 2010),来破解参数的含义。

ASCAT profiles

ASCAT profiles: genome-wide allele-specific copy number profiles
左图:ASCAT首先确定肿瘤细胞的倍性ploidy 和异常细胞分数fraction of aberrant cells。然后评估 goodness of fit for a grid of possible values for both parameters (blue, good solution),选择最佳的solution,即绿色交叉点,例如A图的左边 绿色交叉点对应ploidy=1.77和fraction of aberrant cells=80%
右上图:x轴表示genomic location,y轴 CN(其中绿色是allele with lowest copy number,红色是allele with highest copy number)
右下图: an aberration reliability score异常细胞可靠性分数

  • 何为fit?
Frequency of LOH and copy number-neutral events

(A) Frequency of LOH across the genome. Probes are shown in
genomic order along the x axis, from chromosome 1 to chromosome X, where different chromosomes are delimited by gray lines.
(B) Frequency of copy number neutral events across the genome. For diploid tumors, copy number-neutral events correspond to a subset of LOH (copy number-neutral LOH), but for, for example, tetraploid tumors, a copy number neutral event can also be three copies of A and one copy of B.

  • 何为LOH?
  • 何为copy number neutral event ?

LOH:Loss of heterozygosity (LOH) was defined as the number of counts of chromosomal LOH regions shorter than whole chromosome and longer than 15 Mb 。
Copy number neutral event :Copy number正常,但存在allelic bias。

Illumina SNP arrays deliver two output tracks:** Log R, a measure of total signal intensity,** and B allele frequency (BAF), a measure of allelic contrast.
The Log R track is similar to the output given by common array-CGH platforms and quantifies the (total) copy number of each genomic locus.
The BAF track shows the relative presence of each of the two alternative nucleotides (called “A” and “B”) at each SNP locus profiled.

PennCNV
  • 为了得到LRR和BAF,还是逃不掉处理CEL文件吗?

-end-

你可能感兴趣的:(【搬砖】计算HRD(first try))