关于 MACS call ATAC-seq 数据 Peak 时候的激烈讨论(MACS3 正在开发中)

Twitter 热论:ATAC peak calling with MACS2 问题

MACS3 github专门开放了一个征对 ATAC-seq call peak 的讨论模块:https://github.com/macs3-project/MACS/discussions/435

image.png

liu tao 学生 HMMRATAC 开发者不同意 xi chen 说法:https://twitter.com/epigeneticsnerd/status/1337081681141002240

So if you are going to use MACS1/2 for ATAC-seq, and I will instead recommend using HMMRATAC or MACS3 (which will use the **HMMRATAC **algorithm for **ATAC **analysis), then the best settings are -BAMPE --call-summits.

image.png

To start, a disclosure, I was @fooliu grad student and** together we developed HMMRATAC**. As part of that development, I spent countless hours running MACS with various settings.

One unique feature of ATAC-seq, as opposed to ChIP or DNase-seq, is that the transposase can insert into the linker regions between adjacent nucleosomes.

This creates a library containing fragments of various sizes, corresponding to transposase insertion into nucleosome-free regions, and across nucleosome arrays.

We also found that true nucleosome-free regions are marked by an enrichment of short fragments flanked by nucleosome-sized fragments.False positive sites are often called because they contain an enrichment of just one sized fragments (say those <100bp etc)

What happens then, is single end data, or data forced to be single end, calls more false positives than the properly paired data. Using MACS without -BAMPE haslower precision than when using -BAMPE. The HMMRATAC paper has a figure showing this.

However, i understand the reasoning behind this. With ATAC-seq, many times a researcher is interested in the TF binding site. And with ATAC-seq, this means finding the "footprint". A footprint should enrich for cutting sites, hence the focus on cut sites.

But again, many of those cutting sites are in the linker regions. So if you call peaks using only cut sites, you will end up looking at a lot of linkers.

Instead, you should seek to call the most accurate peaks, with the BAMPE settings, THEN look for the footprint. So how do you do that? With MACS, use --call-summits

Because, as it turns out, the summit of an ATAC peak is the footprint. We also show this in the **HMMRATAC **paper.

So if you are going to use MACS1/2 for ATAC-seq, and I will instead recommend using HMMRATAC or MACS3 (which will use the HMMRATAC algorithm for ATAC analysis), then the best settings are -BAMPE --call-summits.

ATAC peak calling with MACS2
认为加上参数 **-f BED --shift -100 --extsize 200** 更好。

image

ATAC peak calling with MACS2: I know this is a recurring problem but I got asked a few times recently, so I just put all info here for my own ref. We now routinely convert paired BAM to simple BED file and use "-f BED --shift -100 --extsize 200" for the peak calling. Why? 1/7

Like many other peak callers, MACS2 is for ChIP. In ChIP,the reads are flanking the actual binding of the protein. Therefore, many peak callers shift or extend reads towards the mid. of the fragments to reflect the actual binding. MCS2 uses extension, the "--ext" flag. 2/7

image

In ATAC/DNase, the mid of the fragments is not really what we are interested in. Instead, we are interested in the blue and red dots in the fig. which are the cutting sites of the enzyme. Those dots are the start (5' end) of your reads, so default of MACS2 doesn't fit here 3/7
image.png

Instead, we want to call peaks with fragments centred on the 5' end of your reads. We had this discuss about 6 years ago in the MACS2 google usergroup. Both
@fooliu and @anshulkundaje provided excellent info. Check this link if haven't seen it before: https://groups.google.com/g/macs-announcement/c/4OCE59gkpKY/m/v9Tnh9jWriUJ… 4/7

An update of **MACS2 **(ver 2.1.0 20140616) was made after the discussion, and you could freely manipulate the read positions with the combination of the "--shift" and the "--extsize" flags. Then, why covert paired BAM to simple BED to use "-f BED" ? 5/

According to Issue #145 from the MACS2 GitHub https://github.com/macs3-project/MACS/issues/145… . When you set "-f BAM" or "-f BAMPE", MACS2 only takes the left read, ignoring the other. However in ATAC the 5' end of both R1 and R2 are of interest. Convert to BED will solve this problem, I guess?. 6/7

  • Hi very interesting, thank you! I notice that ENCODE3 and others use** --nomodel --shift -37 --extsize 73**. Do you have any comments on this? Thanks
  • We use 200 due to habit. The choice is arbitrary. Not sure how diff it makes. Intuitively smaller fragLen give better res. but it can’t be too short. Check Fig1 in this ref (not ATAC though) https://ncbi.nlm.nih.gov/pmc/articles/PMC2596141/… 75 gives better res. Maybe that’s the reason ENCODE use it.

Not sure if all those modifications will make huge difference, but the results are different. I haven't tried HMMRATAC, Genrich, MACS3 etc. yet. Will do in future. Hope this helps to those who are new to this type of analysis. 7/7

到底哪种更好?希望大家能够积极讨论,也欢迎去 MACS3 社区积极谈论自己的看法。
https://github.com/macs3-project/MACS/discussions

我是搬运工,我负责搬运,你负责参考吸收,不对的别喷我,你如果说我搬运,我也可以选择不搬运,反正我这又不是什么盈利性质的,但是我只是觉得想将自己的所见所闻分享给大家,希望能被有需要的人看到,只要有一个能受到帮助,就够了。

你可能感兴趣的:(关于 MACS call ATAC-seq 数据 Peak 时候的激烈讨论(MACS3 正在开发中))