中国春甲基化数据

中国春甲基化数据

表观遗传学中,甲基化的研究是重要的一块研究内容。最近小麦品种中国春的参考基因组在science杂志上发表。文章中有甲基化的数据,为了让大家在实际研究过程中方便的使用这个数据,我们特别邀请了中国农大的郭伟龙团队进行了数据的分析和处理,并最终呈现在我们小麦多组学网站上。下面我们具体介绍下这块内容。

1 数据来源:

NCBI登录号 SRP133674 ,

文章:Shifting the limits in wheat research and breeding using a fully annotated reference genome

取材时期

Cytosine methylation was profiled in DNA extracted from two-week old CS leaf tissue in three different contexts: CpG dinucleotides, CHG and CHH (where H corresponds to A, T or C).  The frozen leaves from the five samples at 3-leaf stage (Zadok stage 13) were ground and divided as input for the preparation of both RNA-seq libraries (detailed in

Chinese Spring tissues study) and whole genome bisulfite sequencing (WGBS) libraries.

2 结果描述

前面我们提到了这些数据来自science杂志上的中国春参考基因组。下面我们就总结下这篇文章中甲基化方面的结果。

Wheat DNA methylation frequency  of cytosines in the sequence contexts of CpG (average 92.7%), CHG (average 51.3%) and CHH (average 2.7%). The observed levels of cytosine methylations are among the highest observed in angiosperms (161), likely reflecting the abundance of repetitive elements throughout the wheat genome. Methylation patterns in wheat largely follow those observed in other species, showing enrichment in CpG and CHG sequence contexts at pericentromeric regions(gene poor) and depletion toward the chromosome ends (gene rich).

首先看一看high confidence genes的甲基化pattern。如下图所示,在基因编码区相对较低,CpG和CHG而在上有启动子和下游则相对较高。而CHH则相对较平稳。大家分析自己的基因时可以看看是否属于这个pattern。

high confidence genes

​                                        (TSS = transcription start site; TTS = transcription termination site)

High rates of DNA methylation likely serve to prevent transposition by restricting the expression of transposable elements. However, where repetitive elements are proximal to gene sequences, the enriched methylation can perform a regulatory function, predominantly silencing expression. The distinct and highly conserved methylation patterns observed in regions of HC genes and their regulatory regions showed higher levels of DNA methylation associated with the 5’ regulatory regions in all contexts that diminished rapidly at the transcriptional start site (TSS).

而low confidence (LC) genes的甲基化pattern又是如何呢?如下图,3种类型都相对平稳。

image-20181012160153303

​                                              (TSS = transcription start site; TTS = transcription termination site)

DNA methylation increased in the gene body where the CpG methylation formed a peak, whereas gene body methylation levels remained at extremely low levels at CHG and CHH sites. In the 3’ regulatory region after the transcriptional termination site (TTS) methylation rapidly reverted to the levels in 5’ sequences. This contrasted with the pattern observed for LC genes, where a near uniform level of methylation was observed in all sequence contexts. As a conclusion, many of the features included in the LC annotation are either no genes, are truncated or have lost their function through mutation (i.e. pseudogenes).

有一点很重要,甲基化也是一个动态变化的过程,不同发育时期,不同环境下都会发生变化。有些结论要辩证的看待。

![Copia repeat elements (https://wheat-1252088472.picsh.myqcloud.com/2018-10-12-080817.png), and D) Gypsy (RLG) repeat elements.](/Users/mashengwei/Library/Application Support/typora-user-images/image-20181012160511706.png)

​                                                                      TE序列相对来说甲基化程度要高很多

3 甲基化分析

农大的郭伟龙老师开发了甲基化mapping软件BS-Seeker2(BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data)以及后续甲基化分析软件CGmapTools(CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data)。

具体的分析流程见这里。

需要注意的地方:

1、单条染色体需要拆分成两部分,即使用官方提供的161010_Chinese_Spring_v1.0_pseudomolecules_parts.fasta进行基因组index

2、使用bs_seeker2-call_methylation.py时不要整个基因组一起call methylation,一来速度太慢,二来整个基因组一起会出现bug(其他人有没有还不清楚)。我简单的说下我的测试过程,整个基因组进行call methylation,根据程序提示如果1A部分已经运行完毕,直接停止;分离出1A的bam文件单独对1A进行call methylation;将1A和2A合并到一起call methylation。最后发现,整个基因组call methylation的结果与其它两个均不同;而无论是1A单独还是1A和2A一起call methylation,结果都是相同的。

4 Jbrowse呈现

目前可以在我们网站(http://202.194.139.32)上查询感兴趣基因的甲基化水平。

绿色箭头处可输入转录本名字,如*TraesCS7A02G208100.1*


下面我们看一个例子。GS5基因在水稻中控制水稻的粒形和粒重,在小麦里中GS5(TraesCS3A02G212900LC, TraesCS3B02G277100LC和TraesCS3D02G172900)也已经被多个课题组同源克隆,其中3B基因有两处大插入,破坏了基因结构。从甲基化水平上来看,两处插入序列的甲基化水平较高(如下图)。

TaGS5


最后再强调一点,这里的甲基化是苗期叶片中的,不代表其他组织中的甲基化水平一定也是这样。

你可能感兴趣的:(中国春甲基化数据)