motif分析-MEME

1 什么是motif分析

在DNA或蛋白的同源序列中,不同位点的保守程度是不一样的,一般来说,对DNA或蛋白质功能和结构影响比较大的位点会比较保守,其它位点则不是很保守。这些保守的位点就称为“模体(motif)”。motif最先是通过实验的方法发现的。motif这个单词就是形容一种反复出现的模式,而序列motif往往是DNA上的反复出现的模式,并被假设拥有生物学功能。而且,经常是一些具有序列特异性的蛋白的结合位点(如,转录因子)或者是涉及到重要生物过程的(如,RNA 起始,RNA 终止, RNA 剪切等等)。目前被人们识别出来的motif也越来越多,如TRANSFAC和JASPAR数据库都有着大量转录因子的motif。

2 分析motif的软件

分析motif发软件很多,如常见的有motif-x、、MochiView、CisGenome等。但这些软件中大部分都是网页版的,无法批量化进行分析,也很难实现自动化。MEME是一款比较经典的motif分析的软件,除了在线版本外MEME还有适用于可适用于dna、rna和蛋白序列。这款软件包含多种功能,包括motif预测、motif富集分析、motif比较分析等。
MEME网址:

2.1MEME分析原理

MEME是一个工具包,包含多个软件。其中MEME是进行motif挖掘的软件,MEME不允许模体中有空位。MAST是在通过MEME得到一个motif之后,在其它的序列中查找这个motif,是MEME的一个后续的分析,可以在MEME运行结束后,通过超级链接继续,也可以通过保存meme的文本格式文件。GLAM2类似于MEME,但允许莫提中有空位。GLAM2SCAN类似于MAST,MAST不允许模体中有空位,GLAM2SCAN允许模体中有空位。MEME有web和Linux两个版本,web版地址:。整个工具包设计逻辑如下:


motif分析-MEME_第1张图片
image.png

2.2MEME实现方法

2.2.1使用示例

meme test.fa -protein -oc result -nostatus -time 1800000 -mod zoops -nmotifs 3 -minw 6 -maxw 13 -objfun classic -markov_order 0(同web版参数)

2.2.2程序说明

-protein 待预测的为蛋白序列
-oc result 输出路径
-nostatus 不将软件计算过程输出到屏幕上
-time 1800000 CPU消耗时间达到

2.2.3软件参数详细说明

Usage: meme  [optional arguments]
 file containing sequences in FASTA format
[-h] print this message
[-o ] name of directory for output files,will not replace existing directory
[-oc ] name of directory for output files,will replace existing directory
[-text] output in text format (default is HTML)
[-objfun classic|de|se|cd|ce] obxxxxjective function (default: classic)
[-test mhg|mbn|mrs] statistical test type (default: mhg)
[-use_llr] use LLR in search for starts in Classic mode
[-neg ] file containing control sequences
[-shuf ] preserve frequencies of k-mers of size  ,when shuffling (default: 2)
[-hsfrac ] fraction of primary sequences in holdout set (default: 0.5)
[-cefrac ] fraction sequence length for CE region (default: 0.25)
[-searchsize ]maximum portion of primary dataset to use,for motif search (in characters)
[-maxsize ] maximum dataset size in characters
[-norand] do not randomize the order of the input ,sequences with -searchsize
[-csites ] maximum number of sites for EM in Classic mode
[-seed ] random seed for shuffling and sampling
[-dna] sequences use DNA alphabet
[-rna] sequences use RNA alphabet
[-protein] sequences use protein alphabet
[-alph ] sequences use custom alphabet
[-revcomp] allow sites on + or - DNA strands
[-pal] force palindromes (requires -dna)
[-mod oops|zoops|anr] distribution of motifs
[-nmotifs ] maximum number of motifs to find
[-evt ] stop if motif E-value greater than 
[-time ] quit before  CPU seconds consumed
[-nsites ] number of sites for each motif
[-minsites ] minimum number of sites for each motif
[-maxsites ] maximum number of sites for each motif
[-wnsites ] weight on expected number of sites
[-w ] motif width
[-minw ]     minimum motif width
[-maxw ] maximum motif width
[-allw] test starts of all widths from minw to maxw
[-nomatrim] do not adjust motif width using multiple
 alignment
[-wg ] gap opening cost for multiple alignments
[-ws ] gap extension cost for multiple alignments
[-noendgaps] do not count end gaps in multiple alignments
[-bfile ] name of background Markov model file
[-markov_order ] (maximum) order of Markov model to use or create
[-psp ] name of positional priors file
[-maxiter ] maximum EM iterations to run
[-distance ] EM convergence criterion
[-prior dirichlet|dmix|mega|megap|addone] type of prior to use
[-b ] strength of the prior
[-plib ] name of Dirichlet prior file
[-spfuzz ] fuzziness of sequence to theta mapping
[-spmap uni|pam] starting point seq to theta mapping type
[-cons ] consensus sequence to start EM from
[-brief ] omit sites and sequence tables in output if more than  primary sequences
[-nostatus] do not print progress reports to terminal
[-p ] use parallel version with  processors
[-sf ] print  as name of sequence file
[-V] verbose mode
[-version] display the version number and exit

2.2.4结果展示及说明

meme.html -交互式的、可读性强的HTML格式展示的结果
meme.txt -兼容早期MEME版本的纯文本文件结果
meme.xmxxxxl -为机器处理设计的xmxxxxl格式的结果文件
logoN.png.eps - PNG and EPS 格式的miotif logos文件


motif分析-MEME_第2张图片
image.png

注:氨基酸字符大小表示该位点出现8某种氨基酸频率的高低

2.3 注意事项

a)MEME不支持motif中有gap。
b)Linux下Motif检测使用的参数同web版MEME

2.4软件相关文献引用

Timothy L. Bailey and Charles Elkan "Fitting a mixture model by expectation maximization to discover motifs in biopolymers" Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology pp. 28-36 AAAI Press Menlo Park California 1994.

原创文字,如果觉得对你有帮助留下你的赞哦~

你可能感兴趣的:(motif分析-MEME)