生信漫谈如何利用MEGA7构建系统进化树

前言

生物技术近年发展越来迅猛,掌握一门生信语言或者一个生信软件的使用,这将为我们的科研学习之路提供非常大的便利。今天我们主要来介绍如何用MEGA7进行进化树。

1、下面以MEGA7为例来进行讲解,下面是下载地址,大家根据自己的系统进行下载即可。

http://www.megasoftware.net/

生信漫谈如何利用MEGA7构建系统进化树_第1张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第2张图片

 

2、序列的准备,必须是fasta结尾的格式,其他像txt格式,软件不能识别,以下以拟南芥SPL15基因的蛋白序列为例,进行同源序列查找

>NP_191351.1 SPL15 [organism=Arabidopsis thaliana] [GeneID=824961]
MELLMCSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSRSKNRVNTVRKSSTTARCQVEGCRMDLSN
VKAYYSRHKVCCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALF
TSHYSRIAPSLYGNPNAAMIKSVLGDPTAWSTARSVMQRPGPWQINPVRETHPHMNVLSHGSSSFTTCPE
MINNNSTDSSCALSLLSNSYPIHQQQLQTPTNTWRPSSGFDSMISFSDKVTMAQPPPISTHQPPISTHQQ
YLSQTWEVIAGEKSNSHYMSPVSQISEPADFQISNGTTMGGFELYLHQQVLKQYMEPENTRAYDSSPQHF
NWSL

生信漫谈如何利用MEGA7构建系统进化树_第3张图片

选择其中同源序列高的前19条蛋白序列进行下载进行示范

生信漫谈如何利用MEGA7构建系统进化树_第4张图片

 

下载后的序列形式:

>AST51816.1 Venus [Cloning vector pSTB205]
MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKLICTTGKLPVPWPTLVTTLGYGLQCFARYPDHMK
QHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKN
GIKANFKIRHNIEDGGVQLADHYQQNTPIGDGPVLLPDNHYLSYQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKE
LLMCSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSRSKNRVNTVRKSSTTARCQVEGCRMDLSNVKAYYSRHKVCC
IHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSHYSRIAPSLYGNPNAAMIKS
VLGDPTAWSTARSVMQRPGPWQINPVRETHPHMNVLSHGSSSFTTCPEMINNNSTDSSCALSLLSNSYPIHQQQLQTPTN
TWRPSSGFDSMISFSDKVTMAQPPPISTHQPPISTHQQYLSQTWEVIAGEKSNSHYMSPVSQISEPADFQISNGTTMGGF
ELYLHQQVLKQYMEPENTRAYDSSPQHFNWSL
>NP_191351.1 squamosa promoter binding protein-like 15 [Arabidopsis thaliana]
MELLMCSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSRSKNRVNTVRKSSTTARCQVEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSHYSRIAPSLYGNPNAAMI
KSVLGDPTAWSTARSVMQRPGPWQINPVRETHPHMNVLSHGSSSFTTCPEMINNNSTDSSCALSLLSNSYPIHQQQLQTP
TNTWRPSSGFDSMISFSDKVTMAQPPPISTHQPPISTHQQYLSQTWEVIAGEKSNSHYMSPVSQISEPADFQISNGTTMG
GFELYLHQQVLKQYMEPENTRAYDSSPQHFNWSL
>KAG7634825.1 SBP domain superfamily [Arabidopsis suecica]
MELLMGSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSRSKNRVNTVRKSSTTARCQVEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSHYSRIAPSLYGNPNAAMI
KSVLGDPTAWSTARSVMQRPGPWQINPVRETHPHMNVLSHGSSSFTTCPEMINNNSTDSSCALSLLSNSYPIHQQQLQTP
TNTWRPSSGFDSMISFSDKVTMAQPPPISTHQPPISTHQQYLSQTWEVIAGEKSNSHYMSPVSQISEPADFQISNGTTMG
GFELYLHQQVLKQYMEPENTRAYDSSPQHFNWSL
>CAA0387110.1 unnamed protein product [Arabidopsis thaliana]
MELLMGSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSRSKNRVNTVRKSSTTARCQVEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSHYSRIAPSLYGNPNAAMI
KSVLGDPTAWSTARSVMQRPGPWQINPVRETHPHMNVLSHGSSSFTTCPEMINNNSTDSSCALSLLSNSYPIHQQQLQTP
TNTWRPSSGFDSMISFSDKVTMAQPPPISTHQPPISTHQQYLSQTWEVIAGEKSNSHYMSPVSQISEPVDFQISNGTTMG
GFELYLHQQVLKQYMEPENTRAYDSSPQHFNWSL
>CAD5326126.1 unnamed protein product [Arabidopsis thaliana]
MELLMGSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSRSKNRVNTVRKSSTTARCQVEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSHYSRIAPSLYGNPNAAMIKSVLGDP
TAWSTARSVMQRPGPWQINPVRETHPHMNVLSHGSSSFTTCPEMINNNSTDSSCALSLLSNSYPIHQQQLQTPTNTWRPS
SGFDSMISFSDKVTMAQPPPISTHQPPISTHQQYLSQTWEVIAGEKSNSHYMSPVSQISEPVDFQISNGTTMGGFELYLH
QQVLKQYMEPENTRAYDSSPQHFNWSL
>KAG7561265.1 SBP domain superfamily [Arabidopsis thaliana x Arabidopsis arenosa]
MELLMGSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSGSKNRVNTGRKSTMTARCQVEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSRYSRIAPSLYGNPNAAMI
KSVLGDPMAWSTAKSVMRRSGPWQINPERESHQLLNVLSHGSSSFTTCPEIINNNSTDSSCALSLLSNSNPIQQQQLQTP
TNLWRPSSGFDSLISFSDRVTMAQPPPISTHHQYLSQTWEVMAGEKSNSHYISPVSQISEPADFQISNGTTMGGFELSLH
QQVLRQYMEPENTRAYDSSPQHFNWSL
>XP_002878178.1 squamosa promoter-binding-like protein 15 [Arabidopsis lyrata subsp. lyrata]
MELLMGSGQAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSGSKNRVNTGRKSTMTARCQVEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTALFTSRYTRIAPSLYGNPNAAMI
KSVLGDPTAWSTARSVMRRSGPWQINPERESHQIMNVLSHGSSSFTTCPEITNNNSTDSSCALSLLSNSNPIQQQQLQTP
TNLWRPSSGFDSMISFSDRVTMAQPPPISTHHQYLSQTWDVMAGGKSNSHYMSPVSQISEPAEFQISNGTTMGGFELSLH
QQVLRQYMEPENTRAYDSSPQHFNWSL
>KAG7566101.1 SBP domain [Arabidopsis suecica]
MELLMGSGHAESGGSSSTESSSLSGGLRFGQKIYFEDGSGSGSKNRVNTGRKSTMTARCQLEGCRMDLSNVKAYYSRHKV
CCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTSSLFTSRYSRIAPSLYGNPNAAMI
KSVLGDPMAWSTAKSVMRRSGPWQINPERESHQLLNVLSHGSSSFTTCPEIINNNSTDSSCALSLLSNSNPIQQQQLQTP
TNLWRPSSGFDSLISFSDRVTMAQPPPISTHHQYLSQTWEVMAGEKSNSHYISPVSQISEPAGFQISNGTTMGGFELSLH
QQVLRQYMEPENTRAYDSSPQHFNWSL
>CAE6076605.1 unnamed protein product [Arabidopsis arenosa]
MRRGRGKGKRQNATAREDRGSGEEEKIPAFRRRGRPQKPVKDEIEEEEVELVKKTEEEEDKDDDTNGSVTSKEDVTENGR
KRKKPVESKESNITEEENGVGSKSSTEDSMKSSSSIGFRQNGSRRKNKPRRAAEAVVECNGAESGGSSSTESSSLSGGLR
FGQKIYFEDGSGSGSKNRVNTGRKSTMTARCQVEGCRMDLSNVKAYYSRHKVCCIHSKSSKVIVSGLHQRFCQQCSRFHQ
LSEFDLEKRSCRRRLACHNERRRKPQSTTSLFTSRYSRIAPSLYGNPNAAMIKSVLGDPMAWSTAKSVMRRSGPWQINPE
RESHQLLNVLSHGSSSFTTCPEIINNNSTDSSCALSLLSNSNPIQQQQLQTPTNLWRPSSGFDSLISFSDRVTMAQPPPI
STHHQYLSQTWEVMAGEKSNSHYISPVSQISEPADFQISNGSTMGGFELSLHQQVLRQYMEPENTRAYDSSPQHFNWSL
>XP_006291402.1 squamosa promoter-binding-like protein 15 [Capsella rubella]
MELLMGSGQAESGGSSSTESSLLSGGLRFGQKIYFEDGSGSGSKNRVSTGHKSSMTTVARCQVEGCKMDLSNAKAYYSRH
KVCCIHSKSSKVIVSGLHQRFCQQCSRFHHLSEFDLEKRSCRRRLACHNERRRKPQPATLFTSHYTRIAPSLYGNANAAM
IKSVLGDPTAWSTSRSVMRSSGPWQINPVKESNQLMNVYSQESSSFTITCPEMMNNNSTDSGCALSLLSNSNPIQQQQQQ
PQTQTNIWRSSSGFDSMILDRVTMAQPPPISGHHQYLNQTLAFMAGEKSNSHYMSPVLGPSQISEPDEFQISNGTTMDGF
ELSLHQQVLRQYMEPENTRAYDSSPHYFNWSL
>CAH2063751.1 unnamed protein product [Thlaspi arvense]
MELLMGSGQNRTESYGSSSTESSSLSGGLRFGQKIYFEDGSGSGGGSNKNRVNTGRKSRTARCQVEGCRMDLSNVKTYYS
RHKVCCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQATTSLLTSRYSRIAPSLYGNAN
TAMIRSVLGDPTAWSTARSVMRRSAPWQINPERESHQLMNVFSHDSSSFTTTCPEMMNSNGTDSSCALSLLSNSNTNQQQ
QLLQTSTNIWRPSSGFDSANADRATMAQPPPVSNQHQYLNQTWEFMAGEKSNSHYLSPVLGLSQISEPVDFQISNGTTMG
GFELSIHQQVLRHYMEPENTRAYDSSAQHFNWSL
>XP_010516431.1 PREDICTED: squamosa promoter-binding-like protein 15 [Camelina sativa]
MELLIGGSGQTESGGASSTKSSSLSGGLRFGQKIYFEDGSGSGSKNRVGTGHKSSTTTTTARCQVEGCKMDLSNAKAYYS
RHKVCCIHSKSSKVIVSGLRQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTLYTSQYTRIAPSLYGDANA
AMMKSVLGDPTVWSTARSVMRRSGPWQISPVKESHHQLMNVFSQESSSFTITCPEMMNNNSTDSSCALSLLSNSNSNSNP
IQQQQQQLQTQTHIWRPSLGFDSMTVDRVTMAQPPPISSHHQYLNQTLEFMAGEKSSSHYMSPVLGPSQISEPDEFQISN
GTTMDGFELSLHQQVLRQYMEPENTRAYDSSPHHFNWSL
>AKC05620.1 squamosa promoter-binding-like protein 15 [Cardamine hirsuta]
MELLMGSGQSESGASSSNESSSLSGGLRFGQKIYFEDGSGSGSKNRVSSTGRKSSTTTARCQVEGCRMDLSNAKTYYSRH
KVCCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPATTLFTSRFTRTAPSHYGNANAA
MIKSVLGDPTAWTAERSVMRRSAPWQSNPSHQVMIDFSHGSSSLTTTCPEMMNNTSTDSSCALSLLSNSNQTQQLQQQLQ
TPANIWRASSGFDSMIADRVTMAQPPPISTHHQYLNQSWEFMPGEKNDSHYMSPMSQISEPADLHMRNRTTMGGFEVSLH
QQVMRQYMAPENTRAYDSSPQHFNWSL
>XP_010504729.1 PREDICTED: squamosa promoter-binding-like protein 15 [Camelina sativa]
MELLMGGSGQTESGGASSTESSSLSGGLRFGQKIYFEDGSGSGSKNRVGAGHKSSTTARCQVEGCKMDLSNAKAYYSRHK
VCCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTLYTRIAASLYGNANAAMIKSVL
GDPTVWSTARSVMRRSGPWQINPVKESHHQHMNVFSQESSSFTITCPEMMNNNSTDSSCALSLLSNSNSNPIQQQQQQLQ
TQTNIWRPSSGFDYMTVDRVTLAQPPPIPSHHQYLNQTLEFMTGEKNSSHYMSPALGPSQISAPDEFQISNGTTMDGFEL
SLHQQVLRQYMAPENTRAYDSSPHHFNWSL
>CAA7060637.1 unnamed protein product [Microthlaspi erraticum]
MELLMDSSQTESGGSSSIESSSLTGGLRFGQKIYFEDGSGSGAKSSKNRVNTARKSSTSTARCQVEGCRMDLSNAKTYYS
RHKVCCIHSKSSNVIVSGLHQRFHLLSEFDLEKRSCRRRLACHNERRRKPHATTNLLTSRYSRIAPSLYENANTAIFRSV
LGDTTAWSAARPVMRRSGPWQINPERESNLNVFSHGSSSFTTCPAMMNNNSTDSSCALSLLSNSNTNTNQQQQQPLQTST
DTWRPSSGFDSMIADRVTMAQPPPVSIHNQYLNQSWDFMEGEKSNSHHMSPVLGLSQISEPADFQLSNGMGGGFELSLHQ
QVLKQYMEPENTRAYDSSPQHFNWSL
>KAG2324838.1 hypothetical protein Bca52824_007566 [Brassica carinata]
MELLMGSGQDHPQSAGSSSTLSGGLRFGQKIYFEDGSGAGLSRNRVNNTGRKSMTARCQVEGCRMDLSNAKTYYSRHKVC
CVHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQTTTTLLTSHYSSIAPSLYGNAIRSVLG
DPTLWSTARGSSAPWQINPERESHHQLMNIISFGSSSFTNSTDSSCALSLLSNSNRNQQEQQPLQTPTNAWRPSLDFDSI
VADRVTMAQPPPVSIQNQYLNQTWEFMSGEKSNAHCISPVLGLSQISEPVDFQTSNGATMSGVELSLHQQVLRQYLEPEN
TRAYDSSHQHFNWSL
>CAH8384605.1 unnamed protein product [Eruca vesicaria subsp. sativa]
MELEMGSGQKKPESAGSSSTLSGGLRFGQKIYFEDGSGAGLSKNRVSSTGRKSMTARCQVEGCRTDLSNAKTYYSRHKVC
CVHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQATTTLLTSRYSSLYGNAIRSVLGDPTT
WSTARGSAPWKINQESDRHQLMNVISFGSSSFTTCPEMMNNNSTDSSCALSLLSNSNPNQQEQQPLQTSNTIWRPSLDFD
STVADRVTMAQPPPVSMQNQYLNQTWEFMSGEKSNAQCISPVLGQSQISEPVDFQIGTTMGGGFELSLHQQVLRQYMEPE
NTRAYDTSPQYFNWSL
>KAF8114775.1 hypothetical protein N665_0034s0114 [Sinapis alba]
MELLMGSGQNQPESAGSSSSTLSGGLRFGQKIYFEDGSGAGLSKNRVNTGRKSTTARCQVEGCRMDLSSAKTYYSRHKVC
CIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQATTTFLTSHYSSIAPSLYGNAIRGVLG
DSTTWSTARGSAPLQINPERESHRLMNVFSFGSSSFTNNSTDSSCALSLLSNSNPNQQEQQPLQTPTNTWRPSLDFDSIV
ADRVTMAQPPPVSVQNQYLNQTWEFMSGEKSNGQHYISPVLGLSQISEPVDFQISNGATMSGVELSLHQQVLRQYLEPEN
TRAYDSSPQHFNWSL
>XP_010427684.1 PREDICTED: squamosa promoter-binding-like protein 15 [Camelina sativa]
MELLMGGTESGGASSTESSSLSGGLRFGQKIYFEDGSGSGSKNRVVTGHKSSTTTTTARCQVEGCKMDLSNAKAYYSRHK
VCCIHSKSSKVIVSGLHQRFCQQCSRFHQLSEFDLEKRSCRRRLACHNERRRKPQPTTLFTSHYTRIAPSLYGNANAAMI
KSVLGDPTVWSTARSVMRRSGPWQINPVKESHHQLMNVFSQESSSFTITCPEMMNNNNSTDSSCALSLLSNSNSNPIQQQ
QQQLQTQTNIWRPSLGFDSMTVDRVTLAQPPPILSHHQYMSPVLGPSQISAPDEFQISNVTTMDGFELSLHQQVLRQYME
PQNTRAYDSSPHHFNWSL
 
  
 
  

3、导入蛋白序列,点击File菜单栏导入或者直接拖进MEGA7软件都可以

生信漫谈如何利用MEGA7构建系统进化树_第5张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第6张图片

 

以上任一种形式都可以。

4、多序列比对,导入成功后如下图所示。

生信漫谈如何利用MEGA7构建系统进化树_第7张图片

 

选择Alignment > Align by ClustalW > OK > 默认参数

生信漫谈如何利用MEGA7构建系统进化树_第8张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第9张图片

 

比对后的结果如下图:

生信漫谈如何利用MEGA7构建系统进化树_第10张图片

 

5、系统进化树构建,选择NJ法进行构建系统进化树

生信漫谈如何利用MEGA7构建系统进化树_第11张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第12张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第13张图片

 

6、结果展示

第一种,步长树

生信漫谈如何利用MEGA7构建系统进化树_第14张图片

 

第二种步长对齐树

生信漫谈如何利用MEGA7构建系统进化树_第15张图片

 

其他形状的树,点击以上按钮进行展示

生信漫谈如何利用MEGA7构建系统进化树_第16张图片

 

7、保存nwk树文件,导入树图片进行美化

生信漫谈如何利用MEGA7构建系统进化树_第17张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第18张图片

 

生信漫谈如何利用MEGA7构建系统进化树_第19张图片

 

生信漫谈

生信漫谈,认识生信,学习生信,跨越生信入门路上的障碍,从而利用生信技术解决科研学习路上的绊脚石!

你可能感兴趣的:(生信干货,学习方法)