kaks计算

简介

简单多线程快速计算同源基因对kaks

依赖工具

  • ParaAT2.0
  • KaKs_Calculator2.0

ParaAT 使用说明

export PATH=/storage_wut/user/software/ParaAT2.0:$PATH

cd /storage_wut/user/software/ParaAT2.0

ParaAT.pl -h test.homologs -n test.cds -a test.pep -p proc -o output -f axt
--------------------------------
-h, 指定同源基因列表文件
-n, 指定核酸序列文件
-a, 指定蛋白序列文件
-p, 指定多线程文件                      ## 文件中给定线程数,默认为6
-m, 指定比对工具                        ## muscle
-g, 去除比对有gap的密码子
-k, 用KaKs_Calculator                   ## 计算kaks值
-o, 输出结果的目录
-f, 输出比对文件的格式

计算 kaks

echo start at time `date +%F'  '%H:%M:%S`

export PATH=/storage_wut/user/software/ParaAT2.0:$PATH
export PATH=/storage_wut/user/software/KaKs_Calculator2.0/bin/Linux/:$PATH

cd /storage_wut/user/project/06lumeng_project/19.homologs_kaks/01.kaks

ParaAT.pl -h ../00.data/A_CC.collinearity_one2one.dat -n ../00.data/homo.gene.cds.fa -a ../00.data/homo.gene.pep.fa -p proc -m muscle -f axt -g -k -o result_dir

cat ./result_dir/*kaks |awk 'NR==1;NR>=1 { print $0| "grep -v Sequence"}' > ../all.kaks.result.xls
less all.kaks.result.xls  |cut -f 5|grep -v 'NA' > kaks.list

echo finish at time `date +%F'  '%H:%M:%S`

### all.kaks.result.xls 文件格式
Sequence        Method  Ka      Ks      Ka/Ks   P-Value(Fisher) Length  S-Sites N-Sites Fold-Sites(0:2:4)       Substitutions   S-Substitutions N-Substitutio
Cg-F_10146-gene7838     MA      0.0194491       0.172237        0.112921        6.96313e-06     303     67.5573 235.443 NA      14      10.0464 3.95362 NA
Cg-F_11450-gene46992    MA      0.018447        0.18238 0.101146        8.74657e-22     1335    376.13  958.87  NA      75      59.6254 15.3746 NA      NA
Cg-F_11533-gene3021     MA      0.0364833       0.133713        0.272848        3.03892e-07     984     254.578 729.422 NA      56      31.4295 24.5705 NA
Cg-F_11705-gene4507     MA      0.043183        0.281557        0.153372        5.71615e-10     450     99.3644 350.636 NA      37      24.007  12.993  NA
Cg-F_11829-gene26952    MA      0.0670496       0.195014        0.343819        0.000123585     528     128.586 399.414 NA      47      22.7275 24.2725 NA
Cg-F_12075-gene67778    MA      0.163755        0.446331        0.366892        4.00233e-08     510     129.087 380.913 NA      96      46.0956 49.9044 NA
Cg-F_12095-gene37099    MA      0.0459748       0.131137        0.350585        3.28611e-05     1056    236.285 819.715 NA      64      28.8778 35.1222 NA
Cg-F_12212-gene32496    MA      0.0351454       0.113734        0.309015        0.000255903     639     182.649 456.351 NA      34      19.1865 14.8135 NA
Cg-F_12217-gene33956    MA      0.0545515       0.128713        0.423823        0.00831507      552     132.318 419.682 NA      37      15.7831 21.2169 NA

绘制 kaks 条形图

rm(list = ls())
library(ggplot2)
windowsFonts(myFont = windowsFont("Times New Roman"))
setwd("D:\\gooagle_data\\work_r\\kaks")
data <- read.table("kaks.list",sep='\t')
ggplot(data,aes(V1))+ geom_histogram(color='#39A0FE',fill='#39A0FE', binwidth = 0.5)

ggplot(data,aes(V1))+ geom_histogram(fill='#39A0FE', binwidth = 0.03,color='white')+ 
  ylab(label = 'Number of gene pair')+xlab(label = 'ka/ks')+theme_classic()+
  theme(axis.title = element_text(size=20),axis.text = element_text(size = 18,color = "black"))+
  scale_x_continuous(limits = c(-0.1,5),breaks=c(0,1,2,3,4,5))

ka/ks条形图.png

参考资料

kaks计算--刘辉
一键批量计算kaks
使用ParaAT和kaks_calculator批量Kaks批量计算

你可能感兴趣的:(kaks计算)