将包含各糖类活性酶家族的隐马尔科夫序列特征谱下载自dbCAN数据库(Yin et al.,2012)。使用Hmmscan软件(Eddy,2009)进行糖类活性酶的注释,在这一过程中,序列特征谱作为搜索的目标,而包含各真菌蛋白质组序列的文件作为搜索对象。生成的初步结果使用dbCAN提供的hmmscan-parser脚本程序进行处理。即 CAZymes
通过Blastp与KinBase(http://kinase.com/kinbase/)数据库进行比对,选用e-value < e-10进行筛选,使用perl软件包对蛋白家族的基因数量进行分类统计,得到FUNCAT数据库的数量矩阵。
KinBase: Kinase Database at Manning's Group
The Mushroom Kinome
# 下载好的KinBase数据库位置如下
/media/aa/DATA/SZQ2/bj_software/KinBase/C.cinerea_kin_dom.fasta
/media/aa/DATA/SZQ2/bj_software/KinBase/C.cinerea_AAprotein.fasta
# 在(pfam_scan)下
conda activate pfam_scan
# 构建数据库
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/KinBase/Ccinerea_kin_dom.fasta -d Ccinerea_kin_dom
# 新建文件夹
mkdir 18.KinBase && cd 18.KinBase
# 1)diamond blastp evalue为1e-10
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/KinBase/Ccinerea_kin_dom.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.KinBase10.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.KinBase.list
ParaFly -c command.KinBase.list -CPU 48
# 新建文件夹
mkdir KinBase10.tab && cd KinBase10.tab
# 2)parsing_blast_result.pl
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.KinBase10.xml > $i.KinBase10.tab"
done > command.KinBase.list
ParaFly -c command.KinBase.list -CPU 48
# 在(jcvi)下
conda activate jcvi
# 3)比对结果中筛选每个query的最佳subject
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.KinBase10.tab"
done > command.jcvi.list
ParaFly -c command.jcvi.list -CPU 48
# 4)复制并重命名
mkdir best && cd best
cp ../*.KinBase10.tab.best ./
# 查看每个文件里有多少行,“行数-1”即为注释出给结果总数
wc -l *.KinBase10.tab.best
将包含微管蛋白家族(PF00091.20)和Misato家族(PF10644_misato)种子序列的HMM谱下载自Pfam数据库。使用Hmmsacn软件进行序列的鉴定。初步得到的微管蛋白序列通过系统发育分析方法进行家族分类。
直接从之前的Pfam结果里面筛选。
Pfam: Family: Tubulin (PF00091)
Family: Misat_Tub_SegII (PF10644)
Family: NAD_binding_10 (PF13460)
Family: Tubulin_2 (PF13809)
Family: Tubulin_3 (PF14881)
Family: Tubulin_C (PF03953)
GCP_C_terminal | PF04130.16 | Gamma tubulin complex component C-terminal |
GCP_N_terminal | PF17681.4 | Gamma tubulin complex component N-terminal |
MOZART1 | PF12554.11 | Mitotic-spindle organizing gamma-tubulin ring associated |
TBCA | PF02970.19 | Tubulin binding cofactor A |
TBCC | PF07986.15 | Tubulin binding cofactor C |
TBCC_N | PF16752.8 | Tubulin-specific chaperone C N-terminal domain |
TFCD_C | PF12612.11 | Tubulin folding cofactor D C terminal |
TTL | PF03133.18 | Tubulin-tyrosine ligase family |
Tubulin | PF00091.28 | Tubulin/FtsZ family, GTPase domain |
Tubulin_3 | PF14881.9 | Tubulin domain |
Tubulin_C | PF03953.20 | Tubulin C-terminal domain |
Misat_Tub_SegII | PF10644.12 | Misato Segment II tubulin-like domain |
NAD_binding_10 | PF13460.9 | NAD_binding_10 |
基于Non-Redundant Protein Database(NR)(https://www.ncbi.nlm.nih.gov/protein/)、Swiss-Prot(https://www.uniprot.org/)和Pfam(http://pfam.xfam.org/)数据库鉴定同源域转录因子基因(HD)和信息素/受体基因,并通过BLAST搜索进一步鉴定MAT基因座旁的基因。以下序列被用作查询(在NCBI下载):
用于鉴定线粒体中间肽酶基因(mip)
来自Fomitiporia mediterranea的XP_007265184.1
来自Schizophyllum commune的XP_003038723.1
来自平菇Pleurotus ostreatus的XP_036634433.1
来自双孢蘑菇Agaricus bisporus的XP_007325204.1
来自云芝Trametes versicolor的XP_008032819.1
则用于鉴定β侧翼基因(β-fg)
来自Heterobasidion irregular的XP_009540982.1
来自双孢蘑菇Agaricus bisporus的XP_006454075.1
来自Coprinopsis cinerea的XP_001829147.2
# 下载好的PHI数据库位置如下
/media/aa/DATA/SZQ2/bj_software/MAT/mip/
XP_003038723_1.fasta XP_007265184_1.fasta XP_007325204_1.fasta XP_008032819_1.fasta XP_036634433_1.fasta
/media/aa/DATA/SZQ2/bj_software/MAT/βfg/
XP_001829147_2.fasta XP_006454075_1.fasta XP_009540982_1.fasta
# 在(pfam_scan)下
conda activate pfam_scan
# 构建数据库
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_003038723_1.fasta -d XP_003038723_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007265184_1.fasta -d XP_007265184_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007325204_1.fasta -d XP_007325204_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_008032819_1.fasta -d XP_008032819_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_036634433_1.fasta -d XP_036634433_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_001829147_2.fasta -d XP_001829147_2
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_006454075_1.fasta -d XP_006454075_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_009540982_1.fasta -d XP_009540982_1
# 新建文件夹
mkdir 20.MAT && cd 20.MAT
# 1)diamond blastp evalue为1e-10
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_003038723_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_003038723_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_003038723_1.list
ParaFly -c command.XP_003038723_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007265184_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_007265184_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_007265184_1.list
ParaFly -c command.XP_007265184_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007325204_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_007325204_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_007325204_1.list
ParaFly -c command.XP_007325204_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_008032819_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_008032819_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_008032819_1.list
ParaFly -c command.XP_008032819_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_036634433_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_036634433_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_036634433_1.list
ParaFly -c command.XP_036634433_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_001829147_2.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_001829147_210.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_001829147_2.list
ParaFly -c command.XP_001829147_2.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_006454075_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_006454075_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_006454075_1.list
ParaFly -c command.XP_006454075_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_009540982_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_009540982_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_009540982_1.list
ParaFly -c command.XP_009540982_1.list -CPU 48
# 新建文件夹
mkdir MAT10.tab && cd MAT10.tab
# 2)parsing_blast_result.pl
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_003038723_110.xml > $i.XP_003038723_110.tab"
done > command.XP_003038723_1.list
ParaFly -c command.XP_003038723_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_007265184_110.xml > $i.XP_007265184_110.tab"
done > command.XP_007265184_1.list
ParaFly -c command.XP_007265184_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_007325204_110.xml > $i.XP_007325204_110.tab"
done > command.XP_007325204_1.list
ParaFly -c command.XP_007325204_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_008032819_110.xml > $i.XP_008032819_110.tab"
done > command.XP_008032819_1.list
ParaFly -c command.XP_008032819_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_036634433_110.xml > $i.XP_036634433_110.tab"
done > command.XP_036634433_1.list
ParaFly -c command.XP_036634433_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_001829147_210.xml > $i.XP_001829147_210.tab"
done > command.XP_001829147_2.list
ParaFly -c command.XP_001829147_2.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_006454075_110.xml > $i.XP_006454075_110.tab"
done > command.XP_006454075_1.list
ParaFly -c command.XP_006454075_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_009540982_110.xml > $i.XP_009540982_110.tab"
done > command.XP_009540982_1.list
ParaFly -c command.XP_009540982_1.list -CPU 48
# 在(jcvi)下
conda activate jcvi
# 3)比对结果中筛选每个query的最佳subject
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_003038723_110.tab"
done > command.jcviXP_003038723_1.list
ParaFly -c command.jcviXP_003038723_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_007265184_110.tab"
done > command.jcviXP_007265184_1.list
ParaFly -c command.jcviXP_007265184_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_007325204_110.tab"
done > command.jcviXP_007325204_1.list
ParaFly -c command.jcviXP_007325204_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_008032819_110.tab"
done > command.jcviXP_008032819_1.list
ParaFly -c command.jcviXP_008032819_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_036634433_110.tab"
done > command.jcviXP_036634433_1.list
ParaFly -c command.jcviXP_036634433_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_001829147_210.tab"
done > command.jcviXP_001829147_2.list
ParaFly -c command.jcviXP_001829147_2.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_006454075_110.tab"
done > command.jcviXP_006454075_1.list
ParaFly -c command.jcviXP_006454075_1.list -CPU 48
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
echo "python -m jcvi.formats.blast best -n 1 $i.XP_009540982_110.tab"
done > command.jcviXP_009540982_1.list
ParaFly -c command.jcviXP_009540982_1.list -CPU 48
# 4)复制并重命名
mkdir best && cd best
cp ../*.tab.best ./
# 查看每个文件里有多少行,“行数-1”即为注释出给结果总数
wc -l *.XP_003038723_110.tab.best
wc -l *.XP_007265184_110.tab.best
wc -l *.XP_007325204_110.tab.best
wc -l *.XP_008032819_110.tab.best
wc -l *.XP_036634433_110.tab.best
wc -l *.XP_001829147_210.tab.best
wc -l *.XP_006454075_110.tab.best
wc -l *.XP_009540982_110.tab.best
统计数据