子实体形态相关基因鉴定

糖类活性酶功能注释

将包含各糖类活性酶家族的隐马尔科夫序列特征谱下载自dbCAN数据库(Yin et al.,2012)。使用Hmmscan软件(Eddy,2009)进行糖类活性酶的注释,在这一过程中,序列特征谱作为搜索的目标,而包含各真菌蛋白质组序列的文件作为搜索对象。生成的初步结果使用dbCAN提供的hmmscan-parser脚本程序进行处理。即 CAZymes

蛋白激酶功能注释

通过BlastpKinBasehttp://kinase.com/kinbase/数据库进行比对,选用e-value < e-10进行筛选,使用perl软件包对蛋白家族的基因数量进行分类统计,得到FUNCAT数据库的数量矩阵。

KinBase: Kinase Database at Manning's Group

The Mushroom Kinome

  • Human: protein, kinase domain, RNA
  • Mouse: protein, kinase domain, RNA
  • Sea Urchin: protein, kinase domain, RNA
  • Fruit Fly: protein, kinase domain, RNA
  • Nematode worm: protein, kinase domain, RNA
  • A.queenslandica: protein, kinase domain, RNA
  • M.brevicollis: protein, kinase domain, RNA
  • Bakers Yeast: protein, kinase domain, RNA Saccharomyces cerevisiae
  • C.cinerea: protein, kinase domain, RNA Coprinopsis cinerea 下载
  • Slime mold: protein, kinase domain, RNA
  • Tetrahymena: protein, kinase domain, RNA
  • G.lamblia: protein, kinase domain, RNA
  • L.major: protein, kinase domain, RNA
  • T.vaginalis: protein, kinase domain, RNA
  • S.moellendorffii: protein, kinase domain, RNA
# 下载好的KinBase数据库位置如下
/media/aa/DATA/SZQ2/bj_software/KinBase/C.cinerea_kin_dom.fasta
/media/aa/DATA/SZQ2/bj_software/KinBase/C.cinerea_AAprotein.fasta
# 在(pfam_scan)下
conda activate pfam_scan
# 构建数据库
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/KinBase/Ccinerea_kin_dom.fasta -d Ccinerea_kin_dom
# 新建文件夹
mkdir 18.KinBase && cd 18.KinBase
# 1)diamond blastp   evalue为1e-10
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/KinBase/Ccinerea_kin_dom.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.KinBase10.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.KinBase.list
ParaFly -c command.KinBase.list -CPU 48
# 新建文件夹
mkdir KinBase10.tab && cd KinBase10.tab
# 2)parsing_blast_result.pl
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.KinBase10.xml > $i.KinBase10.tab"
done > command.KinBase.list
ParaFly -c command.KinBase.list -CPU 48
# 在(jcvi)下
conda activate jcvi
# 3)比对结果中筛选每个query的最佳subject
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.KinBase10.tab"
done > command.jcvi.list
ParaFly -c command.jcvi.list -CPU 48
# 4)复制并重命名
mkdir best && cd best
cp ../*.KinBase10.tab.best ./
# 查看每个文件里有多少行,“行数-1”即为注释出给结果总数
wc -l *.KinBase10.tab.best 

微管蛋白功能注释

将包含微管蛋白家族(PF00091.20Misato家族(PF10644_misato种子序列的HMM谱下载自Pfam数据库。使用Hmmsacn软件进行序列的鉴定。初步得到的微管蛋白序列通过系统发育分析方法进行家族分类。

直接从之前的Pfam结果里面筛选。

Pfam: Family: Tubulin (PF00091)

子实体形态相关基因鉴定_第1张图片Family: Misat_Tub_SegII (PF10644)

Family: NAD_binding_10 (PF13460)

Family: Tubulin_2 (PF13809)

Family: Tubulin_3 (PF14881)

Family: Tubulin_C (PF03953)

GCP_C_terminal PF04130.16 Gamma tubulin complex component C-terminal
GCP_N_terminal PF17681.4 Gamma tubulin complex component N-terminal
MOZART1 PF12554.11 Mitotic-spindle organizing gamma-tubulin ring associated
TBCA PF02970.19 Tubulin binding cofactor A
TBCC PF07986.15 Tubulin binding cofactor C
TBCC_N PF16752.8 Tubulin-specific chaperone C N-terminal domain
TFCD_C PF12612.11 Tubulin folding cofactor D C terminal
TTL PF03133.18 Tubulin-tyrosine ligase family
Tubulin PF00091.28 Tubulin/FtsZ family, GTPase domain
Tubulin_3 PF14881.9 Tubulin domain
Tubulin_C PF03953.20 Tubulin C-terminal domain
Misat_Tub_SegII PF10644.12 Misato Segment II tubulin-like domain
NAD_binding_10 PF13460.9 NAD_binding_10

交配型(MAT)基因座的基因组结构分析

基于Non-Redundant Protein DatabaseNR)(https://www.ncbi.nlm.nih.gov/protein/)、Swiss-Prothttps://www.uniprot.org/)和Pfamhttp://pfam.xfam.org/)数据库鉴定同源域转录因子基因(HD)和信息素/受体基因,并通过BLAST搜索进一步鉴定MAT基因座旁的基因。以下序列被用作查询(在NCBI下载)

用于鉴定线粒体中间肽酶基因(mip

来自Fomitiporia mediterraneaXP_007265184.1

来自Schizophyllum commune的XP_003038723.1

来自平菇Pleurotus ostreatusXP_036634433.1

来自双孢蘑菇Agaricus bisporusXP_007325204.1

来自云芝Trametes versicolorXP_008032819.1

则用于鉴定β侧翼基因(β-fg

来自Heterobasidion irregularXP_009540982.1

来自双孢蘑菇Agaricus bisporus的XP_006454075.1

来自Coprinopsis cinerea的XP_001829147.2

# 下载好的PHI数据库位置如下
/media/aa/DATA/SZQ2/bj_software/MAT/mip/
XP_003038723_1.fasta  XP_007265184_1.fasta  XP_007325204_1.fasta  XP_008032819_1.fasta  XP_036634433_1.fasta
/media/aa/DATA/SZQ2/bj_software/MAT/βfg/
XP_001829147_2.fasta  XP_006454075_1.fasta  XP_009540982_1.fasta
# 在(pfam_scan)下
conda activate pfam_scan
# 构建数据库
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_003038723_1.fasta -d XP_003038723_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007265184_1.fasta -d XP_007265184_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007325204_1.fasta -d XP_007325204_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_008032819_1.fasta -d XP_008032819_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_036634433_1.fasta -d XP_036634433_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_001829147_2.fasta -d XP_001829147_2
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_006454075_1.fasta -d XP_006454075_1
diamond makedb --in /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_009540982_1.fasta -d XP_009540982_1
# 新建文件夹
mkdir 20.MAT && cd 20.MAT
# 1)diamond blastp   evalue为1e-10
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_003038723_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_003038723_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_003038723_1.list
ParaFly -c command.XP_003038723_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007265184_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_007265184_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_007265184_1.list
ParaFly -c command.XP_007265184_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_007325204_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_007325204_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_007325204_1.list
ParaFly -c command.XP_007325204_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_008032819_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_008032819_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_008032819_1.list
ParaFly -c command.XP_008032819_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/mip/XP_036634433_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_036634433_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_036634433_1.list
ParaFly -c command.XP_036634433_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_001829147_2.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_001829147_210.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_001829147_2.list
ParaFly -c command.XP_001829147_2.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_006454075_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_006454075_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_006454075_1.list
ParaFly -c command.XP_006454075_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "diamond blastp --db /media/aa/DATA/SZQ2/bj_software/MAT/βfg/XP_009540982_1.fasta --query /media/aa/Expansion/szq2/bj/b.OrthoFinder/compliantFasta4/compliantFasta/$i.fasta --out $i.XP_009540982_110.xml --outfmt 5 --sensitive --max-target-seqs 20 --evalue 1e-10 --id 20 --tmpdir /dev/shm --index-chunks 1"
done > command.XP_009540982_1.list
ParaFly -c command.XP_009540982_1.list -CPU 48

# 新建文件夹
mkdir MAT10.tab && cd MAT10.tab
# 2)parsing_blast_result.pl
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_003038723_110.xml > $i.XP_003038723_110.tab"
done > command.XP_003038723_1.list
ParaFly -c command.XP_003038723_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_007265184_110.xml > $i.XP_007265184_110.tab"
done > command.XP_007265184_1.list
ParaFly -c command.XP_007265184_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_007325204_110.xml > $i.XP_007325204_110.tab"
done > command.XP_007325204_1.list
ParaFly -c command.XP_007325204_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_008032819_110.xml > $i.XP_008032819_110.tab"
done > command.XP_008032819_1.list
ParaFly -c command.XP_008032819_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_036634433_110.xml > $i.XP_036634433_110.tab"
done > command.XP_036634433_1.list
ParaFly -c command.XP_036634433_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_001829147_210.xml > $i.XP_001829147_210.tab"
done > command.XP_001829147_2.list
ParaFly -c command.XP_001829147_2.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_006454075_110.xml > $i.XP_006454075_110.tab"
done > command.XP_006454075_1.list
ParaFly -c command.XP_006454075_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "/media/aa/DATA2/bin/parsing_blast_result.pl --evalue 1e-10 --HSP-num 1 --out-hit-confidence --suject-annotation ../$i.XP_009540982_110.xml > $i.XP_009540982_110.tab"
done > command.XP_009540982_1.list
ParaFly -c command.XP_009540982_1.list -CPU 48

# 在(jcvi)下
conda activate jcvi
# 3)比对结果中筛选每个query的最佳subject
for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_003038723_110.tab"
done > command.jcviXP_003038723_1.list
ParaFly -c command.jcviXP_003038723_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_007265184_110.tab"
done > command.jcviXP_007265184_1.list
ParaFly -c command.jcviXP_007265184_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_007325204_110.tab"
done > command.jcviXP_007325204_1.list
ParaFly -c command.jcviXP_007325204_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_008032819_110.tab"
done > command.jcviXP_008032819_1.list
ParaFly -c command.jcviXP_008032819_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_036634433_110.tab"
done > command.jcviXP_036634433_1.list
ParaFly -c command.jcviXP_036634433_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_001829147_210.tab"
done > command.jcviXP_001829147_2.list
ParaFly -c command.jcviXP_001829147_2.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_006454075_110.tab"
done > command.jcviXP_006454075_1.list
ParaFly -c command.jcviXP_006454075_1.list -CPU 48

for i in `cat /media/aa/DATA/SZQ2/bj/functional_annotation/94listssp.txt`
do
    echo "python -m jcvi.formats.blast best -n 1 $i.XP_009540982_110.tab"
done > command.jcviXP_009540982_1.list
ParaFly -c command.jcviXP_009540982_1.list -CPU 48
# 4)复制并重命名
mkdir best && cd best
cp ../*.tab.best ./
# 查看每个文件里有多少行,“行数-1”即为注释出给结果总数
wc -l *.XP_003038723_110.tab.best
wc -l *.XP_007265184_110.tab.best
wc -l *.XP_007325204_110.tab.best
wc -l *.XP_008032819_110.tab.best
wc -l *.XP_036634433_110.tab.best
wc -l *.XP_001829147_210.tab.best
wc -l *.XP_006454075_110.tab.best
wc -l *.XP_009540982_110.tab.best

子实体形态相关基因鉴定_第2张图片

 统计数据

你可能感兴趣的:(基因组功能注释,数据库,ubuntu,服务器)