1.数据下载
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome$ wget https://ndownloader.figshare.com/files/11857577 -O Prochlorococcus_31_genomes.tar.gz
--2018-12-18 14:14:36-- https://ndownloader.figshare.com/files/11857577
Resolving ndownloader.figshare.com (ndownloader.figshare.com)... 34.240.49.185
Connecting to ndownloader.figshare.com (ndownloader.figshare.com)|34.240.49.185|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/11857577/Prochlorococcus_31_genomes.tar.gz [following]
--2018-12-18 14:14:38-- https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/11857577/Prochlorococcus_31_genomes.tar.gz
Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 52.218.36.170
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.218.36.170|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32657898 (31M) [binary/octet-stream]
Saving to: ‘Prochlorococcus_31_genomes.tar.gz’
Prochlorococcus_31_ 100%[===================>] 31.14M 51.8KB/s in 13m 50s
2018-12-18 14:28:30 (38.4 KB/s) - ‘Prochlorococcus_31_genomes.tar.gz’ saved [32657898/32657898]
2.数据解压
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome$ tar -zxvf Prochlorococcus_31_genomes.tar.gz
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome$ cd Prochlorococcus_31_genomes/
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ ls
AS9601.db MIT9311.db
CCMP1375.db MIT9312.db
EQPAC1.db MIT9313.db
external-genomes.txt MIT9314.db
fix_functional_occurence_table.py MIT9321.db
GP2.db MIT9322.db
layer-additional-data.txt MIT9401.db
LG.db MIT9515.db
MED4.db NATL1A.db
MIT9107.db NATL2A.db
MIT9116.db PAC1.db
MIT9123.db pan-state.json
MIT9201.db PROCHLORO-functions-collection.txt
MIT9202.db PROCHLORO-manual-default-state.json
MIT9211.db SB.db
MIT9215.db SS2.db
MIT9301.db SS35.db
MIT9302.db SS51.db
2.泛基因组数据库的构建
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-migrate-db *.db
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ ls
AS9601.db MIT9311.db
CCMP1375.db MIT9312.db
EQPAC1.db MIT9313.db
external-genomes.txt MIT9314.db
fix_functional_occurence_table.py MIT9321.db
GP2.db MIT9322.db
layer-additional-data.txt MIT9401.db
LG.db MIT9515.db
MED4.db NATL1A.db
MIT9107.db NATL2A.db
MIT9116.db PAC1.db
MIT9123.db pan-state.json
MIT9201.db PROCHLORO-functions-collection.txt
MIT9202.db PROCHLORO-manual-default-state.json
MIT9211.db SB.db
MIT9215.db SS2.db
MIT9301.db SS35.db
MIT9302.db SS51.db
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcu(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-gen-genomes-storage -e external-genomes.txt -o PROCHLORO-GENOMES.db
WARNING
===============================================
Good news! Anvi'o found all these functions that are common to all of your
genomes and will use them for downstream analyses and is very proud of you:
'COG_CATEGORY, COG_FUNCTION'.
Internal genomes .............................: 0 have been initialized.
External genomes .............................: 31 found.
PLEASE READ CAREFULLY
===============================================
Some of your genomes had gene calls identified by gene callers other than the
gene caller anvi'o used, which was set to 'prodigal' either by default, or
because you asked for it. The following genomes contained genes that were not
processed (this may be exactly what you expect to happen, but if was not, you
may need to use the `--gene-caller` flag to make sure anvi'o is using the gene
caller it should be using): AS9601 (2 gene calls by "Ribosomal_RNAs"), CCMP1375
(2 gene calls by "Ribosomal_RNAs"), EQPAC1 (2 gene calls by "Ribosomal_RNAs"),
GP2 (2 gene calls by "Ribosomal_RNAs"), LG (2 gene calls by "Ribosomal_RNAs"),
MED4 (2 gene calls by "Ribosomal_RNAs"), MIT9107 (2 gene calls by
"Ribosomal_RNAs"), MIT9116 (2 gene calls by "Ribosomal_RNAs"), MIT9123 (2 gene
calls by "Ribosomal_RNAs"), MIT9201 (2 gene calls by "Ribosomal_RNAs"), MIT9202
(2 gene calls by "Ribosomal_RNAs"), MIT9211 (2 gene calls by "Ribosomal_RNAs"),
MIT9215 (2 gene calls by "Ribosomal_RNAs"), MIT9301 (2 gene calls by
"Ribosomal_RNAs"), MIT9302 (2 gene calls by "Ribosomal_RNAs"), MIT9303 (4 gene
calls by "Ribosomal_RNAs"), MIT9311 (2 gene calls by "Ribosomal_RNAs"), MIT9312
(2 gene calls by "Ribosomal_RNAs"), MIT9313 (4 gene calls by "Ribosomal_RNAs"),
MIT9314 (2 gene calls by "Ribosomal_RNAs"), MIT9321 (2 gene calls by
"Ribosomal_RNAs"), MIT9322 (2 gene calls by "Ribosomal_RNAs"), MIT9401 (2 gene
calls by "Ribosomal_RNAs"), MIT9515 (2 gene calls by "Ribosomal_RNAs"), NATL1A
(2 gene calls by "Ribosomal_RNAs"), NATL2A (2 gene calls by "Ribosomal_RNAs"),
PAC1 (2 gene calls by "Ribosomal_RNAs"), SB (2 gene calls by "Ribosomal_RNAs"),
SS2 (2 gene calls by "Ribosomal_RNAs"), SS35 (2 gene calls by "Ribosomal_RNAs"),
SS51 (2 gene calls by "Ribosomal_RNAs").
* AS9601 is stored with 1,869 genes (0 of which were partial)
* CCMP1375 is stored with 1,826 genes (0 of which were partial)
* EQPAC1 is stored with 1,892 genes (6 of which were partial)
* GP2 is stored with 1,825 genes (22 of which were partial)
* LG is stored with 1,840 genes (24 of which were partial)
* MED4 is stored with 1,891 genes (0 of which were partial)
* MIT9107 is stored with 1,924 genes (20 of which were partial)
* MIT9116 is stored with 1,914 genes (40 of which were partial)
* MIT9123 is stored with 1,931 genes (31 of which were partial)
* MIT9201 is stored with 1,907 genes (38 of which were partial)
* MIT9202 is stored with 1,918 genes (0 of which were partial)
* MIT9211 is stored with 1,740 genes (0 of which were partial)
* MIT9215 is stored with 1,951 genes (0 of which were partial)
* MIT9301 is stored with 1,846 genes (0 of which were partial)
* MIT9302 is stored with 1,957 genes (25 of which were partial)
* MIT9303 is stored with 2,715 genes (0 of which were partial)
* MIT9311 is stored with 1,921 genes (32 of which were partial)
* MIT9312 is stored with 1,900 genes (0 of which were partial)
* MIT9313 is stored with 2,556 genes (0 of which were partial)
* MIT9314 is stored with 1,924 genes (26 of which were partial)
* MIT9321 is stored with 1,884 genes (16 of which were partial)
* MIT9322 is stored with 1,881 genes (17 of which were partial)
* MIT9401 is stored with 1,893 genes (25 of which were partial)
* MIT9515 is stored with 1,871 genes (0 of which were partial)
* NATL1A is stored with 2,030 genes (0 of which were partial)
* NATL2A is stored with 1,991 genes (1 of which were partial)
* PAC1 is stored with 2,059 genes (10 of which were partial)
* SB is stored with 1,855 genes (8 of which were partial)
* SS2 is stored with 1,844 genes (33 of which were partial)
* SS35 is stored with 1,835 genes (17 of which were partial)
* SS51 is stored with 1,833 genes (18 of which were partial)
The new genomes storage ......................: PROCHLORO-GENOMES.db (v6, signature: hash0cde9439)
Number of genomes ............................: 31 (internal: 0, external: 31)
Number of gene calls .........................: 60,223
Number of partial gene calls .................: 409
- 泛基因组分析
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-pan-genome -g PROCHLORO-GENOMES.db --project-name "Prochlorococcus_Pan" --output-dir PROCHLORO --num-threads 12 --minbit 0.5 --mcl-inflation 10 --use-ncbi-blast
#给基因组添加相关信息
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-import-misc-data layer-additional-data.txt -p PROCHLORO/Prochlorococcus_Pan-PAN.db --target-data-table layers
New data for 'layers' in data group 'default'
===============================================
Data key "clade" .............................: Predicted type: str
Data key "light" .............................: Predicted type: str
NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: layers
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-import-state -p PROCHLORO/Prochlorococcus_Pan-PAN.db \
> --state pan-state.json \
> --name default
(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-display-pan -g PROCHLORO-GENOMES.db -p PROCHLORO/Prochlorococcus_Pan-PAN.db
Interactive mode .............................: pan
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PROCHLORO/Prochlorococcus_Pan-PAN.db (v. 12)
Gene cluster homogeneity estimates ..........................: Functional: [YES]; Geometric: [YES]
* Gene clusters are initialized for all 7383 gene clusters in the database.
* The server is now listening the port number "8080". When you are finished, press
CTRL+C to terminate the server.
后面还可以对其泛基因组功能进行分析,感兴趣的大家去anvio网站学习吧!