weixin_33806300

构建NCBI本地BLAST数据库 (NR NT等) | blastx/diamond使用方法 | blast构建索引 | makeblastdb...

参考链接：

FTP README

如何下载 NCBI NR NT数据库？

下载blast：ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+

先了解BLAST Databases：BLAST FTP Site

如何下载NCBI blast数据库？

NCBI提供了一个非常智能化的脚本update_blastdb.pl来自动下载所有blast数据库。

脚本使用方法：

perl update_blastdb.pl nr

有哪些可供下载的blast数据库？

perl update_blastdb.pl --showall

该命令会显示所有可供下载的blast数据库，请自行选择：

16SMicrobial
cdd_delta
env_nr
env_nt
est
est_human
est_mouse
est_others
gss
gss_annot
htgs
human_genomic
landmark
nr
nt
other_genomic
pataa
patnt
pdbaa
pdbnt
ref_prok_rep_genomes
ref_viroids_rep_genomes
ref_viruses_rep_genomes
refseq_genomic
refseq_protein
refseq_rna
refseqgene
sts
swissprot
taxdb
tsa_nr
tsa_nt
vector

这里我选择的是nr数据库。

nohup perl update_blastdb.pl --decompress nr >out.log 2>&1 &

自动在后台下载，然后自动解压。（下载到一半断网了，在运行会接着下载，而不会覆盖已经下载好的文件）

blast如何使用？

这里只演示blastx的使用方法。

刚才下载的nr库就是蛋白库，blastx就是用来将核酸序列比对到蛋白库上的。（nt就是核酸库）

因为我们下载的是已经建好索引的数据库，所以省去了makeblastdb的过程。

常见的命令有下面几个：

-query  要查询的核酸序列

-db  数据库名字

-out  输出文件

-evalue  evalue阈值

-outfmt  输出的格式

blast构建索引 | makeblastdb

makeblastdb -in mature.fa -input_type fasta -dbtype nucl -title miRBase -parse_seqids -out miRBase -logfile File_Name

-in 后接输入文件，你要格式化的fasta序列
-dbtype 后接序列类型，nucl为核酸，prot为蛋白
-title 给数据库起个名，好看~~(不能用在后面搜索时-db的参数)
-parse_seqids 推荐加上，现在有啥原因还没搞清楚
-out 后接数据库名，自己起一个有意义的名字，以后blast+搜索时要用到的-db的参数
-logfile 日志文件，如果没有默认输出到屏幕

资源消耗

blastx -query test.merged.transcript.fasta -db nr -out test.blastx.out

其中fasta文件只有19938行。

可是运行起来耗费了很多资源：

平均内存消耗：51.45G；峰值：115.37G

cpu：1个

运行时间：06:00:24（你敢信？这才是一个小小的test）

所以我强烈推荐用diamond替代blast来做数据库搜索。

blast结果解读

每一个合格的序列比对都会给出一个这样的结果（一个query sequence比对到多个就有多个结果）：

>AAB70410.1 Similar to Schizosaccharomyces CCAAT-binding factor (gb|U88525).
EST gb|T04310 comes from this gene [Arabidopsis thaliana]
Length=208

 Score = 238 bits (607),  Expect = 7e-76, Method: Compositional matrix adjust.
 Identities = 116/145 (80%), Positives = 127/145 (88%), Gaps = 2/145 (1%)
 Frame = +1

Query  253  FWASQYQEIEQTSDFKNHSLPLARIKKIMKADEDVRMISAEAPVVFARACEMFILELTLR  432
            FW +Q++EIE+T+DFKNHSLPLARIKKIMKADEDVRMISAEAPVVFARACEMFILELTLR
Sbjct  39   FWENQFKEIEKTTDFKNHSLPLARIKKIMKADEDVRMISAEAPVVFARACEMFILELTLR  98

Query  433  SWNHTEENKRRTLQKNDIAAAITRNEIFDFLVDIVPREDLKDEVLASIPRGTLPMGAPTE  612
            SWNHTEENKRRTLQKNDIAAA+TR +IFDFLVDIVPREDL+DEVL SIPRGT+P  A
Sbjct  99   SWNHTEENKRRTLQKNDIAAAVTRTDIFDFLVDIVPREDLRDEVLGSIPRGTVPEAA-AA  157

Query  613  GLPYYYMQPQHAPQVGAPGMFMGKP  687
            G PY Y+    AP +G PGM MG P
Sbjct  158  GYPYGYLPAGTAP-IGNPGMVMGNP  181

结果解读网上很多，这里不啰嗦了。

以下是我在同样条件下测试的diamond：

平均内存消耗：11.01G；峰值：12.44G

cpu：1个（571.17%）也就是会自动占用5-6个cpu

运行时间：00:26:15

而且diamond注明了，它的优势是处理>1M 的query，量越大速度越快。

diamond的简单用法：

diamond makedb --in nr.fa -d nr
diamond blastx -d nr -q test.merged.transcript.fasta -o test.matches.m8

但是diamond使用有限制，只能用于比对蛋白数据库。

以下是OrfPredictor推荐的参数设置：

To minimize the file size of BLASTX output for loading, the following parameters are recommended if the BLASTX in the 'NCBI-blastall' package is used: "-v 1 -b 1 -e 1e-5" (Note: we used version 2.2.19 - earlier or later versions may not work properly).

下面是详细的blastx帮助文档，以供查阅：

$ blastx -help
USAGE
  blastx [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
    [-subject_loc range] [-query input_file] [-out output_file]
    [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
    [-gapextend extend_penalty] [-qcov_hsp_perc float_value]
    [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-max_intron_length length] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-culling_limit int_value]
    [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
    [-window_size int_value] [-ungapped] [-lcase_masking] [-query_loc range]
    [-strand strand] [-parse_deflines] [-query_gencode int_value]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-line_length line_length] [-html]
    [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
    [-comp_based_stats compo] [-use_sw_tback] [-version]

DESCRIPTION
   Translated Query-Protein Subject BLAST 2.7.1+

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -version
   Print version number;  ignore other arguments

 *** Input query options
 -query 
   Input file name
   Default = `-'
 -query_loc 
   Location on the query sequence in 1-based offsets (Format: start-stop)
 -strand ', `minus', `plus'>
   Query strand(s) to search against database/subject
   Default = `both'
 -query_gencode 1-6, 9-16, 21-25>
   Genetic code to use to translate query (see user manual for details)
   Default = `1'

 *** General search options
 -task 'blastx' 'blastx-fast' >
   Task to execute
   Default = `blastx'
 -db 
   BLAST database name
    * Incompatible with:  subject, subject_loc
 -out 
   Output file name
   Default = `-'
 -evalue 
   Expectation value (E) threshold for saving hits
   Default = `10'
 -word_size =2>
   Word size for wordfinder algorithm
 -gapopen 
   Cost to open a gap
 -gapextend 
   Cost to extend a gap
 -max_intron_length =0>
   Length of the largest intron allowed in a translated nucleotide sequence
   when linking multiple distinct alignments
   Default = `0'
 -matrix 
   Scoring matrix name (normally BLOSUM62)
 -threshold =0>
   Minimum word score such that the word is added to the BLAST lookup table
 -comp_based_stats 
   Use composition-based statistics:
       D or d: default (equivalent to 2 )
       0 or F or f: No composition-based statistics
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
       2 or T or t : Composition-based score adjustment as in Bioinformatics
   21:902-911,
       2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics 21:902-911,
       2005, unconditionally
   Default = `2'

 *** BLAST-2-Sequences options
 -subject 
   Subject sequence(s) to search
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask
 -subject_loc 
   Location on the subject sequence in 1-based offsets (Format: start-stop)
    * Incompatible with:  db, gilist, seqidlist, negative_gilist,
   negative_seqidlist, db_soft_mask, db_hard_mask, remote

 *** Formatting options
 -outfmt 
   alignment view options:
     0 = Pairwise,
     1 = Query-anchored showing identities,
     2 = Query-anchored no identities,
     3 = Flat query-anchored showing identities,
     4 = Flat query-anchored no identities,
     5 = BLAST XML,
     6 = Tabular,
     7 = Tabular with comment lines,
     8 = Seqalign (Text ASN.1),
     9 = Seqalign (Binary ASN.1),
    10 = Comma-separated values,
    11 = BLAST archive (ASN.1),
    12 = Seqalign (JSON),
    13 = Multiple-file BLAST JSON,
    14 = Multiple-file BLAST XML2,
    15 = Single-file BLAST JSON,
    16 = Single-file BLAST XML2,
    18 = Organism Report

   Options 6, 7 and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
           qseqid means Query Seq-id
              qgi means Query GI
             qacc means Query accesion
          qaccver means Query accesion.version
             qlen means Query sequence length
           sseqid means Subject Seq-id
        sallseqid means All subject Seq-id(s), separated by a ';'
              sgi means Subject GI
           sallgi means All subject GIs
             sacc means Subject accession
          saccver means Subject accession.version
          sallacc means All subject accessions
             slen means Subject sequence length
           qstart means Start of alignment in query
             qend means End of alignment in query
           sstart means Start of alignment in subject
             send means End of alignment in subject
             qseq means Aligned part of query sequence
             sseq means Aligned part of subject sequence
           evalue means Expect value
         bitscore means Bit score
            score means Raw score
           length means Alignment length
           pident means Percentage of identical matches
           nident means Number of identical matches
         mismatch means Number of mismatches
         positive means Number of positive-scoring matches
          gapopen means Number of gap openings
             gaps means Total number of gaps
             ppos means Percentage of positive-scoring matches
           frames means Query and subject frames separated by a '/'
           qframe means Query frame
           sframe means Subject frame
             btop means Blast traceback operations (BTOP)
           staxid means Subject Taxonomy ID
         ssciname means Subject Scientific Name
         scomname means Subject Common Name
       sblastname means Subject Blast Name
        sskingdom means Subject Super Kingdom
          staxids means unique Subject Taxonomy ID(s), separated by a ';'
                (in numerical order)
        sscinames means unique Subject Scientific Name(s), separated by a ';'
        scomnames means unique Subject Common Name(s), separated by a ';'
       sblastnames means unique Subject Blast Name(s), separated by a ';'
                (in alphabetical order)
       sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
                (in alphabetical order)
           stitle means Subject Title
       salltitles means All Subject Title(s), separated by a '<>'
          sstrand means Subject Strand
            qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
           qcovus means Query Coverage Per Unique Subject (blastn only)
   When not provided, the default value is:
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   Default = `0'
 -show_gis
   Show NCBI GIs in deflines?
 -num_descriptions =0>
   Number of database sequences to show one-line descriptions for
   Not applicable for outfmt > 4
   Default = `500'
    * Incompatible with:  max_target_seqs
 -num_alignments =0>
   Number of database sequences to show alignments for
   Default = `250'
    * Incompatible with:  max_target_seqs
 -line_length =1>
   Line length for formatting alignments
   Not applicable for outfmt > 4
   Default = `60'
 -html
   Produce HTML output?

 *** Query filtering options
 -seg 
   Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or
   'no' to disable)
   Default = `12 2.2 2.5'
 -soft_masking 
   Apply filtering locations as soft masks
   Default = `false'
 -lcase_masking
   Use lower case filtering in query and subject sequence(s)?

 *** Restrict search or results
 -gilist 
   Restrict search of database to list of GI's
    * Incompatible with:  negative_gilist, seqidlist, negative_seqidlist,
   remote, subject, subject_loc
 -seqidlist 
   Restrict search of database to list of SeqId's
    * Incompatible with:  gilist, negative_gilist, negative_seqidlist, remote,
   subject, subject_loc
 -negative_gilist 
   Restrict search of database to everything except the listed GIs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
 -negative_seqidlist 
   Restrict search of database to everything except the listed SeqIDs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
 -entrez_query 
   Restrict search with the given Entrez query
    * Requires:  remote
 -db_soft_mask 
   Filtering algorithm ID to apply to the BLAST database as soft masking
    * Incompatible with:  db_hard_mask, subject, subject_loc
 -db_hard_mask 
   Filtering algorithm ID to apply to the BLAST database as hard masking
    * Incompatible with:  db_soft_mask, subject, subject_loc
 -qcov_hsp_perc 0..100>
   Percent query coverage per hsp
 -max_hsps =1>
   Set maximum number of HSPs per subject sequence to save for each query
 -culling_limit =0>
   If the query range of a hit is enveloped by that of at least this many
   higher-scoring hits, delete the hit
    * Incompatible with:  best_hit_overhang, best_hit_score_edge
 -best_hit_overhang 0 and <0.5)>
   Best Hit algorithm overhang value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -best_hit_score_edge 0 and <0.5)>
   Best Hit algorithm score edge value (recommended value: 0.1)
    * Incompatible with:  culling_limit
 -max_target_seqs =1>
   Maximum number of aligned sequences to keep
   Not applicable for outfmt <= 4
   Default = `500'
    * Incompatible with:  num_descriptions, num_alignments

 *** Statistical options
 -dbsize 
   Effective length of the database
 -searchsp =0>
   Effective length of the search space
 -sum_stats 
   Use sum statistics

 *** Search strategy options
 -import_search_strategy 
   Search strategy to use
    * Incompatible with:  export_search_strategy
 -export_search_strategy 
   File name to record the search strategy used
    * Incompatible with:  import_search_strategy

 *** Extension options
 -xdrop_ungap 
   X-dropoff value (in bits) for ungapped extensions
 -xdrop_gap 
   X-dropoff value (in bits) for preliminary gapped extensions
 -xdrop_gap_final 
   X-dropoff value (in bits) for final gapped alignment
 -window_size =0>
   Multiple hits window size, use 0 to specify 1-hit algorithm
 -ungapped
   Perform ungapped alignment only?

 *** Miscellaneous options
 -parse_deflines
   Should the query and subject defline(s) be parsed?
 -num_threads =1 and =<24)>
   Number of threads (CPUs) to use in the BLAST search
   Default = `1'
    * Incompatible with:  remote
 -remote
   Execute search remotely?
    * Incompatible with:  gilist, seqidlist, negative_gilist,
   negative_seqidlist, subject_loc, num_threads
 -use_sw_tback
   Compute locally optimal Smith-Waterman alignments?

以下是copy的详细英文教学：

1. Quick Start

Get all numbered files for a database with the same base name: Each of these files represents a subset (volume) of that database, and all of them are needed to reconstitute the database.
After extraction, there is no need to concatenate the resulting files:Call the database with the base name, for nr database files, use "-db nr". 这些数据库是已经预先进行过makeblastdb命令的，下载后可以直接使用
For easy download, use the update_blastdb.pl script from the blast+ package.
Incremental update is not available.

2. General Introduction

BLAST search pages under the Basic BLAST section of the NCBI BLAST home page(http://blast.ncbi.nlm.nih.gov/) use a standard set of BLAST databases for nucleotide, protein, and translated BLAST searches. These databases are made
available as compressed archives of pre-formatted form) and can be donwloaed from the /db directory of the BLAST ftp site (ftp://ftp.ncbi.nlm.nih.gov/blast/db/). The FASTA files reside under the /FASTA subdirectory.

The pre-formatted databases offer the following advantages:

Pre-formatting removes the need to run makeblastdb; 无需再运行建库命令行
Species-level taxonomy ids are included for each database entry;
Databases are broken into smaller-sized volumes and are therefore easier to download;
Sequences in FASTA format can be generated from the pre-formatted databases by using the blastdbcmd utility;可以从这些数据库文件中导出FASTA文件
A convenient script (update_blastdb.pl) is available in the blast+ package to download the pre-formatted databases. 可用该脚本升级数据库

Pre-formatted databases must be downloaded using the update_blastdb.pl script or via FTP in binary mode. Documentation for this script can be obtained by running the script without any arguments; Perl installation is required.

The compressed files downloaded must be inflated with gzip or other decompress tools. The BLAST database files can then be extracted out of the resulting tar file using the tar utility on Unix/Linux, or WinZip and StuffIt Expander on
Windows and Macintosh platforms, respectively. 下载的数据库为压缩包，要解压缩

Large databases are formatted in multiple one-gigabyte volumes, which are named using the basename.##.tar.gz convention. All volumes with the same base name are required. An alias file is provided to tie individual volumes together so that the database can be called using the base name (without the .nal or .pal extension). For example, to call the est database, simply use "-db est" option in the command line (without the quotes). 大的数据库通常分为多个压缩包，例如nr库有11个压缩包。所有的相关压缩包都要下载，解压。解压缩会生成对应的库文件，同时生成一个nr.pal文件。检索nr库时输入-d nr 即可。

Additional BLAST databases that are not provided in pre-formatted formats may be available in the FASTA subdirectory. For other genomic BLAST databases, please check the genomes ftp directory at: ftp://ftp.ncbi.nlm.nih.gov/genomes/

3. Contents of the /blast/db/ directory

The pre-formatted BLAST databases are archived in this directory. The names of these databases and their contents are listed below.

+-----------------------------+------------------------------------------------+
 File Name        #  Content Description   
+-----------------------------+------------------------------------------------+
16SMicrobial.tar.gz          #  Bacterial and Archaeal 16S rRNA sequences from BioProjects 33175 and 33117
FASTA/        #  Subdirectory for FASTA formatted sequences
README        #  README for this subdirectory (this file)
Representative_Genomes.*tar.gz        #  Representative bacterial/archaeal genomes database
cdd_delta.tar.gz          #  Conserved Domain Database sequences for use with stand alone deltablast
cloud/          #  Subdirectory of databases for BLAST AMI; see http://1.usa.gov/TJAnEt
env_nr.*tar.gz        #  Protein sequences for metagenomes
env_nt.*tar.gz        #  Nucleotide sequences for metagenomes
est.tar.gz        #  This file requires est_human.*.tar.gz, est_mouse.*.tar.gz, and est_others.*.tar.gz files to function. It contains the est.nal alias so that searches against est (-db est) will include est_human, est_mouse and est_others. 
est_human.*.tar.gz        #  Human subset of the est database from the est division of GenBank, EMBL and DDBJ.
est_mouse.*.tar.gz        #  Mouse subset of the est databasae
est_others.*.tar.gz           #  Non-human and non-mouse subset of the est database
gss.*tar.gz           #  Sequences from the GSS division of GenBank, EMBL, and DDBJ
htgs.*tar.gz          #  Sequences from the HTG division of GenBank, EMBL,and DDBJ
human_genomic.*tar.gz         #  Human RefSeq (NC_) chromosome records with gap adjusted concatenated NT_ contigs
nr.*tar.gz        #  Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq
nt.*tar.gz        #  Partially non-redundant nucleotide sequences from all traditional divisions of GenBank, EMBL, and DDBJ excluding GSS,STS, PAT, EST, HTG, and WGS.
other_genomic.*tar.gz         #  RefSeq chromosome records (NC_) for non-human organisms
pataa.*tar.gz         #  Patent protein sequences
patnt.*tar.gz         #  Patent nucleotide sequences. Both patent databases are directly from the USPTO, or from the EPO/JPO via EMBL/DDBJ
pdbaa.*tar.gz         #  Sequences for the protein structure from the Protein Data Bank
pdbnt.*tar.gz         #  Sequences for the nucleotide structure from the Protein Data Bank. They are NOT the protein coding sequences for the corresponding pdbaa entries.
refseq_genomic.*tar.gz        #  NCBI genomic reference sequences
refseq_protein.*tar.gz        #  NCBI protein reference sequences
refseq_rna.*tar.gz        #  NCBI Transcript reference sequences
sts.*tar.gz           #  Sequences from the STS division of GenBank, EMBL,and DDBJ
swissprot.tar.gz          #  Swiss-Prot sequence database (last major update)
taxdb.tar.gz          #  Additional taxonomy information for the databases listed here providing common and scientific names
tsa_nt.*tar.gz        #  Sequences from the TSA division of GenBank, EMBL,and DDBJ
vector.tar.gz         #  Vector sequences from 2010, see Note 2 in section 4.
wgs.*tar.gz           #  Sequences from Whole Genome Shotgun assemblies
+-----------------------------+------------------------------------------------+

4. Contents of the /blast/db/FASTA directory

This directory contains FASTA formatted sequence files. The file names and database contents are listed below. These files must be unpacked and processed through blastdbcmd before they can be used by the BLAST programs.

+-----------------------+-----------------------------------------------------+
File Name          #  Content Description         # 
+-----------------------+-----------------------------------------------------+
alu.a.gz        #  translation of alu.n repeats
alu.n.gz        #  alu repeat elements (from 2003)
drosoph.aa.gz           #  CDS translations from drosophila.nt  
drosoph.nt.gz           #  genomic sequences for drosophila (from 2003)
env_nr.gz*          #  Protein sequences for metagenomes, taxid 408169
env_nt.gz*          #  Nucleotide sequences for metagenomes, taxid 408169
est_human.gz*           #  human subset of the est database (see Note 1)
est_mouse.gz*           #  mouse subset of the est database
est_others.gz*          #  non-human and non-mouse subset of the est database
gss.gz*         #  sequences from the GSS division of GenBank, EMBL,   and DDBJ
htgs.gz*        #  sequences from the HTG division of GenBank, EMBL,   and DDBJ 
human_genomic.gz*           #  human RefSeq (NC_) chromosome records  with gap adjusted concatenated NT_ contigs 
igSeqNt.gz          #  human and mouse immunoglobulin variable region   nucleotide sequences
igSeqProt.gz        #  human and mouse immunoglobulin variable region   protein sequences
mito.aa.gz          #  CDS translations of complete mitochondrial genomes
mito.nt.gz          #  complete mitochondrial genomes
nr.gz*          #  non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and RefSeq
nt.gz*          #  nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ; excluding bulk divisions (gss, sts, pat, est, htg) and wgs entries. Partially non-redundant.
other_genomic.gz*           #  RefSeq chromosome records (NC_) for organisms other than human
pataa.gz*           #  patent protein sequences
patnt.gz*           #  patent nucleotide sequences. Both patent sequence   files are from the USPTO, or EPO/JPO via EMBL/DDBJ
pdbaa.gz*           #  protein sequences from pdb protein structures
pdbnt.gz*           #  nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries.
sts.gz*         #  database for sequence tag site entries 
swissprot.gz*           #  swiss-prot database (last major release)
vector.gz           #  vector sequences from 2010. (See Note 2)
wgs.gz*         #  whole genome shotgun genome assemblies
yeast.aa.gz         #  protein translations from yeast.nt
yeast.nt.gz         #  yeast genomes (from 2003)
+-----------------------+---------------------------------------------------+

NOTE:
(1) NCBI does not provide the complete est database in FASTA format. One needs to get all three subsets (est_human, est_mouse, and est_others and concatenate them into the complete est fasta database).
(2) For screening for vector contamination, use the UniVec database: ftp://ftp.ncbi.nlm.nih.gov/pub/UniVec/
* marked files have pre-formatted counterparts.

5. Database updates

The BLAST databases are updated regularly. There is no established incremental pdate scheme. We recommend downloading the complete databases regularly to keep their content current.

6. Non-redundant defline syntax

The non-redundant databases are nr, nt (partially) and pataa. In them, identical sequences are merged into one entry. To be merged two sequences must have identical lengths and every residue at every position must be the
same. The FASTA deflines for the different entries that belong to one record are separated by control-A characters invisible to most programs. In the example below both entries gi|1469284 and gi|1477453 have the same sequence, in every respect:

>gi|3023276|sp|Q57293|AFUC_ACTPL   Ferric transport ATP-binding protein afuC 
^Agi|1469284|gb|AAB05030.1|   afuC gene product ^Agi|1477453|gb|AAB17216.1|   
afuC [Actinobacillus pleuropneumoniae]
MNNDFLVLKNITKSFGKATVIDNLDLVIKRGTMVTLLGPSGCGKTTVLRLVAGLENPTSGQIFIDGEDVT
KSSIQNRDICIVFQSYALFPHMSIGDNVGYGLRMQGVSNEERKQRVKEALELVDLAGFADRFVDQISGGQ
QQRVALARALVLKPKVLILDEPLSNLDANLRRSMREKIRELQQRLGITSLYVTHDQTEAFAVSDEVIVMN
KGTIMQKARQKIFIYDRILYSLRNFMGESTICDGNLNQGTVSIGDYRFPLHNAADFSVADGACLVGVRPE
AIRLTATGETSQRCQIKSAVYMGNHWEIVANWNGKDVLINANPDQFDPDATKAFIHFTEQGIFLLNKE

The syntax of sequence header lines used by the NCBI BLAST server depends on the database from which each sequence was obtained. The table at http://www.ncbi.nlm.nih.gov/toolkit/doc/book/ch_demo/?report=objectonly#ch_demo.T5 lists the supported FASTA identifiers. 有些BLAST数据库没有提供预先建库的文件，这些数据库可以从FASTA文件夹里下载

For databases whose entries are not from official NCBI sequence databases, such as Trace database, the gnl| convention is used. For custom databases, this convention should be followed and the id for each sequence must be
unique, if one would like to take the advantage of indexed database, which enables specific sequence retrieval using blastdbcmd program included in the blast executable package. One should refer to documents distributed in the standalone BLAST package for more details.

7. Formatting a FASTA file into a BLASTable database

FASTA files need to be formatted with makeblastdb before they can be used in local blast search. For those from NCBI, the following makeblastdb commands are recommended:

For nucleotide fasta file:

makeblastdb -in input_db -dbtype nucl -parse_seqids

For protein fasta file:

makeblastdb -in input_db -dbtype prot -parse_seqids

你可能感兴趣的:(构建NCBI本地BLAST数据库 (NR NT等) | blastx/diamond使用方法 | blast构建索引 | makeblastdb...)

髋关节控制器 - OpenExo 強云 OpenExo OpenExo
髋关节控制器FranksCollinsHipController简介Parameters补充说明ProportionalHipMomentController简介Parameters补充说明FranksCollinsHipController简介FranksCollinsHipController通过一系列分段spline曲线（包括伸展和屈曲两个阶段），为髋关节提供助力。它基于步态百分比进行控制，
闷的人 112233D
躺了一会儿终于缓过来了，晚上和小伙伴们一起吃完饭走回来，离宿舍大概还有三四百米的时候，特别想上厕所，后来就加紧步伐走，越走身体的感觉就越明显，最后感觉都快要憋不住了，后来等电梯和坐电梯的时候，整个人浑身冒汗。也就几十秒的功夫，整个人感觉真的是快要憋不住了，后来下了电梯第一时间就冲进了厕所。后来发现自己浑身都出汗了，那种湿透湿透的汉，整个人也感觉特别虚，真的就是那种突然没有能量的感觉。晚上四个人一起
数字孪生工厂 Frontop_2002
一、前言近几年“数字孪生”“三维可视化”等新一代技术热词频发，以及结合近期的“元宇宙”概念的大火，各巨头纷纷入局元宇宙，也顺道带火了一波数字孪生。再加之《“十四五”数字经济发展规划》再次重点提及“数字孪生”这一技术，也使得更多的个人以及企业开始关注到这一项技术。另外在规划中也提到重点发展的就是智能制造。其实现在很多智能制造行业的巨头在早些年就已经开始重视以及布局数字孪生工厂的建设。相信有很多制造类
微服务架构监控：四大黄金指标解析 AI云原生与云计算技术学院架构微服务云原生 ai
微服务架构监控：四大黄金指标解析关键词：微服务架构、监控体系、四大黄金指标、SRE、延迟、流量、错误、饱和度摘要：本文深入解析微服务架构监控的核心方法论——四大黄金指标（延迟、流量、错误、饱和度），基于GoogleSRE最佳实践，结合具体技术实现与数学模型，阐述指标设计原理、数据采集方法、可视化实践及异常诊断逻辑。通过完整的项目实战案例，演示如何构建端到端监控体系，帮助技术团队建立可观测性基线，提
【Web APIs】JavaScript 节点操作 ③ ( 子节点操作 | firstChild 属性 | firstElementChild 属性 | children[0] 属性 ) 韩曙亮 JavaScript 前端 javascript 开发语言 Web APIs 节点操作子节点操作 js
文章目录一、JavaScript子节点操作1、获取子节点需求2、firstChild和lastChild属性(不推荐-基于所有类型节点)3、firstElementChild和lastElementChild属性(不推荐-兼容性问题)4、children[0]和children[element.children.length-1]属性(推荐-实际用法)5、完整代码示例在【WebAPIs】JavaS
Xss漏洞总结
一、XSS漏洞简介XSS（Cross-SiteScripting，跨站脚本攻击）是一种常见的Web前端安全漏洞，其主要危害对象是网站的访问用户。攻击者通过在网页中注入恶意脚本代码（如JavaScript、Flash等），诱使用户访问后在其浏览器中执行这些代码，从而达到窃取数据、控制会话等攻击目的。二、XSS漏洞原理XSS的根本原因在于服务器未对用户提交的输入内容进行严格过滤和转义处理，导致用户提供
YFCMF-TP6 改后台模块路径方法 huluang php
1、创建表单，表民phpthinkcrud-tbeian_dignji-u12、app路径controllerlang/zh-cnmodelvalidateview以及public/assets/js/backend/找到beian_dignji每个路径下创建需要增加的文件夹名称，如beian，并将以上目录的文件或文件夹剪切到当前文件及下3、编辑表fa_auth_rule规则名称主要为105，上级
SAP Word 模板与 XML 数据流合并过程深度剖析——以表格结构为例汪子熙 ABAP 百科全书 word xml CRM ABAP NetWeaver SAP
在CRMWebClientUI的Office集成功能里，Word模板与XML数据流的动态合并，是合同、报价单等文档自动生成的技术核心。本文结合SAP官方示例代码与OpenXML规范，从模板绑定、数据预处理、运行时递归填充到实际排错技巧，全景展示表格结构合并的幕后细节，并给出一段源自真实项目的实战案例，帮助读者迅速掌握这一看似神秘的“魔术”。(document567.rssing.com,docum
2023-11-01 倪俊卿
侯庄的传说（故事连载）且说那年外地有个江洋大盗听说张员外家很是有钱，便来到阳翟县城，预谋对张员外家进行偷窃。有天夜里，月黑星稀，乌云密布，淅淅沥沥的濛濛雨下个不停，这大盗趁此月黑雨夜，在三更时分拨门潜入张员外宅院，摸入员外女儿房中，把屋中的珍珠玛瑙、金银首饰和银两物件等尽数打成包裹，正准备离开时，张小姐梦中惊醒，点亮灯查看。盗贼一见张员外的女儿长得如花似玉，便心生歹意，图谋不轨。如饿虎一般扑向张小
HMAC API 接口签名 Message安全验证潘多编程 java高级哈希算法算法
什么是HMAC？HMAC全称（Hash-basedMessageAuthenticationCode，即基于Hash的消息的认证码）。-基本过程为对某个消息，利用提前共享的对称密钥和Hash算法进行加密处理，得到HMAC值。-该HMAC值提供方可以证明自己拥有共享密钥的对称密钥，并且消息自身可以利用HMAC确保未经篡改。为什么需要API接口签名？对外开放的API接口都会面临一些安全问题，例如伪装攻
现在的花钱市场怎么样？荒唐忆梦
太平天国背圣宝大花钱估价NTD8,600,000-8,600,000成交价RMB2,847,460专场钱币拍卖时间2018-06-28拍卖公司中正拍卖有限公司拍卖会2018台湾中正春季艺术品拍卖太平天国”背竖“圣宝”，阔缘楷书，直径48.5mm，少见，极美品贤孝传家花钱估价NTD8,600,000-8,600,000成交价RMB2,722,500专场钱币拍卖时间2018-01-26拍卖公司中正拍卖
AE电脑中文版软件下载及安装教程安装包百度网盘地址免费破解版一键安装激活方法心墙
提示：以下是安装教程，安装包资源等放在下面，请往下翻。其他版本安装方法类似。安装教程：1.鼠标右击【Ae2024(64bit)】压缩包（win11及以上系统需先点击“显示更多选项”）【解压到Ae2024(64bit)】。2.打开解压后的文件夹，鼠标右击【Setup】选择【以管理员身份运行】。3.点击【文件夹图标】，点击【更改位置】。4.①双击打开需要将软件安装的磁盘（如：D盘）②新建一个【Ae】文
Verilator的src目录(腾讯元宝) dadaobusi verilator
src/目录是Verilator的核心源代码所在目录，包含了实现Verilator主要功能的C++源文件（.cpp文件）以及部分头文件（.h文件）。这些文件共同构成了Verilator的仿真引擎、信号管理、波形生成等核心功能。由于Verilator的代码规模较大且功能复杂，src/目录下的文件通常按照功能模块进行组织，但并没有像lib/目录那样明确地划分为多个子目录。因此，我们需要逐个分析src/
AI 驱动自动化运维平台架构与实现大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 算法机器学习人工智能决策树大数据
摘要：随着云计算、容器化和大规模分布式系统的普及，传统人工运维方法已难以满足现代IT环境中海量指标、日志和拓扑关系的实时分析与故障响应需求。AI驱动的自动化运维（AIOps）平台通过融合机器学习、深度学习、图分析以及强化学习等多学科技术，实现对海量运维数据的智能感知、预测、诊断和自动化修复。本文深入探讨AI驱动自动化运维平台的整体架构设计与核心技术实现，涵盖数据采集与预处理、AI引擎设计、自动化执
果冻宝盒新人邀请码千万别乱填写，否则会后悔！小小编007
如果你是果冻宝盒老用户请直接忽略此篇文章。如果你还没有注册下载果冻宝盒，务必要往下看，因为填写一个好的邀请码，可以让你直接升级到果冻宝盒最高等级，享受最高返利权限。众所周知，果冻宝盒是一个综合导购返利软件，覆盖了淘宝，京东，拼多多等各大电商平台的商品优惠券和返利服务。还有话费充值，汽车加油，电影票，外卖红包，会员充值等各种低折扣生活权益。果冻宝盒相比其它同行，返利更高，模式更简单，运营成熟稳定。果
第八次作业
一、备份与恢复作业：创库,建表：CREATEDATABASEbooksDB;usebooksDB;CREATETABLEbooks(bk_idINTNOTNULLPRIMARYKEY,bk_titleVARCHAR(50)NOTNULL,copyrightYEARNOTNULL);CREATETABLEauthors(auth_idINTNOTNULLPRIMARYKEY,auth_nameVAR
凉席翘起来怎么处理凉席为什么会翘起来日常购物技巧呀
凉席翘起来就趁还有收缩，压床底直到平展。或者把它用拉平镇固定住再用热开水烫洗一边，等自然冷却了就好了。新买到的凉席，要先用80℃左右温水调些清洁剂，用毛巾把正面仔细擦一遍，然后再用温热清水擦拭几遍，再风干就可以使用了。1.高省APP佣金更高，模式更好，终端用户不流失。【高省】是一个自用省钱佣金高，分享推广赚钱多的平台，百度有几百万篇报道，也期待你的加入。古楼导师高省邀请码518518，注册送2皇冠
@DateTimeFormat、@JsonFormat、@JSONField区别及用法开往1982 java 前端时间 datetime
推荐写法：@JSONField(format="yyyy-MM-ddHH:mm:ss")@JsonFormat(timezone="GMT+8",pattern="yyyy-MM-ddHH:mm:ss")@DateTimeFormat(pattern="yyyy-MM-ddHH:mm:ss")privateLocalDatebirthday;前端读取数据库日期字段时使用@JsonFormat和@J
2023-02-10 英雄用武
乔尔杰维奇是掉到了坑里了，因为男篮基本功差的超乎想象，是太差了，任重道远啊，有一出耳熟能详的传统戏剧是“乔老爷上轿”，并有歇后语“乔老爷上轿，坐车”，但是对于乔尔杰维奇来说，是上任中国男篮主帅、上轿容易，中国男篮上进上升难啊！因为乔尔杰维奇作为世界名帅，是一不小心掉到了一个大坑里去了。因为中国男篮困难、难度等难以想象，是“难”的超出想象了。准备世预赛窗口期比赛的男篮开启了集训，这也是乔老爷上任后的
贵妇膏和素颜霜有什么区别广州时尚王子
贵妇膏和素颜霜在成分、功效、使用范围和使用方法上存在显著区别。·成分差异:贵妇膏通常含有高品质的珍珠粉、胶原蛋白、植物提取物等成分，旨在滋养肌肤和提亮肤色。素颜霜则以二氧化钛、维生素C衍生物为主要成分，强调自然、清新的成分，同时具有一定的防晒效果。·功效不同:贵妇膏主要针对肌肤的滋养和修护，能够提亮肤色，改善肌肤暗沉、干燥等问题。素颜霜则以打造自然、清新的妆容为主，同时具有一定的保湿和防晒效果。·
中秋礼物怎么买划算？网购月饼领优惠券？古楼
在中秋节购买礼物时，想要买得划算，可以考虑以下几个方面，并结合网购月饼领优惠券的策略来进一步节省开支。一、选择合适的购物时机利用促销活动：中秋节前夕，各大电商平台和实体店都会推出各种促销活动，如满减、折扣、秒杀等。关注这些活动，可以在优惠力度最大的时候下单购买。提前规划：提前了解心仪商品的价格走势，避免在高峰期购买时价格虚高。同时，提前下单还能确保商品能够及时送达，避免节日当天物流拥堵。二、网购月
【python】 www_hhhhhhh python 面试职场和发展
1.技术面试题（1）TCP与UDP的区别是什么？答：TCP（传输控制协议）和UDP（用户数据报协议）是两种常见的传输层协议，主要区别在于连接方式和可靠性。TCP是面向连接的协议，传输数据前需建立连接，通过三次握手确保连接可靠，传输过程中有确认、重传和顺序控制机制，保证数据完整、按序到达，适用于网页浏览、文件传输等对可靠性要求高的场景。UDP是无连接的协议，无需建立连接即可发送数据，不保证数据可靠传
快团团怎么帮卖赚佣金，谈谈我的经验糖葫芦很甜
“快团团”作为一款基于微信生态的社群团购工具，凭借其便捷的操作、丰富的商品资源和高效的供应链体系，吸引了大量用户加入帮卖行列，实现了边分享边赚钱的美好愿景。招合作伙伴↓微信在文章底部。要想在快团团上成功帮卖赚佣金，必须对其平台有深入的了解。快团团依托于微信生态，通过团长（即帮卖者）在微信群、朋友圈等社交平台分享商品链接，引导消费者下单购买。每完成一笔交易，团长即可获得一定比例的佣金作为回报。这种“
Python函数的返回值
1.返回值定义及案例：2.返回值与print的区别：print仅仅是打印在控制台，而return则是将return后面的部分作为返回值作为函数的输出，可以用变量接走，继续使用该返回值做其它事。3.保存函数的返回值如果一个函数return返回了一个数据，那么想要用这个数据，那么就需要保存.#定义函数defadd2num(a,b): returna+b#调用函数，顺便保存函数的返回值result=
APT32F1732RBT8爱普特微电子 32位MCU国产芯片智能家居/工业控制首选深圳市尚想信息技术有限公司 32位MCU 国产芯片爱普特微电子智能家居工业控制
APT32F1732RBT8爱普特微电子，32位MCU国产芯片一、产品简介APT32F1732RBT8是爱普特微电子（APT）推出的高性能32位ARMCortex-M0内核MCU，主频高达48MHz，内置64KBFlash+8KBRAM，专为智能家居、工业控制、消费电子等领域打造。国产自主可控，性能比肩国际大厂，价格更亲民！二、核心功能与优势强劲计算性能48MHzCortex-M0内核，1.25D
从小白到月入5k+‼️普通人可以靠副业翻身氧惠_飞智666999
接触互联网开始，就陆陆续续开始做兼职，各种各样的兼职做了不下二十种，从最开始没有门路的新手，到逐渐上手，成为兼职达人。直接上干货，具体渠道和操作方法。一、自媒体号这绝对是时下最热门的兼职之一，大多数人以为自媒体号很难，很复杂，恰巧相反，这是最简单的兼职。例如大鱼号、百家号、头条号等，门槛非常低，渠道正规，都是互联网巨头百度、阿里、字节跳动等旗下的正规兼职。方法也很简单，只需要手机或者电脑，注册一个
Python 函数返回值落花雨时 Python基础
#返回值，返回值就是函数执行以后返回的结果#可以通过return来指定函数的返回值#可以之间使用函数的返回值，也可以通过一个变量来接收函数的返回值defsum(*nums):#定义一个变量，来保存结果result=0#遍历元组，并将元组中的数进行累加forninnums:result+=nprint(result)#sum(123,456,789)#return后边跟什么值，函数就会返回什么值#r
常用 Flutter 命令大全：从开发到发布全流程总结 Bryce李小白 flutter
常用Flutter命令大全：从开发到发布全流程总结Flutter命令行工具是开发者日常工作中不可或缺的利器，涵盖了环境配置、项目管理、调试运行、构建发布等全流程操作。本文整理了开发中最常用的Flutter命令，帮助开发者提高工作效率。一、环境与配置相关命令这类命令主要用于检查和管理Flutter开发环境，确保工具链正常工作。命令功能描述flutter--version查看当前Flutter版本及D
Flutter基础（前端教程①⑨-margin-padding） aaiier Flutter 前端
比喻：把框的盒子把Container想象成一个带边框的盒子：margin是盒子外面的空白（盒子与周围其他东西的距离）。padding是盒子里面的空白（盒子边框与内部内容的距离）。代码示例（带边框更直观）：Container(//盒子外部的空白（与其他组件的距离）margin:EdgeInsets.all(20),//盒子内部的空白（边框与文本的距离）padding:EdgeInsets.all(1
今日头条极速版邀请码今日头条极速版2023新版邀请码凌风导师
2023今日头条极速版是一款全新的版本，在这里用户们可以查看每日最新的头条，让你可以第一时间了解天下大事，引领潮流今日头条极速版邀请码：1168892484返点已开到最高！填错不负责，切记填对，同时为2亿2千万用户个性化推荐约10亿条新闻资讯，聚合新浪、百度、凤凰等各大门户的全平台讯息，涵盖热点、科技、财经、社会、国际今日头条极速版邀请码：1168892484返点已开到最高！填错不负责，切记填对、
Java常用排序算法/程序员必须掌握的8大排序算法 cugfy java
分类： 1）插入排序（直接插入排序、希尔排序） 2）交换排序（冒泡排序、快速排序） 3）选择排序（直接选择排序、堆排序） 4）归并排序 5）分配排序（基数排序）所需辅助空间最多：归并排序所需辅助空间最少：堆排序平均速度最快：快速排序不稳定：快速排序，希尔排序，堆排序。先来看看8种排序之间的关系： 1.直接插入排序（1
【Spark102】Spark存储模块BlockManager剖析 bit1129 manager
Spark围绕着BlockManager构建了存储模块，包括RDD，Shuffle，Broadcast的存储都使用了BlockManager。而BlockManager在实现上是一个针对每个应用的Master/Executor结构，即Driver上BlockManager充当了Master角色，而各个Slave上(具体到应用范围，就是Executor)的BlockManager充当了Slave角色
linux 查看端口被占用情况详解 daizj linux 端口占用 netstat lsof
经常在启动一个程序会碰到端口被占用，这里讲一下怎么查看端口是否被占用，及哪个程序占用，怎么Kill掉已占用端口的程序 1、lsof -i:port port为端口号 [root@slave /data/spark-1.4.0-bin-cdh4]# lsof -i:8080 COMMAND PID USER FD TY
Hosts文件使用周凡杨 hosts locahost
一切都要从localhost说起，经常在tomcat容器起动后，访问页面时输入http://localhost:8088/index.jsp，大家都知道localhost代表本机地址，如果本机IP是10.10.134.21，那就相当于http://10.10.134.21:8088/index.jsp，有时候也会看到http: 127.0.0.1:
java excel工具 g21121 Java excel
直接上代码，一看就懂，利用的是jxl： import java.io.File; import java.io.IOException; import jxl.Cell; import jxl.Sheet; import jxl.Workbook; import jxl.read.biff.BiffException; import jxl.write.Label; import
web报表工具finereport常用函数的用法总结（数组函数）老A不折腾 finereport web报表函数总结
ADD2ARRAY ADDARRAY(array,insertArray, start):在数组第start个位置插入insertArray中的所有元素，再返回该数组。示例： ADDARRAY([3,4, 1, 5, 7], [23, 43, 22], 3)返回[3, 4, 23, 43, 22, 1, 5, 7]. ADDARRAY([3,4, 1, 5, 7], "测试&q
游戏服务器网络带宽负载计算墙头上一根草服务器
家庭所安装的4M，8M宽带。其中M是指，Mbits/S 其中要提前说明的是： 8bits = 1Byte 即8位等于1字节。我们硬盘大小50G。意思是50*1024M字节，约为 50000多字节。但是网宽是以“位”为单位的，所以，8Mbits就是1M字节。是容积体积的单位。 8Mbits/s后面的S是秒。8Mbits/s意思是每秒8M位，即每秒1M字节。我是在计算我们网络流量时想到的
我的spring学习笔记2-IoC（反向控制依赖注入） aijuans Spring 3 系列
IoC（反向控制依赖注入）这是Spring提出来了，这也是Spring一大特色。这里我不用多说，我们看Spring教程就可以了解。当然我们不用Spring也可以用IoC，下面我将介绍不用Spring的IoC。 IoC不是框架，她是java的技术，如今大多数轻量级的容器都会用到IoC技术。这里我就用一个例子来说明：如：程序中有 Mysql.calss 、Oracle.class 、SqlSe
高性能mysql 之选择存储引擎(一) annan211 mysql InnoDB MySQL引擎存储引擎
1 没有特殊情况，应尽可能使用InnoDB存储引擎。原因：InnoDB 和 MYIsAM 是mysql 最常用、使用最普遍的存储引擎。其中InnoDB是最重要、最广泛的存储引擎。她被设计用来处理大量的短期事务。短期事务大部分情况下是正常提交的，很少有回滚的情况。InnoDB的性能和自动崩溃恢复特性使得她在非事务型存储的需求中也非常流行，除非有非常
UDP网络编程百合不是茶 UDP编程局域网组播
UDP是基于无连接的,不可靠的传输与TCP/IP相反 UDP实现私聊,发送方式客户端,接受方式服务器 package netUDP_sc; import java.net.DatagramPacket; import java.net.DatagramSocket; import java.net.Ine
JQuery对象的val()方法执行结果分析 bijian1013 JavaScript js jquery
JavaScript中，如果id对应的标签不存在（同理JAVA中，如果对象不存在），则调用它的方法会报错或抛异常。在实际开发中，发现JQuery在id对应的标签不存在时，调其val()方法不会报错，结果是undefined。
http请求测试实例（采用json-lib解析） bijian1013 json http
由于fastjson只支持JDK1.5版本，因些对于JDK1.4的项目，可以采用json-lib来解析JSON数据。如下是http请求的另外一种写法，仅供参考。 package com; import java.util.HashMap; import java.util.Map; import
【RPC框架Hessian四】Hessian与Spring集成 bit1129 hessian
在【RPC框架Hessian二】Hessian 对象序列化和反序列化一文中介绍了基于Hessian的RPC服务的实现步骤，在那里使用Hessian提供的API完成基于Hessian的RPC服务开发和客户端调用，本文使用Spring对Hessian的集成来实现Hessian的RPC调用。定义模型、接口和服务器端代码 |---Model &nb
【Mahout三】基于Mahout CBayes算法的20newsgroup流程分析 bit1129 Mahout
1.Mahout环境搭建 1.下载Mahout http://mirror.bit.edu.cn/apache/mahout/0.10.0/mahout-distribution-0.10.0.tar.gz 2.解压Mahout 3. 配置环境变量 vim /etc/profile export HADOOP_HOME=/home
nginx负载tomcat遇非80时的转发问题 ronin47
　　nginx负载后端容器是tomcat（其它容器如WAS,JBOSS暂没发现这个问题）非８０端口，遇到跳转异常问题。解决的思路是：$host:port 详细如下：　　该问题是最先发现的，由于之前对nginx不是特别的熟悉所以该问题是个入门级别的： ? 1 2 3 4 5
java-17-在一个字符串中找到第一个只出现一次的字符 bylijinnan java
public class FirstShowOnlyOnceElement { /**Q17.在一个字符串中找到第一个只出现一次的字符。如输入abaccdeff，则输出b * 1.int[] count:count[i]表示i对应字符出现的次数 * 2.将26个英文字母映射：a-z <--> 0-25 * 3.假设全部字母都是小写 */ pu
mongoDB 复制集开窍的石头 mongodb
mongo的复制集就像mysql的主从数据库，当你往其中的主复制集(primary)写数据的时候，副复制集(secondary)会自动同步主复制集(Primary)的数据,当主复制集挂掉以后其中的一个副复制集会自动成为主复制集。提供服务器的可用性。和防止当机问题 mo
[宇宙与天文]宇宙时代的经济学 comsci 经济
宇宙尺度的交通工具一般都体型巨大，造价高昂。。。。。在宇宙中进行航行，近程采用反作用力类型的发动机，需要消耗少量矿石燃料，中远程航行要采用量子或者聚变反应堆发动机，进行超空间跳跃，要消耗大量高纯度水晶体能源以目前地球上国家的经济发展水平来讲，
Git忽略文件 Cwind git
有很多文件不必使用git管理。例如Eclipse或其他IDE生成的项目文件，编译生成的各种目标或临时文件等。使用git status时，会在Untracked files里面看到这些文件列表，在一次需要添加的文件比较多时（使用git add . / git add -u），会把这些所有的未跟踪文件添加进索引。 ==== ==== ==== 一些牢骚
MySQL连接数据库的必须配置 dashuaifu mysql 连接数据库配置
MySQL连接数据库的必须配置 1.driverClass：com.mysql.jdbc.Driver 2.jdbcUrl：jdbc:mysql://localhost:3306/dbname 3.user：username 4.password：password 其中1是驱动名；2是url，这里的‘dbna
一生要养成的60个习惯 dcj3sjt126com 习惯
一生要养成的60个习惯第1篇让你更受大家欢迎的习惯 1 守时，不准时赴约,让别人等,会失去很多机会。如何做到： ①该起床时就起床， ②养成任何事情都提前15分钟的习惯。 ③带本可以随时阅读的书，如果早了就拿出来读读。 ④有条理，生活没条理最容易耽误时间。 ⑤提前计划：将重要和不重要的事情岔开。 ⑥今天就准备好明天要穿的衣服。 ⑦按时睡觉，这会让按时起床更容易。 2 注重
[介绍]Yii 是什么 dcj3sjt126com PHP yii2
Yii 是一个高性能，基于组件的 PHP 框架，用于快速开发现代 Web 应用程序。名字 Yii （读作易）在中文里有“极致简单与不断演变”两重含义，也可看作 Yes It Is! 的缩写。 Yii 最适合做什么？ Yii 是一个通用的 Web 编程框架，即可以用于开发各种用 PHP 构建的 Web 应用。因为基于组件的框架结构和设计精巧的缓存支持，它特别适合开发大型应
Linux SSH常用总结 eksliang linux ssh SSHD
转载请出自出处：http://eksliang.iteye.com/blog/2186931 一、连接到远程主机格式： ssh name@remoteserver 例如： ssh [email protected] 二、连接到远程主机指定的端口格式： ssh name@remoteserver -p 22 例如： ssh i
快速上传头像到服务端工具类FaceUtil gundumw100 android
快速迭代用 import java.io.DataOutputStream; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOExceptio
jQuery入门之怎么使用 ini JavaScript html jquery Web css
jQuery的强大我何问起（个人主页：hovertree.com）就不用多说了，那么怎么使用jQuery呢？首先，下载jquery。下载地址：http://hovertree.com/hvtart/bjae/b8627323101a4994.htm，一个是压缩版本，一个是未压缩版本，如果在开发测试阶段，可以使用未压缩版本，实际应用一般使用压缩版本(min)。然后就在页面上引用。
带filter的hbase查询优化 kane_xie 查询优化 hbase RandomRowFilter
问题描述 hbase scan数据缓慢，server端出现LeaseException。hbase写入缓慢。问题原因直接原因是： hbase client端每次和regionserver交互的时候，都会在服务器端生成一个Lease,Lease的有效期由参数hbase.regionserver.lease.period确定。如果hbase scan需
java设计模式-单例模式 men4661273 java 单例枚举反射 IOC
单例模式1，饿汉模式 //饿汉式单例类.在类初始化时，已经自行实例化 public class Singleton1 { //私有的默认构造函数 private Singleton1() {} //已经自行实例化 private static final Singleton1 singl
mongodb 查询某一天所有信息的3种方法，根据日期查询 qiaolevip 每天进步一点点学习永无止境 mongodb 纵观千象
// mongodb的查询真让人难以琢磨，就查询单天信息，都需要花费一番功夫才行。 // 第一种方式： coll.aggregate([ {$project:{sendDate: {$substr: ['$sendTime', 0, 10]}, sendTime: 1, content:1}}, {$match:{sendDate: '2015-
二维数组转换成JSON tangqi609567707 java 二维数组 json
原文出处：http://blog.csdn.net/springsen/article/details/7833596 public class Demo { public static void main(String[] args) { String[][] blogL
erlang supervisor wudixiaotie erlang
定义supervisor时，如果是监控celuesimple_one_for_one则删除children的时候就用supervisor:terminate_child (SupModuleName, ChildPid)，如果shutdown策略选择的是brutal_kill，那么supervisor会调用exit(ChildPid, kill)，这样的话如果Child的behavior是gen_