gaorongchao1990626

samtools学习及使用范例,以及官方文档详解

本文章主要参考“菜鸟”的新浪博客，自己只是把自己操作的过程记录下来，供大家参考。

#第一步：把sam文件转换成bam文件,我们得到map.bam文件
system"samtools view -bS map.sam > map.bam";
#第二步：sort 一下 BAM 文件,得到map.sorted.bam
system"samtools sort map.bam map.sorted";
#第三步：创建一个关于bam的索引文件,我们得到一个map.sorted.bam.bai的文件
system"samtools index map.sorted.bam";
#第四步：找snp，这里用的是sort以后的bam文件，如果不是，就会不断的报错
system"samtools mpileup -ugf TAIR10.fas map.sorted.bam | bcftools view -vcg -D100 ->snp.vcf"

总的运行步骤就是上面的四部，我用perl写了一下，这样可以把命令记录下来，当然你需要一次运行一个命令，其他的命令可以先用#给标记成注释，从上往下依次运行一个命令。

如果我们要获取全部的位点的信息，而不是仅仅snp位点，那么我们只需要把最后一行的-v去掉就可以了。

如下：

system"samtools mpileup -ugf TAIR10.fas map.sorted.bam | bcftools view -cg -D100 ->snp.vcf"

再下面有详细的解释-v的作用： Output variant sites only (force -c)：这里有-v这个选项就只输出snp位点，如果没有-v那么就是输出所有的位点（测序所包含的）

上面程序中用到的命令，在下面详细介绍的时候出现，我都会用中文解释

bam是BInary Alignment/Map的简写，Binary就是二进制的意思。和sam的文件具有相同的内容，自然就可以相互转换。

上面是最简单的例子。

我们再来详细的看一看官方的文档。

网址如下：http://samtools.sourceforge.net/samtools.shtml

Manual Reference Pages - samtools (1)

NAME

samtools - Utilities for the Sequence Alignment/Map (SAM) format

bcftools - Utilities for the Binary Call Format (BCF) and VCF

Synopsis
Description
Samtools Commands And Options
Bcftools Commands And Options
Sam Format
Vcf Format
Examples
Limitations
Author
See Also

SYNOPSIS(大纲）：这个大纲其实详细的说明了运行的命令，如果没有特殊要求就可以直接采用了。下面的东西都是针对这个的描述。

samtools view -bt ref_list.txt -o aln.bam aln.sam.gz

samtools sort aln.bam aln.sorted

：这个是sort的命令，需要的是aln.bam时你要sort的文件，后面跟的是你可以自己命名的最好和前面保持一致

samtools index aln.sorted.bam

：sort以后要用建立一个索引文件就直接用这个命令

samtools idxstats aln.sorted.bam

samtools view aln.sorted.bam chr2:20,100,000-20,200,000

samtools merge out.bam in1.bam in2.bam in3.bam

samtools faidx ref.fasta

samtools pileup -vcf ref.fasta aln.sorted.bam

samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

：我们再上面用过的最后snp的提取里

samtools tview aln.sorted.bam ref.fasta

bcftools index in.bcf

bcftools view in.bcf chr2:100-200 > out.vcf

bcftools view -vc in.bcf > out.vcf 2> out.afs

DESCRIPTION

Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.

Samtools is designed to work on a stream. It regards an input file ‘-’ as the standard input (stdin) and an output file ‘-’ as the standard output (stdout). Several commands can thus be combined with Unix pipes. Samtools always output warning and error messages to the standard error output (stderr).

Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server if the BAM file name starts with ‘ftp://’ or ‘http://’. Samtools checks the current working directory for the index file and will download the index upon absence. Samtools does not retrieve the entire alignment file unless it is asked to do so.

SAMTOOLS COMMANDS AND OPTIONS

view samtools view [-bchuHS] [-t in.refList] [-o output] [-f reqFlag] [-F skipFlag] [-q minMapQ] [-l library] [-r readGroup] [-R rgFile] <in.bam>|<in.sam> [region1 [...]]
Extract/print all or sub alignments in SAM or BAM format. If no region is specified, all the alignments will be printed; otherwise only alignments overlapping the specified regions will be output. An alignment may be given multiple times if it is overlapping several regions. A region can be presented, for example, in the following format: ‘chr2’ (the whole chr2), ‘chr2:1000000’ (region starting from 1,000,000bp) or ‘chr2:1,000,000-2,000,000’ (region between 1,000,000 and 2,000,000bp including the end points). The coordinate is 1-based.

OPTIONS:

-b Output in the BAM format.我们第一步把sam转换成bam的中-bS中-b表示的就是要输出bam的文件

-f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]

-F INT Skip alignments with bits present in INT [0]

-h Include the header in the output.（再输出文件中包含头文件）

-H Output the header only.（只输出头文件）

-l STR Only output reads in library STR [null]

-o FILE Output file [stdout]

-q INT Skip alignments with MAPQ smaller than INT [0]

-r STR Only output reads in read group STR [null]

-R FILE Output reads in read groups listed in FILE [null]

-S Input is in SAM. If @SQ header lines are absent, the ‘-t’ option is required.这里S表示的就是输入的是SAM的格式，如果sam中没有头文件，那么就要用到-t的选项

-c Instead of printing the alignments, only count them and print the total number. All filter options, such as ‘-f’, ‘-F’ and ‘-q’ , are taken into account.

-t FILE This file is TAB-delimited. Each line must contain the reference name and the length of the reference, one line for each distinct reference; additional fields are ignored. This file also defines the order of the reference sequences in sorting. If you run ‘samtools faidx <ref.fa>’, the resultant index file <ref.fa>.fai can be used as this <in.ref_list> file.

-u Output uncompressed BAM. This option saves time spent on compression/decomprssion and is thus preferred when the output is piped to another samtools command.

tview samtools tview <in.sorted.bam> [ref.fasta]
Text alignment viewer (based on the ncurses library). In the viewer, press ‘?’ for help and press ‘g’ to check the alignment start from a region in the format like ‘chr10:10,000,000’ or ‘=10,000,000’ when viewing the same reference sequence.

这个命令是查看的命令，看到的是map以后覆盖度的文件，samtools tview .bam文件 .ref文件

mpileup samtools mpileup [-EBug] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-M capMapQ][-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]
Generate BCF or pileup for one or multiple BAM files. Alignment records are grouped by sample identifiers in @RG header lines. If sample identifiers are absent, each input file is regarded as one sample.

In the pileup format (without -uor-g), each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities. Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column. At this column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, a ’>’ or ’<’ for a reference skip, ‘ACGTN’ for a mismatch on the forward strand and ‘acgtn’ for a mismatch on the reverse strand. A pattern ‘\+[0-9]+[ACGTNacgtn]+’ indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence. Similarly, a pattern ‘-[0-9]+[ACGTNacgtn]+’ represents a deletion from the reference. The deleted bases will be presented as ‘*’ in the following lines. Also at the read base column, a symbol ‘^’ marks the start of a read. The ASCII of the character following ‘^’ minus 33 gives the mapping quality. A symbol ‘$’ marks the end of a read segment.

Input Options:

-6 Assume the quality is in the Illumina 1.3+ encoding. -A Do not skip anomalous read pairs in variant calling.

-B Disable probabilistic realignment for the computation of base alignment quality (BAQ). BAQ is the Phred-scaled probability of a read base being misaligned. Applying this option greatly helps to reduce false SNPs caused by misalignments.

-b FILE List of input BAM files, one file per line [null]

-C INT Coefficient for downgrading mapping quality for reads containing excessive mismatches. Given a read with a phred-scaled probability q of being generated from the mapped position, the new mapping quality is about sqrt((INT-q)/INT)*INT. A zero value disables this functionality; if enabled, the recommended value for BWA is 50. [0]

-d INT At a position, read maximally INT reads per input BAM. [250]

-E Extended BAQ computation. This option helps sensitivity especially for MNPs, but may hurt specificity a little bit.

-f FILE The faidx-indexed reference file in the FASTA format. The file can be optionally compressed by razip. [null]：要有一个参考序列

-l FILE BED or position list file containing a list of regions or sites where pileup or BCF should be generated [null]

-q INT Minimum mapping quality for an alignment to be used [0]

-Q INT Minimum base quality for a base to be considered [13]

-r STR Only generate pileup in region STR [all sites]

Output Options:输出选项

-D Output per-sample read depth 读取的深度，可以设定值比如-D100

-g Compute genotype likelihoods and output them in the binary call format (BCF).

-S Output per-sample Phred-scaled strand bias P-value

-u Similar to -g except that the output is uncompressed（未压缩的） BCF, which is preferred for piping.

Options for Genotype Likelihood Computation (for -g or -u):

-e INT Phred-scaled gap extension sequencing error probability. Reducing INTleads to longer indels. [20]

-h INT Coefficient for modeling homopolymer errors. Given an l-long homopolymer run, the sequencing error of an indel of size s is modeled as INT*s/l. [100]

-I Do not perform INDEL calling

-L INT Skip INDEL calling if the average per-sample depth is above INT. [250]

-o INT Phred-scaled gap open sequencing error probability. Reducing INT leads to more indel calls. [40]

-P STR Comma dilimited list of platforms (determined by @RG-PL) from which indel candidates are obtained. It is recommended to collect indel candidates from sequencing technologies that have low indel error rate such as ILLUMINA. [all]

reheader samtools reheader <in.header.sam> <in.bam>
Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM->SAM->BAM conversion.

cat samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]
Concatenate BAMs. The sequence dictionary of each input BAM must be identical, although this command does not check this. This command uses a similar trick toreheader which enables fast BAM concatenation.

sort samtools sort [-no] [-m maxMem] <in.bam> <out.prefix>
Sort alignments by leftmost coordinates. File <out.prefix>.bam will be created. This command may also create temporary files <out.prefix>.%d.bam when the whole alignment cannot be fitted into memory (controlled by option -m).

OPTIONS:

-o Output the final alignment to the standard output.

-n Sort by read names rather than by chromosomal coordinates

-m INT Approximately the maximum required memory. [500000000]

merge samtools merge [-nur1f] [-h inh.sam] [-R reg] <out.bam> <in1.bam> <in2.bam> [...]
Merge multiple sorted alignments. The header reference lists of all the input BAM files, and the @SQ headers of inh.sam, if any, must all refer to the same set of reference sequences. The header reference list and (unless overridden by -h) ‘@’ headers of in1.bam will be copied to out.bam, and the headers of other files will be ignored.

OPTIONS:

-1 Use zlib compression level 1 to comrpess the output

-f Force to overwrite the output file if present.

-h FILE Use the lines of FILE as ‘@’ headers to be copied to out.bam, replacing any header lines that would otherwise be copied from in1.bam. (FILE is actually in SAM format, though any alignment records it may contain are ignored.)

-n The input alignments are sorted by read names rather than by chromosomal coordinates

-R STR Merge files in the specified region indicated by STR [null]

-r Attach an RG tag to each alignment. The tag value is inferred from file names.

-u Uncompressed BAM output

index samtools index <aln.bam>
Index sorted alignment for fast random access. Index file <aln.bam>.bai will be created.

idxstats samtools idxstats <aln.bam>
Retrieve and print stats in the index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads.

faidx samtools faidx <ref.fasta> [region1 [...]]
Index reference sequence in the FASTA format or extract subsequence from indexed reference sequence. If no region is specified, faidx will index the file and create<ref.fasta>.fai on the disk. If regions are speficified, the subsequences will be retrieved and printed to stdout in the FASTA format. The input file can be compressed in the RAZF format.

fixmate samtools fixmate <in.nameSrt.bam> <out.bam>
Fill in mate coordinates, ISIZE and mate related flags from a name-sorted alignment.

rmdup samtools rmdup [-sS] <input.srt.bam> <out.bam>
Remove potential PCR duplicates: if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality. In the paired-end mode, this command ONLY works with FR orientation and requires ISIZE is correctly set. It does not work for unpaired reads (e.g. two ends mapped to different chromosomes or orphan reads).

OPTIONS:

-s Remove duplicate for single-end reads. By default, the command works for paired-end reads only.

-S Treat paired-end reads and single-end reads.

calmd samtools calmd [-EeubSr] [-C capQcoef] <aln.bam> <ref.fasta>
Generate the MD tag. If the MD tag is already present, this command will give a warning if the MD tag generated is different from the existing tag. Output SAM by default.

OPTIONS:

-A When used jointly with -r this option overwrites the original base quality.

-e Convert a the read base to = if it is identical to the aligned reference base. Indel caller does not support the = bases at the moment.

-u Output uncompressed BAM

-b Output compressed BAM

-S The input is SAM with header lines

-C INT Coefficient to cap mapping quality of poorly mapped reads. See the pileupcommand for details. [0]

-r Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).

-E Extended BAQ calculation. This option trades specificity for sensitivity, though the effect is minor.

targetcut samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam>
This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and outputs a SAM with each sequence corresponding to a target. When option -f is in use, BAQ will be applied. This command is only designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)].

 phase

 samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam> Call and phase heterozygous SNPs. OPTIONS: 
 
         -A 
         Drop reads with ambiguous phase. 
       
         -b STR 
         Prefix of BAM output. When this option is in use, phase-0 reads will be saved in fileSTR.0.bam and phase-1 reads in STR.1.bam. Phase unknown reads will be randomly allocated to one of the two files. Chimeric reads with switch errors will be saved inSTR.chimeric.bam. [null] 
       
         -F 
         Do not attempt to fix chimeric reads. 
       
         -k INT 
         Maximum length for local phasing. [13] 
       
         -q INT 
         Minimum Phred-scaled LOD to call a heterozygote. [40] 
       
         -Q INT 
         Minimum base quality to be used in het calling. [13]

BCFTOOLS COMMANDS AND OPTIONS

view bcftools view [-AbFGNQSucgv] [-D seqDict] [-l listLoci] [-s listSample] [-igapSNPratio] [-t mutRate] [-p varThres] [-P prior] [-1 nGroup1] [-d minFrac] [-UnPerm] [-X permThres] [-T trioType] in.bcf [region]
Convert between BCF and VCF, call variant candidates and estimate allele frequencies.

Input/Output Options:
-A

Retain all possible alternate alleles at variant sites. By default, the view command discards unlikely alleles.

-b Output in the BCF format. The default is VCF.

-D FILE Sequence dictionary (list of chromosome names) for VCF->BCF conversion [null]

-F Indicate PL is generated by r921 or before (ordering is different).

-G Suppress all individual genotype information.

-l FILE List of sites at which information are outputted [all sites]

-N Skip sites where the REF field is not A/C/G/T

-Q Output the QCALL likelihood format

-s FILE List of samples to use. The first column in the input gives the sample names and the second gives the ploidy, which can only be 1 or 2. When the 2nd column is absent, the sample ploidy is assumed to be 2. In the output, the ordering of samples will be identical to the one in FILE. [null]

-S The input is VCF instead of BCF.

-u Uncompressed BCF output (force -b).

Consensus/Variant Calling Options:
-c

Call variants using Bayesian inference. This option automatically invokes option -e.

-d FLOAT When -v is in use, skip loci where the fraction of samples covered by reads is below FLOAT. [0]

-e Perform max-likelihood inference only, including estimating the site allele frequency, testing Hardy-Weinberg equlibrium and testing associations with LRT.

-g Call per-sample genotypes at variant sites (force -c)

-i FLOAT Ratio of INDEL-to-SNP mutation rate [0.15]

-p FLOAT A site is considered to be a variant if P(ref|D)<FLOAT [0.5]

-P STR Prior or initial allele frequency spectrum. If STR can be full, cond2,flat or the file consisting of error output from a previous variant calling run.

-t FLOAT Scaled muttion rate for variant calling [0.001]

-T STR Enable pair/trio calling. For trio calling, option -s is usually needed to be applied to configure the trio members and their ordering. In the file supplied to the option -s, the first sample must be the child, the second the father and the third the mother. The valid values of STR are ‘pair’, ‘trioauto’, ‘trioxd’ and ‘trioxs’, where ‘pair’ calls differences between two input samples, and ‘trioxd’ (‘trioxs’) specifies that the input is from the X chromosome non-PAR regions and the child is a female (male). [null]

-v Output variant sites only (force -c)：这里有-v这个选项就只输出snp位点，如果没有-v那么就是输出所有的位点（测序所包含的）

Contrast Calling and Association Test Options:
-1 INT

Number of group-1 samples. This option is used for dividing the samples into two groups for contrast SNP calling or association test. When this option is in use, the following VCF INFO will be outputted: PC2, PCHI2 and QCHI2. [0]

-U INT Number of permutations for association test (effective only with -1) [0]

-X FLOAT Only perform permutations for P(chi^2)<FLOAT (effective only with -U) [0.01]

index bcftools index in.bcf
Index sorted BCF for random access.

cat

bcftools catin1.bcf ["in2.bcf "[..."]]]" Concatenate BCF files. The input files are required to be sorted and have identical samples appearing in the same order.

SAM FORMAT

Sequence Alignment/Map (SAM) format is TAB-delimited. Apart from the header lines, which are started with the ‘@’ symbol, each alignment line consists of:

Col Field Description

1 QNAME Query template/pair NAME

2 FLAG bitwise FLAG

3 RNAME Reference sequence NAME

4 POS 1-based leftmost POSition/coordinate of clipped sequence

5 MAPQ MAPping Quality (Phred-scaled)

6 CIAGR extended CIGAR string

7 MRNM Mate Reference sequence NaMe (‘=’ if same as RNAME)

8 MPOS 1-based Mate POSistion

9 TLEN inferred Template LENgth (insert size)

10 SEQ query SEQuence on the same strand as the reference

11 QUAL query QUALity (ASCII-33 gives the Phred base quality)

12+ OPT variable OPTional fields in the format TAG:VTYPE:VALUE

Each bit in the FLAG field is defined as:

Flag Chr Description

0x0001 p the read is paired in sequencing

0x0002 P the read is mapped in a proper pair

0x0004 u the query sequence itself is unmapped

0x0008 U the mate is unmapped

0x0010 r strand of the query (1 for reverse)

0x0020 R strand of the mate

0x0040 1 the read is the first read in a pair

0x0080 2 the read is the second read in a pair

0x0100 s the alignment is not primary

0x0200 f the read fails platform/vendor quality checks

0x0400 d the read is either a PCR or an optical duplicate

where the second column gives the string representation of the FLAG field.

VCF FORMAT

The Variant Call Format (VCF) is a TAB-delimited format with each data line consists of the following fields:

Col Field Description

1 CHROM CHROMosome name

2 POS the left-most POSition of the variant

3 ID unique variant IDentifier

4 REF the REFerence allele

5 ALT the ALTernate allele(s), separated by comma

6 QUAL variant/reference QUALity

7 FILTER FILTers applied

8 INFO INFOrmation related to the variant, separated by semi-colon

9 FORMAT FORMAT of the genotype fields, separated by colon (optional)

10+ SAMPLE SAMPLE genotypes and per-sample information (optional)

The following table gives the INFO tags used by samtools and bcftools.

Tag Format Description

AF1 double Max-likelihood estimate of the site allele frequency (AF) of the first ALT allele

DP int Raw read depth (without quality filtering)

DP4 int[4] # high-quality reference forward bases, ref reverse, alternate for and alt rev bases

FQ int Consensus quality. Positive: sample genotypes different; negative: otherwise

MQ int Root-Mean-Square mapping quality of covering reads

PC2 int[2] Phred probability of AF in group1 samples being larger (,smaller) than in group2

PCHI2 double Posterior weighted chi^2 P-value between group1 and group2 samples

PV4 double[4] P-value for strand bias, baseQ bias, mapQ bias and tail distance bias

QCHI2 int Phred-scaled PCHI2

RP int # permutations yielding a smaller PCHI2

CLR int Phred log ratio of genotype likelihoods with and without the trio/pair constraint

UGT string Most probable genotype configuration without the trio constraint

CGT string Most probable configuration with the trio constraint

EXAMPLES

o Import SAM to BAM when @SQ lines are present in the header:
samtools view -bS aln.sam > aln.bam

If @SQ lines are absent:

samtools faidx ref.fa
samtools view -bt ref.fa.fai aln.sam > aln.bam

where ref.fa.fai is generated automatically by the faidx command.

o Attach the RG tag while merging sorted alignments:
perl -e ’print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"’ > rg.txt
samtools merge -rh rg.txt merged.bam ga.bam 454.bam

The value in a RG tag is determined by the file name the read is coming from. In this example, in the merged.bam, reads from ga.bam will be attached RG:Z:ga, while reads from454.bam will be attached RG:Z:454.

o Call SNPs and short INDELs for one diploid individual:
samtools mpileup -ugf ref.fa aln.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D 100 > var.flt.vcf

The -D option of varFilter controls the maximum read depth, which should be adjusted to about twice the average read depth. One may consider to add -C50 to mpileup if mapping quality is overestimated for reads containing excessive mismatches. Applying this option usually helps BWA-short but may not other mappers.

o Generate the consensus sequence for one diploid individual:
samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

o Call somatic mutations from a pair of samples:
samtools mpileup -DSuf ref.fa aln.bam | bcftools view -bvcgT pair - > var.bcf

In the output INFO field, CLR gives the Phred-log ratio between the likelihood by treating the two samples independently, and the likelihood by requiring the genotype to be identical. This CLR is effectively a score measuring the confidence of somatic calls. The higher the better.

o Call de novo and somatic mutations from a family trio:
samtools mpileup -DSuf ref.fa aln.bam | bcftools view -bvcgT pair -s samples.txt - > var.bcf

File samples.txt should consist of three lines specifying the member and order of samples (in the order of child-father-mother). Similarly, CLR gives the Phred-log likelihood ratio with and without the trio constraint. UGT shows the most likely genotype configuration without the trio constraint, and CGT gives the most likely genotype configuration satisfying the trio constraint.

o Phase one individual:
samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

The calmd command is used to reduce false heterozygotes around INDELs.

o Call SNPs and short indels for multiple diploid individuals:
samtools mpileup -P ILLUMINA -ugf ref.fa *.bam | bcftools view -bcvg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D 2000 > var.flt.vcf

Individuals are identified from the SM tags in the @RG header lines. Individuals can be pooled in one alignment file; one individual can also be separated into multiple files. The-P option specifies that indel candidates should be collected only from read groups with the @RG-PL tag set to ILLUMINA. Collecting indel candidates from reads sequenced by an indel-prone technology may affect the performance of indel calling.

o Derive the allele frequency spectrum (AFS) on a list of sites from multiple individuals:
samtools mpileup -Igf ref.fa *.bam > all.bcf
bcftools view -bl sites.list all.bcf > sites.bcf
bcftools view -cGP cond2 sites.bcf > /dev/null 2> sites.1.afs
bcftools view -cGP sites.1.afs sites.bcf > /dev/null 2> sites.2.afs
bcftools view -cGP sites.2.afs sites.bcf > /dev/null 2> sites.3.afs
......

where sites.list contains the list of sites with each line consisting of the reference sequence name and position. The following bcftools commands estimate AFS by EM.

o Dump BAQ applied alignment for other SNP callers:
samtools calmd -bAr aln.bam > aln.baq.bam

It adds and corrects the NM and MD tags at the same time. The calmd command also comes with the -C option, the same as the one in pileup and mpileup. Apply if it helps.

LIMITATIONS

o Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.

o Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan reads or ends mapped to different chromosomes). If this is a concern, please use Picard’s MarkDuplicate which correctly handles these cases, although a little slower.

AUTHOR

Heng Li from the Sanger Institute wrote the C version of samtools. Bob Handsaker from the Broad Institute implemented the BGZF library and Jue Ruan from Beijing Genomics Institute wrote the RAZF library. John Marshall and Petr Danecek contribute to the source code and various people from the 1000 Genomes Project have contributed to the SAM format specification.

java23种设计模式-解释器模式千里码！设计模式后端技术 #Java 设计模式解释器模式 java
解释器模式（InterpreterPattern）学习笔记编程相关书籍分享：https://blog.csdn.net/weixin_47763579/article/details/145855793DeepSeek使用技巧pdf资料分享：https://blog.csdn.net/weixin_47763579/article/details/1458840391.模式定义行为型设计模式，给定
Redis教程(二十一)：Redis怎么保证缓存一致性 ThatMonth 缓存 redis 数据库
传送门：Redis教程汇总篇，让你从入门到精通Redis的缓存一致性Redis的缓存一致性是指在使用Redis作为缓存层时，保证缓存中的数据与数据库中的数据保持一致的状态。在分布式系统中，数据一致性是一个重要的问题，因为可能存在多个客户端同时读写同一数据，或者数据在不同节点间需要同步更新。在涉及缓存的场景中，保持缓存一致性面临以下挑战：数据更新：当数据库中的数据被修改后，相关联的缓存数据需要被相应
数据库添加数据时，主键字段报错：Field 'mid' doesn't have a default value 懂的越多不懂的也越多数据库添加数据时主键字段报错：Field 'mid'doesn't
数据库添加数据时，主键字段报错：Field'mid'doesn'thaveadefaultvalue简介1、打开mysql安装目录找到my.ini文件，查找2、MySQL5usesastrictmodewhichneedstobedisabled.3、那就可能是你的数据库字段设置有问题.简介在使用ORM框架(Mybatis.JPA…)添加数据时,报错:Field‘mid’doesn’thavead
【缓冲区】数据库备份的衍生问题，缓冲区是什么，在哪里？（一）松岛的枫叶数据库
【缓冲区】数据库备份的衍生问题，缓冲区是什么，在哪里？（一）缓冲区是操作系统和Java运行时环境（JVM）内部的一个机制，你无法直接看到它，因为它是由操作系统和JVM管理的。不过，我可以详细解释它的工作原理，以及如何通过代码间接观察到它的存在。1.缓冲区是什么？缓冲区（Buffer）是一块内存区域，用于临时存储数据。当你运行一个外部命令时，操作系统会为这个命令创建一个进程，并为它的输入、输出和错误
无法启动此程序，因为计算机丢失api-ms-win-core-path-l1-1-0.dll的解决方案爱编程的喵喵 Python基础课程 python windows 7 api-ms-win-core 解决方案
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了无法启动此程序，因为计算机丢失api
V2X通信协议测试软件测试车载测试协议测试
引言随着智能网联汽车的发展，V2X（Vehicle-to-Everything）通信技术成为提升道路安全和交通效率的关键。V2X包括车与车（V2V）、车与基础设施（V2I）、车与行人（V2P）以及车与网络（V2N）的通信。为了确保V2X的可靠性和安全性，通信协议的测试至关重要。本文将介绍V2X通信协议的关键技术、测试方法及挑战。一、V2X通信协议概述目前V2X通信主要基于两种技术：DSRC（Ded
python合并多个pdf_Python实现按序合并多个pdf文件 weixin_39647458 python合并多个pdf
技术交流QQ群:1027579432，欢迎你的加入！欢迎关注我的微信公众号：CurryCoder的程序人生1.整体实现步骤在日常办公中，我们可能会有一个需求，需要将多个pdf文件合并成一个文件。例如：需要将每个章节的pdf文件学习资料合并成一个pdf文件，便于我们进行学习资料的归档与整理。如何才能合并多个pdf文件呢？我查了一下网上现有的资料(详见参考资料1)，发现python中有一个第三方库Py
Ceph Cookbook: 掌握分布式存储技术的实践指南云山雾村
本文还有配套的精品资源，点击获取简介：《CephCookbook》是一本面向希望深入学习Ceph分布式存储系统的读者的实用指南。本书通过实际案例和操作指导，全面介绍Ceph的核心概念和关键技术。介绍了Ceph的三个主要组件：RADOS、RBD和RGW，以及它们如何协同工作以提供高可用性和数据冗余。读者将学习Ceph的安装、配置、管理和优化，以及如何利用其高级特性，如CRUSH算法和多租户管理。本书
神通数据库ShenTong7在CentOS7上的安装与MySQL迁移遇到的兼容性问题 Heartsuit 运维 Database 国产化迁移适配问题解决记录 1024程序员节神通数据库数据迁移 find_in_set CentOS7
背景最近接触了个项目，数据库用的是国产数据库：神通数据库ShenTong7。简单总结下ShenTong7在CentOS7上的安装与使用。此外，在开发环境使用的是MySQL数据库，部署时需要迁移到神通数据库ShenTong7，并且记录了在迁移过程中以及迁移之后遇到的问题及解决方法。以下信息是在安装过程中关于神通数据库ShenTong7的官方介绍：神通数据库是天津神舟通用数据技术有限公司（以下简称“神
FastDFS存储目录迁移方案甘蓝聊Java 【更新中...】项目中的那些事 FastDFS FastDFS目录迁移
1背景生产FastDFS的存储目录为/home/fastdfs。当前的存储情况如下：/home挂载点总磁盘量为4.8GB，可用容量不足1GB。所以计划迁移到/usr挂载点，以解决磁盘空间不足的问题。2迁移方案迁移思路：停止服务：停止原有的fastdfs的两个服务，防止外部用户继续上传文件数据备份及迁移：备份原有目录，并迁移到新目录配置备份：备份storage和tracker配置文件配置修改：修改配
在 MySQL 中，删除数据库和表后，自动递增的值通常会被重置为初始值，一般是 1。但如果自动递增不为零，可能有以下原因及解决办法：无聊大侠hello world 数据库 mysql
在MySQL中，删除数据库和表后，自动递增的值通常会被重置为初始值，一般是1。但如果自动递增不为零，可能有以下原因及解决办法：原因数据文件残留：MySQL的数据存储在数据文件中，虽然删除了数据库或表，但相关的数据文件可能没有被完全清理，其中可能保留了之前自动递增列的最大值记录等信息。当重新创建相同结构的表时，MySQL可能会根据这些残留信息来设置自动递增的起始值。缓存或元数据问题：MySQL的缓存
SQL笔记9.嵌入式SQL 笑神552 sql
SQL嵌入到其它语言中，这个时候编译需要其他方法1.扩充主语言编译系统，使之能够处理SQL语句2.预处理：在编译前先扫描源程序，将SQL语句翻译成目标（或主语言程序）过程代码，并将SQL执行翻译成主语言的过程调用预处理后的源程序再交给诸语言的编译系统处理在使用时，所有的SQL语句都要加EXECSQL在前面，后面PL/1,C时，用；，COBOL用END-EXEC通信：1.SQLCODE这是一个整型变
Elasticsearch：使用阿里云 AI 服务进行向量化和重新排名
作者：来自ElasticTomásMurúa在本文中，我们将介绍如何将阿里云AI功能与Elasticsearch集成，以提高语义搜索的相关性。阿里云人工智能搜索是一种将高级人工智能功能与Elasticsearch工具相结合的解决方案，利用QwenLLM/DeepSeek-R1系列提供高级推理和分类模型。在本文中，我们将使用同一作者撰写的小说和戏剧的描述来测试阿里巴巴重新排名和稀疏嵌入端点。步骤创建
从零开始搭建Zabbix监控系统：安装配置全攻略，轻松掌握企业级监控利器磐基Stack专业服务团队 Zabbix zabbix adb
#作者：stackofumbrella文章目录前提安装及配置开始安装设置mysql编码开始安装查询是否启动成功编译安装zabbix前提selinux关闭#vim/etc/selinux/configSELINUX=disabled#setenforce0关闭防火墙#systemctlstopfirewalldyum安装mysql检查系统是否安装其他版本的MYSQL#yumlistinstalled
Ubuntu中 json 打包数据的使用猫猫的小茶馆嵌入式软件开发 ubuntu json linux 服务器网络 mcu
1.JSON的概念和作用为了避免不同平台下的字节对齐、类型大小不统一的问题，json库把数据封装成具有一定格式的字符流数据，进行传输。json格式：把数据与键值一一对应，数据传输双方约定好同一键值，使用接口API根据键值操作json对象（json_object）存储或取得数据。一般使用：数据-->（封装）json对象-->String格式-->...传输。。。-->String格式-->（解析）j
智能推送系统的敏感词过滤功能：合规防线与用户体验的守护者大数据
在信息爆炸与监管趋严的双重挑战下，APP企业正面临前所未有的内容安全压力。一次不当推送可能引发用户投诉、应用下架甚至法律诉讼。MobPush智能推送系统的敏感词过滤功能，通过技术手段在推送内容发布前自动拦截违规信息，已成为企业规避风险、维护品牌声誉的核心工具。数据显示，引入该功能后，APP的违规内容投诉率平均下降75%，人工审核成本减少60%。本文将从技术逻辑、业务价值及典型案例三个维度，解析这一
智能推送系统的敏感词过滤功能：合规防线与用户体验的守护者大数据
在信息爆炸与监管趋严的双重挑战下，APP企业正面临前所未有的内容安全压力。一次不当推送可能引发用户投诉、应用下架甚至法律诉讼。MobPush智能推送系统的敏感词过滤功能，通过技术手段在推送内容发布前自动拦截违规信息，已成为企业规避风险、维护品牌声誉的核心工具。数据显示，引入该功能后，APP的违规内容投诉率平均下降75%，人工审核成本减少60%。本文将从技术逻辑、业务价值及典型案例三个维度，解析这一
提升物流效率，减少错误：板栗看板为你打造完美物流管理体验项目管理软件
利用板栗看板优化物流管理，可以通过其可视化的任务管理、团队协作和实时跟踪功能，提升物流效率、减少错误并增强团队协作。以下是具体的优化方法和步骤：明确物流管理流程在开始使用板栗看板之前，先梳理企业的物流管理流程，明确各个环节（如订单接收、库存管理、运输调度、配送跟踪等）。将这些环节映射到看板中，形成清晰的工作流。示例看板列：○待处理订单○库存准备中○运输中○已送达○问题处理创建任务卡片将每个物流任务
掌握MCN运营主动权：优化工具助你抢占市场先机项目管理软件
板栗看板作为一款强大的任务管理和团队协作工具，能够显著协助MCN（多频道网络）开展工作。以下详细分析板栗看板如何助力MCN提升运营效率和管理水平：一、任务分配与进度跟踪可视化任务管理○板栗看板通过可视化的看板视图，使MCN能够清晰地看到每个任务的分配情况、进度状态以及优先级。○每个任务卡片都包含详细信息，如负责人、截止日期、任务描述等，便于团队成员快速了解任务要求。灵活的任务分配○MCN管理者可以
阿根廷总统 Milei 谈 Libra 代币风波：从初衷到反思区块链智能合约web3
作者：Techub热点速递采访媒体：TodoNoticias整理：Tia，TechubNews编者按：关于阿根廷总统哈维尔·米莱（JavierMilei）与代币LIBRA事件的风波，近日依然未曾平息。从他公开为项目站台到推文风波引发的广泛关注，一场复杂的政治与经济博弈正在阿根廷上演。与此同时，涉及该事件的资金流向、参与者范围以及后续的舆论反响，逐步揭示了其中错综复杂的内幕。总统米莱在接受采访时，坦
打造卓越工程：工程管理工具如何重塑工作流程团队协作工具
板栗看板作为一款工程管理软件，在项目管理、任务协同、知识笔记以及个人待办等多个方面展现出其独特的优势。以下是对板栗看板在工程管理方面的详细分析：一、核心功能任务可视化管理○板栗看板采用可视化的看板系统，用户可以将任务以卡片的形式展示在看板上，并轻松拖动卡片以显示任务的不同状态（如“待办”、“进行中”、“已完成”等）。○每个任务卡片都可以包含详细的描述、责任人、截止日期、优先级等关键信息，使得团队成
教培机构的核心竞争力：项目管理如何赋能教学与运营？团队协作工具
教培机构项目管理教培机构项目管理是指通过科学的管理方法和工具，对教育培训机构的各项活动进行规划、组织、实施和监控，以确保教学目标的高效达成和机构的可持续发展。以下是教培机构项目管理的核心内容及实施策略：一、项目管理的核心内容课程设计与开发○需求分析：通过市场调研、家长反馈和学生评估，明确课程需求，确定教学目标。○课程规划：根据学生的年龄特点和学习能力，设计系统化、层次化的课程体系。○资源整合：结合
练手代码之使用Python实现合并PDF文件 Wcowin Python python pdf 前端
如果你有合并PDF的需要，你会怎么办我们无所不能的程序员会选择写一个Python代码来实现（谁会这么无聊？是我），如果真的有PDF操作需要，我推荐你使用PDFExpert这个软件哈~话不多说直接上代码：importosimportPyPDF2fromtkinterimportTkfromtkinter.filedialogimportaskopenfilenamesdefcombine_pdfs(
Excel的两个小问题解决怜渠客实用技巧 excel
（一）因为合并单元格存在，无法使用下拉自动填充公式。解决方案：使用Ctrl+Enter组合键选中目标区域：选中需要应用公式的所有合并单元格区域，这些单元格可能是由2行或3行等合并而成。输入公式：在编辑栏中输入所需的公式，例如，如果要对C列和D列对应合并单元格区域进行乘法运算，在编辑栏输入=C2*D2，此时不要按回车键。填充公式：按下Ctrl+Enter组合键，Excel会将公式同时应用到选中的所有
SQLite Select 语句详解 lsx202406 开发语言
SQLiteSelect语句详解引言SQLite是一款轻量级的数据库管理系统，以其小巧的体积、易于使用和跨平台的特点受到广泛欢迎。在SQLite中，SELECT语句是最基本的数据查询操作，用于从数据库表中检索数据。本文将详细介绍SQLite的SELECT语句，包括其语法、功能以及一些高级用法。1.SELECT语句基础SELECT语句的基本语法如下：SELECTcolumn1,column2,...
Hive 面试题昨夜为你摘星
什么是Hive?Hive是基于Hadoop的一个数据仓库工具，用来进行数据提取、转化、加载，这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。Hive数据仓库工具能将结构化的数据文件映射为一张数据库表，并提供SQL查询功能，能将SQL语句转变成MapReduce任务来执行。Hive的意义（最初研发的原因）?降低程序员使用Hadoop的难度，降低学习成本Hive的内部组成模块，作用
Python面向对象面试题及参考答案大模型大数据攻城狮 python 面试继承封装接口隔离弱引用元类
目录什么是面向对象编程？Python中的类和对象是什么？什么是继承？Python如何实现继承？什么是多态？Python如何实现多态？Python中的类属性和实例属性有什么区别？类属性和实例属性的访问优先级规则是什么？Python中的实例方法、类方法和静态方法有什么区别？静态方法、类方法、实例方法的参数传递差异是什么？什么是构造函数（init）？解释__init__方法与__new__方法的区别Py
C# Socket网络通信【高并发场景】阿波茨的鹅 C#开发 c#网络开发语言
用途在C#中，Socket类是用于在网络上进行低级别通信的核心类。它提供了对TCP、UDP等协议的支持，可以实现服务器和客户端之间的数据传输。Socket提供了比TcpClient、UdpClient等更细粒度的控制，因此通常用于需要更多控制的场景。使用服务器usingSystem;usingSystem.Net;usingSystem.Net.Sockets;usingSystem.Text;c
redisCluster集群相关查询结果详解 ghostp redis redis
redisCluster集群相关查询结果详解进入redis进群查看集群信息CLUSTERINFO命令CLUSTERNODES命令info命令infoCommandstats命令查询服务器相关key的大小单个key查询某些前缀key批量查询进入redis进群在安装redis的机器上，找到安装目录的bin文件夹，使用以下命令来进入集群：[root@localhostbin]#./redis-cli-c
Spring Boot与MyBatis geinvse_seg 面试学习路线阿里巴巴 spring boot mybatis 后端
SpringBoot与MyBatis的配置一、简介SpringBoot是一个用于创建独立的、基于Spring的生产级应用程序的框架，它简化了Spring应用的初始搭建以及开发过程。MyBatis是一款优秀的持久层框架，它支持定制化SQL、存储过程以及高级映射。将SpringBoot和MyBatis结合使用，可以高效地开发数据驱动的应用程序。二、环境准备（一）创建SpringBoot项目可以使用Sp
Spring中@Value注解，需要注意的地方无量 spring bean @Value xml
Spring 3以后,支持@Value注解的方式获取properties文件中的配置值，简化了读取配置文件的复杂操作 1、在applicationContext.xml文件(或引用文件中)中配置properties文件 <bean id="appProperty" class="org.springframework.beans.fac
mongoDB 分片开窍的石头 mongodb
mongoDB的分片。要mongos查询数据时候先查询configsvr看数据在那台shard上，configsvr上边放的是metar信息，指的是那条数据在那个片上。由此可以看出mongo在做分片的时候咱们至少要有一个configsvr,和两个以上的shard（片）信息。第一步启动两台以上的mongo服务 &nb
OVER(PARTITION BY)函数用法 0624chenhong oracle
这篇写得很好，引自 http://www.cnblogs.com/lanzi/archive/2010/10/26/1861338.html OVER(PARTITION BY)函数用法 2010年10月26日 OVER(PARTITION BY)函数介绍开窗函数 &nb
Android开发中，ADB server didn't ACK 解决方法一炮送你回车库 Android开发
首先通知：凡是安装360、豌豆荚、腾讯管家的全部卸载，然后再尝试。一直没搞明白这个问题咋出现的，但今天看到一个方法，搞定了！原来是豌豆荚占用了 5037 端口导致。参见原文章：一个豌豆荚引发的血案——关于ADB server didn't ACK的问题简单来讲，首先将Windows任务进程中的豌豆荚干掉，如果还是不行，再继续按下列步骤排查。 &nb
canvas中的像素绘制问题换个号韩国红果果 JavaScript canvas
pixl的绘制，1.如果绘制点正处于相邻像素交叉线，绘制x像素的线宽，则从交叉线分别向前向后绘制x/2个像素，如果x/2是整数，则刚好填满x个像素，如果是小数，则先把整数格填满，再去绘制剩下的小数部分，绘制时，是将小数部分的颜色用来除以一个像素的宽度，颜色会变淡。所以要用整数坐标来画的话（即绘制点正处于相邻像素交叉线时），线宽必须是2的整数倍。否则会出现不饱满的像素。 2.如果绘制点为一个像素的
编码乱码问题灵静志远 java jvm jsp 编码
1、JVM中单个字符占用的字节长度跟编码方式有关，而默认编码方式又跟平台是一一对应的或说平台决定了默认字符编码方式；2、对于单个字符：ISO-8859-1单字节编码，GBK双字节编码，UTF-8三字节编码；因此中文平台(中文平台默认字符集编码GBK)下一个中文字符占2个字节，而英文平台(英文平台默认字符集编码Cp1252(类似于ISO-8859-1))。 3、getBytes()、getByte
java 求几个月后的日期 darkranger calendar getinstance
Date plandate = planDate.toDate(); SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd"); Calendar cal = Calendar.getInstance(); cal.setTime(plandate); // 取得三个月后时间 cal.add(Calendar.M
数据库设计的三大范式（通俗易懂） aijuans 数据库复习
关系数据库中的关系必须满足一定的要求。满足不同程度要求的为不同范式。数据库的设计范式是数据库设计所需要满足的规范。只有理解数据库的设计范式，才能设计出高效率、优雅的数据库，否则可能会设计出错误的数据库. 目前，主要有六种范式：第一范式、第二范式、第三范式、BC范式、第四范式和第五范式。满足最低要求的叫第一范式，简称1NF。在第一范式基础上进一步满足一些要求的为第二范式，简称2NF。其余依此类推。
想学工作流怎么入手 atongyeye jbpm
工作流在工作中变得越来越重要，很多朋友想学工作流却不知如何入手。很多朋友习惯性的这看一点，那了解一点，既不系统，也容易半途而废。好比学武功，最好的办法是有一本武功秘籍。研究明白，则犹如打通任督二脉。系统学习工作流，很重要的一本书《JBPM工作流开发指南》。本人苦苦学习两个月，基本上可以解决大部分流程问题。整理一下学习思路，有兴趣的朋友可以参考下。 1 首先要
Context和SQLiteOpenHelper创建数据库百合不是茶 android Context创建数据库
一直以为安卓数据库的创建就是使用SQLiteOpenHelper创建,但是最近在android的一本书上看到了Context也可以创建数据库,下面我们一起分析这两种方式创建数据库的方式和区别,重点在SQLiteOpenHelper 一:SQLiteOpenHelper创建数据库: 1,SQLi
浅谈group by和distinct bijian1013 oracle 数据库 group by distinct
group by和distinct只了去重意义一样，但是group by应用范围更广泛些，如分组汇总或者从聚合函数里筛选数据等。譬如：统计每id数并且只显示数大于3 select id ,count(id) from ta
vi opertion 征客丶 mac opration vi
进入 command mode （命令行模式）按 esc 键再按 shift + 冒号注：以下命令中带 $ 【在命令行模式下进行】，不带 $ 【在非命令行模式下进行】一、文件操作 1.1、强制退出不保存 $ q! 1.2、保存 $ w 1.3、保存并退出 $ wq 1.4、刷新或重新加载已打开的文件 $ e 二、光标移动 2.1、跳到指定行数字
【Spark十四】深入Spark RDD第三部分RDD基本API bit1129 spark
对于K/V类型的RDD,如下操作是什么含义？ val rdd = sc.parallelize(List(("A",3),("C",6),("A",1),("B",5)) rdd.reduceByKey(_+_).collect reduceByKey在这里的操作，是把
java类加载机制 BlueSkator java 虚拟机
java类加载机制 1.java类加载器的树状结构引导类加载器 ^ | 扩展类加载器 ^ | 系统类加载器 java使用代理模式来完成类加载，java的类加载器也有类似于继承的关系，引导类是最顶层的加载器，它是所有类的根加载器，它负责加载java核心库。当一个类加载器接到装载类到虚拟机的请求时，通常会代理给父类加载器，若已经是根加载器了，就自己完成加载。虚拟机区分一个Cla
动态添加文本框 BreakingBad 文本框
<script> var num=1; function AddInput() { var str=""; str+="<input
读《研磨设计模式》-代码笔记-单例模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ public class Singleton { } /* * 懒汉模式。注意，getInstance如果在多线程环境中调用，需要加上synchronized，否则存在线程不安全问题 */ class LazySingleton
iOS应用打包发布常见问题 chenhbc ios iOS发布 iOS上传 iOS打包
这个月公司安排我一个人做iOS客户端开发，由于急着用，我先发布一个版本，由于第一次发布iOS应用，期间出了不少问题，记录于此。 1、使用Application Loader 发布时报错：Communication error.please use diagnostic mode to check connectivity.you need to have outbound acc
工作流复杂拓扑结构处理新思路 comsci 设计模式工作算法企业应用 OO
我们走的设计路线和国外的产品不太一样，不一样在哪里呢？国外的流程的设计思路是通过事先定义一整套规则(类似XPDL)来约束和控制流程图的复杂度(我对国外的产品了解不够多，仅仅是在有限的了解程度上面提出这样的看法)，从而避免在流程引擎中处理这些复杂的图的问题，而我们却没有通过事先定义这样的复杂的规则来约束和降低用户自定义流程图的灵活性，这样一来，在引擎和流程流转控制这一个层面就会遇到很
oracle 11g新特性Flashback data archive daizj oracle
1. 什么是flashback data archive Flashback data archive是oracle 11g中引入的一个新特性。Flashback archive是一个新的数据库对象，用于存储一个或多表的历史数据。Flashback archive是一个逻辑对象，概念上类似于表空间。实际上flashback archive可以看作是存储一个或多个表的所有事务变化的逻辑空间。
多叉树:2-3-4树 dieslrae 树
平衡树多叉树,每个节点最多有4个子节点和3个数据项,2,3,4的含义是指一个节点可能含有的子节点的个数,效率比红黑树稍差.一般不允许出现重复关键字值.2-3-4树有以下特征: 1、有一个数据项的节点总是有2个子节点(称为2-节点) 2、有两个数据项的节点总是有3个子节点(称为3-节
C语言学习七动态分配 malloc的使用 dcj3sjt126com c language malloc
/* 2013年3月15日15:16:24 malloc 就memory(内存) allocate(分配)的缩写本程序没有实际含义，只是理解使用 */ # include <stdio.h> # include <malloc.h> int main(void) { int i = 5; //分配了4个字节静态分配 int * p
Objective-C编码规范[译] dcj3sjt126com 代码规范
原文链接 : The official raywenderlich.com Objective-C style guide 原文作者 : raywenderlich.com Team 译文出自 : raywenderlich.com Objective-C编码规范译者 : Sam Lau
0.性能优化-目录 frank1234 性能优化
从今天开始笔者陆续发表一些性能测试相关的文章，主要是对自己前段时间学习的总结，由于水平有限，性能测试领域很深，本人理解的也比较浅，欢迎各位大咖批评指正。主要内容包括：一、性能测试指标吞吐量、TPS、响应时间、负载、可扩展性、PV、思考时间 http://frank1234.iteye.com/blog/2180305 二、性能测试策略生产环境相同基准测试预热等 htt
Java父类取得子类传递的泛型参数Class类型 happyqing java 泛型父类子类 Class
import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; import org.junit.Test; abstract class BaseDao<T> { public void getType() { //Class<E> clazz =
跟我学SpringMVC目录汇总贴、PDF下载、源码下载 jinnianshilongnian springMVC
----广告-------------------------------------------------------------- 网站核心商详页开发掌握Java技术，掌握并发/异步工具使用，熟悉spring、ibatis框架；掌握数据库技术，表设计和索引优化，分库分表/读写分离；了解缓存技术，熟练使用如Redis/Memcached等主流技术；了解Ngin
the HTTP rewrite module requires the PCRE library 流浪鱼 rewrite
./configure: error: the HTTP rewrite module requires the PCRE library. 模块依赖性Nginx需要依赖下面3个包 1. gzip 模块需要 zlib 库 ( 下载: http://www.zlib.net/ ) 2. rewrite 模块需要 pcre 库 ( 下载: http://www.pcre.org/ ) 3. s
第12章 Ajax（中） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Optimize query with Query Stripping in Web Intelligence blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Optimize+query+with+Query+Stripping+in+Web+Intelligence and a very straightfoward video http://www.sdn.sap.com/irj/scn/events?rid=/library/uuid/40ec3a0c-936
Java开发者写SQL时常犯的10个错误 tomcat_oracle java sql
1、不用PreparedStatements 　　有意思的是，在JDBC出现了许多年后的今天，这个错误依然出现在博客、论坛和邮件列表中，即便要记住和理解它是一件很简单的事。开发者不使用PreparedStatements的原因可能有如下几个：　　他们对PreparedStatements不了解　　他们认为使用PreparedStatements太慢了　　他们认为写Prepar
世纪互联与结盟有感阿尔萨斯
10月10日，世纪互联与（Foxcon）签约成立合资公司，有感。全球电子制造业巨头（全球500强企业）与世纪互联共同看好IDC、云计算等业务在中国的增长空间，双方迅速果断出手，在资本层面上达成合作，此举体现了全球电子制造业巨头对世纪互联IDC业务的欣赏与信任，另一方面反映出世纪互联目前良好的运营状况与广阔的发展前景。众所周知，精于电子产品制造（世界第一），对于世纪互联而言，能够与结盟

o	Import SAM to BAM when @SQ lines are present in the header: samtools view -bS aln.sam > aln.bam If @SQ lines are absent: samtools faidx ref.fa samtools view -bt ref.fa.fai aln.sam > aln.bam where ref.fa.fai is generated automatically by the faidx command.
o	Attach the RG tag while merging sorted alignments: perl -e ’print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"’ > rg.txt samtools merge -rh rg.txt merged.bam ga.bam 454.bam The value in a RG tag is determined by the file name the read is coming from. In this example, in the merged.bam, reads from ga.bam will be attached RG:Z:ga, while reads from454.bam will be attached RG:Z:454.
o	Call SNPs and short INDELs for one diploid individual: samtools mpileup -ugf ref.fa aln.bam \| bcftools view -bvcg - > var.raw.bcf bcftools view var.raw.bcf \| vcfutils.pl varFilter -D 100 > var.flt.vcf The -D option of varFilter controls the maximum read depth, which should be adjusted to about twice the average read depth. One may consider to add -C50 to mpileup if mapping quality is overestimated for reads containing excessive mismatches. Applying this option usually helps BWA-short but may not other mappers.
o	Generate the consensus sequence for one diploid individual: samtools mpileup -uf ref.fa aln.bam \| bcftools view -cg - \| vcfutils.pl vcf2fq > cns.fq
o	Call somatic mutations from a pair of samples: samtools mpileup -DSuf ref.fa aln.bam \| bcftools view -bvcgT pair - > var.bcf In the output INFO field, CLR gives the Phred-log ratio between the likelihood by treating the two samples independently, and the likelihood by requiring the genotype to be identical. This CLR is effectively a score measuring the confidence of somatic calls. The higher the better.
o	Call de novo and somatic mutations from a family trio: samtools mpileup -DSuf ref.fa aln.bam \| bcftools view -bvcgT pair -s samples.txt - > var.bcf File samples.txt should consist of three lines specifying the member and order of samples (in the order of child-father-mother). Similarly, CLR gives the Phred-log likelihood ratio with and without the trio constraint. UGT shows the most likely genotype configuration without the trio constraint, and CGT gives the most likely genotype configuration satisfying the trio constraint.
o	Phase one individual: samtools calmd -AEur aln.bam ref.fa \| samtools phase -b prefix - > phase.out The calmd command is used to reduce false heterozygotes around INDELs.
o	Call SNPs and short indels for multiple diploid individuals: samtools mpileup -P ILLUMINA -ugf ref.fa .bam \| bcftools view -bcvg - > var.raw.bcf bcftools view var.raw.bcf \| vcfutils.pl varFilter -D 2000 > var.flt.vcf Individuals are identified from the SM tags in the @RG* header lines. Individuals can be pooled in one alignment file; one individual can also be separated into multiple files. The-P option specifies that indel candidates should be collected only from read groups with the @RG-PL tag set to ILLUMINA. Collecting indel candidates from reads sequenced by an indel-prone technology may affect the performance of indel calling.
o	Derive the allele frequency spectrum (AFS) on a list of sites from multiple individuals: samtools mpileup -Igf ref.fa .bam > all.bcf bcftools view -bl sites.list all.bcf > sites.bcf bcftools view -cGP cond2 sites.bcf > /dev/null 2> sites.1.afs bcftools view -cGP sites.1.afs sites.bcf > /dev/null 2> sites.2.afs bcftools view -cGP sites.2.afs sites.bcf > /dev/null 2> sites.3.afs ...... where sites.list* contains the list of sites with each line consisting of the reference sequence name and position. The following bcftools commands estimate AFS by EM.
o	Dump BAQ applied alignment for other SNP callers: samtools calmd -bAr aln.bam > aln.baq.bam It adds and corrects the NM and MD tags at the same time. The calmd command also comes with the -C option, the same as the one in pileup and mpileup. Apply if it helps.