whiffen_cann

bam deal

pibase tools for validational and comparative analysis of BAM filespibase is an open-source package of linux command line tools for validating next-generation sequencing loci (SNPs and loci of interest where no SNPs are known) and for comparative analyses using Fisher's exact test.Acknowledgement: The development of pibase was partly funded by: The German Ministry of Education and Research (BMBF); the National Genome Research Network (NGFN); the Deutsche Forschungsgemeinschaft (DFG) Cluster of Excellence 'Inflammation at Interfaces'; the EU Seventh Framework Programme [FP7/2007-2013, grant numbers 201418, READNA and 262055, ESGI].Disclaimer: pibase is provided free of charge for non-commercial use but you are required to read our disclaimer and to cite us when publishing results.Download: pibase 1.4.7 example data (12GB) example output only (130kb)Overviewpibase Acronym for: get Position Information at BASE position of interest.Interoperability Input and output file types.Work flows Preparing BAM-files and using the pibase tools.QuickStart Tutorial Prerequisites, installation, and pibase examples using BAM-files from the 1000 Genomes project.Essentials:pibase_bamref Extract information from a BAM-file and a reference sequence file and table this information into a tab-separated text file.pibase_consensus Infer 'best' genotypes and their 'quality' classification, and optionally merge multiple pibase_bamref files (e.g. a control panel or several runs of the same patient) into a single file.pibase_fisherdiff Compare two pibase_consensus files using Fisher's exact test on original data (aligned reads) rather than comparing processed data (SNP-calls or genotypes).Annotate:pibase_tosnpacts Works with our unpublished annotation pipeline. Contact us if you wish us to annotate your pibase files with our annotations.pibase_annot Works with our unpublished annotation pipeline. Contact us if you wish us to annotate your pibase files with our annotations.pibase_tag Add primer region tags to a pibase_bamref, pibase_consensus, pibase_fisherdiff, or pibase_annot file.Phylogenetics:pibase_to_rdf Generate rdf file (for phylogenetic network analysis) from set of pibase_fisherdiff files.pibase_rdf_ref Generate a reference sample file from a pibase_consensus file, prior to generating the set of pibase_fisherdiff files required for pibase_to_rdf.pibase_chrm_to_crs Extract Cambridge Reference Sequence (Anderson 1981) variants from reads mapped to chrM (hg18, hg19, or NCBI36).Utilities:pibase_to_vcf Convert a single-sample pibase file into VCFv4.1 format.pibase_c_to_contig Convert pibase-contig-numbers into contig names (e.g. 25 -> chrM).pibase_flagsnp Flag non-reference genotypes in a pibase_consensus file as potential mismatches ("SNPs" in NGS-parlance).pibase_diff Compare two pibase_consensus files using (BestGen) genotypes and a (BestQual) quality threshold.pibase_gen_from_snpacts Works with our unpublished annotation pipeline. Contact us if you wish us to merge SNPs from SNP-callers, SNP-chips, or Sanger sequences into your pibase files for a genotype-comparison.pibase_ref Gets 500 flanking nucleotides of reference sequence around each coordinate (of the input list) and outputs a pseudo-FASTA file. For example for manual Sanger-sequencing primer design.Interoperabilitypibase reads genomic coordinates of interest from a VCF*, samtools pileup, SOLiD Bioscope gff3, or a tab-separated file.pibase extracts data at the coordinates of interest from an indexed FASTA reference and from a BAM-file** generated by BFAST, BWA, SSAHA2, samtools, SOAP (after conversion using soap2sam.pl), and SOLiD Bioscope. To extract the most complete information (including homologous region information and low-coverage genotypes), please use the raw unfiltered BAM-file (which includes non-uniquely mapped reads and duplicate reads).pibase outputs tab-separated text files which can then be used in popular spreadsheet software, or filtered from the linux command line using grep, awk, and cut. pibase can also output variants into VCF, rdf, and snpActs formats.*VCF is the variant list format accepted by European Nucleotide Archive. (VCF files are e.g. generated by GATK and samtools.)**BAM is the sequence format preferred by the European Nucleotide Archive.Work flows"Essentials" work flowPrepare an indexed sorted BAM-file with MD tags using samtools: Create BAM file with MD tag. If you have several BAM-Files for the same sample, and some BAM-files were generated from the same library: merge all BAM-files from the same library into one BAM-file (because pibase_consensus counts the number of unique start points independently per BAM-file).[samtools sort -o unsorted.bam > bamfile.bam]samtools calmd -b bamfile.bam referencefile.fasta > bamfile.md.bam[samtools merge outfile.bam infile1.bam infile2.bam [...]]samtools index bamfile.md.bampibase_bamref : extract position info from BAM file and reference sequence file.pibase_consensus over single run: infer multi-filter-level genotypes from a single pibase_bamref-file and classify the genotypes into stable or dubious genotypes (BestQual flag).pibase_consensus over multiple runs: infer multi-filter-level ''consensus'' genotypes from pibase_bamref-files from multiple runs and classify the genotypes into stable or dubious genotypes (BestQual flag).[Optional: pibase_fisherdiff : compare two samples by unique start point counts (Fisher's exact test 2x4), using the pibase_consensus-files]"Annotate" work flowFirst, carry out the "Essentials" workflow, in order to get a pibase_consensus file or a pibase_fisherdiff file.[Optional: pibase_tosnpacts : export genotypes from pibase_consensus or pibase_fisherdiff or pibase_diff to snpact-format (snpact -ft 'own' -sp 1).[Optional: pibase_annot : annotate pibase_diff files with rsID, gene name, and other information][Optional: pibase_tag : for PCR enriched samples, tag snps which are in PCR primer regions +- 1 base, using pibase_consensus-files or pibase_diff-files or pibase_annot files]"Phylogenetics" work flowFirst, carry out the "Essentials" workflow, in order to get a set of pibase_consensus files.[Optional: pibase_rdf_ref : create a reference sample from one of the pibase_consensus files.]Create a tab-separated text file detailing the sample files in the group, and the sample names which should be displayed in the phylogenetic network (see pibase_to_rdf). The first sample in this text file can either be one of the group of samples, or the reference sample created by pibase_rdf_ref.pibase_to_rdf : create an rdf file from a set of pibase_fisherdiff files for subsequent phylogenetic network analysis, e.g. evolutionary analysis, or sample mix-up (or confusion) analysis.[Optional: pibase_chrm_to_crs : extract Cambridge Reference Sequence (Anderson et al., 1981) variants from reads mapped to chrM (hg18, hg19, or NCBI36)]"Utilities" (Tools for special interests)First, carry out the "Essentials" workflow, in order to get a pibase_consensus file or optionally a pibase_fisherdiff file.[Optional: pibase_to_vcf : From a pibase_consensus file, create a VCF file that can e.g. be submitted to the European Nucleotide Archive or processed with VCFtools.][Optional: pibase_c_to_contig : Convert pibase-contig-numbering (e.g. in a pibase_consensus file or a pibase_fisherdiff file) to contig names, e.g. 25 to chrM.[Optional: pibase_flagsnp : Flag non-reference genotypes in a pibase_consensus file as potential mismatches ("SNPs" in NGS-parlance), using the pibase_consensus-files, pibase_fisherdiff-files or annotated versions of these files.][Optional: pibase_diff : compare two samples (conventionally) by genotypes, using the pibase_consensus-files. This is much less accurate than pibase_fisherdiff and only included for those users interested in a conventional comparison!][Optional: pibase_gen_from_snpacts : Merge SNPs from SNP-callers, SNP-chips, or Sanger sequences into pibase_consensus, pibase_fisherdiff, or annotated versions of these files. The idea is to generate a side-by-side genotype and nucleotide signals comparison table.]QuickStart TutorialPrerequisitesLinux operating system (we use CentOS 5.5 / linux 2.6.18-194.32.1.el5 on our linux cluster and Ubuntu 8, 9, or 10 on our PCs.)Python v2.4.3 or v2.6.5 or v2.7.2 (recommended) (usually already installed on linux clusters or linux PCs)pysam v0.6 (if using other versions, test for pysam bugs using large data sets)GNU Fortran (usually already installed on linux clusters, or installable using the Synaptics package manager under Ubuntu PCs)1GB of RAM (2GB for pibase_fisherdiff)Bash command line, or a linux cluster job scheduler such as PBS.InstallationDownload pibase.v1.4.7.tar.gz and unzip.From the linux command line, navigate into the pibase directory and enter ./f.sh to compile the pibase_643 subprogram.Finally, include the pibase directory in your linux PATH variable.Test runTest whether the installations were successful using our example data (12GB size!): Download the 12GB tar.gz file and unzip. From the linux command line, navigate into the example data folder and then run the example shell script by entering./pibase_test.sh(If problems occur, check the pre-requisites).When pibase_test.sh runs successfully, it creates a directory called output, into which the results files are written.Next, run the phylogenetic example shell script by entering./pibase_to_rdf.shTo validate the test runs, compare directories output and output_validated from the linux command line:diff output output_validatedWindows usersWindows users (and impatient Linux users) can download just the small zipped output_validated folder (130kb) for a quick impression of the output files. Import these tab-separated text files into Excel for viewing, filtering, or sorting. Note that there are floating point and floating comma versions of the files, the latter of which are generated using simple linux commands in the example shell script pibase_test.sh.Detailed examplesPlease look into the shell script pibase_test.sh and pibase_to_rdf.sh web pages which give several examples how to start the pibase tools.Essentials | pibase_bamrefpibase_bamref extracts information at genomic coordinates of interest from a BAM file and a reference sequence file and tables this information into a tab-spearated text file. Optionally, information can be displayed on the command line. A single genomic coordinate can be specified on the command line, or a list of coordinates can be specified in a file. Note that BAM-files must include MD-tags.Usage:pibase_bamref [-[v][d][p chr]] {table or poi} ref bam out LR QVmin [[[MLmin] [[MMmax] [RQVmin] [maxr]]]] [] optional -vdp [v]erbose output into terminal, [d]etailed list of reads, at single [p]osition: chr poi chr: chromosome name (or contig name) poi: position of interest (chromosomal coordinate, 1-based) table: file of poi's: vcf, gff3, pileup, or plain tab-file: chr TAB poi LINEBREAK The format is detected by pibase_bamref. If the plain tab-file is used, the lines must be sorted by chr-names, and the chr-names must be ordered in the sequence of the reference sequence file or BAM-file-header. ref: reference sequence file bam: sorted bam file, bam index file must also exist out: name for (tab-separated) output file LR: max length of reads (e.g. 50) QVmin: optional min base quality (default: 20) MLmin: optional min mapped read length (default: 49) MMmax: optional max number of base space mismatches in read (default: 1) RQVmin: optional min read mapping quality (default: 20) maxr: optional max number of reads considered at a poi (default: -1) (default: unlimited. For ultra-high coverage files: use an integer value of e.g. 1000 as the limit).Examples:pibase_bamref -vp chrM 16189 hg20.fasta na12752.bam na12752_chrM_16189.txtpibase_bamref list.txt hg18.fasta na12752.bam na12752_list.txtpibase_bamref pileupsnps.txt hg18.fasta na12752.bam na12752_pileupsnps.txtpibase_bamref gatksnps.vcf hg18.fasta na12752.bam na12752_gatksnps.txtNotes:If you are generating your own input lists with positions of interest: The genomic coordinates must be 1-based (the UCSC browser, VCF SNPlists, samtools pileup SNPlists, and Bioscope SNPlists use 1-based coordinates). SNPlists from Bioscope 1.2.1 or samtools pileup can be used without checking - these are ok.pibase_bamref speed depends on depth of coverage and read length, and on python/pysam version. For high coverage BAM-files (50x-100x) and 50bp read lengths, pibase_bamref v1.4.5 processes about 50,000-100,000 genotypes per hour on a single core using 2GB RAM . At least python v2.7 and at least pysam v0.5 are required for this speed, otherwise pibase_bamref is up to 100x slower. As there are critical but silent bugs in other pysam versions, please do not use other pysam versions unless you test that version thoroughly (e.g. by running our chr22 and exome example data sets and performing a diff)!For the most complete information: give pibase the original BAM file (which should include the ambiguously mapped reads, and which can also include "duplicate" reads)Essentials | pibase_consensuspibase_consensus adds information to a pibase_bamref file, inferring 'best' genotypes and 'best qualities', and optionally merging read counts from multiple pibase_bamref files (e.g. a control panel or several runs of the same patient) into a single pibase_consensus file. A BestQual question mark ("?") indicates that the BestGen is instable with respect to filtering, and therefore should not be used without further validations.Usage:pibase_consensus [-t[h][v [ver]] t1 t2 t3 t4 t5 {h1 h2}] in1 [in2 ... inN] out []=optional {}=only for "-th" option [-t[h][v] (optional) minimal thresholds for allele calling (A, C, G, or T), (optionally) followed by "h" to override h1 and h2 defaults, (optionally) followed by "v" to revert to an older version [ver] (optional) older pibase_consensus version to revert to. To list all available older versions, specify "-tv" without ver (older version 1.4.3: arguments h1 and h2 not available) t1 min fraction of reads indicating allele (default: 2.2%, i.e. 0.022) t2 min number of reads indicating allele (default: 8) t3 min fraction of unique start points indicating allele (default: 0.04) t4 min number of unique start points indicating allele (default: 4) t5 both-stranded confirmation only if, for at least one filter level, min-strand-counts >= t5 * max-strand-counts (default t5: 0.2) {h1 min fraction of unique start points that need to be different between filter F2-F3 resp. F3-F4 in order to flag a hypervariable resp. homologous locus (default: 2.2%, i.e. 0.022) h2} ] min number of unique start points difference between F2-F3 resp. F3-F4 in order to flag a hypervariable resp. homologous locus (default: 2) inX input file[s] = pibase_bamref output file[s] must have same reference sequence (not checked!) and same positions of interest out name for (tab-separated) output fileExamples:# Single filepibase_consensus na12752_pileupsnps.txt genotypes_na12752_pileupsnps.txt# Two filespibase_consensus na12752_1.txt na12752_2.txt gen_na12752_1to2.txt# Decreased sensitivity to sequencing or mapping errors or contaminationpibase_consensus -t 0.1 8 0.15 8 0.01 na12752_pileupsnps.txt genotypes_na12752_pileupsnps.txtNotes:What pibase_consensus does:Reads the pibase_bamref file and adds further columns of information: For each genomic coordinate of interest: 10 rule-based genotype decisions and one "BestGen" (best genotype) from these 10 genotypes, strand support for the A and B alleles of the "BestGen", and summary allele counts for each of the 10 genotypes.If multiple pibase_bamref files are specified, consensus genotypes are computed for pooled allele counts over all files (e.g. for a library which was sequenced in multiple runs or multiple lanes). The output file format is identical to the single-file output except that all specified input files are listed the file header.Low quality genotypes are annotated with BestQual tags ?1 to ?8; where ?1 denotes not too bad quality and genotypes with higher values should be rejected or inspected further. If there is no tag, the genotype is assumed to be validated by all 10 filter methods at a minimal coverage of t2 (default: 8) per allele of which there are at least t4 (default: 4) unique starting points per allele. The quality grading aims to help you understand the underlying mechanisms of each problem.Both-strandedness confirmation must be applied using the additional tags A+- and B+-. (Because some enrichment methods are strand-biased and occasional failure of both-stranded support is common in next-generation sequencing, we did not bundle the strandedness criterion into the BestQual tag).For slightly higher specificity, the SNVs near indels can be filtered using the additional tag Ign.Regions of simple repeats can be filtered using the Class tag (>1), regions of segmental duplications can be filtered using the Homologous tag ("H"), and hypervariable regions can be filtered using the Hypervariable tag ("V").Best Quality tag:?1 : Slightly poor genotype quality. Mapping stringency versus reference sequence context class looks ok. Not all 10 genotyping filter stages lead to the same genotype. However, for the high mapping stringency filter stages, at least t4 (default: 4) unique start points and at least t2 (default: 8) reads support this genotype.?2 : Somewhat poor genotype quality. Mapping stringency versus reference sequence context class looks ok. This genotype is supported by less than 5 filter stages, but by at least 2 filter stages, of which one stage is in the unique start points category, and the other stage is in the coverage category.?3: Really poor quality. Reference sequence context looks difficult (homopolymeric run>4, or STRs) and mapping stringency was low. But at least one stringent filter supports this genotype.?4: Very poor quality. Reference sequence context looks difficult (homopolymeric run>4, or STRs) and mapping stringency was low. But at least one of the unique-start-point filters supports this genotype.?5: Highly problematic quality. The best unique-start-point derived genotype is in conflict to the best coverage-derived genotype.?6: Highly problematic quality. The best unique-start-point derived genotype is in conflict to the best coverage-derived genotype, and the best coverage-derived genotype is "senior" to the best usp-derived genotype.?7: Low-coverage guess. The coverage is below t2 (default: 8) (can be as low as 1).?8: Low-coverage guess. The coverage is below t2 (default: 8) (can be as low as 1). Reference sequence context looks difficult (homopolymeric run>4, or STRs), and there are no stringently mappable reads.Poor quality genotypes and single-strand genotypes can be filtered using linux commands as follows:# copy header lines into new file:grep '^#' pibase_consensus_file > validated_pibase_consensus_file# append filtered table lines (indel-reads < 3, tags A+- and B+- and BestQual) to the new file:awk 'BEGIN {FS="\t"};($7<3 10="='+-')&&($11=='+-')'" pibase_consensus_file="" grep="" -v="" validated_pibase_consensus_file="" potentially="" problematic="" regions="" can="" be="" filtered="" or="" counted="" using="" linux="" commands="" as="" follows:="" count="" number="" of="" genotypes="" in="" homologous="" loci:="" grep="" -v="" pibase_consensus_file="" grep="" h="" wc="" -l="" count="" number="" of="" genotypes="" in="" hypervariable="" loci:="" grep="" -v="" pibase_consensus_file="" grep="" v="" wc="" -l="" count="" number="" of="" genotypes="" near="" indels:="" awk="" begin="" fs="\t" 7="">0)' pibase_consensus_file | wc -lEssentials | pibase_fisherdiffpibase_fisherdiff merges two pibase_consensus files and tags differences between these two files (e.g. control sample and case sample) based on Fisher's exact test for five different filter levels. (See example.)Usage:pibase_fisherdiff control case out [[cov] [[p] [[fac]]] control: pibase_consensus file of control (healthy) sample case: pibase_consensus file of case (diseased) sample out: output file [] optional [cov]: min coverage threshold (optional. Default: 100) [p]: max median p-value threshold (optional. Default: 0.01) [fac]: max factor (optional. Default: 3.5).Example:# min coverage 50, p <= 0.01, fac <= 10pibase_fisherdiff normal.txt tumor.txt diff_out.txt 50 0.01 10Notes:Eight leading columns are added to the pibase_consensus output, and the two samples are merged into one file which can then be filtered using grep or awk or python/perl/java/etc scripts, or imported into Excel by Windows users and viewed/filtered further in Excel:In the leading column, significant similarity at a genomic coordinate is denoted by "=-" (control sample) or "=+" (case sample).Significant difference at a genomic coordinate is denoted by "-" (control) and "+" (case).Low coverage in one or both samples is denoted by "?-" and "?+".For those who want to understand the underlying mechanism: The control and case samples are computed to be significantly different at a genomic coordinate if the median p-value <= p and the coverage in each sample >= cov and the factor <= fac. The factor is defined as follows: If the major alleles are identical in case and control, major allele counts / minor allele counts must be <= fac; if the major alleles are different, the factor criterion is not used. The coverage and factor criteria must be applied because the p-value changes quite a lot at coverages of about 50 or lower, when just 1 read is shifted from the major allele to the minor allele (i.e. when a normal sequencing error occurs). In other words, for special cases the p-value alone is not a sufficiently accurate indicator of similarity or difference.In the second column, the tag D_error indicates whether the (default or user-specified) thresholds for the applicability of Fisher's exact test have been violated.In detail, if the coverage falls below the threshold, D_error at the genomic coordinates displays an '!' and bit 0 (control) or 1 (case) is set. If the major/minor allele counts threshold is exceeded, D_error is output with an '!' and bit 2 (control) or 3 (case) is set. For example '!1' indicates low coverage in the control sample, and '!12' indicates the major/minor allele threshold violation in both samples.Fisher's exact test p-values (two-tailed) are given in column 3 (median p-value) and columns 5-8 (for each of the 5 filter levels, computed for ACGT counts of unique start points).Both-strand-support testing is not included in the first 3 columns, so explicit filtering using the A+- and B+- tags should be performed where considered, see pibase_consensus filtering.Mathematical detail: The p-values are computed with ACM algorithm 643 (http://portal.acm.org/citation.cfm?id=214326) for a 2x4 matrix using a hybrid approximation. This network method overcomes numeric overflow problems associated with online web calculators.Compute speed: about 20,000-50,000 genomic coordinates per hour on a single core using 2GB RAM.For a posteriori more stringent filtering of differences than in the original pibase_fisherdiff run, it may be of interest to use the linux awk command. Single-strand genotypes and more stringent p-values can be filtered using linux commands as follows:# copy header lines (i.e. starting with '#') into new file:grep '^#' pibase_fisherdiff_file > filtered_file# append filtered table lines to the new file (p-value <= 0.005, and tags A+- and B-):awk 'BEGIN {FS="\t"};($3<=0.005)&&($18=="+-")&&($19=="+-")' pibase_fisherdiff_file >> filtered_file# count differences:grep '^+' filtered_file | wc -lAnnotate | pibase_tosnpactspibase_tosnpacts reads the "BestGen" genotypes from a pibase_consensus file or a pibase_fisherdiff file and creates a file (or file pair, if pibase_fisherdiff) for import into snpacts. After annotation within snpacts, the snpacts-annotations are exported and added to the pibase_consensus or pibase_fisherdiff file using pibase_annot.Usage:pibase_tospnacts in out1 [out2] in: file from pibase_consensus or pibase_fisherdiff out1: output file (for input into snpacts, in 'own' format) out2: 2nd output file (case sample) if pibase_fisherdiffExamples:# pibase_consensus file:pibase_tosnpacts geno_na12752.txt for_snpacts_na12752.txt# pibase_fisherdiff file:pibase_tosnpacts diff.txt for_snpacts_control.txt for_snpsacts_case.txtNotes:How to import the pibase_tosnpacts file into snpacts:snpact -ft own -sp 1 -pchr 0 -psnp 1 -prb 2 -pbb 3 -psb 4 -pmt 5 -all -hg hg18 -t tablename filenameRemove problematic EOLs from BLOBs in annotated table:tableupdate -eol tablenameExport the annotated, de-BLOBBed table to csv:outputact -t tablename -based 1 -csv -worst -hg hg18 inputfilename outputfilenamewhere inputfilename is the file from pibase_tosnpacts, and outputfilename is the file for pibase_annotAnnotate | pibase_annotpibase_annot annotates a pibase_consensus or pibase_fisherdiff file with snp information from an exported snpacts table (see above section on pibase_tosnpacts).Usage:pibase_annot type an1 [an2] in out type: 1=pibase_consensus; 2=pibase_fisherdiff an1: snpacts-annotations file 1 (control sample, created by outputact) an2: snpacts-annotations file 2 (case sample, created by outputact) annot2 is not specified for pibase_consensus (type=1) in: tab-separated file from pibase_consensus or pibase_fisherdiff out: results file with merging of annotations and infileExamples:# pibase_consensus file:pibase_annot 1 annot.txt geno_na12752.txt anno_geno_na12752.txt# pibase_fisherdiff file:pibase_annot 2 anno_control.txt anno_tumor.txt diff.txt anno_diff.txtAnnotate | pibase_tagpibase_tag adds primer region tags to a pibase_consensus, pibase_fisherdiff or pibase_annot file. The pibase file is annotated using information from a primer region file.Usage:pibase_tag typ pr in chr pos out [tag] typ: type of input file (always enter 0) pr: primer region file, tab-separated: chr pos start end strand (int, 1-based) in: tab-separated file, e.g. from pibase_diff or pibase_annot chr: column number (1-based) in in-file with chromosome number pos: column number (1-based) in in-file with position out: output file ( = tagged input file) tag: (optional, default=column 2) column number, at which the 4 primer tag columns are inserted.Example:# pibase_annot file:pibase_tag 0 primer_regions.txt pibase_annot.txt 4 5 pibase_tag_annot.txt 2Notes:The primer region file must have a special table header (see ## in example below) and is tab-separated:chromosome, start, end, strand (integer, 1-based).#dummy test file for pibase_tag ##chr start end strand 1 152644520 152644530 1 1 152644740 152644750 -1Phylogenetics | pibase_to_rdfpibase_to_rdf generates an rdf file (for phylogenetic network analysis) from a set of pibase_fisherdiff files.Usage:pibase_to_rdf list rdf [[pmax] [both] [elim]] list: text file which lists the set of pibase_fisherdiff files: file_name TAB sample_name EOL rdf: name of output file (for analysis using Network) []: optional arguments: [pmax]: max p-value to consider (default=0.01) [both]: bothstranded confirmation (y/n, default=y) [elim]: eliminate "N"-columns and invariable columns (y/n, default=n)Example: See pibase_to_rdf.shNote:The example above refers to our pibase paper. Please consult this paper if you are unfamiliar with phylogenetic network analyses.Phylogenetics | pibase_rdf_refpibase_rdf_ref generates a reference-base pibase_consensus file which may be used as the reference sample for pibase_fisherdiff comparisons in a group of samples, e.g. for a pibase_to_rdf run and subsequent phylogenetic network analysis.Usage:pibase_rdf_ref in out [cov] in: pibase_consensus file (any sample from the group of samples) out: pibase_consensus file with genotypes taken from Ref column of in-file cov: coverage (optional. default=50. Divisible by two.)Example: See pibase_to_rdf.shNote:The example above refers to our pibase paper. Please consult this paper if you are unfamiliar with phylogenetic network analyses.Phylogenetics | pibase_chrm_to_crspibase_chrm_to_crs extracts Cambridge Reference Sequence (CRS)* variants from chrM** alignments. The CRS numbering is identical to the numbering of the revised CRS (rCRS)***. The reference sequence bases within the mtDNA control region are identical for the CRS and the rCRS.(In hg18, hg19, and NCBI36, chrM is used as the mitochondrial reference genome; in NCBI build 37 rCRS is used and in hg20, rCRS will be used; most mtDNA databases are based on the CRS/rCRS genomic coordinates.)Usage:pibase_chrm_to_crs in out own pro sam [[t] [min]] in: pibase_consensus genotype file (MUST be chrM ONLY!) out: base name of output files. The following files are output: out_f4maj.txt: major variants for highest-stringency filter (f4) out_f4min.txt: minor variants for highest-stringency filter (f4) out_f0maj.txt: major variants for lowest-stringency filter (f0) out_f0maj.txt: minor variants for lowest-stringency filter (f0) out_all.txt: complete pibase_consensus file in CRS notation out_sum.txt: summary file in CRS notation own: abbreviation of owner/PrincipalInvestigator of the project pro: for summary: project abbreviation sam: for summary: sample abbreviation []: optional arguments: [t]: threshold for minor allele cut-off (default: 0.015, i.e. 1.5% of total read counts at a given coordinate) [min]: min number of read counts required for an allele-call (default: 2)Example:pibase_chrm_to_crs pat123_chrM.txt pat123_chrs.txt FRA fro Pat123Note:The pibase_consensus file must contain only chrM lines. If there are other lines in the file, the header lines and the chrM lines must be extracted before using pibase_chrm_to_crs:# copy header lines into new file:grep '^#' pibase_consensus_file > chrM_pibase_consensus_file# append lines with "chromosome 25" (i.e. chrM) to the new file:grep '^25' pibase_consensus_file >> chrM_pibase_consensus_fileReferences:* Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. (1981) Sequence and organization of the human mitochondrial genome. Nature, 290, 457-465.** Ingman, M., Kaessmann, H., Pääbo, S., Gyllensten, U. (2000) Mitochondrial genome variation and the origin of modern humans. Nature, 7, 708-713.*** Andrews, R.M., Kubacka, I., Chinnery, P.F., Lightowlers, R.N., Turnbull, D.M., Howell, N. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet., 23, 147.Utilities | pibase_to_vcfpibase_to_vcf converts a single-sample pibase file into VCFv4.1 format.Usage:pibase_to_vcf [-h] [-v] [-d] [-u s s s s s s s s s s] in out nam bam in: tab-separated file (pibase_consensus) out: VCF file nam: name of sample in VCF file (column header) bam: name of bam file from which in file was generatedExample:pibase_c_to_contig in.txt out.txt mysample mysample.bamUtilities | pibase_c_to_contigpibase_c_to_contig converts pibase-contig-numbers into contig names (e.g. 25 -> chrM).Usage:pibase_c_to_contig in [in2] out in: file to be converted (from pibase_consensus, pibase_fisherdiff, ...) Note: pibase_fisherdiff has a shortened file head, so a second input file with the conversion table must be given. in2: file with contig numbering/name table in header (required, if the first input file is a pibase_fisherdiff file) out: output fileExamples:pibase_c_to_contig gen.txt gen_contigs.txtpibase_c_to_contig diff.txt table.txt diff_contigs.txtUtilities | pibase_flagsnppibase_flagsnp flags a BestGen genotype as being an "NGS SNP" (i.e. a mismatch) if it is different to the reference base. Three columns are inserted before the "BestGen" column: "IsSnp" (1/0 flag), "SnpReads" (non-reference-allele coverage at filter level 0), and "CovAll" (coverage at filter level 0, consisting of all reads indicating A, C, G, or T but not N)To sort the pibase_consensus "BestGen" genotypes into Non-Reference (i.e. SNP) and Reference.Usage:pibase_flagsnp in out in: tab-separated file (pibase_consensus, pibase_fisherdiff, or pibase_diff) out: results file with merging of annotations and infileExample:pibase_flagsnp gen_id100.txt snp_gen_id100.txtNote: Outside the NGS world, "SNPs" include reference genotypes which are polymorphisms occurring in 1% or more of a population - this is often a matter of confusion between NGS and non-NGS specialists.Utilities | pibase_diffpibase_diff compares two pibase_consensus files using (BestGen) genotypes and a (BestQual) quality threshold. (Accuracy can be much lower than pibase_fisherdiff, i.e. up 10x or more false positives and false negatives, but compute time is fast.)Usage:pibase_diff control case out worst control: pibase_consensus file of control (healthy) sample case: pibase_consensus file of case (diseased) sample out: output file worst: worst BestQual level to consider (0-8)Examples:# Consider BestGens up to a worst BestQual of ?1 (i.e. ignore ?2 to ?8):pibase_diff normal.txt tumor.txt diff_out.txt 1Notes:One leading column is added to the pibase_consensus output, and the two samples are merged into one file which can then be filtered using grep or python/perl/java/etc scripts, or imported into Excel by Windows users and viewed/filtered further in Excel:In the leading column, identical BestGen genotypes are denoted by "=-" (control sample) or "=+" (case sample).Different BestGen genotypes at a genomic coordinate are denoted by "-" (control) and "+" (case).Differences and single-strand genotypes can be filtered using linux commands as follows:# copy header lines into new file:grep '^#' pibase_diff_file > filtered_diff_file# append filtered lines to the new file (filtering by first column and tags A+- and B+-):awk 'BEGIN {FS="\t"};($11=="+-")&&($12=="+-")' pibase_diff_file | grep -v '^=' >> filtered_diff_file# count differences:grep '^+' filtered_diff_file | wc -lUtilities | pibase_gen_from_snpactspibase_gen_from_snpacts inserts snpacts genotypes into a pibase file before the "BestGen" column. Multiple runs of this script can be performed to insert genotypes from several different sources (e.g. samtools, Bioscope, GATK). Only chromosomes chr1-chr22, chrX, chrY are recognised in snpacts files.To compare genotypes from different sources with signals and genotypes from pibase_consensus.Usage:pibase_gen_from_snpacts typ s1 [s2] in out hdr typ: type of input file (1=pibase_consensus or pibase_gen_from_snpacts) s1: snpacts file1, created by outputact s2: snpacts file2, (not specified for typ=1) in: tab-separated file, from pibase_consensus or pibase_gen_from_snpacts out: results file with merging of annotations and infile hdr: header of inserted snpacts-column (e.g. samtools, Bioscope, GATK)Example:# add samtools SNPs into pibase_consensus file:pibase_gen_from_snpacts 1 id100_samtools.txt gen_id100.txt sam_gen_id100.txt samtools# add Sanger SNPs into samtools-pibase_consensus file:pibase_gen_from_snpacts 1 id100_sanger.txt sam_gen_id100.txt san_sam_gen_id100.txt sangerUtilities | pibase_refpibase_ref gets 500 flanking nucleotides of reference sequence around each coordinate (of the input list) and outputs a pseudo-FASTA file.Primary use: manual Sanger-sequencing primer design.Usage:pibase_ref in ref out in: file with list of genomic coordinates (e.g. a VCF file): chromosome_name TAB 1_based_coordinate_in_chromosome ref: indexed reference sequence file (fai must exist) out: FASTA-formatted output fileExample:pibase_ref snplist.vcf hg19.fa flanked_snps.fasta原文地址：http://www.ikmb.uni-kiel.de/pibase/

你可能感兴趣的:(bioinformatics)

生物医学工程导论：学习笔记（四） Zodornus 生物医学工程学习笔记
生物信息学(Bioinformatics)狭义概念：应用信息科学的理论、方法和技术，来管理、分析和利用生物分子数据。广义概念：应用信息科学的方法和技术，研究生物体系和生物过程中信息的存储、信息的内涵和信息的传递，研究和分析生物体细胞、组织、器官的生理、病理、药理过程中的各种生物信息。（生命科学中的信息科学）目的：处理、归纳、总结海量的生物实验数据，并找到其中的规律。成果：基因测序等。研究内容基因组
探索生物信息学的未来：Rust-Bio 库富嫱蔷
探索生物信息学的未来：Rust-Bio库rust-bioThislibraryprovidesimplementationsofmanyalgorithmsanddatastructuresthatareusefulforbioinformatics.Allprovidedimplementationsarerigorouslytestedviacontinuousintegration.项目地址
生物信息学技能树（Bioinformatics）与学习路径 lisw05 生物信息学生物信息学
李升伟整理生物信息学是一门跨学科领域，涉及生物学、计算机科学以及统计学等多个方面。以下是关于生物信息学的学习路径及相关技能的详细介绍。一、基础理论知识1.生物学基础知识需要掌握分子生物学、遗传学、细胞生物学等相关概念。对基因组结构、蛋白质功能及其相互作用有基本理解。2.编程能力掌握至少一种脚本语言（如Python或Perl），用于数据处理和自动化任务3。学习R语言进行数据分析和可视化。3.统计学与
计算基因组学需要计算机知识吗,生物信息学——计算基因组学的一些参考书 weixin_39610422 计算基因组学需要计算机知识吗
有两个都可以在新浪爱问资料Bioinformatics.For.Dummies.2nd.Ed.2007.pdfAnIntroductiontoBioinformaticsAlgorithms.pdf另外看到Virginia大学的一些课程The2012ComputationalGenomicsCoursehasbeenrescheduledtoNovember28-December4,2012用mo
生物信息学工作流（Bioinformatics Workflow）：概念、历史、现状与展望？ lisw05 生物信息学生物信息学工作流
李升伟整理1.引言生物信息学工作流是指通过一系列计算步骤和工具，对生物学数据进行处理、分析和解释的系统化流程。随着高通量测序技术的普及和生物数据的爆炸式增长，生物信息学工作流在基因组学、转录组学、蛋白质组学等领域中扮演着至关重要的角色。它不仅提高了数据分析的效率，还为生命科学研究提供了新的视角和方法。2.生物信息学工作流的概念生物信息学工作流的核心是将复杂的生物学数据分析任务分解为多个可管理的步骤
Bioinformatics exercise 后端
MolecularNutrition:ApplicationofBioinformaticstotheanalysisofgeneexpressionby5’deletionanalysisofpromoterregionsBioinformaticsexercise:Introsession9amMon17/02/2025(A07SB-Gateway).Mon,24/02/2025(14:00–
推荐一份生物信息学入门很好的参考材料小明的数据分析笔记本
链接是https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/这个是康涅狄格大学（UniversityofConnecticut）提供的一份教程，主要的内容包括1、生物信息学中经常用到的文件格式image.png2、linux操作系统和R语言的基础知识image.png3、转录组数据的处理流程image.png这里包括有参
Bioinformatic workflow 小潤澤
给大家推荐个网站：https://bioinformaticsworkbook.org/projectManagement/Intro_projectManagement#gsc.tab=0这个网站适合于刚入门的生物信息同学，里面涉及到一些NGS的流程软件介绍以获得原作者的授权：原推文链接：https://twitter.com/tangming2005/status/12401074132289
使用GC含量归一化对深度测序数据的拷贝数变化进行无对照calling 亦是旅人呐
这次分享的是来自瑞士苏黎世联邦理工学院计算机科学系ValentinaBoeva教授于2011年发表在BIOINFORMATICS(IF:6.937,2020)上的文章Control-freecallingofcopynumberalterationsindeep-sequencingdatausingGC-contentnormalization。简要我们提出了一种利用深度测序数据进行无对照拷贝数
STAR: ultrafast universal RNA-seq aligner sunlight_yy
DobinA,DavisCA,SchlesingerF,etal.STAR:ultrafastuniversalRNA-seqaligner[J].Bioinformatics,2012,29(1).ABSTRACTMotivation:高通量RNA-seq数据的准确比对是一个具有挑战性但尚未解决的问题，因为转录结构不连续，读取长度相对较短且测序技术的通量不断提高。当前可用的RNA-seq比对仪遭
突然发现基本都是临床医生、医学生在搞纯生信数据挖掘 SCI狂人团队
在2016年之前，你在PubMed上搜索meta分析这个关键词会发现大部分相关的文章都是来自国内***医院或者***医科大学；而在2016年之后，来自国内***医院或者***医科大学的meta分析类文章数量明显下降，而在PubMed上输入TCGA、GEObioinformatics这些关键词会发现越来越多来自国内***医院或者***医科大学的文章。从这些文章数量的变化可以看出，由于很多单位政策的改
单细胞scRNA-seq测序基础知识笔记是土豆大叔啊！ AI4Science 笔记数据分析
单细胞scRNA-seq测序基础知识笔记scRNA-seq技术scRNA-seq分析流程数据预处理聚类标准化数据筛选有用的数据数据降维聚类Clustering注释细胞类型scRNA数据分析结尾该笔记来源于B站up江湾青年以及CostaLab-BioinformaticsCourse关于scATAC-seq的请移步scRNA-seq技术首先是如何测序，上图瓶中有很多细胞，然后让这些细胞一个一个进入右
单细胞scATAC-seq测序基础知识笔记是土豆大叔啊！ AI4Science 笔记生物信息数据分析
单细胞scATAC-seq测序基础知识笔记单细胞ATAC测序前言scATAC-seq数据怎么得出的？该笔记来源于CostaLab-BioinformaticsCourse另一篇关于scRNA-seq的请移步单细胞ATAC测序前言因为我的最终目的是scATAC-seq的数据，所以这部分只是分享下我刚学的（不是）相关的生物学知识，而且我本身也没有生物学的背景知识，所以我尽量从计算机专业的角度去理解这些
学习小组Day7——宣Xuanan 宣Xuanan
因为课题就是做转录组测序的，所以基础知识有一些了解，接下来从数据处理部分开始进行笔记。数据初步分析：使用fastqc进行质量分析，这是一款Java软件，支持多线程。写这篇文章的时候版本是v0.11.7。软件前期准备：下载方式有两种：官网下载好用filezilla导入linux服务器直接在服务器中wgethttp://www.bioinformatics.babraham.ac.uk/project
昨日收获 - 在了解微信机器人开发的过程中生信石头
写在前面Emmm...五六年前，还在bioinformatics*中国当群管的时候，我大体写了一个简单的QQ机器人。那会使用的是已有的perl模块。能做的事情也不多，基本就是实现一个QQ聊天界面的数据库操作与字词识别并自动回复。使用已有模块的好处是可以快速达成简单需求。但是这也意味着各个地方会受限，比如开发者不再开发，或者开发者设立相对较高的授权费。这两日没什么事情，于是我又搜索了一些相关的资料，
卡梅计算机生物专业怎么样,美国卡梅生物信息学专业录取案例 weixin_39683863 卡梅计算机生物专业怎么样
宫同学基本情况本科学校：山东大学；gpa:85.44;托福：107；gre:3.5录取Carnegiemellonuniversity卡耐基梅隆大学computationalbiology计算机生物学Universityofmichigan,annarbor密歇根大学安娜堡分校bioinformatics生物信息学Georgiainstituteoftechnology佐治亚理工学院bioinfo
生物信息网站集合庐山星晖
1.常用门户：美国国家生物技术信息中心(NCBI)：https://www.ncbi.nlm.nih.gov欧洲生物信息学研究所(EMBL-EBI)：https://www.ebi.ac.ukUCSCGenome：http://genome.ucsc.edu国际生物信息学动态及会议：http://www.bioinformatics.orgSeqAnswer国际生物信息技术问答论坛：http://
使用 ChatGPT 为生物信息学初学者赋能简说基因-专业生信合作伙伴 chatgpt 人工智能
论文：EmpoweringBeginnersinBioinformaticswithChatGPT.2023对于生信初学者而言，最大的困难是身边没有经验丰富的人给予指导。而ChatGTP的出现可能改变这一现状，学生可以自己作为导师，指导ChatGPT完成数据分析工作。众所周知，与ChatGPT互动，给予的指令越精确，那么它给出的答案越精准。这篇论文提出一个与ChatGPT互动的模型：OPTICAL
2022新版TCGA批量下载表达矩阵及临床信息科研小徐
#BiocManager::install("BioinformaticsFMRP/TCGAbiolinksGUI.data")#BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")gdcdata=function(i){library(TCGAbiolinks)projects%as.data.frame()%>%select(proje
DeepPhos代码复现流程学诠生物信息 Python python pip keras tensorflow 神经网络深度学习
背景介绍本文复现蛋白质磷酸化领域经典论文DeepPhos：《DeepPhos:predictionofproteinphosphorylationsiteswithdeeplearning》，发表在《Bioinformatics》期刊上，由FenglinLuo、MinghuiWang、YuLiu、Xing-MingZhao和AoLi共同撰写。文章提出了一种名为DeepPhos的新型深度学习架构，用
肺癌相关文献5 愿航生物信息学
第十一篇IdentifyingprognosticgenesrelatedPANoptosisinlungadenocarcinomaanddevelopingpredictionmodelbasedonbioinformaticsanalysisIF:4.6中科院分区:2区综合性期刊亮点1.免疫得分方法：TIMER,quanTIseq,CIBERSORT,xCell,MCPcounter,and
GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database, 物种注释和进化树构建工具使用及介绍小果运维生信分析-bioinfo 数据库 GTDB-tk 基因组分类工具物种注释
资源介绍：GTDB-Tkv2:memoryfriendlyclassificationwiththegenometaxonomydatabase|Bioinformatics|OxfordAcademic(oup.com)GTDB-GenomeTaxonomyDatabase(ecogenomic.org)大家自己看吧，不在解释了，直接上安装和配置，然后再使用。github地址：GitHub-Ec
推荐植物生物信息学参考书Plant Bioinformatics Methods and Protocols》third edition 小明的数据分析笔记本
找论文的时候偶然发现的这本参考书，个人感觉内容还挺丰富的，在这里推荐给大家书名是《PlantBioinformaticsMethodsandProtocols》thirdedition我看了下是2022年出的是最新的一版，全书总共28章第一章UsingGenBankandSRA介绍了genbank和sra数据库的一些内容第二章ScriptingAnalysesofGenomesinEnsemblP
分子生物学数据库和软件 weixin_30892987 数据库 java 操作系统
核酸数据库EMBLDatabase欧洲分子生物学实验室（EuropeanMolecularBiologyLaboratory）核酸序列数据库，为欧洲最主要的核酸序列数据库，世界两大核酸数据库之一。目前此数据库由其分支机构—EBI（theEuropeanBioinformaticsInstitute，欧洲生物情报研究所）维护。GenBank美国国家生物技术情报中心（NCBI，NationalCent
会议 | 宏基因组和生物信息学进行病原检测的进展和未来胡童远
文献信息文章：Currentprogressandfutureopportunitiesinapplicationsofbioinformaticsforbiodefenseandpathogendetection:reportfromtheWinterMid-AtlanticMicrobiomeMeet-up,CollegePark,MD,January10,2018杂志：Microbiome时
Frontiers in Bioinformatics这本期刊是否值得投纯生信？ SCI狂人团队
有粉丝说FrontiersinBioinformatics这本期刊是否值得投纯生信？这个就要看你的发文目的。如果你需要发SCI论文，这本期刊就不适合你，因为它不是SCI期刊，不被SCI数据库收录。这本期刊仅被下面这些数据库收录：GoogleScholar,CrossRef,SemanticScholar,CLOCKSS,OpenAIRE。如果你不在意这本期刊不是SCI期刊，那就可以投这本期刊。Fr
Venn-韦恩图绘制陈洪瑜
在线工具http://bioinfogp.cnb.csic.es/tools/venny/index.html最多四个http://bioinformatics.psb.ugent.be/webtools/Venn/最多五个，多于五个仅列出共用数目http://jvenn.toulouse.inra.fr/app/example.html最多六个http://genevenn.sourceforg
点点点 | 真香！Simple GO GSEA 富集分析 ~ 生信石头
写在前面时间拨回去2015年，那时我接触生信已有一年，TBtools开发尚在萌芽阶段。那会，我写了几款小的软件，包括“blast3go”，为的是应对即将收费的“blast2go”。当然，后来相关功能都整合到TBtools中。而其中有一个重点功能，即GO富集分析。那会在Bioinformatics中国社群，我们开始了理论上是国内最早的公开社群学术Seminar（网络直播），我在其上也分享了相关学习经
5+氧化应激+WGCNA+ceRNA+分子对接，网药纯生信也能轻松发5+？生信风暴论文阅读
今天给同学们分享一篇生信文章“NetworkPharmacologyandBioinformaticsStudyofGeniposideRegulatingOxidativeStressinColorectalCancer”，这篇文章发表在IntJMolSci期刊上，影响因子为5.6。结果解读：丁香苷的目标网络图构建作者分别通过SwissTargetPrediction、TargetNet、CTD
跟着Briefings in Bioinformatics学数据分析：植物线粒体基因组组装流程GSAT初步尝试小明的数据分析笔记本
论文Mastergraph:anessentialintegratedassemblymodelfortheplantmitogenomebasedonagraph-basedframeworkhttps://academic.oup.com/bib/article-abstract/24/1/bbac522/6854450?redirectedFrom=fulltext&login=falseb
Java实现的基于模板的网页结构化信息精准抽取组件：HtmlExtractor yangshangchuan 信息抽取 HtmlExtractor 精准抽取信息采集
HtmlExtractor是一个Java实现的基于模板的网页结构化信息精准抽取组件，本身并不包含爬虫功能，但可被爬虫或其他程序调用以便更精准地对网页结构化信息进行抽取。 HtmlExtractor是为大规模分布式环境设计的，采用主从架构，主节点负责维护抽取规则，从节点向主节点请求抽取规则，当抽取规则发生变化，主节点主动通知从节点，从而能实现抽取规则变化之后的实时动态生效。如
java编程思想 -- 多态百合不是茶 java 多态详解
一: 向上转型和向下转型面向对象中的转型只会发生在有继承关系的子类和父类中（接口的实现也包括在这里）。父类：人子类：男人向上转型： Person p = new Man() ; //向上转型不需要强制类型转化向下转型： Man man =
[自动数据处理]稳扎稳打,逐步形成自有ADP系统体系 comsci dp
对于国内的IT行业来讲,虽然我们已经有了"两弹一星",在局部领域形成了自己独有的技术特征,并初步摆脱了国外的控制...但是前面的路还很长.... 首先是我们的自动数据处理系统还无法处理很多高级工程...中等规模的拓扑分析系统也没有完成,更加复杂的
storm 自定义日志文件商人shang storm cluster logback
Storm中的日志级级别默认为INFO，并且，日志文件是根据worker号来进行区分的，这样，同一个log文件中的信息不一定是一个业务的，这样就会有以下两个需求出现： 1. 想要进行一些调试信息的输出 2. 调试信息或者业务日志信息想要输出到一些固定的文件中不要怕，不要烦恼，其实Storm已经提供了这样的支持，可以通过自定义logback 下的 cluster.xml 来输
Extjs3 SpringMVC使用 @RequestBody 标签问题记录 21jhf
springMVC使用 @RequestBody(required = false) UserVO userInfo 传递json对象数据，往往会出现http 415，400,500等错误，总结一下需要使用ajax提交json数据才行，ajax提交使用proxy，参数为jsonData，不能为params；另外，需要设置Content-type属性为json，代码如下：（由于使用了父类aaa
一些排错方法文强chu 方法
1、java.lang.IllegalStateException: Class invariant violation at org.apache.log4j.LogManager.getLoggerRepository(LogManager.java:199)at org.apache.log4j.LogManager.getLogger(LogManager.java:228) at o
Swing中文件恢复我觉得很难小桔子 swing
我那个草了！老大怎么回事，怎么做项目评估的？只会说相信你可以做的，试一下，有的是时间！用java开发一个图文处理工具，类似word，任意位置插入、拖动、删除图片以及文本等。文本框、流程图等，数据保存数据库，其余可保存pdf格式。ok,姐姐千辛万苦，
php 文件操作 aichenglong PHP 读取文件写入文件
1 写入文件 @$fp=fopen("$DOCUMENT_ROOT/order.txt", "ab"); if(!$fp){ echo "open file error" ; exit; } $outputstring="date:"." \t tire:".$tire."
MySQL的btree索引和hash索引的区别 AILIKES 数据结构 mysql 算法
Hash 索引结构的特殊性，其检索效率非常高，索引的检索可以一次定位，不像B-Tree 索引需要从根节点到枝节点，最后才能访问到页节点这样多次的IO访问，所以 Hash 索引的查询效率要远高于 B-Tree 索引。可能很多人又有疑问了，既然 Hash 索引的效率要比 B-Tree 高很多，为什么大家不都用 Hash 索引而还要使用 B-Tree 索引呢
JAVA的抽象--- 接口 --实现百合不是茶
抽象接口实现接口 //抽象类 ,方法 //定义一个公共抽象的类 ,并在类中定义一个抽象的方法体抽象的定义使用abstract abstract class A 定义一个抽象类例如： //定义一个基类 public abstract class A{ //抽象类不能用来实例化，只能用来继承 //
JS变量作用域实例 bijian1013 作用域
<script> var scope='hello'; function a(){ console.log(scope); //undefined var scope='world'; console.log(scope); //world console.log(b);
TDD实践（二） bijian1013 java TDD
实践题目：分解质因数 Step1：单元测试： package com.bijian.study.factor.test; import java.util.Arrays; import junit.framework.Assert; import org.junit.Before; import org.junit.Test; import com.bijian.
[MongoDB学习笔记一]MongoDB主从复制 bit1129 mongodb
MongoDB称为分布式数据库，主要原因是1.基于副本集的数据备份， 2.基于切片的数据扩容。副本集解决数据的读写性能问题，切片解决了MongoDB的数据扩容问题。事实上，MongoDB提供了主从复制和副本复制两种备份方式，在MongoDB的主从复制和副本复制集群环境中，只有一台作为主服务器，另外一台或者多台服务器作为从服务器。本文介绍MongoDB的主从复制模式，需要指明
【HBase五】Java API操作HBase bit1129 hbase
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.ha
python调用zabbix api接口实时展示数据 ronin47
zabbix api接口来进行展示。经过思考之后，计划获取如下内容： 1、获得认证密钥 2、获取zabbix所有的主机组 3、获取单个组下的所有主机 4、获取某个主机下的所有监控项
jsp取得绝对路径 byalias 绝对路径
在JavaWeb开发中，常使用绝对路径的方式来引入JavaScript和CSS文件，这样可以避免因为目录变动导致引入文件找不到的情况，常用的做法如下：一、使用${pageContext.request.contextPath} 　　代码” ${pageContext.request.contextPath}”的作用是取出部署的应用程序名，这样不管如何部署，所用路径都是正确的。
Java定时任务调度：用ExecutorService取代Timer bylijinnan java
《Java并发编程实战》一书提到的用ExecutorService取代Java Timer有几个理由，我认为其中最重要的理由是：如果TimerTask抛出未检查的异常，Timer将会产生无法预料的行为。Timer线程并不捕获异常，所以 TimerTask抛出的未检查的异常会终止timer线程。这种情况下，Timer也不会再重新恢复线程的执行了;它错误的认为整个Timer都被取消了。此时，已经被
SQL 优化原则 chicony sql
一、问题的提出　在应用系统开发初期，由于开发数据库数据比较少，对于查询SQL语句，复杂视图的的编写等体会不出SQL语句各种写法的性能优劣，但是如果将应用系统提交实际应用后，随着数据库中数据的增加，系统的响应速度就成为目前系统需要解决的最主要的问题之一。系统优化中一个很重要的方面就是SQL语句的优化。对于海量数据，劣质SQL语句和优质SQL语句之间的速度差别可以达到上百倍，可见对于一个系统
java 线程弹球小游戏 CrazyMizzz java 游戏
最近java学到线程，于是做了一个线程弹球的小游戏，不过还没完善这里是提纲 1.线程弹球游戏实现 1.实现界面需要使用哪些API类 JFrame JPanel JButton FlowLayout Graphics2D Thread Color ActionListener ActionEvent MouseListener Mouse
hadoop jps出现process information unavailable提示解决办法 daizj hadoop jps
hadoop jps出现process information unavailable提示解决办法 jps时出现如下信息： 3019 -- process information unavailable3053 -- process information unavailable2985 -- process information unavailable2917 --
PHP图片水印缩放类实现 dcj3sjt126com PHP
<?php class Image{ private $path; function __construct($path='./'){ $this->path=rtrim($path,'/').'/'; } //水印函数，参数：背景图，水印图，位置，前缀,TMD透明度 public function water($b,$l,$pos
IOS控件学习：UILabel常用属性与用法 dcj3sjt126com ios UILabel
参考网站： http://shijue.me/show_text/521c396a8ddf876566000007 http://www.tuicool.com/articles/zquENb http://blog.csdn.net/a451493485/article/details/9454695 http://wiki.eoe.cn/page/iOS_pptl_artile_281
完全手动建立maven骨架 eksliang java eclipse Web
建一个 JAVA 项目： mvn archetype:create -DgroupId=com.demo -DartifactId=App [-Dversion=0.0.1-SNAPSHOT] [-Dpackaging=jar] 建一个 web 项目： mvn archetype:create -DgroupId=com.demo -DartifactId=web-a
配置清单 gengzg 配置
1、修改grub启动的内核版本 vi /boot/grub/grub.conf 将default 0改为1 拷贝mt7601Usta.ko到/lib文件夹拷贝RT2870STA.dat到 /etc/Wireless/RT2870STA/文件夹拷贝wifiscan到bin文件夹，chmod 775 /bin/wifiscan 拷贝wifiget.sh到bin文件夹，chm
Windows端口被占用处理方法 huqiji windows
以下文章主要以80端口号为例，如果想知道其他的端口号也可以使用该方法..........................1、在windows下如何查看80端口占用情况?是被哪个进程占用?如何终止等. 这里主要是用到windows下的DOS工具,点击"开始"--"运行",输入&
开源ckplayer 网页播放器，跨平台(html5, mobile)，flv, f4v, mp4, rtmp协议. webm, ogg, m3u8 ！天梯梦 mobile
CKplayer，其全称为超酷flv播放器，它是一款用于网页上播放视频的软件，支持的格式有：http协议上的flv,f4v,mp4格式，同时支持rtmp视频流格式播放，此播放器的特点在于用户可以自己定义播放器的风格，诸如播放/暂停按钮，静音按钮，全屏按钮都是以外部图片接口形式调用，用户根据自己的需要制作出播放器风格所需要使用的各个按钮图片然后替换掉原始风格里相应的图片就可以制作出自己的风格了，
简单工厂设计模式 hm4123660 java 工厂设计模式简单工厂模式
简单工厂模式（Simple Factory Pattern）属于类的创新型模式，又叫静态工厂方法模式。是通过专门定义一个类来负责创建其他类的实例，被创建的实例通常都具有共同的父类。简单工厂模式是由一个工厂对象决定创建出哪一种产品类的实例。简单工厂模式是工厂模式家族中最简单实用的模式，可以理解为是不同工厂模式的一个特殊实现。
maven笔记 zhb8015 maven
跳过测试阶段： mvn package -DskipTests 临时性跳过测试代码的编译： mvn package -Dmaven.test.skip=true maven.test.skip同时控制maven-compiler-plugin和maven-surefire-plugin两个插件的行为，即跳过编译，又跳过测试。指定测试类 mvn test
非mapreduce生成Hfile，然后导入hbase当中 Stark_Summer map hbase reduce Hfile path实例
最近一个群友的boss让研究hbase，让hbase的入库速度达到5w+/s，这可愁死了，4台个人电脑组成的集群，多线程入库调了好久，速度也才1w左右，都没有达到理想的那种速度，然后就想到了这种方式，但是网上多是用mapreduce来实现入库，而现在的需求是实时入库，不生成文件了，所以就只能自己用代码实现了，但是网上查了很多资料都没有查到，最后在一个网友的指引下，看了源码，最后找到了生成Hfile
jsp web tomcat 编码问题王新春 tomcat jsp pageEncode
今天配置jsp项目在tomcat上，windows上正常，而linux上显示乱码，最后定位原因为tomcat 的server.xml 文件的配置，添加 URIEncoding 属性： <Connector port="8080" protocol="HTTP/1.1" connectionTi