xflee0608

Cufflinks的使用

一. 简介

Cufflinks下主要包含cufflinks,cuffmerge,cuffcompare和cuffdiff等几支主要的程序。主要用于基因表达量的计算和差异表达基因的寻找。

二. 安装

Cufflinks下载网页。
1. 为了安装Cufflinks，必须有Boost C++ libraries。下载Boost并安装。默认安装在/usr/local。

$ tar jxvf boost_1_53_0.tar.bz2
$ cd boost_1_53_0
$ ./bootstrap.sh
$ sudo ./b2 install

2.安装SAM tools。

下载SAM tools。
$ tar jxvf samtools-0.1.18.tar.bz2
$ cd samtools-0.1.18
$ make
$ sudo su 
# mkdir /usr/local/include/bam
# cp libbam.a /usr/local/lib
# cp *.h /usr/local/include/bam/
# cp samtools /usr/bin/

3. 安装 Eigen libraries。

下载Eigen
$ tar jxvf 3.1.2.tar.bz2
$ cd eigen-eigen-5097c01bcdc4
$ sudo cp -r Eigen/ /usr/local/include/

4. 安装Cufflinks。

$ tar zxvf cufflinks-2.0.2.tar.gz
$ cd cufflinks-2.0.2
$ ./configure --prefix=/path/to/cufflinks/install --with-boost=/usr/local/ --with-eigen=/usr/local/include//Eigen/
$ make
$ make install

5. 可以直接下载Linux x86_64 binary。不需要上述繁琐步骤，解压后的程序直接可用。(推荐)

三. Cufflinks的使用

1. Cufflinks简介

Cufflinks程序主要根据Tophat的比对结果，依托或不依托于参考基因组的GTF注释文件，计算出(各个gene的)isoform的FPKM值，并给出trascripts.gtf注释结果(组装出转录组)。

注意：

1. fragment的长度的估测，若为pair-end测序，则cufflinks自己会有一套算法，算出结果。若为single-end测序，则cufflinks默认的是高斯分布，或者你自己提供相关的参数设置。

2. cufflinks计算multi-mapped reads，一般a read map到10个位置，则每个位置记为10%。a read mapping to 10 positions will count as 10% of a read at each position.

3. 一般不推荐用cufflinks拼接细菌的转录组，推荐 Glimmer。但是，若有注释文件，可以用cufflinks和cuffdiff来检测基因的表达和差异性。

4. cufflinks/cuffdiff不能计算出exon或splicing event的FPKM

5.cuffdiff处理时间序列data：采用参数-t

6.当你使用cufflinks时，在最后出现了99%，然后一直不动。因为cuffdiff需要更多的CPU来处理一些匹配很多reads的loci。而这些位点一般要等其他位点全部解决了后，才由cuffdiff来处理。可以用参数-M来提供相关的文件，过滤掉rRNA或者线粒体RNA。

7. 当使用cufflinks或cuffdiff出现了“crash with a ‘bad_alloc' error”，cuffdiff和cufflinks运行了很长时间才结束————这表明计算机拼接一个高表达的基因或定量分析一个高表达的基因，运行的内存使用玩尽了！解决方法：修改选项“-max-bundle-frags”，可以先尝试500000，若错误依旧在，可以继续下调！

8. cuffdiff报道的结果里面所有的基因和转录本的FPKM=0，这表明GTF中的染色体名字和BAM里的名字不匹配。

9. cuffdiff和cufflinks的缺点：存在一定的假基因和转录本（原因：测序深度，测序质量，测序样本的测序次数，以及注释的错误）

10. large fold change表达量不代表数据的明显性（这些基因的isform多或这些基因测序测到的少，整体较低的表达）。cuffdiff中明显表达倍数改变的基因，存在不确定性。

11. 通过cufflinks产生的结果中transcript.gtf文件中cuff标识的转录本就是新的转录本。相应的，其他模块输出中CUFF标识代表着新的转录本。

12. 若出现了如下错误：

You are using Cufflinks v2.2.1, which is the most recent release.
open: No such file or directory
File 30 doesn't appear to be a valid BAM file, trying SAM...
Error: cannot open alignment file 30 for reading
这表明，你的参数有问题。例如“--min-intron-length”,你设置为了：“-min-intron-length”

2. 使用方法

$ cufflinks [options]* 

一个常用的例子：
$ cufflinks -p 8 -G transcript.gtf --library-type fr-unstranded -o cufflinks_output tophat_out/accepted_hits.bam

3. 普通参数

  -h | --help

   -o | --output-dir default: ./
    设置输出的文件夹名称

 -p | --num-threads default: 1
    用于比对reads的CPU线程数

 -G | --GTF 
    提供一个GFF文件，以此来计算isoform的表达。此时，将不会组装新的transcripts，
程序会忽略和reference transcript不兼容的比对结果

 -g | --GTF-guide 
    提供GFF文件，以此来指导转录子组装(RABT assembly)。此时，输出结果会包含reference transcripts和novel genes and isforms。

 -M | --mask-file 
    提供GFF文件。Cufflinks将忽略比对到该GTF文件的transcripts中的reads。该
文件中常常是rRNA的注释，也可以包含线立体和其它希望忽略的transcripts的注释。将这些不需要的RNA去除后，对计算mRNA的表达量是有利的。

 -b | --frag-bias-correct 
    提供一个fasta文件来指导Cufflinks运行新的bias detection and correction algorithm。这样能明显提高转录子丰度计算的精确性。

 -u | --multi-read-correct
    让Cufflinks来做initial estimation步骤，从而更精确衡量比对到genome多个位点的reads。

 --library-type default:fr-unstranded
    处理的reads具有链特异性。比对结果中将会有个XS标签。一般Illumina数据的lib
rary-type为 fr-unstranded。

--library-norm-method    具体参考官网,三种方式：classic-fpkm  默认的方式。geometric  针对DESeq。quartile  计算时，fragments和总的map的count取75%

4. 丰度评估参数

-m | --frag-len-mean default: 200
插入片段的平均长度。不过现在Cufflinks能learns插入片段的平均长度，因此不推荐自主
设置此值。

 -s | --frag-len-std-dev default: 80
插入片段长度的标准差。不过现在Cufflinks能learns插入片段的平均长度，因此不推荐自
主设置此值。

 -N | --upper-quartile-form
使用75%分为数的值来代替总的值(比对到单一位点的fragments的数值)，作normalize。这样有利于在低丰度基因和转录子中寻找差异基因。

 --total-hits-norm default: TRUE
Cufflinks在计算FPKM时,算入所有的fragments和比对上的reads。和下一个参数
对立。默认激活该参数。

 --compatible-hits-norm 
Cufflinks在计算FPKM时，只针对和reference transcripts兼容的fragments以及比对上的reads。该参数默认不激活，只能在有 --GTF 参数下有效，并且作 RABT
或 ab initio 的时候无效。

--max-mle-iterations  进行极大似然法时选择的迭代次数，默认为：5000

--max-bundle-frags  一个skipped locus/loci在别skipped前可以拥有的最大的fragment片段。默认为1000000

--no-effective-length-correction   Cufflinks will not employ its "effective" length normalization to transcript FPKM.Cufflinks将不会使用它的“effective” 长度标准化去计算转录的FPKM

--no-length-correction   Cufflinks将根本不会使用转录本的长度去标准化fragment的数目。当fragment的数目和the features being quantified的size是独立的，可以使用（例如for small RNA libraries, where no fragmentation takes place, or 3 prime end sequencing, where sampled RNA fragments are all essentially the same length).小心使用

5. 组装常用参数

-L | --label default: CUFF
    Cufflink以GTF格式来报告转录子片段(transfrags),该参数是GTF文件的前缀

-F/--min-isoform-fraction <0.0-1.0>  在计算一个基因的isoform 丰度后，过滤了丰度极低的转录本，因为这些转录本不可以信任。也可以过滤一些read匹配极低的外显子。默认为0.1或者10% of the most abundant isoform (the major isoform) of the gene.（一个基因的主要isoform的丰度的10%）

-j/--pre-mrna-fraction <0.0-1.0>   内含子被aligment覆盖的最低深度。若小于这个值则那些内含子的alignments被忽略掉。默认为15%。 The minimum depth of coverage in the intronic region covered by the alignment is divided by the number of spliced reads, and if the result is lower than this parameter value, the intronic alignments are ignored. The default is 15%.

-I/--max-intron-length  内含子的最大长度。若大于该值的内含子，cufflinks不会报告。默认为300000.Cufflinks will not report transcripts with introns longer than this, and will ignore SAM alignments with REF_SKIP CIGAR operations longer than this. The default is 300,000.

-a/--junc-alpha <0.0-1.0>    剪接比对过滤中假阳性的二项检验中的 alpha value。默认为 0.001

-A/--small-anchor-fraction <0.0-1.0>  在junction中一个reads小于自身长度的这个百分比，会被怀疑，可能会在拼接前被过滤掉。默认为0.09

--min-frags-per-transfrag default: 10
    组装出的transfrags被支持的RNA-seq的fragments数少于该值则不被报道。

--overhang-tolerance  当决定一个reads或转录本与某个转录本兼容或匹配的时候，允许的能加入该转录本的外显子的延伸长度。默认是8bp和bowtie/tophat默认的一致。

--max-bundle-length  Maximum genomic length allowed for a given bundle. The default is 3,500,000bp.

--min-intron-length default: 50
    最小的intron大小。

--trim-3-avgcov-thresh  最小的3‘端的平均覆盖程度。小于该值，则删除其3’端序列。默认10  Minimum average coverage required to attempt 3' trimming. The default is 10.

--trim-3-dropoff-frac   最低百分比的拼接的转录本的3‘端的平均覆盖程度。默认0.1  The fraction of average coverage below which to trim the 3' end of an assembled          transcript.  The default is 0.1.

--max-multiread-fraction <0.0-1.0>   若一个转录本Transfrags的reads能匹配到基因组的多个位置，其中该转录本的reads有超过该百分比是multireads，则不会报告这个转录本。默认为75%   The fraction a transfrag's supporting reads that may be multiply mapped to the genome. A transcript composed of more than this fraction will not be reported by the assembler.  Default: 0.75 (75% multireads or more is suppressed).

--overlap-radius default: 50
    Transfrags之间的距离少于该值，则将其连到一起。

Advanced Reference Annotation Based Transcript (RABT) Assembly Options:当你使用-g/--GTF-guide这个参数时，需要考虑的选项。

--3-overhang-tolerance    当决定一个拼接的转录本（这个转录本可能不是新的转录本）和一个参考转录本是否合并时，参考转录本的3‘端允许延伸的长度。默认600bp   The number of bp allowed to overhang the 3' end of a reference transcript when determining if an assembled transcript should be merged with it (ie, the assembled transcript is not novel). The default is 600 bp.

--intron-overhang-tolerance   当决定一个拼接的转录本（这个转录本可能不是新的转录本）和一个参考转录本是否合并时，参考转录本的外显子允许延伸的长度。默认50bp   The number of bp allowed to enter the intron of a reference transcript when determining if an     assembled transcript should be merged with it (ie, the assembled transcript is not novel).      The default is 50 bp.

--no-faux-reads   This option disables tiling of the reference transcripts with faux reads. Use this if you only want to use sequencing reads in assembly but do not want to output assembled transcripts that lay within reference transcripts. All reference transcripts in the input annotation will also be included in the output.这一项将不能掩盖参考转录组中的假reads。当你只想在拼接中使用测序的reads而不想输出lay within reference transcripts的拼接的转录组。输入时注释的所有的参考转录组也将会输入到输出中。

其他参数（无关紧要）

-v/--verbose   显示版本信息等等

-q/--quiet     除了警告和错误外，其他信息将不会print

--no-update-check   关系cufflinks自动更新的能力

6. Cufflinks输出结果

cufflinks的输入文件是sam或bam格式。并且sam或bam格式的文件必须排好序。（The SAM file supplied to Cufflinks must be sorted by 
          reference position.）Tophat的输出结果sam或bam已经排好了序。针对其他的未排序的sam或bam文件采用如下排序方式：

sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted



1. 
transcripts.gtf

该文件包含Cufflinks的组装结果isoforms。前7列为标准的GTF格式，最后一列为attributes。其每一列的意义：

列数   列的名称  例子         描述


1     序列名    chrX        染色体或contig名
; 2     来源      Cufflinks   产生该文件的程序名
; 3     类型      exon        记录的类型，一般是transcript或exon
; 4     起始      1           1-base的值
; 5     结束      1000        结束位置
; 6     得分      1000        
; 7     链        +          Cufflinks猜测isoform来自参考序列的那一条链，
一般是'+','-'或'.';  
8     frame    .           Cufflinks不去预测起始或终止密码子框的位置
; 9     attributes  ...      详见下


每一个GTF记录包含如下attributes：

Attribute      例子       描述


gene_id          CUFF.1      Cufflinks的gene id
;  transcript_id    CUFF.1.1  Cufflinks的转录子 id  
; FPKM           101.267   isoform水平上的丰度, Fragments Per Kilobase
 of exon model per Million mapped fragments
; frac           0.7647    保留着的一项，忽略即可，以后可能会取消这个; 
conf_lo        0.07      isoform丰度的95%置信区间的下边界，即 下边界值 =
 FPKM * ( 1.0 - conf_lo )
;  conf_hi        0.1102    isoform丰度的95%置信区间的上边界，即 上边界值 =
 FPKM * ( 1.0 + conf_hi )
; cov            100.765   计算整个transcript上read的覆盖度; 
full_read_support   yes  当使用 RABT assembly 时，该选项报告所有的intr
ons和exons是否完全被reads所覆盖


2. ispforms.fpkm_tracking

isoforms(可以理解为gene的各个外显子)的fpkm计算结果

3. 
genes.fpkm_tracking

gene的fpkm计算结果

四. Cuffmerge的使用

1. Cuffmerge简介


Cuffmerge将各个Cufflinks生成的transcripts.gtf文件融合称为一个更加全面的transcripts注释结果文件merged.gtf。以利于用Cuffdiff来分析基因差异表达。

2. 使用方法

$ cuffmerge [options]* 
输入文件为一个文本文件，是包含着GTF文件路径的list。常用例子：
$ cuffmerge -o ./merged_asm -p 8 assembly_list.txt

3. 使用参数

-h | --help
 -o default: ./merged_asm
将结果输出至该文件夹。

 -g | --ref-gtf
将该reference GTF一起融合到最终结果中。

 -p | --num-threads defautl: 1
使用的CPU线程数


-s | --ref-sequence /
该参数指向基因组DNA序列。如果是一个文件夹，则每个contig则是一个fasta文件；如果是
一个fasta文件，则所有的contigs都需要在里面。Cuffmerge将使用该ref-sequence来
帮助对transfrags分类，并排除repeats。比如transcripts包含一些小写碱基的将归类
到repeats.

4. Cuffmerge输出结果


输出的结果文件默认为 /merged.gtf

五. Cuffcompare的使用

1. Cuffcompare简介


Cuffcompare使用Cufflinks的GTF结果，对GTF结果进行比较。和reference gtf比较寻找novel转录本等。

2. Cuffcompare的使用方法

$ cuffcompare [options]*  [cuff2.gtf] ... [cuffN.gtf]

使用例子：
$ cuffcompare -o cuffcmp cuff1.gtf cuff2.gtf

3. 使用参数

-h                -V  显示进程    

-C  默认，表示"contained" transcripts 也会写入.combined.gtf中。 -o default: cuffcmp
输出文件的前缀


-r 
参考的GFF文件。用来评估输入的gtf文件中gene models的精确性。每一个输入的gtf的isoforms将和该参考文件进行比较，并被标注为 overlapping, matching 或 novel。

 -R
当有了 -r 参数时，指定该参数时，将忽略参考GFF文件中的一些transcripts。这些transcripts不和任何输入的GTF文件overlapped。

 -s  该参数指向基因组DNA序列。如果是一个文件夹，则每个contig则是一个fasta文件；如果是
一个fasta文件，则所有的contigs都需要在里面。小写字母的碱基用来将相应的transcripts作为repeats处理。

4. 输出结果


在当前目录下输出3个文件：

.stats， 报告与参考注释比较时，各种与准确性相关的数据。其中，Sn和Sp展示的是specificity and sensitivity values。 fSn and fSp 列展示的 "fuzzy" variants of these same accuracy calculations。允许存在变动。（-o 没有设置，默认为cuffcmp为文件前缀）

.combined.gtf    报告每个样本的所有的 transfrags 的信息。若一个transfrag在多个样本中，它只报道一次。

 .tracking      匹配到样本间的转录本。this file matches transcripts up between samples.  Each row contains 
                a transcript structure that is present in one or more input GTF files. 
                Because the transcripts will generally have different IDs (unless you 
                assembled your RNA-Seq reads against a reference transcriptome), 
                cuffcompare examines the structure of each the transcripts, 
                matching transcripts that agree on the coordinates and order of all of
                their introns, as well as strand.  Matching transcripts are allowed to 
                differ on the length of the first and last exons, since these lengths
                will naturally vary from sample to sample due to the random nature of 
                sequencing.

例子；



TCONS_00000045 XLOC_000023 Tcea|uc007afj.1    j            q1:exp.115|exp.115.0|100|3.061355|0.350242|0.350207      q2:60hr.292|60hr.292.0|100|4.094084|0.000000|0.000000

In this example, a transcript present in the two input files, called exp.115.0 in the first and 60hr.292.0 in the second, doesn't match any reference transcript exactly, but shares exons with uc007afj.1, an isoform of the gene Tcea, as indicated by the class codej. The first three columns are as follows:

其中，1 Cufflinks transfrag id TCONS_00000045  内部的transfrag id；2  Cufflinks locus id XLOC_000023  内部的locus id； 3 Reference gene id  Tcea   参考的注释的gene的id或者“-”表示没有匹配到参考的转录本； 4 Reference transcript id uc007afj.1 参考的注释的转录本的id或者“-”表示没有匹配到参考的转录本 ； 5 Class code c  转录本和参考转录本之间的匹配类型。第五列之后如下：

qJ: | | | | | | |

在输入的GTF的同目录下输出.refmap 和 
.tmap 文件。

.refmap  具体内容如下：

1  Reference gene name   参考注释的gtf中的基因名字 2 Reference transcript id 参考的转录本id  3  Class code 表示cufflinks拼接的转录本和参考转录本间的匹配情况：c 表示部分匹配；= 表示全部匹配

4  Cufflinks matches  匹配到参考转录本的cufflinks拼接的转录本的id



.tmap  具体内容如下：

1  Reference gene name   参考注释的gtf中的基因名字 2 Reference transcript id 参考的转录本id  3  Class code 表示cufflinks拼接的转录本和参考转录本间的匹配情况：c 表示部分匹配；= 表示全部匹配

4 Cufflinks gene id  ; 5 Cufflinks transcript id;  6 Fraction of major isofor m (FMI) ; 7  FPKM ; 8 FPKM_conf_lo; 9  FPKM_conf_hi  ; 10 Coverage ; 11 Length; 12  Major isoform ID



class cord :

Priority	Code	Description
1	=	Complete match of intron chain
2	c	Contained
3	j	Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
4	e	Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
5	i	A transfrag falling entirely within a reference intron
6	o	Generic exonic overlap with a reference transcript
7	p	Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
8	r	Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
9	u	Unknown, intergenic transcript
10	x	Exonic overlap with reference on the opposite strand
11	s	An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
12	.	(.tracking file only, indicates multiple classifications)

六. Cuffdiff的使用

1. Cuffdiff简介


用于寻找转录子表达的显著性差异。

2. Cuffdiff使用方法

cuffdiff主要是发现转录本表达，剪接，启动子使用的明显变化。

cuffdiff [options]* ... [sampleN.sam_replicate1.sam[,...,sample2_replicateM.sam]]

$ cuffdiff [options]*   ...[sampleN_1.sam[,...,sampleN_M.sam]]
其中transcripts.gtf是由cufflinks，cuffcompare，cuffmerge所生成的文件，或是由其它程序生成的。一个样本有多个replicate，用逗号隔开。sample多于一个时，cuffdiff将比较samples间的基因表达的差异性。

一个常用例子：
$ cuffdiff --lables lable1,lable2 -p 8 --time-series --multi-read-correct --library-type fr-unstranded --poisson-dispersion transcripts.gtf sample1.sam sample2.sam

cuffdiff接受bam/sam或cuffquant的CXB文件，同时也可以接受bam与sam的混合文件，不能接受bam/sam和CXB的混合文件。

3. 使用参数

-h | --help
 -o | --output-dir default: ./
输出的文件夹目录。
 -L | --lables default: q1,q2,...qN
给每个sample一个样品名或者一个环境条件一个lable

 -p | --num-threads default: 1
使用的CPU线程数

 -T | --time-series
让Cuffdiff来按样品顺序来比对样品，而不是对所有的samples都进行两两比对。即第二个
SAM和第一个SAM比；第三个SAM和第二个SAM比；第四个SAM和第三个SAM比...

 -N | --upper-quartile-form
使用75%分为数的值来代替总的值(比对到单一位点的fragments的数值)，作normalize。
这样有利于在低丰度基因和转录子中寻找差异基因。

 --total-hits-norm 
Cufflinks在计算FPKM时,算入所有的fragments和比对上的reads。和下一个参数对立。
默认不激活该参数。

 --compatible-hits-norm
Cufflinks在计算FPKM时，只针对和reference transcripts兼容的fragments以及
比对上的reads。该参数默认激活，使用该参数可以降低核糖体rna的reads对基因表达的干扰。

 -b | --frag-bias-correct（一般是genome.fa）
提供一个fasta文件来指导Cufflinks运行新的bias detection and correction 
algorithm。这样能明显提高转录子丰度计算的精确性。

 -u | --multi-read-correct
让Cufflinks来做initial estimation步骤，从而更精确衡量比对到genome多个位点
的reads。

 -c | --min-alignment-count default: 10
如果比对到某一个位点的fragments数目少于该值，则不做该位点的显著性分析。认为该位点的表达量没有显著性差异。

 -M | --mask-file 
提供GFF文件。Cufflinks将忽略比对到该GTF文件的transcripts中的reads。该文件中常常是rRNA的注释，也可以包含线立体和其它希望忽略的transcripts的注释。将这些不需要的RNA去除后，对计算mRNA的表达量是有利的。


-FDR default: 0.05
允许的false discovery rate.


--library-type default:fr-unstranded
处理的reads具有链特异性。比对结果中将会有个XS标签。一般Illumina数据的library-
type为 fr-unstranded。


--dispersion-method  其他高级参数：

-m | --frag-len-mean default: 200
插入片段的平均长度。不过现在Cufflinks能learns插入片段的平均长度，因此不推荐自主
设置此值。

 -s | --frag-len-std-dev default: 80
插入片段长度的标准差。不过现在Cufflinks能learns插入片段的平均长度，因此不推荐自
主设置此值。

 -v/--verbose   显示版本信息等等

 -q/--quiet     除了警告和错误外，其他信息将不会print


--no-update-check   关系cufflinks自动更新的能力


-F/--min-isoform-fraction <0.0-1.0>   建议不要更改，主要的isorform丰度若低于这个分数，可变的isoform将四舍五入为0.默认为1e-5

--max-bundle-frags  一个skipped locus/loci在skipped前可以拥有的最大的fragment片段。默认为1000000 

--max-frag-count-draws （默认为100）和--max-frag-assign-draws （默认为50） --min-reps-for-js-test      一个针对不同调控的基因做test的最小的复制次数。Cuffdiff won't test genes for differential regulation unless the conditions in question have at least this many replicates. Default: 3. 

--no-effective-length-correction  Cuffdiff will not employ its "effective" length normalization to transcript FPKM. Cufflinks将不会使用它的“effective” 长度标准化去计算转录的FPKM

--no-length-correction   cufflinks将根本不会使用转录本的长度去标准化fragment的数目。当fragment的数目和the features being quantified的size是独立的，可以使用（例如for small RNA libraries, where no fragmentation takes place, or 3 prime end sequencing, where sampled RNA fragments are all essentially the same length).小心使用 --max-mle-iterations       极大似然法的迭代次数，默认5000 --poisson-dispersion
Use the Poisson fragment dispersion model instead of learning one 
in each condition.

4. Cuffdiff输出


1. FPKM tracking files   cuffdiff计算每个样本中的转录本，初始转录本和基因的FPKM。其中，基因和初始转录本的FPKM的计算是在每个转录本group和基因group中的转录本的FPKM的求和。

isoforms.fpkm_tracking	Transcript FPKMs
genes.fpkm_tracking	Gene FPKMs. Tracks the summed FPKM of transcripts sharing each gene_id
cds.fpkm_tracking	Coding sequence FPKMs. Tracks the summed FPKM of transcripts sharing each p_id, independent of tss_id
tss_groups.fpkm_tracking	Primary transcript FPKMs. Tracks the summed FPKM of transcripts sharing each tss_id


2. Count tracking files    评估每个样本中来自每个 transcript, primary transcript, 
                and gene的fragment数目。其中primary transcript, 
                and gene的fragment数目是每个primary transcript group或gene group中trancript的数目之和。

isoforms.count_tracking	Transcript counts
genes.count_tracking	Gene counts. Tracks the summed counts of transcripts sharing each gene_id
cds.count_tracking	Coding sequence counts. Tracks the summed counts of transcripts sharing each p_id, independent of tss_id
tss_groups.count_tracking	Primary transcript counts. Tracks the summed counts of transcripts sharing each tss_id

 3. Read group tracking 
files   计算在每个repulate中每个transcript， primary transcript和gene的表达量和frage数目

isoforms.read_group_tracking	Transcript read group tracking
genes.read_group_tracking	Gene read group tracking. Tracks the summed expression and counts of transcripts sharing each gene_id in each replicate
cds.read_group_tracking	Coding sequence FPKMs. Tracks the summed expression and counts of transcripts sharing each p_id, independent of tss_id in each replicate
tss_groups.read_group_tracking	Primary transcript FPKMs. Tracks the summed expression and counts of transcripts sharing each tss_id in each replicate

4. Differential expression test    对于splicing transcript，
                primary transcripts, genes, and coding sequences.样本之间的表达差异检验。对于每一对样本x和y，都会有以下四个文件：

isoform_exp.diff	Transcript differential FPKM.
gene_exp.diff	Gene differential FPKM. Tests difference sin the summed FPKM of transcripts sharing each gene_id
tss_group_exp.diff	Primary transcript differential FPKM. Tests differences in the summed FPKM of transcripts sharing each tss_id
cds_exp.diff	Coding sequence differential FPKM. Tests differences in the summed FPKM of transcripts sharing each p_id independent of tss_id

每个文件的样式如下：

Column number	Column name	Example	Description
1	Tested id	XLOC_000001	A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested
2	gene	Lypla1	The gene_name(s) or gene_id(s) being tested
3	locus	chr1:4797771-4835363	Genomic coordinates for easy browsing to the genes or transcripts being tested.
4	sample 1	Liver	Label (or number if no labels provided) of the first sample being tested
5	sample 2	Brain	Label (or number if no labels provided) of the second sample being tested
6	Test status	NOTEST	Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing.
7	FPKM_x	8.01089	FPKM of the gene in sample x
8	FPKM_y	8.551545	FPKM of the gene in sample y
9	log2(FPKM_y/FPKM_x)	0.06531	The (base 2) log of the fold change y/x
10	test stat	0.860902	The value of the test statistic used to compute significance of the observed change in FPKM
11	p value	0.389292	The uncorrected p-value of the test statistic
12	q value	0.985216	The FDR-adjusted p-value of the test statistic
13	significant	no	Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing

5. Differential splicing tests – 
splicing.diff     对于每个primary transcript，鉴定的不同的isoform的差异性。只有2个或2个以上的isoforms的primary transcript存在

Column number	Column name	Example	Description
1	Tested id	TSS10015	A unique identifier describing the primary transcript being tested.
2	gene name	Rtkn	The gene_name or gene_id that the primary transcript being tested belongs to
3	locus	chr6:83087311-83102572	Genomic coordinates for easy browsing to the genes or transcripts being tested.
4	sample 1	Liver	Label (or number if no labels provided) of the first sample being tested
5	sample 2	Brain	Label (or number if no labels provided) of the second sample being tested
6	Test status	OK	Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing.
7	Reserved	0
8	Reserved	0
9	√JS(x,y)	0.22115	The splice overloading of the primary transcript, as measured by the square root of the Jensen-Shannon divergence computed on the relative abundances of the splice variants
10	test stat	0.22115	The value of the test statistic used to compute significance of the observed overloading, equal to √JS(x,y)
11	p value	0.000174982	The uncorrected p-value of the test statistic.
12	q value	0.985216	The FDR-adjusted p-value of the test statistic
13	significant	yes	Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing


6. Differential coding output – cds.diff    对于每个基因，它的cds的鉴定。样本间的输出cds的差异性。只有2个或2个以上的cds（multi-protein genes）列举在文件中。

Column number	Column name	Example	Description
1	Tested id	XLOC_000002-[chr1:5073200-5152501]	A unique identifier describing the gene being tested.
2	gene name	Atp6v1h	The gene_name or gene_id
3	locus	chr1:5073200-5152501	Genomic coordinates for easy browsing to the genes or transcripts being tested.
4	sample 1	Liver	Label (or number if no labels provided) of the first sample being tested
5	sample 2	Brain	Label (or number if no labels provided) of the second sample being tested
6	Test status	OK	Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing.
7	Reserved	0
8	Reserved	0
9	√JS(x,y)	0.0686517	The CDS overloading of the gene, as measured by the square root of the Jensen-Shannon divergence computed on the relative abundances of the coding sequences
10	test stat	0.0686517	The value of the test statistic used to compute significance of the observed overloading, equal to √JS(x,y)
11	p value	0.00546783	The uncorrected p-value of the test statistic
12	q value	0.985216	The FDR-adjusted p-value of the test statistic
13	significant	yes	Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing


7. Differential 
promoter use – promoters.diff  样本间启动子使用的差异性。只有表达2个或2个以上isoform的基因列举在这里。

8. Read group info – read_groups.info   每个repulate，在进行定量分析时，cuffdiff的关键属性会列出。

Column number	Column name	Example	Description
1	file	mCherry_rep_A/accepted_hits.bam	BAM or SAM file containing the data for the read group
2	condition	mCherry	Condition to which the read group belongs
3	replicate_num	0	Replicate number of the read group
4	total_mass	4.72517e+06	Total number of fragments for the read group
5	norm_mass	4.72517e+06	Fragment normalization constant used during calculation of FPKMs.
6	internal_scale	1.23916	Internal scaling factor, used to transform replicates of a single condition onto the "internal" common count scale.
7	external_scale	0.96	External scaling factor, used to transform counts from different conditions onto an internal common count scale.


9. Run 
info – run.info   运行的信息。



其中：输出文件FPKM Tracking file的格式如下：

1 tracking_id TCONS_00000001 内部唯一object的id（识别基因，转录本，CDS，初始转录本）A unique identifier describing the object (gene, transcript, CDS, primary transcript)

2 class_code = 内部定义的类别的id，“-”表明不是转录本。The class_code attribute for the object, or "-" if not a transcript, or if class_code isn't present

3 nearest_ref_id NM_008866.1 最接近的参考转录本The reference transcript to which the class code refers, if any

4 gene_id NM_008866 基因id The gene_id(s) associated with the object

5 gene_short_name Lypla1 基因名字 The gene_short_name(s) associated with the object

6 tss_id TSS1 初始转录本id，或者“-”表示没有初始转录本。The tss_id associated with the object, or "-" if not a transcript/primary transcript, or if tss_idisn't present

7 locus chr1:4797771-4835363 基因组上的位置Genomic coordinates for easy browsing to the object

8 length 2447 转录本的长度The number of base pairs in the transcript, or '-' if not a transcript/primary transcript

9 coverage 43.4279 read覆盖深度的估测值 Estimate for the absolute depth of read coverage across the object

10 q0_FPKM 8.01089 样本0中object的FPKM FPKMof the object in sample 0

11 q0_FPKM_lo 7.03583 object在样本0中FPKM的95%置信区间的下界the lower bound of the 95% confidence interval on the FPKM of the object in sample 0

12 q0_FPKM_hi 8.98595 object在样本0中FPKM的95%置信区间的上界the upper bound of the 95% confidence interval on the FPKM of the object in sample 0

13 q0_status OK object在样本0中的量化状态，0K表示成功，LOWDATA:太复杂或测序深度不够；HIDATA：在一个基因座上太多fragments，FAIL：失败的协方差矩阵或其他数值阻止了去卷积Quantification status for the object in sample 0. Can be one of OK (deconvolution successful), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents deconvolution.

Count tracking files 格式如下:

1 tracking_id TCONS_00000001 A unique identifier describing the object (gene, transcript, CDS, primary transcript)

2 q0_count 201.334 Estimated (externally scaled) number of fragments generated by the object in sample 0

3 q0_count_variance 5988.24 Estimated variance in the number of fragments generated by the object in sample 0

4 q0_count_uncertainty_var 170.21 Estimated variance in the number of fragments generated by the object in sample 0 due to fragment assignment uncertainty.

5 q0_count_dispersion_var 4905.63 Estimated variance in the number of fragments generated by the object in sample 0 due to cross-replicate variability.

6 q0_status OK Quantification status for the object in sample 0. Can be one of OK (deconvolution successful), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents deconvolution.




七. cufflinks使用中遇到的问题


1. 使用cuffdiff时候，在最新版本下，无重复的RNA-seq样作比较，结果中没有差异表达基因？

在v2.0.1及之后的版本中cuffdiff貌似不支持无重复的RNA-seq数据了。使用之前的版本即可。

八 Cuffquant

cuffquant是cuffquant能够对单个 BAM 文件的基因转录本表达水平进行定量分析。生成的是CXB文件abundances.cxb,，可以作为cuffdiff的输入，这会加快cuffdiff的运行速度。也可以作为Cuffnorm的输入。

具体使用：Usage: cuffquant [options]*

它的参数：(和前面参数的含义是一样的)

-h/--help；-o/--output-dir ；-p/--num-threads ；-M/--mask-file ；-b/--frag-bias-correct ；-u/--multi-read-correct；--library-type；-m/--frag-len-mean ；-s/--frag-len-std-dev ；--max-mle-iterations ；--max-bundle-frags ；--no-effective-length-correction；--no-length-correction；-v/--verbose；-q/--quiet；--no-update-check；

九 Cuffnorm

cuffnorm能够用 cuffquant 的输出文件作为输入文件，对基因和转录组，简单计算标准化过的表达水平。当你想要的是一系列可比较的基因、转录组、CDS 组和 TSS 组的表达值时，可是使用 cuffnorm。例如，当你仅仅想对单个基因的表达值做个热图或者点图时。

cuffnorm [options]* ... [sampleN.sam_replicate1.sam[,...,sample2_replicateM.sam]]

具体参数：它的参数和前面的类似，可以看前面的相关参数。

-h/--help ；-o/--output-dir ；-L/--labels ；-p/--num-threads ；
--total-hits-norm（默认不激活）；--compatible-hits-norm（默认激活）； --library-type； --library-norm-method；--output-format；-v/--verbose； -q/--quiet； --no-update-check；

cuffnorm的输出文件是实验中的each gene, transcript, TSS group, and CDS group的标准化的表达水平。不做表达差异的分析。cuffnorm的输出文件默认是“simple-table”的文件。这些文件和cuffdiff输出的文件格式不同。若你想要cuffdiff格式的文件，你需要输入命令： --output-format cuffdiff

cuffnorm 报道FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group.这些结果已经做了标准化处理。对于某些下游软件需要原始文件，是不作为其输入的。

可以创建一个文件，例如sample_sheet.txt作为cuffdiff或cuffnorm的输入（存入sam文件的path）。文件格式如下：

sample_id      group_label


C1_R1.sam       C1


C1_R2.sam       C1


C2_R1.sam       C2


C2_R2.sam       C2

输出结果文件如下：

FPKM tracking files：估测的基因的表达水平

Count tracking files：估测的基因的fragment count values

Read group tracking files：报道per-replicate expression and count data.

对于每个genes, transcripts, TSS groups, and CDS groups，cuffnorm会报道两种文件形式： *.fpkm_table files and *.count_table files。

你可能感兴趣的:(Bioinformatics)

推荐一份生物信息学入门很好的参考材料小明的数据分析笔记本
链接是https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/这个是康涅狄格大学（UniversityofConnecticut）提供的一份教程，主要的内容包括1、生物信息学中经常用到的文件格式image.png2、linux操作系统和R语言的基础知识image.png3、转录组数据的处理流程image.png这里包括有参
Bioinformatic workflow 小潤澤
给大家推荐个网站：https://bioinformaticsworkbook.org/projectManagement/Intro_projectManagement#gsc.tab=0这个网站适合于刚入门的生物信息同学，里面涉及到一些NGS的流程软件介绍以获得原作者的授权：原推文链接：https://twitter.com/tangming2005/status/12401074132289
使用GC含量归一化对深度测序数据的拷贝数变化进行无对照calling 亦是旅人呐
这次分享的是来自瑞士苏黎世联邦理工学院计算机科学系ValentinaBoeva教授于2011年发表在BIOINFORMATICS(IF:6.937,2020)上的文章Control-freecallingofcopynumberalterationsindeep-sequencingdatausingGC-contentnormalization。简要我们提出了一种利用深度测序数据进行无对照拷贝数
STAR: ultrafast universal RNA-seq aligner sunlight_yy
DobinA,DavisCA,SchlesingerF,etal.STAR:ultrafastuniversalRNA-seqaligner[J].Bioinformatics,2012,29(1).ABSTRACTMotivation:高通量RNA-seq数据的准确比对是一个具有挑战性但尚未解决的问题，因为转录结构不连续，读取长度相对较短且测序技术的通量不断提高。当前可用的RNA-seq比对仪遭
突然发现基本都是临床医生、医学生在搞纯生信数据挖掘 SCI狂人团队
在2016年之前，你在PubMed上搜索meta分析这个关键词会发现大部分相关的文章都是来自国内***医院或者***医科大学；而在2016年之后，来自国内***医院或者***医科大学的meta分析类文章数量明显下降，而在PubMed上输入TCGA、GEObioinformatics这些关键词会发现越来越多来自国内***医院或者***医科大学的文章。从这些文章数量的变化可以看出，由于很多单位政策的改
单细胞scRNA-seq测序基础知识笔记是土豆大叔啊！ AI4Science 笔记数据分析
单细胞scRNA-seq测序基础知识笔记scRNA-seq技术scRNA-seq分析流程数据预处理聚类标准化数据筛选有用的数据数据降维聚类Clustering注释细胞类型scRNA数据分析结尾该笔记来源于B站up江湾青年以及CostaLab-BioinformaticsCourse关于scATAC-seq的请移步scRNA-seq技术首先是如何测序，上图瓶中有很多细胞，然后让这些细胞一个一个进入右
单细胞scATAC-seq测序基础知识笔记是土豆大叔啊！ AI4Science 笔记生物信息数据分析
单细胞scATAC-seq测序基础知识笔记单细胞ATAC测序前言scATAC-seq数据怎么得出的？该笔记来源于CostaLab-BioinformaticsCourse另一篇关于scRNA-seq的请移步单细胞ATAC测序前言因为我的最终目的是scATAC-seq的数据，所以这部分只是分享下我刚学的（不是）相关的生物学知识，而且我本身也没有生物学的背景知识，所以我尽量从计算机专业的角度去理解这些
学习小组Day7——宣Xuanan 宣Xuanan
因为课题就是做转录组测序的，所以基础知识有一些了解，接下来从数据处理部分开始进行笔记。数据初步分析：使用fastqc进行质量分析，这是一款Java软件，支持多线程。写这篇文章的时候版本是v0.11.7。软件前期准备：下载方式有两种：官网下载好用filezilla导入linux服务器直接在服务器中wgethttp://www.bioinformatics.babraham.ac.uk/project
昨日收获 - 在了解微信机器人开发的过程中生信石头
写在前面Emmm...五六年前，还在bioinformatics*中国当群管的时候，我大体写了一个简单的QQ机器人。那会使用的是已有的perl模块。能做的事情也不多，基本就是实现一个QQ聊天界面的数据库操作与字词识别并自动回复。使用已有模块的好处是可以快速达成简单需求。但是这也意味着各个地方会受限，比如开发者不再开发，或者开发者设立相对较高的授权费。这两日没什么事情，于是我又搜索了一些相关的资料，
卡梅计算机生物专业怎么样,美国卡梅生物信息学专业录取案例 weixin_39683863 卡梅计算机生物专业怎么样
宫同学基本情况本科学校：山东大学；gpa:85.44;托福：107；gre:3.5录取Carnegiemellonuniversity卡耐基梅隆大学computationalbiology计算机生物学Universityofmichigan,annarbor密歇根大学安娜堡分校bioinformatics生物信息学Georgiainstituteoftechnology佐治亚理工学院bioinfo
生物信息网站集合庐山星晖
1.常用门户：美国国家生物技术信息中心(NCBI)：https://www.ncbi.nlm.nih.gov欧洲生物信息学研究所(EMBL-EBI)：https://www.ebi.ac.ukUCSCGenome：http://genome.ucsc.edu国际生物信息学动态及会议：http://www.bioinformatics.orgSeqAnswer国际生物信息技术问答论坛：http://
使用 ChatGPT 为生物信息学初学者赋能简说基因-专业生信合作伙伴 chatgpt 人工智能
论文：EmpoweringBeginnersinBioinformaticswithChatGPT.2023对于生信初学者而言，最大的困难是身边没有经验丰富的人给予指导。而ChatGTP的出现可能改变这一现状，学生可以自己作为导师，指导ChatGPT完成数据分析工作。众所周知，与ChatGPT互动，给予的指令越精确，那么它给出的答案越精准。这篇论文提出一个与ChatGPT互动的模型：OPTICAL
2022新版TCGA批量下载表达矩阵及临床信息科研小徐
#BiocManager::install("BioinformaticsFMRP/TCGAbiolinksGUI.data")#BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")gdcdata=function(i){library(TCGAbiolinks)projects%as.data.frame()%>%select(proje
DeepPhos代码复现流程学诠生物信息 Python python pip keras tensorflow 神经网络深度学习
背景介绍本文复现蛋白质磷酸化领域经典论文DeepPhos：《DeepPhos:predictionofproteinphosphorylationsiteswithdeeplearning》，发表在《Bioinformatics》期刊上，由FenglinLuo、MinghuiWang、YuLiu、Xing-MingZhao和AoLi共同撰写。文章提出了一种名为DeepPhos的新型深度学习架构，用
肺癌相关文献5 愿航生物信息学
第十一篇IdentifyingprognosticgenesrelatedPANoptosisinlungadenocarcinomaanddevelopingpredictionmodelbasedonbioinformaticsanalysisIF:4.6中科院分区:2区综合性期刊亮点1.免疫得分方法：TIMER,quanTIseq,CIBERSORT,xCell,MCPcounter,and
GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database, 物种注释和进化树构建工具使用及介绍小果运维生信分析-bioinfo 数据库 GTDB-tk 基因组分类工具物种注释
资源介绍：GTDB-Tkv2:memoryfriendlyclassificationwiththegenometaxonomydatabase|Bioinformatics|OxfordAcademic(oup.com)GTDB-GenomeTaxonomyDatabase(ecogenomic.org)大家自己看吧，不在解释了，直接上安装和配置，然后再使用。github地址：GitHub-Ec
推荐植物生物信息学参考书Plant Bioinformatics Methods and Protocols》third edition 小明的数据分析笔记本
找论文的时候偶然发现的这本参考书，个人感觉内容还挺丰富的，在这里推荐给大家书名是《PlantBioinformaticsMethodsandProtocols》thirdedition我看了下是2022年出的是最新的一版，全书总共28章第一章UsingGenBankandSRA介绍了genbank和sra数据库的一些内容第二章ScriptingAnalysesofGenomesinEnsemblP
分子生物学数据库和软件 weixin_30892987 数据库 java 操作系统
核酸数据库EMBLDatabase欧洲分子生物学实验室（EuropeanMolecularBiologyLaboratory）核酸序列数据库，为欧洲最主要的核酸序列数据库，世界两大核酸数据库之一。目前此数据库由其分支机构—EBI（theEuropeanBioinformaticsInstitute，欧洲生物情报研究所）维护。GenBank美国国家生物技术情报中心（NCBI，NationalCent
会议 | 宏基因组和生物信息学进行病原检测的进展和未来胡童远
文献信息文章：Currentprogressandfutureopportunitiesinapplicationsofbioinformaticsforbiodefenseandpathogendetection:reportfromtheWinterMid-AtlanticMicrobiomeMeet-up,CollegePark,MD,January10,2018杂志：Microbiome时
Frontiers in Bioinformatics这本期刊是否值得投纯生信？ SCI狂人团队
有粉丝说FrontiersinBioinformatics这本期刊是否值得投纯生信？这个就要看你的发文目的。如果你需要发SCI论文，这本期刊就不适合你，因为它不是SCI期刊，不被SCI数据库收录。这本期刊仅被下面这些数据库收录：GoogleScholar,CrossRef,SemanticScholar,CLOCKSS,OpenAIRE。如果你不在意这本期刊不是SCI期刊，那就可以投这本期刊。Fr
Venn-韦恩图绘制陈洪瑜
在线工具http://bioinfogp.cnb.csic.es/tools/venny/index.html最多四个http://bioinformatics.psb.ugent.be/webtools/Venn/最多五个，多于五个仅列出共用数目http://jvenn.toulouse.inra.fr/app/example.html最多六个http://genevenn.sourceforg
点点点 | 真香！Simple GO GSEA 富集分析 ~ 生信石头
写在前面时间拨回去2015年，那时我接触生信已有一年，TBtools开发尚在萌芽阶段。那会，我写了几款小的软件，包括“blast3go”，为的是应对即将收费的“blast2go”。当然，后来相关功能都整合到TBtools中。而其中有一个重点功能，即GO富集分析。那会在Bioinformatics中国社群，我们开始了理论上是国内最早的公开社群学术Seminar（网络直播），我在其上也分享了相关学习经
5+氧化应激+WGCNA+ceRNA+分子对接，网药纯生信也能轻松发5+？生信风暴论文阅读
今天给同学们分享一篇生信文章“NetworkPharmacologyandBioinformaticsStudyofGeniposideRegulatingOxidativeStressinColorectalCancer”，这篇文章发表在IntJMolSci期刊上，影响因子为5.6。结果解读：丁香苷的目标网络图构建作者分别通过SwissTargetPrediction、TargetNet、CTD
跟着Briefings in Bioinformatics学数据分析：植物线粒体基因组组装流程GSAT初步尝试小明的数据分析笔记本
论文Mastergraph:anessentialintegratedassemblymodelfortheplantmitogenomebasedonagraph-basedframeworkhttps://academic.oup.com/bib/article-abstract/24/1/bbac522/6854450?redirectedFrom=fulltext&login=falseb
signalP6.0本地化安装与使用 CAAS_IFR_zp python
今天来下载本地化的signalP6.0在线网址如下（也挺好用的）：SignalP6.0-DTUHealthTech-BioinformaticServices还是很方便的本地化主要是处理服务器上很多文件的时候，一个个在线提交太麻烦了，就直接在服务器预测吧！在刚刚的官网点击Downloads，有fast和slow模式，主要是得到的结果不一样吧，这里选择fast填写名字，机构，邮箱等信息，阅读许可之后
5+共病+WGCNA+机器学习+免疫浸润，经典共病生信思路，轻松拿5+ 生信风暴论文阅读
今天给同学们分享一篇生信文章“Identificationofbiomarkersforthediagnosisofchronickidneydisease(CKD)withnon-alcoholicfattyliverdisease(NAFLD)bybioinformaticsanalysisandmachinelearning”，这篇文章发表在FrontEndocrinol(Lausanne)
7+衰老+WGCNA+机器学习+实验，非肿瘤领域的衰老相关研究生信风暴论文阅读
今天给同学们分享一篇生信文章“Identificationofaging-relatedbiomarkersandimmuneinfiltrationcharacteristicsinosteoarthritisbasedonbioinformaticsanalysisandmachinelearning”，这篇文章发表在FrontImmunol期刊上，影响因子为7.3。结果解读：数据处理作者整合
52.《Bioinformatics Data Skills》之实战：获取基因组基因间区域与内含子区域 DataScience
今天我们通过2个实战来掌握函数gaps，setdiff与reduce在GenomicRanges中的使用：获取基因间区域；获取基因的内含子区域。Gaps函数了解首先了解一下提取基因间区域需要用到的gaps函数，此函数用于返回范围间的空白区域，之前在IRanges包中介绍过。但GRanges对象增加了strand信息，结果有所不同，通过一个简单的例子说明：首先创建一个GRanges对象gr2：>li
bioconda中国镜像(北外备用，清华已恢复，中科大暂时没恢复) zd200572 生物信息 biconda tuna mirror anaconda
bioconda是conda上一个分发生物信息软件的频道，现在已经有超过2700款软件。由于国内没有基镜像，下载安装生物信息软件速度十分缓慢，经常中断，生物信息人迫切需要一个国内镜像。Biocondaisachannelforthecondapackagemanagerspecializinginbioinformaticssoftware。2019.6.15高兴地看到大家相互转告清华源恢复了。1
41.《Bioinformatics Data Skills》之IRanges操作 DataScience
上节内容介绍了IRange对象的创建与提取，它的数据形式虽然类似于dataframe，实际上为IRange对象。这种类别允许进行专门的处理操作，接下面我们学习一下对IRange对象的运算，转换与集合的操作。计算计算包换+，-，*（没有/，因为没有意义）。加法或者减法的作用是在范围的两侧对称地延伸或者缩减特定的长度（图1）：>xx+4LIRangesoflength2startendwidth[1]
PHP如何实现二维数组排序？ IT独行者二维数组 PHP 排序　
二维数组在PHP开发中经常遇到，但是他的排序就不如一维数组那样用内置函数来的方便了，（一维数组排序可以参考本站另一篇文章【PHP中数组排序函数详解汇总】）。二维数组的排序需要我们自己写函数处理了，这里UncleToo给大家分享一个PHP二维数组排序的函数：代码： functionarray_sort($arr,$keys,$type='asc'){ $keysvalue= $new_arr
【Hadoop十七】HDFS HA配置 bit1129 hadoop
基于Zookeeper的HDFS HA配置主要涉及两个文件,core-site和hdfs-site.xml。测试环境有三台 hadoop.master hadoop.slave1 hadoop.slave2 hadoop.master包含的组件NameNode, JournalNode, Zookeeper，DFSZKFailoverController
由wsdl生成的java vo类不适合做普通java vo darrenzhu VO wsdl webservice rpc
开发java webservice项目时，如果我们通过SOAP协议来输入输出，我们会利用工具从wsdl文件生成webservice的client端类，但是这里面生成的java data model类却不适合做为项目中的普通java vo类来使用，当然有一中情况例外，如果这个自动生成的类里面的properties都是基本数据类型，就没问题，但是如果有集合类，就不行。原因如下： 1)使用了集合如Li
JAVA海量数据处理之二（BitMap）周凡杨 java 算法 bitmap bitset 数据
路漫漫其修远兮，吾将上下而求索。想要更快，就要深入挖掘 JAVA 基础的数据结构，从来分析出所编写的 JAVA 代码为什么把内存耗尽，思考有什么办法可以节省内存呢？啊哈！算法。这里采用了 BitMap 思想。首先来看一个实验：指定 VM 参数大小： -Xms256m -Xmx540m
java类型与数据库类型 g21121 java
很多时候我们用hibernate的时候往往并不是十分关心数据库类型和java类型的对应关心，因为大多数hbm文件是自动生成的，但有些时候诸如：数据库设计、没有生成工具、使用原始JDBC、使用mybatis(ibatIS)等等情况，就会手动的去对应数据库与java的数据类型关心，当然比较简单的数据类型即使配置错了也会很快发现问题，但有些数据类型却并不是十分常见，这就给程序员带来了很多麻烦。 &nb
Linux命令 510888780 linux命令
系统信息 arch 显示机器的处理器架构(1) uname -m 显示机器的处理器架构(2) uname -r 显示正在使用的内核版本 dmidecode -q 显示硬件系统部件 - (SMBIOS / DMI) hdparm -i /dev/hda 罗列一个磁盘的架构特性 hdparm -tT /dev/sda 在磁盘上执行测试性读取操作 cat /proc/cpuinfo 显示C
java常用JVM参数墙头上一根草 java jvm参数
-Xms：初始堆大小，默认为物理内存的1/64(<1GB)；默认(MinHeapFreeRatio参数可以调整)空余堆内存小于40%时，JVM就会增大堆直到-Xmx的最大限制 -Xmx：最大堆大小，默认(MaxHeapFreeRatio参数可以调整)空余堆内存大于70%时，JVM会减少堆直到 -Xms的最小限制 -Xmn：新生代的内存空间大小，注意：此处的大小是（eden+ 2
我的spring学习笔记9-Spring使用工厂方法实例化Bean的注意点 aijuans Spring 3
方法一： <bean id="musicBox" class="onlyfun.caterpillar.factory.MusicBoxFactory" factory-method="createMusicBoxStatic"></bean> 方法二：
mysql查询性能优化之二 annan211 UNION mysql 查询优化索引优化
1 union的限制有时mysql无法将限制条件从外层下推到内层，这使得原本能够限制部分返回结果的条件无法应用到内层查询的优化上。如果希望union的各个子句能够根据limit只取部分结果集，或者希望能够先排好序在合并结果集的话，就需要在union的各个子句中分别使用这些子句。例如想将两个子查询结果联合起来，然后再取前20条记录，那么mys
数据的备份与恢复百合不是茶 oracle sql 数据恢复数据备份
数据的备份与恢复的方式有: 表,方案 ,数据库; 数据的备份: 导出到的常见命令; 参数说明 USERID 确定执行导出实用程序的用户名和口令 BUFFER 确定导出数据时所使用的缓冲区大小，其大小用字节表示 FILE 指定导出的二进制文
线程组 bijian1013 java 多线程 thread java多线程线程组
有些程序包含了相当数量的线程。这时，如果按照线程的功能将他们分成不同的类别将很有用。线程组可以用来同时对一组线程进行操作。创建线程组：ThreadGroup g = new ThreadGroup(groupName); &nbs
top命令找到占用CPU最高的java线程 bijian1013 java linux top
上次分析系统中占用CPU高的问题，得到一些使用Java自身调试工具的经验，与大家分享。 (1)使用top命令找出占用cpu最高的JAVA进程PID:28174 (2)如下命令找出占用cpu最高的线程 top -Hp 28174 -d 1 -n 1 32694 root 20 0 3249m 2.0g 11m S 2 6.4 3:31.12 java
【持久化框架MyBatis3四】MyBatis3一对一关联查询 bit1129 Mybatis3
当两个实体具有1对1的对应关系时，可以使用One-To-One的进行映射关联查询 One-To-One示例数据以学生表Student和地址信息表为例，每个学生都有都有1个唯一的地址(现实中，这种对应关系是不合适的，因为人和地址是多对一的关系)，这里只是演示目的学生表 CREATE TABLE STUDENTS (
C/C++图片或文件的读写 bitcarter 写图片
先看代码： /*strTmpResult是文件或图片字符串 * filePath文件需要写入的地址或路径 */ int writeFile(std::string &strTmpResult,std::string &filePath) { int i,len = strTmpResult.length(); unsigned cha
nginx自定义指定加载配置 ronin47
进入 /usr/local/nginx/conf/include 目录，创建 nginx.node.conf 文件，在里面输入如下代码： upstream nodejs { server 127.0.0.1:3000; #server 127.0.0.1:3001; keepalive 64; } server { liste
java-71-数值的整数次方.实现函数double Power(double base, int exponent)，求base的exponent次方 bylijinnan double
public class Power { /** *Q71-数值的整数次方 *实现函数double Power(double base, int exponent)，求base的exponent次方。不需要考虑溢出。 */ private static boolean InvalidInput=false; public static void main(
Android四大组件的理解 Cb123456 android 四大组件的理解
分享一下，今天在Android开发文档-开发者指南中看到的: App components are the essential building blocks of an Android
[宇宙与计算]涡旋场计算与拓扑分析 comsci 计算
怎么阐述我这个理论呢？。。。。。。。。。首先：宇宙是一个非线性的拓扑结构与涡旋轨道时空的统一体。。。。我们要在宇宙中寻找到一个适合人类居住的行星，时间非常重要，早一个刻度和晚一个刻度，这颗行星的
同一个Tomcat不同Web应用之间共享会话Session cwqcwqmax9 session
实现两个WEB之间通过session 共享数据查看tomcat 关于 HTTP Connector 中有个emptySessionPath 其解释如下： If set to true, all paths for session cookies will be set to /. This can be useful for portlet specification impleme
springmvc Spring3 MVC，ajax，乱码 dashuaifu spring jquery mvc Ajax
springmvc Spring3 MVC @ResponseBody返回，jquery ajax调用中文乱码问题解决 Spring3.0 MVC @ResponseBody 的作用是把返回值直接写到HTTP response body里。具体实现AnnotationMethodHandlerAdapter类handleResponseBody方法，具体实
搭建WAMP环境 dcj3sjt126com wamp
这里先解释一下WAMP是什么意思。W:windows，A：Apache，M：MYSQL，P：PHP。也就是说本文说明的是在windows系统下搭建以apache做服务器、MYSQL为数据库的PHP开发环境。工欲善其事，必须先利其器。因为笔者的系统是WinXP，所以下文指的系统均为此系统。笔者所使用的Apache版本为apache_2.2.11-
yii2 使用raw http request dcj3sjt126com http
Parses a raw HTTP request using yii\helpers\Json::decode() To enable parsing for JSON requests you can configure yii\web\Request::$parsers using this class: 'request' =&g
Quartz-1.8.6 理论部分 eksliang quartz
转载请出自出处：http://eksliang.iteye.com/blog/2207691 一.概述基于Quartz-1.8.6进行学习，因为Quartz2.0以后的API发生的非常大的变化，统一采用了build模式进行构建；什么是quartz? 答：简单的说他是一个开源的java作业调度框架，为在 Java 应用程序中进行作业调度提供了简单却强大的机制。并且还能和Sp
什么是POJO？ gupeng_ie java POJO 框架 Hibernate
POJO--Plain Old Java Objects(简单的java对象) POJO是一个简单的、正规Java对象，它不包含业务逻辑处理或持久化逻辑等，也不是JavaBean、EntityBean等，不具有任何特殊角色和不继承或不实现任何其它Java框架的类或接口。 POJO对象有时也被称为Data对象，大量应用于表现现实中的对象。如果项目中使用了Hiber
jQuery网站顶部定时折叠广告 ini JavaScript html jquery Web css
效果体验：http://hovertree.com/texiao/jquery/4.htmHTML文件代码： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>网页顶部定时收起广告jQuery特效 - HoverTree<
Spring boot内嵌的tomcat启动失败 kane_xie spring boot
根据这篇guide创建了一个简单的spring boot应用，能运行且成功的访问。但移植到现有项目（基于hbase）中的时候，却报出以下错误： SEVERE: A child container failed during start java.util.concurrent.ExecutionException: org.apache.catalina.Lif
leetcode: sort list michelle_0916 Algorithm linked list sort
Sort a linked list in O(n log n) time using constant space complexity. ====analysis======= mergeSort for singly-linked list ====code======= /** * Definition for sin
nginx的安装与配置,中途遇到问题的解决 qifeifei nginx
我使用的是ubuntu13.04系统，在安装nginx的时候遇到如下几个问题，然后找思路解决的，nginx 的下载与安装 wget http://nginx.org/download/nginx-1.0.11.tar.gz tar zxvf nginx-1.0.11.tar.gz ./configure make make install 安装的时候出现
用枚举来处理java自定义异常 tcrct java enum exception
在系统开发过程中，总少不免要自己处理一些异常信息，然后将异常信息变成友好的提示返回到客户端的这样一个过程，之前都是new一个自定义的异常，当然这个所谓的自定义异常也是继承RuntimeException的，但这样往往会造成异常信息说明不一致的情况，所以就想到了用枚举来解决的办法。 1，先创建一个接口，里面有两个方法，一个是getCode, 一个是getMessage public
erlang supervisor分析 wudixiaotie erlang
当我们给supervisor指定需要创建的子进程的时候，会指定M,F,A,如果是simple_one_for_one的策略的话，启动子进程的方式是supervisor:start_child(SupName, OtherArgs),这种方式可以根据调用者的需求传不同的参数给需要启动的子进程的方法。和最初的参数合并成一个数组，A ++ OtherArgs。那么这个时候就有个问题了，既然参数不一致，那