03--DeconRNASeq:A R package of deconvolution for RNA-seq expression

DeconRNASeq:A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq data

Overview of DeconRNASeq

DeconRNASeq package uses nonnegative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next generation sequencing data. It requires two R data frame input:

  • datasets : the raw mRNA expression data matrix ( genes by samples)
    datasets = signature *A
  • signatures : known signatures of specific cell types or tissues (genes by cell types)
  • A : the cell type concentration matrix(Cell type by samples)

Pipeline of using DeconRNASeq

  • install deconRNASeq package

    source("https://bioconductor.org/biocLite.R")
    biocLite("DeconRNASeq")
    library(DeconRNASeq)
    ##view documentation
    browseVignettes("DeconRNASeq")
    
  • run the example

    ## multi_tissue: expression profiles for 10 mixing samples from multiple tissues
    data(multi_tissue)
    datasets <- x.data[,2:11] 
    ## tissue-specific signatures for different human tissues 
    signatures <- x.signature.filtered.optimal[,2:6]
    proportions <- fraction
    ## deconvolution
    DeconRNASeq(datasets, signatures, proportions, checksig=FALSE,
                  known.prop = TRUE, use.scale = TRUE, fig = TRUE)
    
    • datasets:

      datasets matrix contains 28745 genes and 10 samples, column name are the sample names, row names are the gene names.

      > head(datasets,3)
                  reads.1.RPKM reads.2.RPKM reads.3.RPKM reads.4.RPKM reads.5.RPKM
      NR_024540      3.6682100      3.78953     8.254980     7.693440     5.637220
      NR_028325.1    0.0796274      0.14644     0.104652     0.376109     0.104008
      NR_028322.1    0.0796274      0.14644     0.104652     0.376109     0.104008
                  reads.6.RPKM reads.7.RPKM reads.8.RPKM reads.9.RPKM reads.10.RPKM
      NR_024540       6.358460     5.941820     6.555140     7.784240     5.9895300
      NR_028325.1     0.160564     0.188188     0.133709     0.244789     0.0885794
      NR_028322.1     0.160564     0.188188     0.133709     0.244789     0.0885794
      > dim(datasets)
      [1] 28745    10
      
    • signatures:

      The filter signature data matrix contains 1570 genes for the five tissues. Row names are the gene name, column names are the different tissue (or the cell type) in the mixture.

      > head(signatures,3)
                      brain    muscle     lung    liver     heart
      NR_024540   2.4742600 3.3782600 3.093570 1.279540 0.8652710
      NR_028325.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035
      NR_028322.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035
      > dim(signatures)
      [1] 1570    5
      
    • proportions:

      This data matrix means the proportions of different tissues(different cell types) in samples. Here is the prportions of 5 tissues in 10 sampes.

      > head(proportions,3)
                    brain muscle   lung  liver  heart
      reads.1.RPKM 0.0463 0.0323 0.0805 0.0747 0.7662
      reads.2.RPKM 0.0606 0.1156 0.0278 0.6960 0.1000
      reads.3.RPKM 0.0728 0.6058 0.1051 0.1262 0.0900
      > dim(proportions)
      [1] 10  5
      

你可能感兴趣的:(03--DeconRNASeq:A R package of deconvolution for RNA-seq expression)