有参转录组学习三:参考基因组及基因注释

Author:ligc
Date:19/5/15
下载参考基因组

我们的实验对象是小鼠(Mus musculus),所以进入UCSC官网下载小鼠的基因组文件.


有参转录组学习三:参考基因组及基因注释_第1张图片
UCSC

有参转录组学习三:参考基因组及基因注释_第2张图片
mm10
下载基因组注释文件

进入gencode官网https://www.gencodegenes.org/下载小鼠基因组对应的gtf或gff文件

有参转录组学习三:参考基因组及基因注释_第3张图片
image.png

有参转录组学习三:参考基因组及基因注释_第4张图片
GTF

GTF(General Transfer Format)分为如下几列:
  • seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
  • source - name of the program that generated this feature, or the data source (database or project name)
  • feature - feature type name, e.g. Gene, Variation, Similarity
  • start - Start position of the feature, with sequence numbering starting at 1.
  • end - End position of the feature, with sequence numbering starting at 1.
  • score- A floating point value.
  • strand - defined as + (forward) or - (reverse).
  • frame - One of '0', '1' or '2'. '0' indicates that the first base is a codon, '1' that the second base is the first base of a codon, and so on..
  • attribute - A semicolon-separated list of tag-value pairs, providing additional information about each feature.
    有参转录组学习三:参考基因组及基因注释_第5张图片
    GFF3
GFF3(General Feature Format)的格式如下:
  1. seqid - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seq ID must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
  2. source - name of the program that generated this feature, or the data source (database or project name)
  3. type - type of feature. Must be a term or accession from the SOFA sequence ontology
  4. start - Start position of the feature, with sequence numbering starting at 1.
  5. end - End position of the feature, with sequence numbering starting at 1.
  6. score - A floating point value.
  7. strand - defined as + (forward) or - (reverse).
  8. phase - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
  9. attributes - A semicolon-separated list of tag-value pairs, providing additional information about each feature. Some of these tags are predefined, e.g. ID, Name, Alias, Parent - see the GFF documentation for more details.
Integrative Genomics Viewer (IGV)

https://software.broadinstitute.org/software/igv/
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

本文主要参考了徐洲更师兄的文章

https://www.jianshu.com/nb/14291282

你可能感兴趣的:(有参转录组学习三:参考基因组及基因注释)