org.Hs.eg.db 包的使用

jimmy布置的个R语言中级作业中提到了几个R包,我查找了Bioconductor中org.Hs.eg.db包的使用说明书,为了自己更好的理解和应用,做了以下笔记。

首先了解一下基因芯片。 应用基因芯片可以直接检测mRNA的种类和丰度,基因芯片的原理是基于DNA的碱基配对,采用一段已知序列的核酸作为探针(prob)来检测与之配对的核酸序列。 根据探针制备和固定技术的不同,基因芯片主要分为两类 (1)寡核苷酸芯片(oligonucleotide microarray) (2)cDNA芯片(printed cDNA microarray)

在Bioconductor中有很多基因注释R包,其中org.Hs.eg.db就是一个人类基因的注释包,大多数注释包是在于AnnotationDb 的基础上改进的。

> if (!requireNamespace("BiocManager", quietly = TRUE))
 install.packages("BiocManager")
> BiocManager::install()
 BiocManager::install("org.Hs.eg.db")    # 下载安装 org.Hs.eg.db包,同时也会安装相应的依赖包
> ls("package:org.Hs.eg.db") # 查看有哪些包
 "org.Hs.eg" # Bioconductor annotation data package
 "org.Hs.eg.db"  # Bioconductor annotation data package
 "org.Hs.eg_dbconn" # 
 "org.Hs.eg_dbfile" 
 "org.Hs.eg_dbInfo"
 "org.Hs.eg_dbschema"
 "org.Hs.egACCNUM" #Map Entrez Gene identifiers to GenBank Accession Numbers 
 "org.Hs.egACCNUM2EG" 
 "org.Hs.egALIAS2EG"  #Map between Common Gene Symbol Identifiers and Entrez Gene 
 "org.Hs.egCHR"  #  Map Entrez Gene IDs to Chromosomes 
 "org.Hs.egCHRLENGTHS"   # A named vector for the length of each of the chromosomes 
 "org.Hs.egCHRLOC"   # Entrez Gene IDs to Chromosomal Location
 "org.Hs.egCHRLOCEND" 
 "org.Hs.egENSEMBL"   # Map Ensembl gene accession numbers with Entrez Gene identifiers 
 "org.Hs.egENSEMBL2EG" 
 "org.Hs.egENSEMBLPROT"  #Map Ensembl protein acession numbers with Entrez Gene identifiers 
 "org.Hs.egENSEMBLPROT2EG" 
 "org.Hs.egENSEMBLTRANS"   # Map Ensembl transcript acession numbers with Entrez Gene identifiers
 "org.Hs.egENSEMBLTRANS2EG"
 "org.Hs.egENZYME"    # Map between Entrez Gene IDs and Enzyme Commission (EC) Numbers
 "org.Hs.egENZYME2EG"
 "org.Hs.egGENENAME"   # Map between Entrez Gene IDs and Genes
 "org.Hs.egGO"       # Maps between Entrez Gene IDs and Gene Ontology (GO) IDs
 "org.Hs.egGO2ALLEGS"
 "org.Hs.egGO2EG"
 "org.Hs.egMAP"      # Map between Entrez Gene Identifiers and cytogenetic maps/bands
 "org.Hs.egMAP2EG" 
 "org.Hs.egMAPCOUNTS"     # Number of mapped keys for the maps in package org.Hs.eg.db
 "org.Hs.egOMIM"       # Map between Entrez Gene Identifiers and Mendelian Inheritance in Man (MIM) identifiers 
 "org.Hs.egOMIM2EG"
 "org.Hs.egORGANISM"      # The Organism for org.Hs.eg
 "org.Hs.egPATH"         # Mappings between Entrez Gene identifiers and KEGG pathway identifiers
 "org.Hs.egPATH2EG"
 "org.Hs.egPFAM"       #Maps between Manufacturer Identifiers and PFAM Identifiers
 "org.Hs.egPMID"      # Map between Entrez Gene Identifiers and PubMed Identifiers
 "org.Hs.egPMID2EG"
 "org.Hs.egPROSITE"   # Maps between Manufacturer Identifiers and PROSITE Identifiers 
 "org.Hs.egREFSEQ"     # Map between Entrez Gene Identifiers and RefSeq Identifiers
 "org.Hs.egREFSEQ2EG"
 "org.Hs.egSYMBOL"       # Map between Entrez Gene Identifiers and Gene Symbols
 "org.Hs.egSYMBOL2EG"
 "org.Hs.egUCSCKG"        # This mapping has been deprecated and will no longer be available after bioconductor 2.6\. See the details section for how you can live without it. For now, it is a map of UCSC "Known Gene" accession numbers with Entrez Gene identifiers
 "org.Hs.egUNIGENE"   #Map between Entrez Gene Identifiers and UniGene cluster identifiers
 "org.Hs.egUNIGENE2EG" 
 "org.Hs.egUNIPROT"     #Map Uniprot accession numbers with Entrez Gene identifiers

参考官方文档中给的例子,在Rstudio中跑几段代码并理解。

## select() interface: 使用select()函数
## Objects in this package can be accessed using the select() interface
## from the AnnotationDbi package. See ?select for details. 用AnnotationDbi包查看更详细的select()函数
​
## Bimap interface:
x <- org.Hs.egACCNUM #创建一个映射
# Get the entrez gene identifiers that are mapped to an ACCNUM
mapped_genes <- mappedkeys(x)  #将得到的Entrez Gene identifiers 与 GenBank Accession Numbers进行map
# Convert to a list
xx <- as.list(x[mapped_genes]) #as.函数转换成list列表形式
if(length(xx) > 0) {
 # Get the ACCNUM for the first five genes
 xx[1:5]   #获取前5个基因
 # Get the first one
 xx[[1]]    #获取第一个
}
#For the reverse map ACCNUM2EG:
# Convert to a list
xx <- as.list(org.Hs.egACCNUM2EG)
if(length(xx) > 0){
 # Gets the entrez gene identifiers for the first five Entrez Gene IDs
 xx[1:5]
 # Get the first one
 xx[[1]]
}

以上。

入门生信最快方式请搜索生信技能树

  1. 生信技能树全球公益巡讲
    https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
  2. B站公益74小时生信工程师教学视频合辑https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
  3. 招学徒
    https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw

你可能感兴趣的:(org.Hs.eg.db 包的使用)