R package org.Hs.eg.db to convert gene id

文章目录

  • install
  • 使用org.Hs.egENSEMBL将Ensembl id convert to gene id
  • org.Hs.egGENENAME 将Ensembl id convert to gene name
  • org.Hs.egSYMBOL 将 gene symbol convert to gene id
  • 我现在有一些ensembl id 如何转为 gene name
  • 注意
  • 你会遇到一些record不全的情况,gtf文件存在而org.Hs.eg.db不存在

install

# install 
# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")

# BiocManager::install("AnnotationDbi")
# BiocManager::install("org.Hs.eg.db")

# or install 

# wget https://www.bioconductor.org/packages/release/bioc/src/contrib/AnnotationDbi_1.62.2.tar.gz
# install.packages("/public/home/djs/software/AnnotationDbi_1.62.2.tar.gz", repos = NULL, type="source")
# wget https://www.bioconductor.org/packages/release/data/annotation/src/contrib/org.Hs.eg.db_3.17.0.tar.gz
# install.packages("/public/home/djs/software/org.Hs.eg.db_3.17.0.tar.gz", repos = NULL, type="source")
library(org.Hs.eg.db)

help(package="org.Hs.eg.db")
Index:

org.Hs.eg.db            Bioconductor annotation data package
org.Hs.egACCNUM         Map Entrez Gene identifiers to GenBank Accession Numbers
org.Hs.egALIAS2EG       Map between Common Gene Symbol Identifiers and Entrez Gene
org.Hs.egCHR            Map Entrez Gene IDs to Chromosomes
org.Hs.egCHRLENGTHS     A named vector for the length of each of the chromosomes
org.Hs.egCHRLOC         Entrez Gene IDs to Chromosomal Location
org.Hs.egENSEMBL        Map Ensembl gene accession numbers with Entrez Gene identifiers
org.Hs.egENSEMBLPROT    Map Ensembl protein acession numbers with Entrez Gene identifiers
org.Hs.egENSEMBLTRANS   Map Ensembl transcript acession numbers with Entrez Gene identifiers
org.Hs.egENZYME         Map between Entrez Gene IDs and Enzyme Commission (EC) Numbers
org.Hs.egGENENAME       Map between Entrez Gene IDs and Genes
org.Hs.egGENETYPE       Map between Entrez Gene Identifiers and Gene Type
org.Hs.egGO             Maps between Entrez Gene IDs and Gene Ontology (GO) IDs
org.Hs.egMAP            Map between Entrez Gene Identifiers and cytogenetic maps/bands
org.Hs.egMAPCOUNTS      Number of mapped keys for the maps in package org.Hs.eg.db
org.Hs.egOMIM           Map between Entrez Gene Identifiers and Mendelian Inheritance in Man (MIM) identifiers
org.Hs.egORGANISM       The Organism for org.Hs.eg
org.Hs.egPATH           Mappings between Entrez Gene identifiers and KEGG pathway identifiers
org.Hs.egPFAM           Maps between Manufacturer Identifiers and PFAM  Identifiers
org.Hs.egPMID           Map between Entrez Gene Identifiers and PubMed  Identifiers
org.Hs.egPROSITE        Maps between Manufacturer Identifiers and  PROSITE Identifiers
org.Hs.egREFSEQ         Map between Entrez Gene Identifiers and RefSeq  Identifiers
org.Hs.egSYMBOL         Map between Entrez Gene Identifiers and Gene  Symbols
org.Hs.egUNIPROT        Map Uniprot accession numbers with Entrez Gene  identifiers
org.Hs.eg_dbconn        Collect information about the package  annotation DB

使用org.Hs.egENSEMBL将Ensembl id convert to gene id

x <- org.Hs.egENSEMBL
# Get the entrez gene IDs that are mapped to an Ensembl ID
mapped_genes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_genes])

xx[1:5]  # entrez gene id 是list的索引名字,list的元素则是 ensembl id

R package org.Hs.eg.db to convert gene id_第1张图片

org.Hs.egGENENAME 将Ensembl id convert to gene name

x <- org.Hs.egGENENAME
# Get the gene names that are mapped to an entrez gene identifier
mapped_genes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_genes])

R package org.Hs.eg.db to convert gene id_第2张图片

org.Hs.egSYMBOL 将 gene symbol convert to gene id

x <- org.Hs.egSYMBOL
# Get the gene symbol that are mapped to an entrez gene identifiers
mapped_genes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_genes])

# For the reverse map:
x <- org.Hs.egSYMBOL2EG
# Get the entrez gene identifiers that are mapped to a gene symbol
mapped_genes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_genes])

R package org.Hs.eg.db to convert gene id_第3张图片

我现在有一些ensembl id 如何转为 gene name

# 将 ensembl id 单独拿出来
k <- keys(org.Hs.eg.db,keytype = "ENSEMBL")
# 然后根据 ensembl id 调出来entrez gene id 和 gene symbol
list <- select(org.Hs.eg.db,keys=k,columns = c("ENTREZID","SYMBOL"), keytype="ENSEMBL")

# 或者使用你自己的 ensembl id 作为keys
list <- select(org.Hs.eg.db,keys=ID,columns = c("ENTREZID","SYMBOL"), keytype="ENSEMBL")

head(list,5)

R package org.Hs.eg.db to convert gene id_第4张图片

# 此处的 ensembl ID就是你个性化的id,我这里直接抽样得到然后用于演示
ID <- sample(list$ENSEMBL,10) 
ID_list <- list[match(ID,list[,"ENSEMBL"]),]
ID_list

R package org.Hs.eg.db to convert gene id_第5张图片

注意

这些ID对应关系随着不同数据库的升级和维护有可能出现前后不对应的情况。
同时这些ID 也不是一一对应的关系,可能存在一对多或者多对一的关系。
R package org.Hs.eg.db to convert gene id_第6张图片

你会遇到一些record不全的情况,gtf文件存在而org.Hs.eg.db不存在

gtf存在 61544个基因
在这里插入图片描述

x <- org.Hs.egENSEMBL

sum(is.na(unlist(as.list(x))))
[1] 105167
sum(!is.na(unlist(as.list(x))))
[1] 45727
# org.Hs.egENSEMBL 只有45727 个record

自己找个gtf文件然后提取信息再做转化吧

cat gencode.v40.annotation.gtf |awk 'BEGIN{FS=="\t"} $3~/gene/{print $0}' |cut -f 9 | cut -d ";" -f1,3 |cut -d " " -f2,4 |sed 's/\..*;//g' |sed 's/"//g' > ENSEMBL_TO_GENE.txt

你可能感兴趣的:(R,bioinfo,r语言)