人鼠基因转换

背景: 由于人和小鼠研究进展差异,人类基因功能/注释研究会更加深入,一些数据库只有人的注释。又或者研究中通常采用小鼠模型进行验证,这种情况下就会涉及一些基因 name / id 转换。

下面就介绍下一般基因转换方式,概括如下:

特殊基因 id/name 转换: R包(biomaRt)

全基因组 id/name 转换: 从Ensembl中直接下载对应关系文件并进行转换

另一个有意思的R包(模式物种基因各大数据库注释查询):  AnnotationDbi 

同源基因数据库列表: List of orthology databases

1. 基于R包(biomaRt)

安装biomaRt包:

library("BiocManager")

BiocManager::install("biomaRt")

library("biomaRt")

listMarts()

##              biomart                version

##1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 106

##2  ENSEMBL_MART_MOUSE      Mouse strains 106

##3    ENSEMBL_MART_SNP  Ensembl Variation 106

##4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 106


小鼠基因转人类基因:

library("biomaRt")

human = useEnsembl(biomart="ensembl", dataset = "hsapiens_gene_ensembl")

mouse = useEnsembl(biomart="ensembl", dataset = "mmusculus_gene_ensembl")

# Basic function to convert mouse to human gene names

convertMouseGeneList <- function(x){

  genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = x , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)

  humanx <- unique(genesV2[, 2])

  # Print the first 6 genes found to the screen

  return(humanx)

}

musGenes <- c("Hmmr", "Tlx3", "Cpeb4")

convertMouseGeneList(musGenes)

## 测试

musGenes <- c("Hmmr", "Tlx3", "Cpeb4")

convertMouseGeneList(musGenes)

## [1] "HMMR" "CPEB4" "TLX3"

#将代转换基因放在文件中,并读取

mmu_genes =  read.table("Gene.mmu",header = TRUE,sep= "\t")

head(mmu_genes$Gene)

## [1] "Xkr4"    "Gm1992"  "Gm19938" "Rp1"    "Sox17"  "Gm37587"

报错:

##Error: biomaRt has encountered an unexpected server error.

##Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)

人类基因转小鼠基因:

hsa = read.table("hsa.raw",header = TRUE,sep= "\t")

head(hsa)

##    Gene

##1    Xkr4

##2  Gm1992

convertHumanGeneList <- function(x){

  genesV2 = getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", values = x , mart = human,        attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=T)

  humanx <- unique(genesV2[, 2])

  # Print the first 6 genes found to the screen

  return(humanx)

}

humGenes <-hsa$Gene

convertHumanGeneList(humGenes)

## Error: biomaRt has encountered an unexpected server error.

##Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)

经过上述尝试发现,输入部分基因 name list 转换可以很好的完成;拿全部基因组的gene name做转换还是会出现问题,具体讨论解决方案可见: 链接。

2. 从Ensembl中直接下载对应关系文件并进行转换

Step1:Enaembl 官网->BioMart; 选择对应基因组 : 链接

Step2: 属性中选择“Homologues”: Gene stable ID, Gene name ;

Step3:选择对应orthologs的物种(根据首字母)

Step4: 下载: Result -> Go


Step5: 查看下载结果,写脚本自己转换吧;

    转换结果:小鼠原始gene 数目:24784

                    转换后gene数目:16412

3.  其他

另外,发现了一个比较有意思的R包,对于探索基因功能注释以及富集分析会有帮助:AnnotationDbi , org.Hs.eg.db;

安装:

Library(BiocManager)

BiocManager::install("Orthology.eg.db")

keytypes(org.Hs.eg.db)    ##查看基因注释数据库

## [1] "ACCNUM"      "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"

## [6] "ENTREZID"    "ENZYME"      "EVIDENCE"    "EVIDENCEALL"  "GENENAME"   

## [11] "GO"          "GOALL"        "IPI"          "MAP"          "OMIM"       

## [16] "ONTOLOGY"    "ONTOLOGYALL"  "PATH"        "PFAM"        "PMID"       

## [21] "PROSITE"      "REFSEQ"      "SYMBOL"      "UCSCKG"      "UNIGENE"   

## [26] "UNIPROT"   

columns(org.Hs.eg.db)  #查看通用数据库中id注释

## [1] "ACCNUM"      "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"

## [6] "ENTREZID"    "ENZYME"      "EVIDENCE"    "EVIDENCEALL"  "GENENAME"   

## [11] "GO"          "GOALL"        "IPI"          "MAP"          "OMIM"       

## [16] "ONTOLOGY"    "ONTOLOGYALL"  "PATH"        "PFAM"        "PMID"       

## [21] "PROSITE"      "REFSEQ"      "SYMBOL"      "UCSCKG"      "UNIGENE"   

## [26] "UNIPROT"

实施方案具体搜索哈~

4. 同源基因数据库推荐:链接

你可能感兴趣的:(人鼠基因转换)