2020-01-10 分析SNP位点:连锁不平衡-可视化R包LDheatmap

参考学习资料:https://cran.r-project.org/web/packages/LDheatmap/LDheatmap.pdf

  • 二代测序得到的那些SNP,有些是一起起作用的,那么怎么来判断呢,这里有个可视化的包可以帮助我们,其实呢我的理解所谓基因的连锁不平衡就是可以用相关性热图来体现的,那么R包LDheatmap就是这个作用。
  • 这个包需要一些依赖包包括:snpStatsrtracklayer, GenomicRanges, GenomInfoDb and IRanges 基本上都在 BioConductor 收录https://bioconductor.org.

源码: https://sfustatgen.github.io/LDheatmap/

内置测试数据

CEUData

  • CEUSNP: 来自60人的15个SNPs信息。
  • CEUDist: CEUSNP中15个SNPs的物理位置信息
    获取方法
rm(list = ls())
options(stringsAsFactors = F)
install.packages("LDheatmap")
library("LDheatmap")
data(CEUData)

这个数据是来自7号染色体9kb区域(国际HapMap项目第7版的基本位置126273659至126282556)的次要等位基因频率(MAF)大于5%的SNP的数据。 从30个三口之家获得了基因型。 这30个来自犹他州的多代家庭,祖先来自北欧和西欧,家庭疾病:CEPH,是从这90人中,提取了60位父母信息。

参考文献:

  • International HapMap Project ftp://ftp.ncbi.nlm.nih.gov/hapmap/
  • The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299-1320. 2005.

示例

rm(list = ls())
options(stringsAsFactors = F)
library("LDheatmap")
require(snpStats)
# Load the package's data set
data("CEUData")
# Produce an LDheatmap object
MyLDheatmap <- LDheatmap(CEUSNP, genetic.distances = CEUDist, flip = TRUE)

参数flip = TRUE设置水平显示,默认是非水平显示。

CEUdata

CHBJPTData

  • CHBJPTSNP: 来自45个中国人和45个日本人的13个SNPs
  • CHBJPTDist: CHBJPTSNP中13个SNPs的物理位置信息
    获取方法:
data(CHBJPTData)
  • CHBJPTSNP: 一个SNP genotypes的df. 每行代表一个人,每列代表一个SNP。
  • CHBJPTDist: 一个整数型向量, 表示SNP位点在染色体的基因组位置信息。

数据CHBJPTSNP包含来自45位中国人和45位日本人的7号染色体上13个SNP的基因型。 中国人是来自北京师范大学的互相无亲缘关系且至少有都有3带以上的汉族祖父母。 日本人是来自东京的所有祖父母都是东京且无亲缘关系。 数据来自国际HapMap项目的第21版(International HapMap Consortium 2005)。

参考文献:

  • International HapMap Project ftp://ftp.ncbi.nlm.nih.gov/hapmap/
  • The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299-1320. 2005.

示例:

#Now do our panel plot with LDheatmaps in the panels
library(lattice)
pop<-factor(c(rep("chinese", 45), rep("japanese", 45)))
xyplot(1:nrow(CHBJPTSNP) ~ 1:nrow(CHBJPTSNP) | pop, type="n",
       scales=list(draw=FALSE), xlab="", ylab="",
       panel=function(x, y, subscripts,...) {
         LDheatmap(CHBJPTSNP[subscripts,], CHBJPTDist, newpage=FALSE)})
CHBJPTSNP

系统不知道哪里出了问题,估计是什么依赖包没有安装好的原因,第一次尝试的结果显示数据缺失,经过一番调试,估计是多安装了一个包,才得到正确的结果。

rm(list = ls())
options(stringsAsFactors = F)
# Install the latest release version from CRAN and the
# imported/suggested BioConductor packages with
install.packages("LDheatmap")
BiocManager::install("snpStats")#需要源码安装,后面选不更新
# BiocManager::install("GenomeInfoDb")
# BiocManager::install(c("snpStats","rtracklayer","GenomicRanges","IRanges"))
# Install the latest development version from GitHub with
#devtools::install_github("SFUStatgen/LDheatmap")
library("LDheatmap")
data(CHBJPTData)
#Now do our panel plot with LDheatmaps in the panels
library(lattice)
pop<-factor(c(rep("chinese", 45), rep("japanese", 45)))
xyplot(1:nrow(CHBJPTSNP) ~ 1:nrow(CHBJPTSNP) | pop, type="n",
       scales=list(draw=FALSE), xlab="", ylab="",
       panel=function(x, y, subscripts,...) {
         LDheatmap(CHBJPTSNP[subscripts,], CHBJPTDist, newpage=FALSE)})
连锁不平衡第一例

GIMAP5

SNP基于HapMap的GIMAP5 基因的连锁不平衡分析

SNP genotypes from HapMap release 27 for SNPs in a 10KB region spanning the GIMAP5 gene. Data are on founders from each of the 11 HapMap phase III populations:
ASW African ancestry in Southwest USA
CEU Utah residents with Northern and Western European ancestry from the CEPH collection CHB Han Chinese in Beijing, China
CHD Chinese in Metropolitan Denver, Colorado GIH Gujarati Indians in Houston, Texas
JPT Japanese in Tokyo, Japan
LWK Luhya in Webuye, Kenya
MEX Mexican ancestry in Los Angeles, California MKK Maasai in Kinyawa, Kenya
TSI Toscani in Italia
YRI Yoruba in Ibadan, Nigeria

仅展示人群中MAF >5% SNPs。参考基因组位置参考NCBI build 36 (UCSC genome hg18).
参考文献:

  • International HapMap Project ftp://ftp.ncbi.nlm.nih.gov/hapmap/
  • The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299-1320. 2005.

示例

data(GIMAP5)
#Now do a lattice plot with LDheatmaps in the panels
library(lattice)
pop<-GIMAP5$subject.support$pop
n<-nrow(GIMAP5$snp.data)
xyplot(1:n ~ 1:n | pop, type="n", scales=list(draw=FALSE), xlab="", ylab="",
       panel=function(x, y, subscripts,...) {
         LDheatmap(GIMAP5$snp.data[subscripts,],GIMAP5$snp.support$Position,
                   newpage=FALSE)})
rm(pop,n)
同时展示一个基因的SNP在多个人群的比较
require(snpStats) # for the SnpMatrix data structure
data(GIMAP5.CEU)
LDheatmap(GIMAP5.CEU$snp.data,GIMAP5.CEU$snp.support$Position)
一个基因某个区段内的SNP关联

还可以通过更改相应的参数调节色块展示效果及添加标注:

rm(list = ls())
options(stringsAsFactors = F)
library("LDheatmap")
require(snpStats) # for the SnpMatrix data structure
data(GIMAP5.CEU)
MyHeatmap <-LDheatmap(GIMAP5.CEU$snp.data,GIMAP5.CEU$snp.support$Position)

old.prompt <- devAskNewPage(ask = TRUE)
# Highlight a certain LD block of interest:
LDheatmap.highlight(MyHeatmap, i = 13, j = 19, col = "black",
                    fill = "grey",flipOutline=FALSE, crissCross=FALSE)
# Plot a symbol in the center of the pixel which represents LD between
# the fourth and seventh SNPs:
LDheatmap.marks(MyHeatmap,  17.5,  14.5,  gp=grid::gpar(cex=2),  pch = "*")
#### Use an RGB pallete for the color scheme ####
rgb.palette <- colorRampPalette(rev(c("blue", "orange", "red")), space = "rgb")
LDheatmap(MyHeatmap, color=rgb.palette(18))
找出密切关联的一部分SNP

标注*的位置为连锁区域。


或者用不同的颜色来展示

或者可以用不同颜色来标注连锁区域。
每个线条也可以进行rs号的标注。先获取rs号:

> colnames(GIMAP5.CEU$snp.data)
 [1] "rs6955828"  "rs3807383"  "rs6965571"  "rs9657890"  "rs9657891" 
 [6] "rs9657879"  "rs11973400" "rs9657894"  "rs4725936"  "rs4725359" 
[11] "rs9657898"  "rs13235400" "rs10239400" "rs9657886"  "rs9657900" 
[16] "rs759011"   "rs1046355"  "rs10361"    "rs6598"     "rs2286899" 
[21] "rs2286898"  "rs9657901"  "rs11760839"

那么之前那个色块起止是第13和19位的SNP,标注出来。

require(grid)
LDheatmap(MyHeatmap, SNP.name = c("rs10239400", "rs6598"))

结果如下:


标注某些特别关注的SNP

还可以进一步修饰:

getNames()
#[1] "ldheatmap"
# Find the names of the component grobs of "ldheatmap"
childNames(grid.get("ldheatmap"))
#[1] "heatMap" "geneMap" "Key"
#Find the names of the component grobs of heatMap
childNames(grid.get("heatMap"))
#[1] "heatmap" "title"
#Find the names of the component grobs of geneMap
childNames(grid.get("geneMap"))
#[1] "diagonal" "segments" "title"    "symbols"  "SNPnames"
#Find the names of the component grobs of Key
childNames(grid.get("Key"))
#[1] "colorKey" "title"    "labels"   "ticks"    "box"
#Change the plotting symbols that identify SNPs rs10239400 and rs6598
#on the plot to bullets
grid.edit("symbols", pch = 20, gp = gpar(cex = 1))
#Change the color of the main title
grid.edit(gPath("ldheatmap", "heatMap", "title"), gp = gpar(col = "red"))
#Change size of SNP labels
grid.edit(gPath("ldheatmap", "geneMap","SNPnames"), gp = gpar(cex=1.5))
#Add a grid of white lines to the plot to separate pairwise LD measures
grid.edit(gPath("ldheatmap", "heatMap", "heatmap"), gp = gpar(col = "white",
                                                              lwd = 2))
#### Modify a heat map using 'editGrob' function ####
MyHeatmap <- LDheatmap(MyHeatmap, color = grey.colors(20))
new.grob <- editGrob(MyHeatmap$LDheatmapGrob, gPath("geneMap", "segments"),gp=gpar(col="orange"))
##Clear the old graphics object from the display before drawing the modified heat map:
grid.newpage()
grid.draw(new.grob)
# now the colour of line segments connecting the SNP
# positions to the LD heat map has been changed from black to orange.
#### Draw a resized heat map (in a 'blue-to-red' color scale ####
grid.newpage()
pushViewport(viewport(width=0.5, height=0.5))
LDheatmap(MyHeatmap, SNP.name = c("rs10239400", "rs6598"), newpage=FALSE,
          color="blueToRed")
popViewport()
组合1

组合2

这样就学会了不但知道怎么画这个图,也知道怎么理解这个图,新技能get!

你可能感兴趣的:(2020-01-10 分析SNP位点:连锁不平衡-可视化R包LDheatmap)