2021-07-30 统计测序中插入片段长度insert_sizes

对于双端测序而言,事情就变得简单了。bam文件的第9列直接存储了fragment length的信息,直接提取之后就可以用来画图,提取的代码如下

从bam文件中提取插入片段长度

samtools  view input.bam | \
awk -F '\t' 'function abs(x){return ((x < 0.0) ? -x : x)} {print $1"\t"abs($9)}' | \
sort | uniq | cut -f2 > fragment.length.txt

用R作图

data <- read.table("fragment.length.txt", header = F)
# 设置插入片段长度的阈值,过滤掉太长的片段
length_cutoff <- 1200
fragment <- data$V1[data$V1 <= length_cutoff]
# 利用直方图统计频数分布,设置柱子个数
breaks_num <- 500
res <- hist(fragment, breaks = breaks_num, plot = FALSE)
# 添加坐标原点
plot(x = c(0, res$breaks),
     y = c(0, 0, res$counts) / 10^2,
     type = "l", col = "red",
     xlab = "Fragment length(bp)",
     ylab = expression(Normalized ~ read ~ density ~ 10^2),
     main = "Sample Fragment sizes")

参考https://blog.csdn.net/weixin_43569478/article/details/108079757

你可能感兴趣的:(2021-07-30 统计测序中插入片段长度insert_sizes)