统计fastq长度分布

截取长度列
j是线程数 f 是长度所在列

seqkit fx2tab -j 8 -l  -n -i -H HNHP_rep1_1h_S46_L003_R1_val_1.fq.gz| cut -f 2 >Length.txt

得到一列长度数

length
128
150
150
150
150
150
147
150
150

然后在R 中运行画

library(tidyverse)

length <- read_tsv("Length.txt") %>% group_by(length) %>%
  summarise(Count = n())
length$length <- as.character(length$length)
sum <- sum(length$Count)
ggplot(length) + geom_col(aes(length, Count), width = 0.8) + 
  geom_line(aes(length, Count), group = 1) + geom_point(aes(length, Count)) + 
  scale_y_continuous(sec.axis = sec_axis(~.*100/sum, name = "% Relative Abundance")) + xlab("Length") +
  theme_bw() + theme(panel.grid = element_blank(), 
                     axis.title = element_text(size = 15))

ggsave("Length.png", height = 5, width = 8)
ggsave("Length.pdf", height = 5, width = 8)

最后得出一个分布图

image.png

参考https://www.jianshu.com/p/31244fb42da1

你可能感兴趣的:(统计fastq长度分布)