生信随手记2020-02-10:awk命令字符串操作

复现lncRNA文献时,作者使用了一个叫做Animal QTL的数据库。

The Animal Quantitative Trait Loci (QTL) Database (Animal QTLdb) strives to collect all publicly available trait mapping data, i.e. QTL (phenotype/expression, eQTL), candidate gene and association data (GWAS), and copy number variations (CNV) mapped to livestock animal genomes, in order to facilitate locating and comparing discoveries within and between species. New data and database tools are continually developed to align various trait mapping data to map-based genome features such as annotated genes.

下载了文件以后,查看内容:

wc -l qdwnld82711OVKG.txt 
30195 qdwnld82711OVKG.txt
grep -v '^#' qdwnld82711OVKG.txt |less -SN


圈出来的几行没有坐标,需要去除。
awk命令查看第二列:

grep -v '^#' qdwnld82711OVKG.txt |less -SN

可以看到,没有坐标的行输出的是字符串。
查看第二列的首字母:

grep -v "^#" qdwnld82711OVKG.txt |awk '{print(substr($2,1,1))}' |less -SN

统计:

grep -v "^#" qdwnld82711OVKG.txt |awk '{print(substr($2,1,1))}' |sort | uniq -c

我们只需要首字母是0-9的行。

grep -v "^#" qdwnld82711OVKG.txt |awk '(substr($2,1,1) ~ /[0-9]/){print($0)}' |less -SN
grep -v "^#" qdwnld82711OVKG.txt |awk '(substr($2,1,1) ~ /[0-9]/){print($0)}' |wc -l
28594

涨姿势

  1. awk命令字符串函数substr()
  2. awk命令中字符串匹配模式~ //

你可能感兴趣的:(生信随手记2020-02-10:awk命令字符串操作)