awk扩展内容

  • awk中使用外部shell变量

    说明:-v选项用于定义参数,这里表示将变量A的值赋予GET_A。
    有多少个变量需要赋值,就需要多少个-v选项。与之等价的

    [root@aminglinux-02 awk]# A=44 ;awk -v GET_A=$A -F ':' '{print GET_A ":" $1}' 2.txt
    44:1111111
    44:2222222
    44:1111111
    44:3333333
    44:2222222

    应用于脚本中:

    [root@aminglinux-02 awk]# cat 2.txt 
    1111111:13443253456
    2222222:13211222122
    1111111:13643543544
    3333333:12341243123
    2222222:12123123123
    [root@aminglinux-02 awk]# cat 1.sh 
    #!/bin/bash
    sort -n 2.txt |awk -F ':' '{print $1}'|uniq >id.txt
    for id in `cat id.txt`; do
    echo "[$id]"
    awk -v id2=$id -F ':' '$1==id2 {print $2}' 2.txt 
    done 
    ##或者:awk -F ':' '$1=="id" {print $2}' 2.txt
    [root@aminglinux-02 awk]# sh 1.sh 
    [1111111]
    13443253456
    13643543544
    [2222222]
    13211222122
    12123123123
    [3333333]
    12341243123
  • 如果需要把两个文件中,第一列相同的行合并到同一行中。举个例子,有两个文件,内容如下
    [root@aminglinux-02 awk]# cat 1.txt 
    1 aa
    2 bb
    3 ee
    4 ss
    [root@aminglinux-02 awk]# cat 2.txt 
    1 ab
    2 cd
    3 ad
    4 bd
    5 de
    [root@aminglinux-02 awk]# awk 'NR==FNR{a[$1]=$2}NR>FNR{print $0,a[$1]}' 1.txt 2.txt 
    1 ab aa
    2 cd bb
    3 ad ee
    4 bd ss
    5 de 
  • 解释:NR表示读取的行数,FNR表示读取的当前行数
    所以其实NR==FNR 就表示读取1.txt的时候。 同理NR>FNR表示读取2.txt的时候
    数组a其实就相当于一个map
  • 一个文件每行都有一个数字,现在需要把每行的数字用“+”连接起来
    [root@aminglinux-02 awk]# cat a 
    96
    1093
    1855
    1253
    1364
    1332
    2308
    2589
    2531
    1239
    2164
    2826
    2787
    2145
    2617
    4311
    1810
    2115
    1235
    [root@aminglinux-02 awk]# awk '{printf ("%s+",$0)}'  a; echo ""
    96+1093+1855+1253+1364+1332+2308+2589+2531+1239+2164+2826+2787+2145+2617+4311+1810+2115+1235+
    [root@aminglinux-02 awk]# cat a|xargs|sed 's/ /+/g'
    96+1093+1855+1253+1364+1332+2308+2589+2531+1239+2164+2826+2787+2145+2617+4311+1810+2115+1235
    这里注意,最后一个是带“+”的。echo ""  的作用是换行。
  • awk中gsub函数的使用
    [root@aminglinux-02 awk]# awk -F ':' 'gsub(/www/,"abc",$1){print $0}' test.txt 
    abc x 1 1 bin /bin /sbin/nologin
    abcwwon x 2 2 daemon /sbin /sbin/nologin
    [root@aminglinux-02 awk]# awk 'gsub(/www/,"abc")' test.txt 
    abc:x:1:1:bin:/bin:/sbin/nologin
    abcwwon:x:2:2:daemon:/sbin:/sbin/nologin
    adm:abc:3:4:adm:/var/adm:/sbin/nologin
    lp:abc:4:7:lp:/var/spool/lpd:/sbin/nologin
  • awk 截取指定多个域为一行
    for j in `seq 0 20`; do
        let x=100*$j
        let y=$x+1
        let z=$x+100
        for i in `seq $y $z` ; do
                awk  -v a=$i '{printf $a " "}' example.txt >>/tmp/test.txt
                echo " " >>/tmp/test.txt
        done
    done
  • grep 或 egrep 或awk 过滤两个或多个关键词
    [root@aminglinux-02 awk]# grep -E '123|abc' test.txt 
    abc:www:3:4:adm:/var/adm:/sbin/nologin
    123:www:4:7:lp:/var/spool/lpd:/sbin/nologin
    [root@aminglinux-02 awk]# egrep '123|abc' test.txt 
    abc:www:3:4:adm:/var/adm:/sbin/nologin
    123:www:4:7:lp:/var/spool/lpd:/sbin/nologin
    [root@aminglinux-02 awk]# awk '/123|abc/' test.txt 
    abc:www:3:4:adm:/var/adm:/sbin/nologin
    123:www:4:7:lp:/var/spool/lpd:/sbin/nologin
  • 用awk编写生成以下结构文件的程序。( 最后列使用现在的时间,时间格式为YYYYMMDDHHMISS) 各列的值应如下所示,每增加一行便加1,共500万行。
    [root@aminglinux-02 awk]# awk 'BEGIN{for(i=1;i<=10;i++)printf("%d,%d,%010d,%010d,%010d,%010d,%010d,%010d,%d\n",i,i,i,i,i,i,i,i,strftime("%Y%m%d%H%M"))}'
    1,1,0000000001,0000000001,0000000001,0000000001,0000000001,0000000001,201707212356
    2,2,0000000002,0000000002,0000000002,0000000002,0000000002,0000000002,201707212356
    3,3,0000000003,0000000003,0000000003,0000000003,0000000003,0000000003,201707212356
    4,4,0000000004,0000000004,0000000004,0000000004,0000000004,0000000004,201707212356
    5,5,0000000005,0000000005,0000000005,0000000005,0000000005,0000000005,201707212356
    6,6,0000000006,0000000006,0000000006,0000000006,0000000006,0000000006,201707212356
    7,7,0000000007,0000000007,0000000007,0000000007,0000000007,0000000007,201707212356
    8,8,0000000008,0000000008,0000000008,0000000008,0000000008,0000000008,201707212356
    9,9,0000000009,0000000009,0000000009,0000000009,0000000009,0000000009,201707212356
    10,10,0000000010,0000000010,0000000010,0000000010,0000000010,0000000010,201707212356
#! /bin/bash

for i in `seq 1 5000000`; do
    n=`echo "$i"|awk '{print length($0)}'`
    export m=$[10-$n]
    export o=`perl -e '$a='0'; $b=$a x $ENV{"m"}; print $b;'`
    export j=$i
    p=`perl -e '$c=$ENV{"o"}.$ENV{"j"}; print $c;'`
    echo "$i,$i,$p,$p,$p,$p,$p,$p,`date +%Y%m%d%H%M%S`"
done
  • awk用print打印单引号,在awk中使用脱义字符\是起不到作用的,如果想打印特殊字符,只能使用'""' 这样的组合才可以。这里自左至右为单引号 双引号 双引号 单引号其中两个单引号为一对,两个双引号为一对。想脱义$那就是'"$"' 脱义单引号
    [root@aminglinux-02 awk]# awk -F ':' '{print "This is a '"'"'"$1}' test.txt 
    This is a 'root
    This is a 'bin
    This is a 'daemon
    This is a 'adm
    This is a 'lp
    This is a 'sync
    This is a 'shutdown
    This is a 'halt
    This is a 'mail
    This is a 'operator
    This is a 'games
    This is a 'ftp
    This is a 'nobody
    This is a 'systemd-bus-proxy
    This is a 'systemd-network
    This is a 'dbus
    This is a 'polkitd
    This is a 'tss
    This is a 'postfix
    This is a 'sshd
    This is a 'chrony
    This is a 'aming
    This is a 'user1
    This is a 'user3
  • 把两个文件中相同的行合并成一行
    [root@aminglinux-02 awk]# cat a.txt 
    1 2 3
    4 5 6
    a b c
    [root@aminglinux-02 awk]# cat b.txt 
    3 2 1
    6 5 4
    c b a
    [root@aminglinux-02 awk]# paste a.txt b.txt 
    1 2 3   3 2 1
    4 5 6   6 5 4
    a b c   c b a
    [root@aminglinux-02 awk]# paste -d '+' a.txt b.txt 
    1 2 3+3 2 1
    4 5 6+6 5 4
    a b c+c b a
    [root@aminglinux-02 awk]#  awk 'NR==FNR {a[FNR]=$0} NR>FNR {print a[FNR],$0}' a.txt b.txt 
    1 2 3 3 2 1
    4 5 6 6 5 4
    a b c c b a
    [root@aminglinux-02 awk]#  awk 'NR==FNR {a[FNR]=$0} NR>FNR {print a[FNR],"+",$0}' a.txt b.txt 
    1 2 3 + 3 2 1
    4 5 6 + 6 5 4
    a b c + c b a
  • awk的参考教程