awk 是一个处理文本的编程语言工具,能用简短的程序处理标准输入或文件、数据排序、计算以及生成报表等等。awk 处理的工作方式与数据库类似,支持对记录和字段处理,这也是 grep 和 sed 不能实现的。在 awk 中,缺省的情况下将文本文件中的一行视为一个记录,逐行放到内存中处理,而将一行中的某一部分作为记录中的一个字段。用 1,2,3...数字的方式顺序表示行(记录)中的不同字段。用$后跟数字,引用对应的字段,以逗号分隔,0 表示整个行。
- -f 从文件中读取awk程序源文件
- -F 指定fs为输入字段分隔符
- -v 变量赋值
- --posix 兼容posix正则表达式
- --dump-variables=[file] 把awk命令时的全局变量写入文件,默认文件是awkvars.out
- --profile=[file] 格式化awk语句到文件,默认是awkprof.out
常用模式有:
- BEGIN{ } 给程序赋予初始状态,先执行的工作
- END{ } 程序结束之后执行的一些扫尾工作
- /regular expression/ 为每个输入记录匹配正则表达式
- pattern && pattern 逻辑and,满足两个模式
- pattern || pattern 逻辑or,满足其中一个模式
- !pattern 逻辑not,不满足模式
- pattern1,pattern2 范围模式,匹配所有模式1的记录,直到匹配到模式2
[root@rhel8 ~]# cat /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@rhel8 ~]# cat test.awk
{print $2}
[root@rhel8 ~]# cat /tmp/services | awk -f test.awk
48003/udp
48049/tcp
48128/tcp
48128/udp
48129/tcp
48129/udp
48556/tcp
48556/udp
48619/tcp
48619/udp
[root@rhel8 ~]# awk -F ':' '{print $1}' /etc/passwd
root
bin
daemon
adm
lp
.......
还可以指定多个分隔符,作为同一个分隔符处理:
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[/#]' '{print $3}'
com-bardac-dw
iqobject
iqobject
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[/#]' '{print $1}'
com-bardac-dw 48556
iqobject 48619
iqobject 48619
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[/#]' '{print $2}'
udp
tcp
udp
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[ /]+' '{print $2}'
48556
48619
48619
[]元字符的意思是符号其中任意一个字符,也就是说每遇到一个/或#时就分隔一个字段,当用多个分隔符时,就能更方便处理字段了
[root@rhel8 ~]# awk -v a=123 'BEGIN{print a}'
123
系统变量作为awk变量的值
[root@rhel8 ~]# a=123
[root@rhel8 ~]# awk -v a=$a 'BEGIN{print a}'
123
或使用单引号
[root@rhel8 ~]# awk 'BEGIN{print '$a'}'
123
[root@rhel8 ~]# seq 5 |awk --dump-variables '{print $0}'
1
2
3
4
5
[root@rhel8 ~]# cat awkvars.out
ARGC: 1
ARGIND: 0
ARGV: array, 1 elements
BINMODE: 0
CONVFMT: "%.6g"
ENVIRON: array, 29 elements
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: "-"
FNR: 5
FPAT: "[^[:space:]]+"
FS: " "
FUNCTAB: array, 41 elements
IGNORECASE: 0
LINT: 0
NF: 1
NR: 5
OFMT: "%.6g"
OFS: " "
ORS: "\n"
PREC: 53
PROCINFO: array, 20 elements
RLENGTH: 0
ROUNDMODE: "N"
RS: "\n"
RSTART: 0
RT: "\n"
SUBSEP: "\034"
SYMTAB: array, 28 elements
TEXTDOMAIN: "messages"
BEGIN 模式是在处理文件之前执行该操作,常用于修改内置变量、变量赋值和打印输出的页眉或标题。
例如,打印页眉:
[root@rhel8 ~]# tail /tmp/services |awk 'BEGIN{print "Service\t\tPort\t\t\tDescription\n==="}{print $0}'
Service Port Description
===
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
END 模式是在程序处理完才会执行。
例如,打印页尾:
[root@rhel8 ~]# tail /tmp/services |awk '{print $0}END{print "===\nEND......"}'
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
===
END......
[root@rhel8 ~]# tail /tmp/services |awk --profile 'BEGIN{print"Service\t\tPort\t\t\tDescription\n==="}{print $0}END{print "===\nEND......"}'
Service Port Description
===
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
===
END......
[root@rhel8 ~]# cat awkprof.out
# gawk profile, created Thu Sep 15 07:45:12 2022
# BEGIN rule(s)
BEGIN {
print "Service\t\tPort\t\t\tDescription\n==="
}
# Rule(s)
{
print $0
}
# END rule(s)
END {
print "===\nEND......"
}
匹配包含 tcp 的行:
[root@rhel8 ~]# cat /tmp/services |awk '/tcp/{print $0}'
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
iqobject 48619/tcp # iqobject
匹配记录中包含 blp5 和 tcp 的行:
[root@rhel8 ~]# cat /tmp/services |awk '/blp5/ && /tcp/{print $0}'
blp5 48129/tcp # Bloomberg locator
匹配记录中包含 blp5 或 tcp 的行:
[root@rhel8 ~]# cat /tmp/services |awk '/blp5/ || /tcp/{print $0}'
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
iqobject 48619/tcp # iqobject
不匹配开头是#和空行:
[root@rhel8 ~]# awk '! /^#/ && ! /^$/{print $0}' /etc/httpd/conf/httpd.conf
或者
[root@rhel8 ~]# awk '! /^#|^$/' /etc/httpd/conf/httpd.conf
[root@rhel8 ~]# cat /tmp/services |awk '/^blp5/,/^com/'
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw