文本处理三剑客之--awk

一、简介

        awk 是一个处理文本的编程语言工具,能用简短的程序处理标准输入或文件、数据排序、计算以及生成报表等等。awk 处理的工作方式与数据库类似,支持对记录和字段处理,这也是 grep 和 sed 不能实现的。在 awk 中,缺省的情况下将文本文件中的一行视为一个记录,逐行放到内存中处理,而将一行中的某一部分作为记录中的一个字段。用 1,2,3...数字的方式顺序表示行(记录)中的不同字段。用$后跟数字,引用对应的字段,以逗号分隔,0 表示整个行。

选项:
  • -f  从文件中读取awk程序源文件
  • -F 指定fs为输入字段分隔符
  • -v  变量赋值
  • --posix  兼容posix正则表达式
  • --dump-variables=[file]   把awk命令时的全局变量写入文件,默认文件是awkvars.out
  • --profile=[file]   格式化awk语句到文件,默认是awkprof.out
模式:

常用模式有:

  • BEGIN{ }  给程序赋予初始状态,先执行的工作
  • END{ }      程序结束之后执行的一些扫尾工作
  • /regular  expression/  为每个输入记录匹配正则表达式
  • pattern && pattern  逻辑and,满足两个模式
  • pattern || pattern  逻辑or,满足其中一个模式
  • !pattern  逻辑not,不满足模式
  • pattern1,pattern2  范围模式,匹配所有模式1的记录,直到匹配到模式2

二、示例

[root@rhel8 ~]# cat /tmp/services 
nimgtw 		48003/udp 		# Nimbus Gateway
3gpp-cbsp 		48049/tcp 		# 3GPP Cell Broadcast Service Protocol
isnetserv 		48128/tcp 		# Image Systems Network Services
isnetserv 		48128/udp 		# Image Systems Network Services
blp5 			48129/tcp 		# Bloomberg locator
blp5 			48129/udp 		# Bloomberg locator
com-bardac-dw 	48556/tcp 		# com-bardac-dw
com-bardac-dw 	48556/udp 		# com-bardac-dw
iqobject 		48619/tcp 		# iqobject
iqobject 		48619/udp 		# iqobject
1、从文件读取awk程序处理文件
[root@rhel8 ~]# cat test.awk 
{print $2}
[root@rhel8 ~]# cat /tmp/services | awk -f test.awk 
48003/udp
48049/tcp
48128/tcp
48128/udp
48129/tcp
48129/udp
48556/tcp
48556/udp
48619/tcp
48619/udp
2、指定分隔符,打印指定字段
[root@rhel8 ~]# awk -F ':' '{print $1}' /etc/passwd
root
bin
daemon
adm
lp
.......

还可以指定多个分隔符,作为同一个分隔符处理:

[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[/#]' '{print $3}'
 com-bardac-dw
 iqobject
 iqobject
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[/#]' '{print $1}'
com-bardac-dw 	48556
iqobject 	48619
iqobject 	48619
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[/#]' '{print $2}'
udp 	
tcp 	
udp 	
[root@rhel8 ~]# tail -n3 /tmp/services |awk -F'[ /]+' '{print $2}'
	48556
	48619
	48619

[]元字符的意思是符号其中任意一个字符,也就是说每遇到一个/或#时就分隔一个字段,当用多个分隔符时,就能更方便处理字段了

3、变量赋值
[root@rhel8 ~]# awk -v a=123 'BEGIN{print a}'
123
系统变量作为awk变量的值
[root@rhel8 ~]# a=123
[root@rhel8 ~]# awk -v a=$a 'BEGIN{print a}'
123
或使用单引号
[root@rhel8 ~]# awk 'BEGIN{print '$a'}'
123
4、输出 awk 全局变量到文件
[root@rhel8 ~]# seq 5 |awk --dump-variables '{print $0}'
1
2
3
4
5

[root@rhel8 ~]# cat awkvars.out 
ARGC: 1
ARGIND: 0
ARGV: array, 1 elements
BINMODE: 0
CONVFMT: "%.6g"
ENVIRON: array, 29 elements
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: "-"
FNR: 5
FPAT: "[^[:space:]]+"
FS: " "
FUNCTAB: array, 41 elements
IGNORECASE: 0
LINT: 0
NF: 1
NR: 5
OFMT: "%.6g"
OFS: " "
ORS: "\n"
PREC: 53
PROCINFO: array, 20 elements
RLENGTH: 0
ROUNDMODE: "N"
RS: "\n"
RSTART: 0
RT: "\n"
SUBSEP: "\034"
SYMTAB: array, 28 elements
TEXTDOMAIN: "messages"
5、BEGIIN和END

BEGIN 模式是在处理文件之前执行该操作,常用于修改内置变量、变量赋值和打印输出的页眉或标题。

例如,打印页眉:

[root@rhel8 ~]# tail /tmp/services |awk 'BEGIN{print "Service\t\tPort\t\t\tDescription\n==="}{print $0}'
Service		Port			Description
===
nimgtw 		48003/udp 	# Nimbus Gateway
3gpp-cbsp 		48049/tcp 	# 3GPP Cell Broadcast Service Protocol
isnetserv 		48128/tcp 	# Image Systems Network Services
isnetserv 		48128/udp 	# Image Systems Network Services
blp5 			48129/tcp 	# Bloomberg locator
blp5 			48129/udp 	# Bloomberg locator
com-bardac-dw 	48556/tcp 	# com-bardac-dw
com-bardac-dw 	48556/udp 	# com-bardac-dw
iqobject 		48619/tcp 	# iqobject
iqobject 		48619/udp 	# iqobject

END 模式是在程序处理完才会执行。

例如,打印页尾:

[root@rhel8 ~]# tail /tmp/services |awk '{print $0}END{print "===\nEND......"}'
nimgtw 		48003/udp 	# Nimbus Gateway
3gpp-cbsp 	48049/tcp 	# 3GPP Cell Broadcast Service Protocol
isnetserv 	48128/tcp 	# Image Systems Network Services
isnetserv 	48128/udp 	# Image Systems Network Services
blp5 		48129/tcp 	# Bloomberg locator
blp5 		48129/udp 	# Bloomberg locator
com-bardac-dw 	48556/tcp 	# com-bardac-dw
com-bardac-dw 	48556/udp 	# com-bardac-dw
iqobject 	48619/tcp 	# iqobject
iqobject 	48619/udp 	# iqobject
===
END......
6、格式化输出awk命令到文件
[root@rhel8 ~]# tail /tmp/services |awk --profile 'BEGIN{print"Service\t\tPort\t\t\tDescription\n==="}{print $0}END{print "===\nEND......"}'
Service		Port			Description
===
nimgtw 		48003/udp 	# Nimbus Gateway
3gpp-cbsp 	48049/tcp 	# 3GPP Cell Broadcast Service Protocol
isnetserv 	48128/tcp 	# Image Systems Network Services
isnetserv 	48128/udp 	# Image Systems Network Services
blp5 		48129/tcp 	# Bloomberg locator
blp5 		48129/udp 	# Bloomberg locator
com-bardac-dw 	48556/tcp 	# com-bardac-dw
com-bardac-dw 	48556/udp 	# com-bardac-dw
iqobject 	48619/tcp 	# iqobject
iqobject 	48619/udp 	# iqobject
===
END......

[root@rhel8 ~]# cat awkprof.out 
	# gawk profile, created Thu Sep 15 07:45:12 2022
 
	# BEGIN rule(s)
 
	BEGIN {
       	print "Service\t\tPort\t\t\tDescription\n==="
	}
 
	# Rule(s)
 
         {
      	print $0
	}
 
	# END rule(s)
 
	END {
       	print "===\nEND......"
	}
7、/re/正则匹配

匹配包含 tcp 的行:

[root@rhel8 ~]# cat /tmp/services |awk '/tcp/{print $0}'
3gpp-cbsp 		48049/tcp 	# 3GPP Cell Broadcast Service Protocol
isnetserv 		48128/tcp 	# Image Systems Network Services
blp5 			48129/tcp 	# Bloomberg locator
com-bardac-dw 	48556/tcp 	# com-bardac-dw
iqobject 		48619/tcp 	# iqobject
8、逻辑 and、or 和 not

匹配记录中包含 blp5 和 tcp 的行:

[root@rhel8 ~]# cat /tmp/services |awk '/blp5/ && /tcp/{print $0}'
blp5 		48129/tcp 	# Bloomberg locator

匹配记录中包含 blp5 或 tcp 的行:

[root@rhel8 ~]# cat /tmp/services |awk '/blp5/ || /tcp/{print $0}'
3gpp-cbsp 		48049/tcp 	# 3GPP Cell Broadcast Service Protocol
isnetserv 		48128/tcp 	# Image Systems Network Services
blp5 			48129/tcp 	# Bloomberg locator
blp5 			48129/udp 	# Bloomberg locator
com-bardac-dw 	48556/tcp 	# com-bardac-dw
iqobject 		48619/tcp 	# iqobject

不匹配开头是#和空行:

[root@rhel8 ~]# awk '! /^#/ && ! /^$/{print $0}' /etc/httpd/conf/httpd.conf

或者

[root@rhel8 ~]# awk '! /^#|^$/' /etc/httpd/conf/httpd.conf
9、匹配范围
[root@rhel8 ~]# cat /tmp/services |awk '/^blp5/,/^com/'
blp5 		48129/tcp 	# Bloomberg locator
blp5 		48129/udp 	# Bloomberg locator
com-bardac-dw 	48556/tcp 	# com-bardac-dw

你可能感兴趣的:(linux,bash,运维,开发语言)