AWK 简介
- 基本概念
- 基本命令形式
- 命令参数
  - -f 指定 awk 程序文件
  - -F 指定分隔符
    - 设置 RS 变量
  - -v 变量赋值
    - 将外部环境变量传递到 awk 程序
- pattern
- 处理代码块
- awk 语言基础
  - 执行方式
  - 内建变量
  - 内建函数
  - 数据类型
    - 隐式转换
  - 数组
    - 基本操作
    - 迭代器
  - 语句
  - 控制语句，if for
  - 正则
  - 运算符
  - 自定义函数
- 杂项
  - awk命令行参数
  - 获取其他bash命令输出

AWK 简介

基本概念

awk 语言由 Aho, Kernighan and Weinberger(1988) 提出，适合于数据文件和文本批量处理和格式化的场景，提供变量和函数的概念，并有许多内建函数和变量，对正则表达式提供良好的支持。

mawk 为实验环境的 awk 语言解释器。

基本命令形式

awk 'pat1{hadle1} pat2{hadle2} ... ' file

pattern{hadle block}: 为awk程序的基本组成元素，awk 程序运行过程中，待处理的数据流会被 RS(Record Separator) 变量分割为逐个的 record，每个 record 会逐个与各个模式进行匹配，当匹配成功时，对应的处理模块会执行，缺省的分隔符为 \n，缺省的处理模块为 { print($0) } 打印当前 record。

$ cat test.txt 
1 a
2 bc
3 def
4 ghij
5 klmno
$ awk 'BEGIN{ print "line word"} /^1/{print} /^3/{print} END{print "end..."}' test.txt 
# BEGIN 及 END 为一种特殊的 PATTERN
line word
1 a
3 def
end...

命令参数

`-f` 指定 awk 程序文件

awk -f awk_code_file file_to_handle... 简单的awk程序可以直接写在命令行中，形如 awk 'awk_code' file_to_handle...，复杂的代码可以保存为代码文件，用 -f 指定，见 awk语言基础 > 执行方式 小节

`-F` 指定分隔符

默认 Record Separator = "\n" 可以自定义，在本机（ mawk 1.3.3 @ ubuntu 16.04.10）该参数不生效

设置 RS 变量

可以通过直接设置 RS 变量达到改变分割符的目的

$ cat test.txt 
1 a - 2 bc - 3 def - 4 ghij - 5 klmno
$ awk 'BEGIN{ RS="( - )?[0-9]"} {print($0)}' test.txt 
# 支持正则表达式作为分隔符
 a
 bc
 def
 ghij
 klmno

`-v` 变量赋值

-v var=value assigns value to program variable var.

将外部环境变量传递到 awk 程序

$ RS=[0-9]
$ awk -v RS=${RS} 'BEGIN{print RS} {print($0)}' test.txt 
[0-9]

 a - 
 bc - 
 def - 
 ghij - 
 klmno

$ awk 'BEGIN{print RS} {print($0)}' RS=$RS test.txt 

 a - 
 bc - 
 def - 
 ghij - 
 klmno

pattern

A pattern can be:

BEGIN : 可选，在输入流读取之前执行，一般用来完成变量初始化、打印输出表头等操作。

END : 可选，在输入流处理完毕执行，一般用来完成统计结果。

expression : 变量的判断语句或 record的正则匹配语句

expression , expression : 表达式的pattern1，pattern2形式称为范围模式。它匹配所有输入记录，从与pattern1匹配的记录开始，并持续到与pattern2匹配的记录（包括）。它不与任何其他类型的模式表达式组合。当有重复行时，范围模式会将重复结果显示，见关于awk的范围模式功能问题

$ seq 15 | awk '/1$/,/4$/{print $0}' # 正则遇上重复行，行为比较复杂，慎用
1
2
3
4
11
12
13
14
$ seq 15 | awk 'NR==1,NR==4{print $0}' # 选择行范围就很方便
1
2
3
4

处理代码块

awk 语言基础

AWK程序设计语言

执行方式

awk 'simple code' file1 file2 ...: 对多个 file 输入执行简单的处理命令
awk -f awk_code_file file1 file2 ...: 对多个 file 输入执行复杂的处理脚本
awk 'simple code' k1=v1 k2=v2 file1 file2 ...: 指定awk长徐运行的命令行参数

内建变量

变量	含义	备注
NR	当前行号	-
NF	当前行的字段数	-
$0	当前行的内容	-
$n	当前行的第n个字段	-

$ awk 'BEGIN{ print "line\tNR\tcontent"} {printf "%d\t%d\t%s\n", NR, NF, $2 } END{print "end ..."}' test.txt 
line    NR  content
1   2   a
2   2   bc
3   2   def
4   2   ghij
5   2   klmno
end ...

内建函数

函数	说明	备注
`length(str)`	字符串长度	-
`index(str, subStr)`	字符串索引	-
`split(str, arr, delimiter)`	字符串分割	-
`substr(str, start, end)`	提取子串	-
`sub(regex, replace_str, string)`	正则替换首个子串	-
`gsub(regex, replace_str, string)`	全局正则替换子串	-
`match(regex, string)`	正则匹配	-
`printf( format, ... )`	打印	-

$ cat test.awk 
$ cat test.awk 
BEGIN{ 
    print( "line\t" "$2\t" "length\t" "index of a\t" "subStr(0-3)") 
} 

{ # show the basic usage of length, index and substr
    printf("%d\t" "%s\t" "%d\t" "%d\t\t" "%s" "\n", 
        NR, 
        $2, 
        length($2), 
        index($0,"a"), 
        substr($0,1,3) );
}

{ # show the basic usage of split, match and sub
    if( match( $0, "h.*$" ) ){
       split( $0, arr, "h" );
       mat = substr($0, RSTART, RSTART+RLENGTH)
       printf("get character %s, %s -- %s\n", mat, arr[0], arr[1] );
       replace = "-" mat;
       sub( "h.*$", replace, $0);
       print($0); 
    };
}

{ # show the basic usage of gsub
    printf( "%s ==> ", $0 )
    gsub( "[a-z]", "-", $0 );
    print($0);
}

END{
    print("end ...")
}
$ awk -f test.awk test.txt 
line    $2  length  index of a  subStr(0-3)
1   a   1   3       1 a
1 a ==> 1 -
2   bc  2   0       2 b
2 bc ==> 2 --
3   def 3   0       3 d
3 def ==> 3 ---
4   ghij    4   0       4 g
get character hij,  -- 4 g
4 g-hij
4 g-hij ==> 4 -----
5   klmno   5   0       5 k
5 klmno ==> 5 -----
end ...

数据类型

数值形常量, 支持整形、浮点和科学计数法，数值型计算过程自动转换为 float. 布尔型 true = 1.0
字符串常量, 双引号包围，特殊字符需要转义
变量，用户自定义变量同时具有数值型值和字符串型值，随计算需要自动进行转换，并且在第一次引用时自动创建，初始值为 null，数值为0，字符串值为""

隐式转换

The type of an expression is determined by its context and automatic type conversion occurs if needed.  For example, to evaluate the statements
表达式的类型有内容决定，必要时会执行隐式转换。
y = x + 2  ;  z = x  "hello"
x为变量，可以转换为数值型，2为数值型，因此y为数值型，“hello”为字符串型，x可以转换为字符串型，因此z为字符串型。
字符串转换为数字通过函数 atof, 反之通过 sprintf
In boolean contexts such as, if ( expr ) statement, a string expression evaluates true if and only if it is not the empty string ""; numeric values if and only if not numerically zero.
空字符串和0为假，其他为真

数组

基本操作

$ seq 5 | awk '{arr[NR]=$0} END{ for(l in arr) print l "-" }'
1-
2-
3-
4-
5-

迭代器

for ( var in array ) statement

语句

同 C，且 # 开始的行为注释

控制语句，if for

if ( expr ) statement

if ( expr ) statement else statement

while ( expr ) statement

do statement while ( expr )

for ( opt_expr ; opt_expr ; opt_expr ) statement

for ( var in array ) statement

continue

break

正则

Linux基础之-正则表达式（grep，sed，awk）

正则比较运算符str ~ /pattren/

运算符

assignment = += -= *= /= %= ^=

conditional ? :

logical or ||

logical and &&

array membership in

matching ~ !~

relational < > <= >= == !=

concatenation (no explicit operator)

add ops + -

mul ops * / %

unary + -

logical not !

exponentiation ^

inc and dec ++ -- (both post and pre)

field $

自定义函数

$ cat fun_test.awk 
function hello(n) { return "hello " n }
{ print( hello($0)) }
$ seq 2 | awk -f fun_test.awk 
hello 1
hello 2

杂项

awk命令行参数

$ awk 'BEGIN{print RS} {print($0)}' RS=$RS test.txt # RS为命令行参数 

 a - 
 bc - 
 def - 
 ghij - 
 klmno

获取其他bash命令输出

echo | awk '{ "expression_shell_cmd" | getline cmdout; statement... }'

鸡肋

awk笔记

AWK 简介

基本概念

基本命令形式

命令参数

`-f` 指定 awk 程序文件

`-F` 指定分隔符

设置 RS 变量

`-v` 变量赋值

将外部环境变量传递到 awk 程序

pattern

处理代码块

awk 语言基础

执行方式

内建变量

内建函数

数据类型

隐式转换

数组

基本操作

迭代器

语句

控制语句，if for

正则

运算符

自定义函数

杂项

awk命令行参数

获取其他bash命令输出

你可能感兴趣的:(awk笔记)

awk笔记

AWK 简介

基本概念

基本命令形式

命令参数

-f 指定 awk 程序文件

-F 指定分隔符

设置 RS 变量

-v 变量赋值

将外部环境变量传递到 awk 程序

pattern

处理代码块

awk 语言基础

执行方式

内建变量

内建函数

数据类型

隐式转换

数组

基本操作

迭代器

语句

控制语句，if for

正则

运算符

自定义函数

杂项

awk命令行参数

获取其他bash命令输出

你可能感兴趣的:(awk笔记)

`-f` 指定 awk 程序文件

`-F` 指定分隔符

`-v` 变量赋值