awk数组实例1:统计访问web的日志用户IP数量并排序

root@localhost logs] cat access_log

10.12.29.250 - - [10/Oct/2017:10:41:19 +0800] "GET /favicon.ico HTTP/1.1" 404 209

10.12.29.250 - - [10/Oct/2017:10:41:49 +0800] "GET /favicon.ico HTTP/1.1" 404 209

10.12.29.250 - - [10/Oct/2017:10:42:00 +0800] "GET /kaoshi HTTP/1.1" 301 234

10.12.29.250 - - [10/Oct/2017:10:43:09 +0800] "GET / HTTP/1.1" 200 2271

10.12.29.250 - - [10/Oct/2017:10:43:24 +0800] "GET /kaoshi HTTP/1.1" 301 234

10.12.29.250 - - [10/Oct/2017:10:44:43 +0800] "GET /kaoshi HTTP/1.1" 301 234

10.11.37.15 - - [10/Oct/2017:10:46:26 +0800] "GET / HTTP/1.1" 200 2271

10.11.37.15 - - [10/Oct/2017:10:46:27 +0800] "GET /favicon.ico HTTP/1.1" 404 209

10.11.37.15 - - [10/Oct/2017:10:46:28 +0800] "GET /favicon.ico HTTP/1.1" 404 209

10.12.29.250 - - [10/Oct/2017:10:46:53 +0800] "GET /kaoshi/ HTTP/1.1" 200 528

...................................

方法1;

使用awk 取列;sort 是重复的内容相邻 -r 倒序 -n 数字方式排序 uniq 去重 -c 去重并显示重复次数

[root@localhost logs]# awk '{print $1}' access_log |sort | uniq -c | sort -rn | head  -5

25 10.13.61.250

11 10.13.15.134

2 10.13.65.251

2 10.11.45.198

1 10.12.46.235

[root@localhost logs]#

方法2:awk 数组

[root@localhost logs]# awk '{array[$1]++}END{for(key in array) print array[key],key }' access_log |sort -rn | haed -5

25 10.13.61.250

11 10.13.15.134

2 10.13.65.251

2 10.11.45.198

1 10.12.46.235

[root@localhost logs]#

说明:awk 默认是空格分隔;在access_log日志里第一列是用户IP;把$1(用户的IP)当作数组array的下标;IP出现的次数是数组的数值;for (key in array)遍历数组;

如array[10.13.15.134]开始是数值是0;当遍历数组时发现它一次自身数值就加1;数组遍历结束就会输出出现的次数和IP;array[key](次数)数组变量值;key(IP)数组下标

[root@localhost logs]# awk '{array[$1]++}END{for(key in array) if (array[key] > 1000){print array[key],key} }' access_log

2317 10.11.37.15

1556 10.11.33.16

1156 10.13.48.251

1681 10.11.43.120

1356 10.12.29.250

2441 10.11.21.17

1208 10.11.35.12

1220 10.13.15.235

1038 10.12.36.247

1711 10.12.21.252

1262 10.10.41.163

1278 10.11.42.26

1501 10.10.41.68

[root@localhost logs]#

你可能感兴趣的:(awk数组实例1:统计访问web的日志用户IP数量并排序)