root@localhost logs] cat access_log
10.12.29.250 - - [10/Oct/2017:10:41:19 +0800] "GET /favicon.ico HTTP/1.1" 404 209
10.12.29.250 - - [10/Oct/2017:10:41:49 +0800] "GET /favicon.ico HTTP/1.1" 404 209
10.12.29.250 - - [10/Oct/2017:10:42:00 +0800] "GET /kaoshi HTTP/1.1" 301 234
10.12.29.250 - - [10/Oct/2017:10:43:09 +0800] "GET / HTTP/1.1" 200 2271
10.12.29.250 - - [10/Oct/2017:10:43:24 +0800] "GET /kaoshi HTTP/1.1" 301 234
10.12.29.250 - - [10/Oct/2017:10:44:43 +0800] "GET /kaoshi HTTP/1.1" 301 234
10.11.37.15 - - [10/Oct/2017:10:46:26 +0800] "GET / HTTP/1.1" 200 2271
10.11.37.15 - - [10/Oct/2017:10:46:27 +0800] "GET /favicon.ico HTTP/1.1" 404 209
10.11.37.15 - - [10/Oct/2017:10:46:28 +0800] "GET /favicon.ico HTTP/1.1" 404 209
10.12.29.250 - - [10/Oct/2017:10:46:53 +0800] "GET /kaoshi/ HTTP/1.1" 200 528
...................................
方法1;
使用awk 取列;sort 是重复的内容相邻 -r 倒序 -n 数字方式排序 uniq 去重 -c 去重并显示重复次数
[root@localhost logs]# awk '{print $1}' access_log |sort | uniq -c | sort -rn | head -5
25 10.13.61.250
11 10.13.15.134
2 10.13.65.251
2 10.11.45.198
1 10.12.46.235
[root@localhost logs]#
方法2:awk 数组
[root@localhost logs]# awk '{array[$1]++}END{for(key in array) print array[key],key }' access_log |sort -rn | haed -5
25 10.13.61.250
11 10.13.15.134
2 10.13.65.251
2 10.11.45.198
1 10.12.46.235
[root@localhost logs]#
说明:awk 默认是空格分隔;在access_log日志里第一列是用户IP;把$1(用户的IP)当作数组array的下标;IP出现的次数是数组的数值;for (key in array)遍历数组;
如array[10.13.15.134]开始是数值是0;当遍历数组时发现它一次自身数值就加1;数组遍历结束就会输出出现的次数和IP;array[key](次数)数组变量值;key(IP)数组下标
[root@localhost logs]# awk '{array[$1]++}END{for(key in array) if (array[key] > 1000){print array[key],key} }' access_log
2317 10.11.37.15
1556 10.11.33.16
1156 10.13.48.251
1681 10.11.43.120
1356 10.12.29.250
2441 10.11.21.17
1208 10.11.35.12
1220 10.13.15.235
1038 10.12.36.247
1711 10.12.21.252
1262 10.10.41.163
1278 10.11.42.26
1501 10.10.41.68
[root@localhost logs]#