awk练习题

1、⽂件ip_list.txt如下格式,请提取“.magedu.com"前⾯的主机名部分并写⼊到 该⽂件中:

1 blog.magedu.com 
2 www.magedu.com 
... 999 
999 study.magedu.com 
awk -F'[ .]' '{print $2}' ip_list.txt >>ip_list.txt

2、统计/etc/fstab⽂件中每个⽂件系统类型出现的次数?

root@qqq:~# awk '/^[\/UUID]/ {fs[$3]++} END {for (a in fs) print a,fs[a]}' /etc/fstab

cat /etc/fstab |awk ' /^[^#]/ {print $3}' | sort  |uniq -c

3、统计/etc/fstab⽂件中每个单词出现的次数?

root@qqq:~# awk  '{for(i=1;i<=NF;i++)word[$i]++}END{for (a in word) print a,word[a]}'   /etc/fstab 

4、提取出字符串Yd$C@M05MB%9Bdh7dq+YVixp3vpw中的所有数字?

root@qqq:~# echo "Yd$C@M05MB%9Bdh7dq+YVixp3vpw" | grep -o [0-9]
#gsub函数,将非数字替换为"",再打印
root@qqq:~# echo "yd$C@M05MB%9&Bdh7dq+yVixp3vpw" | awk '{gsub(/[^0-9]/,"");print $0}' 
05973

5、有⼀⽂件记录了1-100000之间的随机的整数共5000个,存储的格式, 100,50,35,89,。。。请取出其中最⼤和最⼩的整数?

#生成五千个随机数
root@qqq:~# for i in $(seq 5000);do shuf -i 1-100000 -n1 >> 1.txt;done
#取最大和最小

]# awk -F "," '{min=$1;max=$1;for(i=1;i<=NF;i++){if($i>=max){max=$i} else if(min>=$i)min=$i}print min,max}' num.txt

6、解决DOS攻⽣产案例:根据web⽇志或⽹络连接数,监控当某个ip并发连接 数或者短时间内pv达到100,即调⽤防⽕墙命令封掉对应的ip,监控频率每隔5分 钟;防⽕墙命令为iptables -A INPUT -s IP -j REJECT?

netstat -tup | awk '/tcp/ {socket=$5; split(socket,tmps,":");ss[tmps[1]]++} END {for (i in ss){if (ss[i] >0){cmd="iptables -A INPUT -s "i" -j REJECT"; system(cmd);} }}'

vim deny_dos.sh while true ;do awk '/^[0-9]/{IP[$1]++}END{for(i in IP){if (IP[i]>=100)print i}}' /var/log/httpd/access_log | while read line;do iptables -A INPUT -s $line -j REJECT;done
sleep 300 done

7、将以下⽂件内容中FQDN取出并根据其进⾏计数从⾼到低排序:

http://mail.magedu.com/index.html

http://www.magedu.com/test.html

http://study.magedu.com/index.html

http://blog.magedu.com/index.html

http://www.magedu.com/images/logo.jpg

http://blog.magedu.com/20080102.html

[root@textbox ~]# sed -rn 's#.*//(.*.com)/.*#\1#p' 1.log | sort -rn | uniq -c | sort -rn
[root@textbox ~]# awk -F'/+' '/^http/  {DOMAIN[$2]++}  END {for(i in DOMAIN)print DOMAIN[i],i}  ' 1.log  | sort -rn

8、将以下⽂本以inode为标记,对inode相同的counts进⾏累加,并且统计出

同一inode中,beginnumber的最小值和endnumber的最大值 inode|beginnumber|endnumber|counts| 106|3363120000|3363129999|10000| 106|3368560000|3368579999|20000| 310|3337000000|3337000100|101| 310|3342950000|3342959999|10000| 310|3362120960|3362120961|2| 311|3313460102|3313469999|9898| 311|3313470000|3313499999|30000| 311|3362120962|3362120963|2| 输出的结果格式为: 310|3337000000|3362120961|10103| 311|3313460102|3362120963|39900| 106|3363120000|3368579999|30000|

[root@textbox ~]# awk -F'|' -v OFS='|' '/^[0-9]/{inode[$1]++; if(!bn[$1]){bn[$1]=$2}else if(bn[$1]>$2) {bn[$1]=$2}; if(en[$1]<$3)en[$1]=$3;cnt[$1]+=$(NF-1)} END{for(i in inode)print i,bn[i],en[i],cnt[i]}' 1.log

9、使⽤awk统计当前主机的并发访问量?

[root@192-168-38-140 ~]# netstat -tan | awk '/^tcp/ {++state[$NF]} END {for(key in state) print key,"\t",state[key]}'
LISTEN   8
ESTABLISHED      6

10、使⽤awk命令,计算⼀个⽬录下⽂件⼤⼩的总和?

[root@192-168-38-140 ~]# ll  | awk 'BEGIN {sum=0} {sum+=$5} END {print sum}'
12529

11、统计apache访问⽇志流量排名前10个ip?

`awk ‘{a[$1] += 1;} END {for (i in a) printf(“%d %s\n”, a[i], i);}’ 日志文件 | sort -n | tail`

12、、nginx的access.log⽇志如下,⽤shell实现,将状态码为200的请求的ip访问 排名前10个列出来:

172.18.116.232 - - [18/May/2018:00:20:29 -0400] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36" "-" 

172.18.116.232 - - [18/May/2018:00:20:29 -0400] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36" "-" 
awk '($9 ~ /200/)' access.log | awk '{print $9,$7}' | sort -nr |head -n10