这篇文章说是原创的其实里面包含了很多朋友的帮助,在此对朋友们表示感谢!!
前天开发的同事让我帮忙分析下 nginx访问日志,我用了awstat做成了图表,结果人家说不要图,他只要访问日志里面的4个值...(早说啊),我看了下nginx的日志格式,下面是其中一段
124.227.66.162 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "uid=355288&cuid=355287×tamp=1264484517&check=68230e418e28a9d05b8cf1e2f7cbf392&action=plantInfo" 200 1019 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -
124.240.39.49 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "cid=2&lid=4&oid=2&action=researchLayer&cuid=496990×tamp=1264398138&check=b50cd4ade18c0797df24cb1a8828ae18" 200 219 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" -
121.236.118.126 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "check=8ec1521fc3df9c03d83af9a4d933dbb0&cuid=509590×tamp=1264398703&oid=2&action=oreInfo" 200 261 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 1.7; .NET CLR 2.0.50727)" -
同事让我帮忙取 IP地址 时间 还有 cuid= 和 action= 的值
看上去好乱,但是还是有规律的,里面好多行没有 action 和cuid,我先把他过滤掉
awk '/action/{print $0}' access.log > action.log
因为 如果有action 就肯定会有cuid 所以只过滤一个action就好了
现在的所有行都有 cuid 和 action了
好了,我再来改一改格式,让他看起来更清晰一些
awk -F "[ '&''[']" '{print $1"\t"$5"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15}' action.log > newlog
这样比较麻烦,不过确实能让他更清晰一些,下面是得到的结果
117.83.131.36 25/Jan/2010:14:31:34 "uid=438824 cuid=511252 timestamp=1264401079 check=fbb9ad922f01888e6c0757d117bf304e action=plantInfo" 200
221.9.32.181 25/Jan/2010:14:31:34 "cuid=506517 action=plantInfo timestamp=1264401075 check=01661377f346538eba790e856dd3713a uid=539860" 200
221.178.128.146 25/Jan/2010:14:31:34 "timestamp=1264401105 check=7d5e41feeb3ae0482e1fe990f27ddc67 cuid=303367 display=1 action=plantInfuid=303367"
124.131.80.68 25/Jan/2010:14:31:34 "cuid=393678 timestamp=1264401093 action=checkResearchLayer check=2f2cc50cc99aa9e05f02b6f6a47cbef6"200 765
125.107.199.28 25/Jan/2010:14:31:34 "timestamp=1264401094 oid=4 uid=350003 action=oreInfo check=5d835e252b841c86da041b8b63b4b67e cuid=356549"
111.167.145.209 25/Jan/2010:14:31:34 "action=plantInfo cuid=154228 timestamp=1264401094 check=5d835e252b841c86da041b8b63b4b67e uid=372981" 200
看到这里我有点发愁了,因为cuid 和 action 所在的列不是固定的,用简单的AWK过滤不行,需要借助AWK的循环和判断了,而这方面我没有做过于是就在群里发了求助信息,这时候有两个朋友 给了我回复一个是 辉太郎 另一个是 jeremy.zhang
他们的方案也不同,一个是用perl 脚本,另一个是直接用awk
先说说 用perl吧,其实perl我也不太懂,直接把他写的脚本贴上来
#!/usr/bin/perl -w
open(MYFILE,"/mnt/disk/newlog") || die "$!";
while()
{
$str = $_;
if ($str =~ m/(.*?)\[/s)
{
$var1 = $1;
print $var1;
}
if ($str =~ m/\[(.*?)\"/s)
{
$var4 = $1;
print $var4;
}
if ($str =~ m/cuid=(\d+)/s)
{
$var2 = $1;
print "cuid=",$var2,"\t";
}
if ($str =~ m/action=(\w+)/s)
{
$var3 = $1;
print "action=",$var3,"\n";
}
}
/mnt/disk/newlog 这个是我刚才过滤出来的文件,执行的时候用perl 执行
perl 1.sh > newlog1
但是这条我执行后格式出了一点小偏差
124.197.61.124 25/Jan/2010:14:42:17 cuid=430334 action=plantInfo
54955 124.79.7.236 25/Jan/2010:14:42:17 cuid=318701 action=petsInfo
54956 122.230.66.90 25/Jan/2010:14:42:17 cuid=223422 action=compQuest
54957 113.128.147.225 25/Jan/2010:14:42:17 cuid=362043 action=plantInfo
54958 220.184.20.99 25/Jan/2010:14:42:17 cuid=484582 action=wordInfo
54959 222.161.49.201 25/Jan/2010:14:42:17 cuid=304167 218.95.48.90 25/Jan/2010:14:42:17 cuid=476480 action=plantInfo
54960 218.106.242.20 25/Jan/2010:14:42:17 cuid=501942 action=oreInfo
54961 221.137.223.58 25/Jan/2010:14:42:17 cuid=445595 action=takeQuest
54962 124.126.155.202 25/Jan/2010:14:42:17 cuid=0 action=initData
54963 113.224.227.68 25/Jan/2010:14:42:17 cuid=529218 action=editName
54964 121.4.66.146 25/Jan/2010:14:42:17 cuid=187626 action=researchLayer
54965 220.190.82.170 25/Jan/2010:14:42:17 cuid=62789 action=steal
54966 218.5.38.250 25/Jan/2010:14:42:17 cuid=456212 124.90.203.86 25/Jan/2010:14:42:17 cuid=492016 action=oreInfo
但是总体来讲还是可以接受的,谢谢辉太郎
下面看看 jeremy 的awk 命令,
第一步 awk '/action/{print $0}' access.log >tmp.log 过滤出包含action的行
第二步
awk '{print $1"\t"$4"\t"$9}' tmp.log > action.log
将没用的列去掉
第三部
过滤并输出 IP 时间 cuid= action=
awk -F"[ '['\"'&''=']+" '{printf $1"\t"$2"\t";for(i=3;i<=NF;i++){if($i=="cuid" || $i=="action")printf "%s",$i"="$(i+1)"\t"};printf "\n"}' action.log > cuid_action.log
下面是最终的结果
202.113.30.144 25/Jan/2010:13:42:07 cuid=181188 action=compound
124.227.66.162 25/Jan/2010:13:42:07 cuid=355287 action=plantInfo
124.240.39.49 25/Jan/2010:13:42:07 action=researchLayer cuid=496990
121.236.118.126 25/Jan/2010:13:42:07 cuid=509590 action=oreInfo
113.139.18.82 25/Jan/2010:13:42:07 cuid=512461 action=oreInfo
222.184.232.183 25/Jan/2010:13:42:07 cuid=520595 action=oreInfo
218.59.80.95 25/Jan/2010:13:42:07 cuid=293339 action=questInfo
221.6.38.37 25/Jan/2010:13:42:07 action=plantInfo cuid=518015
125.39.143.96 25/Jan/2010:13:42:07 cuid=133987 action=pkResult
119.180.17.218 25/Jan/2010:13:42:07 cuid=452667 action=wordInfo
其实上面这三步可以合并成一步但是分开来弄更清晰一些
大家可以通过修改上面这些命令来 定制过滤自己需要的字段,希望对大家有所帮助
再次感谢jerrmy
前天开发的同事让我帮忙分析下 nginx访问日志,我用了awstat做成了图表,结果人家说不要图,他只要访问日志里面的4个值...(早说啊),我看了下nginx的日志格式,下面是其中一段
124.227.66.162 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "uid=355288&cuid=355287×tamp=1264484517&check=68230e418e28a9d05b8cf1e2f7cbf392&action=plantInfo" 200 1019 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -
124.240.39.49 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "cid=2&lid=4&oid=2&action=researchLayer&cuid=496990×tamp=1264398138&check=b50cd4ade18c0797df24cb1a8828ae18" 200 219 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" -
121.236.118.126 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "check=8ec1521fc3df9c03d83af9a4d933dbb0&cuid=509590×tamp=1264398703&oid=2&action=oreInfo" 200 261 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 1.7; .NET CLR 2.0.50727)" -
同事让我帮忙取 IP地址 时间 还有 cuid= 和 action= 的值
看上去好乱,但是还是有规律的,里面好多行没有 action 和cuid,我先把他过滤掉
awk '/action/{print $0}' access.log > action.log
因为 如果有action 就肯定会有cuid 所以只过滤一个action就好了
现在的所有行都有 cuid 和 action了
好了,我再来改一改格式,让他看起来更清晰一些
awk -F "[ '&''[']" '{print $1"\t"$5"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15}' action.log > newlog
这样比较麻烦,不过确实能让他更清晰一些,下面是得到的结果
117.83.131.36 25/Jan/2010:14:31:34 "uid=438824 cuid=511252 timestamp=1264401079 check=fbb9ad922f01888e6c0757d117bf304e action=plantInfo" 200
221.9.32.181 25/Jan/2010:14:31:34 "cuid=506517 action=plantInfo timestamp=1264401075 check=01661377f346538eba790e856dd3713a uid=539860" 200
221.178.128.146 25/Jan/2010:14:31:34 "timestamp=1264401105 check=7d5e41feeb3ae0482e1fe990f27ddc67 cuid=303367 display=1 action=plantInfuid=303367"
124.131.80.68 25/Jan/2010:14:31:34 "cuid=393678 timestamp=1264401093 action=checkResearchLayer check=2f2cc50cc99aa9e05f02b6f6a47cbef6"200 765
125.107.199.28 25/Jan/2010:14:31:34 "timestamp=1264401094 oid=4 uid=350003 action=oreInfo check=5d835e252b841c86da041b8b63b4b67e cuid=356549"
111.167.145.209 25/Jan/2010:14:31:34 "action=plantInfo cuid=154228 timestamp=1264401094 check=5d835e252b841c86da041b8b63b4b67e uid=372981" 200
看到这里我有点发愁了,因为cuid 和 action 所在的列不是固定的,用简单的AWK过滤不行,需要借助AWK的循环和判断了,而这方面我没有做过于是就在群里发了求助信息,这时候有两个朋友 给了我回复一个是 辉太郎 另一个是 jeremy.zhang
他们的方案也不同,一个是用perl 脚本,另一个是直接用awk
先说说 用perl吧,其实perl我也不太懂,直接把他写的脚本贴上来
#!/usr/bin/perl -w
open(MYFILE,"/mnt/disk/newlog") || die "$!";
while(
{
$str = $_;
if ($str =~ m/(.*?)\[/s)
{
$var1 = $1;
print $var1;
}
if ($str =~ m/\[(.*?)\"/s)
{
$var4 = $1;
print $var4;
}
if ($str =~ m/cuid=(\d+)/s)
{
$var2 = $1;
print "cuid=",$var2,"\t";
}
if ($str =~ m/action=(\w+)/s)
{
$var3 = $1;
print "action=",$var3,"\n";
}
}
/mnt/disk/newlog 这个是我刚才过滤出来的文件,执行的时候用perl 执行
perl 1.sh > newlog1
但是这条我执行后格式出了一点小偏差
124.197.61.124 25/Jan/2010:14:42:17 cuid=430334 action=plantInfo
54955 124.79.7.236 25/Jan/2010:14:42:17 cuid=318701 action=petsInfo
54956 122.230.66.90 25/Jan/2010:14:42:17 cuid=223422 action=compQuest
54957 113.128.147.225 25/Jan/2010:14:42:17 cuid=362043 action=plantInfo
54958 220.184.20.99 25/Jan/2010:14:42:17 cuid=484582 action=wordInfo
54959 222.161.49.201 25/Jan/2010:14:42:17 cuid=304167 218.95.48.90 25/Jan/2010:14:42:17 cuid=476480 action=plantInfo
54960 218.106.242.20 25/Jan/2010:14:42:17 cuid=501942 action=oreInfo
54961 221.137.223.58 25/Jan/2010:14:42:17 cuid=445595 action=takeQuest
54962 124.126.155.202 25/Jan/2010:14:42:17 cuid=0 action=initData
54963 113.224.227.68 25/Jan/2010:14:42:17 cuid=529218 action=editName
54964 121.4.66.146 25/Jan/2010:14:42:17 cuid=187626 action=researchLayer
54965 220.190.82.170 25/Jan/2010:14:42:17 cuid=62789 action=steal
54966 218.5.38.250 25/Jan/2010:14:42:17 cuid=456212 124.90.203.86 25/Jan/2010:14:42:17 cuid=492016 action=oreInfo
但是总体来讲还是可以接受的,谢谢辉太郎
下面看看 jeremy 的awk 命令,
第一步 awk '/action/{print $0}' access.log >tmp.log 过滤出包含action的行
第二步
awk '{print $1"\t"$4"\t"$9}' tmp.log > action.log
将没用的列去掉
第三部
过滤并输出 IP 时间 cuid= action=
awk -F"[ '['\"'&''=']+" '{printf $1"\t"$2"\t";for(i=3;i<=NF;i++){if($i=="cuid" || $i=="action")printf "%s",$i"="$(i+1)"\t"};printf "\n"}' action.log > cuid_action.log
下面是最终的结果
202.113.30.144 25/Jan/2010:13:42:07 cuid=181188 action=compound
124.227.66.162 25/Jan/2010:13:42:07 cuid=355287 action=plantInfo
124.240.39.49 25/Jan/2010:13:42:07 action=researchLayer cuid=496990
121.236.118.126 25/Jan/2010:13:42:07 cuid=509590 action=oreInfo
113.139.18.82 25/Jan/2010:13:42:07 cuid=512461 action=oreInfo
222.184.232.183 25/Jan/2010:13:42:07 cuid=520595 action=oreInfo
218.59.80.95 25/Jan/2010:13:42:07 cuid=293339 action=questInfo
221.6.38.37 25/Jan/2010:13:42:07 action=plantInfo cuid=518015
125.39.143.96 25/Jan/2010:13:42:07 cuid=133987 action=pkResult
119.180.17.218 25/Jan/2010:13:42:07 cuid=452667 action=wordInfo
其实上面这三步可以合并成一步但是分开来弄更清晰一些
大家可以通过修改上面这些命令来 定制过滤自己需要的字段,希望对大家有所帮助
再次感谢jerrmy