awk统计广告点击与独立ip数

每天生成当天日志,按年月存放 如:/2010/6/20_enter.log

日志格式如下:第五列是ip的数字形式

22973,42,3795,218.1.122.166,3657530022,2010-06-17 00:00:00
22402,48,3171,220.248.176.230,3707285734,2010-06-17 00:00:00
22973,42,3795,221.230.17.1,3722842369,2010-06-17 00:00:01
23007,53,4133,58.60.5.146,977012114,2010-06-17 00:00:01
22973,42,3795,113.12.191.122,1896660858,2010-06-17 00:00:01
22873,49,3670,221.200.38.128,3720881792,2010-06-17 00:00:02
22973,42,3795,61.175.135.106,1034913642,2010-06-17 00:00:02
21083,18,1454,125.118.162.87,2104926807,2010-06-17 00:00:02
22973,42,3795,59.50.95.190,993157054,2010-06-17 00:00:02

 

现需要按前三项分组统计出出现次数和独立ip数,形如:

20738_29_1033,3,2
20406_12_610,9,9
22838_49_3631,5,5
21313_45_2197,1,1
20135_14_252,1,1

 

编写awk脚本semstat.awk如下:

{
  group = sprintf("%s_%s_%s",$1,$2,$3)
  sum[group]++
  if(flag[group,$5]!=1)
  {b[group]++}
  flag[group,$5]=1;
}
END{ 
  for(i in sum){
    printf("%s,%d,%d\n",i,sum[i],b[i])
  }
}

 

编写sh脚本semstat.sh用于定时来执行:

#!/bin/sh
semdir=/username#目录
log=${semdir}/semlog.log#日志文件
awkfile=${semdir}/semstat.awk#awk文件

year=$(date +%Y)
month=$(date +%m)
day=$(date +%d)
starttime=$(date +%s)

temp=$(echo ${month}|cut -c 1)
temp1=$(echo ${month}|cut -c 2)
if [ $temp = "0" ];
then 
  month=$temp1
fi

temp=$(echo ${day}|cut -c 1)
temp1=$(echo ${day}|cut -c 2)
if [ $temp = "0" ];
then 
  day=$temp1
fi

logfile=$semdir/$year/$month/${day}_enter.log
csvfile=$semdir/$year/$month/${day}.csv
echo "$(date +%F) $(date +%T) logfile is ${logfile}" >> $log
echo "$(date +%F) $(date +%T) csvfile is ${csvfile}" >> $log

if [ -f $logfile ];
then
  gawk -F "," -f $awkfile $logfile > $csvfile
  echo "$(date +%F) $(date +%T) ${logfile} deal ok" >> $log
else
  echo "$(date +%F) $(date +%T) ${logfile}  not exist" >> $log
fi

 

 赋予semstat.sh执行权限 如:chmod 755 semstat.sh

 

./semstat.sh 执行可以得到csv文件。

 

 

你可能感兴趣的:(脚本语言)