用python分析nginx的access日志

项目正式发布后,有需求要分析下nginx的access日志内容,于是写了如下脚本:


#! /usr/bin/env python
# -*- coding: utf-8 -*-
#@author [email protected]
#@version 2011-04-12 16:34
#Nginx日志分析,初始做成

import os
import fileinput
import re

#日志的位置
dir_log = r"D:\python cmd\nginxlog"

#使用的nginx默认日志格式$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'
#日志分析正则表达式

#203.208.60.230
ipP = r"?P[\d.]*";

#[21/Jan/2011:15:04:41 +0800]
timeP = r"""?P


得到的HTTP状态码的数量如下:

{'200': 287559, '302': 6743, '304': 4074, '404': 152918, '499': 887, '400': 14, '504': 93, '502': 300, '503': 5, '500': 88353}


各IP访问网站的次数如下(前10的IP):

[('220.178.14.98', 323230), ('220.181.94.225', 120870), ('203.208.60.230', 14342), ('61.135.249.220', 6479), ('203.208.60.88', 5426), ('61.135.249.216', 4867), ('123.125.71.94', 1290), ('123.125.71.104', 1282), ('123.125.71.108', 1280), ('123.125.71.110', 1278), 余下不显示]

从原始信息中提取IP后可以做一些额外的分析工作:如访问量前10的IP等 数据量大时采用hashIp后取模再统计

你可能感兴趣的:(Python,Server.Nginx)