Python Regular Expression——简介

1. 正则表达式使用的特殊符号和字符

记号

说明

正则表达式样例

literal

匹配字符串的值

foo

re1 | re2

匹配正则表达式re1或re2

foo | bar

.

匹配任何字符(换行符除外)

b.b

^

匹配字符串的开始

^Dear

$

匹配字符串的结尾

/bin/*sh$

*

匹配前面出现的正则表达式零次或多次

[A-Za-z0-9]*

+

匹配前面出现的正则表达式一次或多次

[a-z]+\.com

?

匹配前面出现的正则表达式零次或一次

goo?

{N}

匹配前面出现的正则表达式N次

[0-9]{3}

{M,N}

匹配重复出现M次到N次的正则表达式

[0-9]{5,9}

[…]

匹配字符组里出现的任意一个字符

[aeiou]

[..x-y..]

匹配从字符x到y中出现的任意一个字符

[0-9], [A-Za-z]

[^…]

不匹配此字符集中出现的任何一个字符,包括某一范围的字符(如果在此字符集中出现)

[^aeiou], [^A-Za-z0-9_]

(* | + | ? | {} )?

用于上面出现的任何“非贪婪”。版本重复匹配次数符号

.*?[a-z]

(…)

匹配封闭括号中正则表达式(RE),并保存为子组

([0-9]{3})?, f(oo|u)bar

特殊字符

 

 

\d

匹配任何数字,和[0-9]一样(\D是\d的反义:任何非数字符)

data\d.txt

\w

匹配任何数字字母字符,和[A-Za-z0-9_]相同(\W是\w的反义)

[A-Za-z_]\w+

\s

匹配任何空白符,和[\n\t\r\v\f]相同,(\S是\s的反义)

of\sthe

\b

匹配单词边界(\B是\b的反义)

\bThe\b

\nn

匹配已保存的子组(请参考上面的正则表达式符号:(…))

price:\16

\c

逐一匹配特殊字符c(即,取消它的特殊含义,按字面匹配)

\., \\, \*

\A (\Z)

匹配字符串的起始(结束)

\ADear


2. Python 的re模块:核心函数和方法

模块的函数

compile(pattern, flags=0)

compile RE pattern with any optional flags and return a regex object

re模块的函数和regex对象的方法

match(pattern, string, flags=0)

attempt to match RE pattern to string with optional flags; return match object on success, None on failure

search(pattern, string, flags=0)

search for first occurrence of RE pattern within string with optional flags; return match object on success, None on failure

findall(pattern, string)

look for all (non-overlapping) occurrences of pattern in string; return a list of matches (new as of Python 1.5.2)

split(pattern, string, max=0)

split string into a list according to RE pattern delimiter and return list of successful matches, splitting at most max times (split all occurrences is the default)

sub(pattern, repl, string, max=0)

replace all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided (also see subn() which, in addition, returns the number of substitutions made)

匹配对象的方法

group(num=0)

return entire match (or specific subgroup num)

groups()

return all matching subgroups in a tuple (empty if there weren't any)


3. 正则表达式的用法

待过滤字符串文件
Tue Nov  7 18:38:48 1995::[email protected]::815740728-6-12
Mon Dec 12 15:09:50 1977::[email protected]::250758590-5-12
Thu Apr 13 15:40:25 1972::[email protected]::71998825-6-7
Mon Jan 31 07:58:42 1994::[email protected]::759974322-4-7
Mon Nov 16 12:30:48 1970::[email protected]::27577848-7-11
Fri Jul  5 01:07:39 1996::[email protected]::836500059-5-8

Python脚本

import re

for line in open('data.log'):
#       result = re.findall('.*(h).*(h).*', line)
        line = line.rstrip()
#       result = re.split('::|\n', line)
#       result = re.match('^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)', line)
#       result = re.search('.+?(\d+-\d+-\d+)', line)
        result = re.search('-(\d)-', line)
#       if len(result) != 0:
        if result is not None:
        #       print result
#               print result.group()
                print result.group(1)
注释中的语句为re模块的一些常见用法

你可能感兴趣的:(码农生活)