由于工作中需要通过解析业务日志获取特定数据,发现Python原生提供的文本处理方法对于复杂日志格式并不适用,因此选择了Pyparsing这个强大的文本数据处理模块
Word基本使用示例:
from pyparsing import Word, Regex, alphas, Literal, ZeroOrMore, \
Group, nestedExpr, alphanums, originalTextFor, Optional, restOfLine, Suppress, oneOf, ParseException, Combine, \
Dict, delimitedList, QuotedString, SkipTo, stringEnd, nums, DelimitedList
def test_01():
text = 'hello world'
wd = Word(alphas)
pattern = wd + wd
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
['hello', 'world']
Literal基本使用示例:
from pyparsing import Word, Regex, alphas, Literal, ZeroOrMore, \
Group, nestedExpr, alphanums, originalTextFor, Optional, restOfLine, Suppress, oneOf, ParseException, Combine, \
Dict, delimitedList, QuotedString, SkipTo, stringEnd, nums, DelimitedList
def test_01():
text = 'hello world'
wd1 = Literal('hello')
wd2 = Literal('world')
pattern = wd1 + wd2
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
['hello', 'world']
Literal解析失败示例:
def test_03():
text = 'hello world'
wd1 = Literal('hello')
wd2 = Literal('aaaworld')
pattern = wd1 + wd2
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
No match: Expected 'aaaworld', found 'world' (at char 6), (line:1, col:7)
Group基本使用示例:
def test_04():
text = 'a,b,c,d,1,2'
wd = Group(Word('abcd'))
wd2 = Group(Word(nums))
pattern = wd + Group(ZeroOrMore(',' + wd)) + ZeroOrMore(',' + wd2) # 输出结果:[['a', ',', 'b', ',', 'c', ',', 'd']]
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
[['a'], [',', ['b'], ',', ['c'], ',', ['d']], ',', ['1'], ',', ['2']]
Combine基本使用示例:
def test_05():
text = 'ab,c,d,1,2'
wd = Word('abcd')
wd2 = Word(nums)
pattern = Combine(wd + ZeroOrMore(',' + wd)) # 输出结果:['a,b,c,d']
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
['a,b,c,d']
作用:用于保留原始解析的文本,而不管包含的表达式进行的任何标记处理或转换如何
不用originalTextFor的示例:
def test_03():
text = 'a,b,c,d,1,2'
wd = Group(Word('abcd'))
pattern = wd + ZeroOrMore(',' + wd) # 输出结果: [['a'], ',', ['b'], ',', ['c'], ',', ['d']]
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
使用originalTextFor的示例:
def test_03():
text = 'a,b,c,d,1,2'
wd = Group(Word('abcd'))
pattern = originalTextFor(wd + ZeroOrMore(Suppress(',') + wd)) # 输出结果:['a,b,c,d']
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
作用:解析在开始标记和结束标记之间的文本,支持嵌套
nestedExpr的基本使用示例:
def test_03():
text = '(abc (123) (456)) abc'
pattern = nestedExpr('(', ')')
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
[['abc', ['123'], ['456']]]
作用:表示一个可选项,即当前元素出现或不出现都是合法的。在pyparsing中,Optional类常用于构建更复杂的表达式,它可以将一个或多个语法元素标记为可选的
不使用Optional的示例:
def test_03():
text = 'a,b,c,d,1,2'
wd = Group(Word('abcd'))
pattern = wd + ZeroOrMore(',' + wd) # 输出结果:[['a'], ',', ['b'], ',', ['c'], ',', ['d']]
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
使用Optional的示例:
def test_03():
text = 'a,b,c,d,1,2'
wd = Group(Word('abcd'))
pattern = wd + ZeroOrMore(',' + wd) + Optional(restOfLine) # 输出结果:[['a'], ',', ['b'], ',', ['c'], ',', ['d'], ',1,2']
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
注释:文本结尾内容也被正确解析出来
作用:匹配0次或0次以上的解析规则,主要适用于解析文本格式重复的场景
作用:忽略解析结果中的某些字符
有三种实现方式:
不用suppress的示例:
def test_03():
text = 'a,b,c,d,1,2'
wd = Word('abcd')
pattern = wd + ZeroOrMore(',' + wd) # 输出结果: ['a', ',', 'b', ',', 'c', ',', 'd']
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
使用suppress的示例:
def test_03():
text = 'a,b,c,d,1,2'
wd = Word('abcd')
pattern = wd + ZeroOrMore(Suppress(',') + wd) # 输出结果:['a', 'b', 'c', 'd']
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
作用:跳过文本内容直到遇到特定的标记或特定的子句再开始校验其余文本
SkipTo基本使用示例:
def test_03():
text = 'a,b-c,d,1,2,e,f'
wd = Word(alphas)
pattern = SkipTo(Literal("-")) + Literal("-") + ZeroOrMore(wd + ',')
try:
result = pattern.parseString(text)
print(result)
except ParseException as pe:
print(" No match: {0}".format(str(pe)))
输出结果:
['a,b', '-', 'c', ',', 'd', ',']
结果解析:示例中跳过了“a,b”文本后,遇到了特殊标记“-”,停止跳过。从下一个元素开始按解析规则处理文本