Pyparsing模块使用介绍

一、背景介绍

由于工作中需要通过解析业务日志获取特定数据,发现Python原生提供的文本处理方法对于复杂日志格式并不适用,因此选择了Pyparsing这个强大的文本数据处理模块

二、使用示例

2.1 Word与Literal使用与区别

  • Word:模糊匹配
  • Literal:完全匹配

Word基本使用示例:

from pyparsing import Word, Regex, alphas, Literal, ZeroOrMore, \
    Group, nestedExpr, alphanums, originalTextFor, Optional, restOfLine, Suppress, oneOf, ParseException, Combine, \
    Dict, delimitedList, QuotedString, SkipTo, stringEnd, nums, DelimitedList

def test_01():
	text = 'hello world'
	wd = Word(alphas)
    pattern = wd + wd
	
    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

['hello', 'world']

Literal基本使用示例:

from pyparsing import Word, Regex, alphas, Literal, ZeroOrMore, \
    Group, nestedExpr, alphanums, originalTextFor, Optional, restOfLine, Suppress, oneOf, ParseException, Combine, \
    Dict, delimitedList, QuotedString, SkipTo, stringEnd, nums, DelimitedList
    
def test_01():
    text = 'hello world'
    wd1 = Literal('hello')
    wd2 = Literal('world')
    pattern = wd1 + wd2

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

['hello', 'world']

Literal解析失败示例:

def test_03():
    text = 'hello world'
    wd1 = Literal('hello')
    wd2 = Literal('aaaworld')
    pattern = wd1 + wd2

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

No match: Expected 'aaaworld', found 'world'  (at char 6), (line:1, col:7)

2.2 Group和Combine使用与区别

  • Group:不影响解析规则,但是会影响解析结果。在一个Group里的解析结果同属于一个数组里的元素;不在一个Group里的解析结果分属于不同数组里的元素
  • Combine:目前感觉最大的区别是解析结果都放到同一个数组中

Group基本使用示例:

def test_04():
    text = 'a,b,c,d,1,2'
    wd = Group(Word('abcd'))
    wd2 = Group(Word(nums))
    pattern = wd + Group(ZeroOrMore(',' + wd)) + ZeroOrMore(',' + wd2)  # 输出结果:[['a', ',', 'b', ',', 'c', ',', 'd']]

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

[['a'], [',', ['b'], ',', ['c'], ',', ['d']], ',', ['1'], ',', ['2']]

Combine基本使用示例:

def test_05():
    text = 'ab,c,d,1,2'
    wd = Word('abcd')
    wd2 = Word(nums)
    pattern = Combine(wd + ZeroOrMore(',' + wd))  # 输出结果:['a,b,c,d']
    
    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

['a,b,c,d']

2.3 originalTextFor使用

作用:用于保留原始解析的文本,而不管包含的表达式进行的任何标记处理或转换如何

不用originalTextFor的示例:

def test_03():
    text = 'a,b,c,d,1,2'
    wd = Group(Word('abcd'))
    pattern = wd + ZeroOrMore(',' + wd)   # 输出结果: [['a'], ',', ['b'], ',', ['c'], ',', ['d']]

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

使用originalTextFor的示例:

def test_03():
    text = 'a,b,c,d,1,2'
    wd = Group(Word('abcd'))
    pattern = originalTextFor(wd + ZeroOrMore(Suppress(',') + wd))  # 输出结果:['a,b,c,d']

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

2.4 nestedExpr使用

作用:解析在开始标记和结束标记之间的文本,支持嵌套

nestedExpr的基本使用示例:

def test_03():
    text = '(abc (123) (456)) abc'
    pattern = nestedExpr('(', ')')

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

[['abc', ['123'], ['456']]]

2.5 Optional使用

作用:表示一个可选项,即当前元素出现或不出现都是合法的。在pyparsing中,Optional类常用于构建更复杂的表达式,它可以将一个或多个语法元素标记为可选的

不使用Optional的示例:

def test_03():
    text = 'a,b,c,d,1,2'
    wd = Group(Word('abcd'))
    pattern = wd + ZeroOrMore(',' + wd)   # 输出结果:[['a'], ',', ['b'], ',', ['c'], ',', ['d']]

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

使用Optional的示例:

def test_03():
    text = 'a,b,c,d,1,2'
    wd = Group(Word('abcd'))
    pattern = wd + ZeroOrMore(',' + wd) + Optional(restOfLine)   # 输出结果:[['a'], ',', ['b'], ',', ['c'], ',', ['d'], ',1,2']

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

注释:文本结尾内容也被正确解析出来

2.6 ZeroOrMore使用

作用:匹配0次或0次以上的解析规则,主要适用于解析文本格式重复的场景

2.7 suppress使用

作用:忽略解析结果中的某些字符
有三种实现方式:

  • Suppress(‘,’)
  • Literal(‘,’).suppress()
  • Suppress(Literal(‘,’))

不用suppress的示例:

def test_03():
    text = 'a,b,c,d,1,2'
    wd = Word('abcd')
    pattern = wd + ZeroOrMore(',' + wd)  # 输出结果: ['a', ',', 'b', ',', 'c', ',', 'd']

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

使用suppress的示例:

def test_03():
    text = 'a,b,c,d,1,2'
    wd = Word('abcd')
    pattern = wd + ZeroOrMore(Suppress(',') + wd)  # 输出结果:['a', 'b', 'c', 'd']

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

2.8 SkipTo使用

作用:跳过文本内容直到遇到特定的标记或特定的子句再开始校验其余文本

SkipTo基本使用示例:

def test_03():
    text = 'a,b-c,d,1,2,e,f'
    wd = Word(alphas)
    pattern = SkipTo(Literal("-")) + Literal("-") + ZeroOrMore(wd + ',')

    try:
        result = pattern.parseString(text)
        print(result)
    except ParseException as pe:
        print("  No match: {0}".format(str(pe)))

输出结果:

['a,b', '-', 'c', ',', 'd', ',']

结果解析:示例中跳过了“a,b”文本后,遇到了特殊标记“-”,停止跳过。从下一个元素开始按解析规则处理文本

你可能感兴趣的:(python)