Python 提取邮件头基本信息

1 邮件内容

假设目前邮件名叫“1.txt”,邮件内容为:

From:   [email protected] on behalf of Bieber
Leader [[email protected]]
Sent:   2017-07-01 12:48
To: '[email protected]'; [email protected];
Willim Johnson; John Snow
Subject:    The battlefield in Winterfell


I have just met then. More details as soon as possible. So far, so good.

Sent via iPhone 7 plus

2 提取思路

  • 要求把邮件头部信息提取出来,需要提取信息:
    • 发件人(From:)、发件时间(Sent)、收件人(To)、主题(Subject)
  • 初步提取信息所在行的内容即可。
  • 使用一个提取函数,把四个关键词放入数组中,用正则提取。
  • 四个信息都做了全局函数,如果曾经匹配过,则全局函数 + 1,以做标识。
  • 如果一个信息已经匹配过,而下一个信息还没匹配到,这一行的内容也需要读取出来。
  • 提取函数的返回值,如果是 None 则不做处理。
# coding: utf-8
import re

from_count = 0
sent_count = 0
to_count = 0
subject_count = 0


def inspect_string(string):
    global from_count
    global sent_count
    global to_count
    global subject_count

    keyword_list = ['From:', 'Sent:', 'To:', 'Subject:']
    for keyword in keyword_list:
        regex_str = ".*({0}.*)".format(keyword)
        match_obj = re.match(regex_str, string)

        if re.match(".*(From:.*)", string):
            from_count += 1

        if re.match(".*(Sent:.*)", string):
            sent_count += 1

        if re.match(".*(To:.*)", string):
            to_count += 1

        if re.match(".*(Subject:.*)", string):
            subject_count += 1

        if match_obj:
            return match_obj.group(1)

        if from_count > 0 and sent_count < 1:
            return string

        if sent_count > 0 and to_count < 1:
            return string

        if to_count > 0 and subject_count < 1:
            return string


with open('1.txt', 'rb') as f:
    for line in f:
        result = inspect_string(str(line))
        if result is None:
            continue
        print(result)

3 运行结果

From:   [email protected] on behalf of Bieber
Leader [[email protected]]

Sent:   2017-07-01 12:48

To: '[email protected]'; [email protected];

Willim Johnson; John Snow

Subject:    The battlefield in Winterfell

你可能感兴趣的:(Python 提取邮件头基本信息)