PythonShowMeTheCode(0011-12): 检测敏感词

1. 题目

第 0011 题：敏感词文本文件 filtered_words.txt，里面的内容为以下内容，当用户输入敏感词语时，则打印出 Freedom，否则打印出 Human Rights。

第 0012 题：敏感词文本文件 filtered_words.txt，里面的内容和0011题一样，当用户输入敏感词语，则用星号 * 替换，例如当用户输入「北京是个好城市」，则变成「**是个好城市」。

北京
程序员
公务员
领导
牛比
牛逼
你娘
你妈
love
sex
jiangge

2. 思路

首先，从敏感词文件中读取到敏感词汇，放入容器中
然后，获取用户输入，判断输入是否包含敏感词汇，并输出相对应的结果（0012题则需要对字符串进行替换）。

3. 实现

# -*- coding: utf-8 -*-


def get_filters(path):
    if path is None:
        return

    filters = []
    with open(path, encoding="utf-8") as f:
        for line in f.readlines():
            if "\n" in line:
                filters.append(line[:-1])
            else:
                filters.append(line)
    return filters


def main_0011():
    filters = get_filters("1.txt")
    while 1:
        tmp = input("plz input: ")
        if tmp == "0":
            print("Exit")
            break
        else:
            if tmp in filters:
                print("Freedom")
            else: 
               print("Human Rights")


if __name__ == "__main__":
    main_0011()

def main_0012():
    filters = get_filters("1.txt")
    while 1:
        tmp = input("plz input:")
        if tmp == "0":
            print("Exit")
            break
        for filter_word in filters: 
           new_str = "" 
           if filter_word in tmp:
                if len(re.findall(u"[\u4e00-\u9fa5]+", filter_word)) > 0:
                    len_new_str = len(filter_word)
                else:
                    len_new_str = 1

                for i in range(len_new_str): 
                   new_str += "*"
                tmp = str(tmp).replace(filter_word, new_str)

        print(tmp)

0012题要求将北京替换为**两个星号，所以需要先计算敏感词汇的字符数， len()函数即可达到目的。当敏感词汇为一个英文单词时，只替换成一个星号即可。

本例中采用正则来匹配敏感词中是否含有中文。

4. 字符串前面加`r`或者`u`的意义

字符串前面加r是为了告诉编译器这个string是个raw string，不要转义 '' ，一般用于正则表达式， re模块中。
字符串前面加u是为了表示该字符串为Unicode编码的。
字符串前面加b是为了令字符串以字节形式表示。

PythonShowMeTheCode(0011-12): 检测敏感词

1. 题目

2. 思路

3. 实现

4. 字符串前面加r或者u的意义

你可能感兴趣的:(PythonShowMeTheCode(0011-12): 检测敏感词)

4. 字符串前面加`r`或者`u`的意义