每日算法6 —— UVa10815 安迪的第一个字典 Andy‘s First Dictionary

从今天开始,题目难度会有所增加,题解也会丰富起来啦 (欢迎大家在评论区讨论哦~)

一、Question

1. 题目描述

Andy, 8, has a dream - he wants to produce his very own dictionary. This is not an easy task for him, as the number of words that he knows is, well, not quite enough. Instead of thinking up all the words himself, he has a briliant idea. From his bookshelf he would pick one of his favourite story books, from which he would copy out all the distinct words. By arranging the words in alphabetical order, he is done! Of course, it is a really time-consuming job, and this is where a computer program is helpful. You are asked to write a program that lists all the different words in the input text. In this problem, a word is defined as a consecutive sequence of alphabets, in upper and/or lower case. Words with only one letter are also to be considered. Furthermore, your program must be CaSe InSeNsItIvE. For example, words like “Apple”, “apple” or “APPLE” must be considered the same.

题目要求:输入一段文本,找出所有不同的单词(连续的字母序列),按字典序从小到大输出,单词不区分大小写。

2. Input

The input file is a text with no more than 5000 lines. An input line has at most 200 characters. Input is terminated by EOF.

3. Output

Your output should give a list of different words that appears in the input text, one in a line. The words should all be in lower case, sorted in alphabetical order. You can be sure that he number of distinct words in the text does not exceed 5000.

4. Sample Input

Adventures in Disneyland
Two blondes were going to Disneyland when they came to a fork in the road. The sign read: "Disneyland Left."
So they went home.

5. Sample Output

a
adventures
blondes
came
disneyland
fork
going
home
in
left
read
road
sign
so
the
they
to
two
went
were
when

二、题解

1. C++

题目要求我们找出所有不同的单词,也就是说,要删掉重复的单词,而且还要按字典序从小到大排列。 如果熟知STL的话就会条件反射出C++中的集合set,set中元素自动从小到大排列好,而且集合本身性质之一就是无重复元素,因此利用set解题是最佳的选择。此外,题目不要求区分大小写,因此使用tolower函数将所有字母转为小写。整体解题思路如下:构建空集合,不断读取文本,把输入的文本从头到尾处理一Bianca,不是字母的转换成空格,是字母的转换为小写,然后使用字符串流stringstream读取字符串(默认以空格为分割,正好与题目中要求的单词的定义一致),把每个单词塞进集合中,利用迭代器遍历集合,将单词输出。(不用人为排序,set集合会自动由小到大排序)

C++知识点:(1)set; (2) tolower、isalpha函数;(3)stringstream字符串流;(4)迭代器iterator
#include
using namespace std;
// 建立一个dict集合,里面每个元素的类型都是string
set <string> dict;

int main()
{
    string s, buf;
    while (cin>>s){
        for(int i = 0; i < s.length(); ++i)
        {
            if(isalpha(s[i]))
                s[i] = tolower(s[i]);
            else
                s[i] = ' ';
        }
        stringstream ss(s);
        while (ss >> buf)
            dict.insert(buf);
    }
    for(set<string>::iterator it = dict.begin(); it != dict.end(); ++it)
        cout << *it << endl;
    return 0;
}

2. Python

其实OJ题目用Python写是很难的,绝大多数人对于Python语法的熟悉程度并没有C/C++高,而且python中缺少了一些C++封装好的库,但是可能是因为C++太卷了,所以这也导致一部分同学转战Python组别了。本文python代码参考https://blog.csdn.net/CxsGhost/article/details/103973216。我们先来回顾一下这道题C++方法中处理输入文本的代码:

        for(int i = 0; i < s.length(); ++i)
        {
            if(isalpha(s[i]))
                s[i] = tolower(s[i]);
            else
                s[i] = ' ';
        }

在python中其实也有str.isalpha方法,但是下面采用另一种方式:输入的文本无非就是字母、数字、空格、标点,string.punctuation返回全部的标点值,数字只有0~9这10中数字,因此只需要把标点、数字转化为空格,其他的字母转化为小写(空格作为分隔符)即可,这样会暂时导致一个小问题,因为我们把非字母都转换为了空格,并以此为分隔符,可能导致分割出来的字符串为空串,因此在遍历输出时需要判断是不是空串,不是空串才予以输出。另外,Python中的set是无序的,这就需要调用一下sorted来排序一下再输出。代码如下:

import string
def wash_data(s):
    for k in string.punctuation:
        s = s.replace(k, " ")
    for n in range(10):
        s = s.replace(str(n), " ")
    list_1 = s.lower().split(' ')
    set_1 = set(list_1)
    return set_1

list_str = []  # 用于收集字符串
set_all = set()  # 先建立里一个空集合,后续合并时使用
while True:
    try:
        str_ = input()
        list_str.append(str_)
    except EOFError:
        break
list_set = map(wash_data, list_str)
for i in list_set:
    set_all = set_all | i
for j in sorted(set_all):
    print(j)

此外,更高级的方法是巧妙地利用Python的正则表达式,引入re模块的split(),这个split和python内置的区别很大,强大很多,正则表达式[^a-z]意思是匹配除了小写字母以外的所有字符,然后以他们为标准,切分字符串。代码如下:

from re import split
from sys import stdin

def Dict():
    str_1 = stdin.read()
    str_2 = str_1.lower()
    str_3 = set(split(r'[^a-z]', str_2))
    for i in sorted(str_3):
        if i:
            print(i)

Dict()

你可能感兴趣的:(算法,算法,c++,python)