Python 自然语言处理 1.8 练习

Natural Language Processing with Python
Python 自然语言处理
1.8练习


5. Compare the lexical diversity scores for humor and romance fiction in Table 1-1. Which genre is more lexically diverse?

  • [√] romance fiction: 8.3
  • [x] humor:4.3

6. Produce a dispersion plot of the four main protagonists in Sense and Sensibility:Elinor, Marianne, Edward, and Willoughby. What can you observe about the different roles played by the males and females in this novel? Can you identify the couples?

text2.dispersion_plot(["Elinor","Marianne","Edward","Willoughby"])
Python 自然语言处理 1.8 练习_第1张图片

7. Find the collocations(搭配) in text5 .

text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time

8. Consider the following Python expression: len(set(text4)) . State the purpose of this expression. Describe the two steps involved in performing this computation.

text4中"词类型"的数目.
第一步,set(text4) 获得在text4中"词类型"的词汇表
第二部, len() 计算这个词汇表的大小("词类型"数目)


9.


25. ◑Define sent to be the list of words ['she', 'sells', 'sea', 'shells', 'by','the', 'sea', 'shore'] . Now write code to perform the following tasks:

a. Print all words beginning with sh.
[w for w in sent if w.startswith('sh')]
b. Print all words longer than four characters
  • 1st Solution
 [w for w in sent if len(w) >= 4]
  • 2nd Solution
 for ab in sent:
    if len(ab) >= 4:
        print ab,
Python 自然语言处理 1.8 练习_第2张图片
Q25

26.◑ What does the following Python code do? sum([len(w) for w in text1]) Can you use it to work out the average word length of a text?

Text1 中有999044个字符(标点符号 + sum(每个单词长度))

>>> sum([len(w) for w in text1])/len(text1)
3

你可能感兴趣的:(Python 自然语言处理 1.8 练习)