miniAI学堂

《Python自然语言处理（第二版）-Steven Bird等》学习笔记：第01章语言处理与Python

第01章语言处理与Python

1.1 语言计算：文本和单词
- Python入门
- NLTK 入门
- 搜索文本
- 计数词汇
1.2 近观Python：将文本当做词链表
- 链表（list，也叫列表）
- 索引列表
- 变量
- 字符串
1.3 计算语言：简单的统计
- 频率分布
- 细粒度的选择词
- 词语搭配和双连词（bigrams）
- 计数其他东西
1.4 回到Python决策与控制
- 条件
- 对每个元素进行操作
- 嵌套代码块
- 条件循环
1.5 自动理解自然语言
- 词意消歧
- 指代消解
- 自动生成语言
- 机器翻译
- 人机对话系统
- 文本的含义
- NLP 的局限性
1.6 小结

1.1 语言计算：文本和单词

Python入门

输入一些你自己的表达式

交互式解释器——将要运行你的Python 代码的程序——里面直接打字。在Windows 中，你可以在“程序→Python”中找到。

1+5*2-3

1/3

0.3333333333333333

1.0/3.0

0.3333333333333333

无意义的表达式

NLTK 入门

首先应该安装NLTk。可以从http://www.nltk.org/免费下载。按照说明下载适合你的操作系统的版本。安装完NLTK 之后，像前面那样启动Python解释器。

import nltk

nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True

from nltk.book import *  #从NLTK 的book 模块加载所有的东西

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

任何时候我们想要找到这些文本，只需要在Python 提示符后输入它们的名字。

text1

text2

搜索文本

词语索引视图显示一个指定单词的每一次出现，连同一些上下文一起显示。

text1.concordance("monstrous") #查一下《白鲸记》中的词monstrous

Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us , 
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But 
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

text2.concordance("affection") #搜索《理智与情感》中的词affection

Displaying 25 of 79 matches:
, however , and , as a mark of his affection for the three girls , he left them
t . It was very well known that no affection was ever supposed to exist between
deration of politeness or maternal affection on the side of the former , the tw
d the suspicion -- the hope of his affection for me may warrant , without impru
hich forbade the indulgence of his affection . She knew that his mother neither
rd she gave one with still greater affection . Though her late conversation wit
 can never hope to feel or inspire affection again , and if her home be uncomfo
m of the sense , elegance , mutual affection , and domestic comfort of the fami
, and which recommended him to her affection beyond every thing else . His soci
ween the parties might forward the affection of Mr . Willoughby , an equally st
 the most pointed assurance of her affection . Elinor could not be surprised at
he natural consequence of a strong affection in a young and ardent mind . This 
 opinion . But by an appeal to her affection for her mother , by representing t
 every alteration of a place which affection had established as perfect with hi
e will always have one claim of my affection , which no other can possibly shar
f the evening declared at once his affection and happiness . " Shall we see you
ause he took leave of us with less affection than his usual behaviour has shewn
ness ." " I want no proof of their affection ," said Elinor ; " but of their en
onths , without telling her of his affection ;-- that they should part without 
ould be the natural result of your affection for her . She used to be all unres
distinguished Elinor by no mark of affection . Marianne saw and listened with i
th no inclination for expense , no affection for strangers , no profession , an
till distinguished her by the same affection which once she had felt no doubt o
al of her confidence in Edward ' s affection , to the remembrance of every mark
 was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if

text3.concordance("lived") #搜索《创世纪》找出某人活了多久

Displaying 25 of 38 matches:
ay when they were created . And Adam lived an hundred and thirty years , and be
ughters : And all the days that Adam lived were nine hundred and thirty yea and
nd thirty yea and he died . And Seth lived an hundred and five years , and bega
ve years , and begat Enos : And Seth lived after he begat Enos eight hundred an
welve years : and he died . And Enos lived ninety years , and begat Cainan : An
 years , and begat Cainan : And Enos lived after he begat Cainan eight hundred 
ive years : and he died . And Cainan lived seventy years and begat Mahalaleel :
rs and begat Mahalaleel : And Cainan lived after he begat Mahalaleel eight hund
years : and he died . And Mahalaleel lived sixty and five years , and begat Jar
s , and begat Jared : And Mahalaleel lived after he begat Jared eight hundred a
and five yea and he died . And Jared lived an hundred sixty and two years , and
o years , and he begat Eno And Jared lived after he begat Enoch eight hundred y
 and two yea and he died . And Enoch lived sixty and five years , and begat Met
 ; for God took him . And Methuselah lived an hundred eighty and seven years , 
 , and begat Lamech . And Methuselah lived after he begat Lamech seven hundred 
nd nine yea and he died . And Lamech lived an hundred eighty and two years , an
ch the LORD hath cursed . And Lamech lived after he begat Noah five hundred nin
naan shall be his servant . And Noah lived after the flood three hundred and fi
xad two years after the flo And Shem lived after he begat Arphaxad five hundred
at sons and daughters . And Arphaxad lived five and thirty years , and begat Sa
ars , and begat Salah : And Arphaxad lived after he begat Salah four hundred an
begat sons and daughters . And Salah lived thirty years , and begat Eber : And 
y years , and begat Eber : And Salah lived after he begat Eber four hundred and
 begat sons and daughters . And Eber lived four and thirty years , and begat Pe
y years , and begat Peleg : And Eber lived after he begat Peleg four hundred an

text4.concordance("nation") #text4，《就职演说语料》，回到1789 年看看那时英语的例子，搜索如nation, terror,god 这样的词，看看随着时间推移这些词的使用如何不同；

Displaying 25 of 302 matches:
 to the character of an independent nation seems to have been distinguished by
f Heaven can never be expected on a nation that disregards the eternal rules o
first , the representatives of this nation , then consisting of little more th
, situation , and relations of this nation and country than any which had ever
, prosperity , and happiness of the nation I have acquired an habitual attachm
an be no spectacle presented by any nation more pleasing , more noble , majest
party for its own ends , not of the nation for the national good . If that sol
tures and the people throughout the nation . On this subject it might become m
if a personal esteem for the French nation , formed in a residence of seven ye
f our fellow - citizens by whatever nation , and if success can not be obtaine
y , continue His blessing upon this nation and its Government and give it all 
powers so justly inspire . A rising nation , spread over a wide and fruitful l
ing now decided by the voice of the nation , announced according to the rules 
ars witness to the fact that a just nation is trusted on its word when recours
e union of opinion which gives to a nation the blessing of harmony and the ben
uil suffrage of a free and virtuous nation , would under any circumstances hav
d spirit and united councils of the nation will be safeguards to its honor and
iction that the war with a powerful nation , which forms so prominent a featur
out breaking down the spirit of the nation , destroying all confidence in itse
ed on the military resources of the nation . These resources are amply suffici
the war to an honorable issue . Our nation is in number more than half that of
ndividually have been happy and the nation prosperous . Under this Constitutio
rights , and is able to protect the nation against injustice from foreign powe
 great agricultural interest of the nation prospers under its protection . Loc
ak our Union , and demolish us as a nation . Our distance from Europe and the

text5.concordance("im")#《NPS 聊天语料库》，你可以在里面搜索一些网络词，如im, ur,lol。

Displaying 25 of 149 matches:
now im left with this gay name :P PART hey e
what did you but on e-bay i feel like im in the wrong room yeee haw U30 im con
ike im in the wrong room yeee haw U30 im considering changing my nickname to "
 the hell outta my freaking PM box .. Im with my fiance !!!!!!!!!!!!!!!! answe
m impressed . PART hiya room lmao !!! im doin alright thanks omg Finger .. Dee
th lol JOIN so read it . thanks U7 .. Im happy to have my fiance here !! forwa
i didnt me phone you . . . sheesh now im that phone perv guy lets hope not U12
to spain ? i need to go this summer . im a HUGE phone perv ok seriously who wa
an ... . ACTION video tapes . hey U20 Im blind now . ACTION has left the room 
T u got that right , i dont do shit , im the supervisor Hello U165 . hey U165 
 him in the " untouchable " list U115 im good U6 lmao U7 how r u U128 hehe how
can I ask where ya all are from ..... im here in kentucky as I said ... too wi
ic but had to resize and stuff U37 no im an equal oppertunity hater LOL Hi , U
he cover weeeeeeeee thanks U19 ! PART im out in cal now U3 looking at some new
 :) hi U58 lol wb U29 hi U29 U13 .... im down to time now PART Hello U24 , wel
, I 'd never kick you outta my box hi im good thanks U16 yerself ?? PART inter
ke wth . . who are you even ty U34 yw Im glad he 's back . awwww U16 i like ps
 ha U23 !!! wow ... are you the U39 ? Im talkin about all yer typin . . It 's 
... you ??? Apparently , I 'm not U41 im good U23 dear . How are you U23 ~wink
~ U35 ... I love that 5 am phone call im good ... me and eric r back together 
 , I am happy . You know i LuverZ YOU im the same busy busy oh ok then U1 nm l
))) . ACTION stretches . ty U19 Ugh , Im so sore ! Repeatedly , with a big sti
'm a size queen U41 Why U45 ? naw U23 im cheating on you with Jayse hes hawt t
oeer is sum1 gonna ghet fuked up ? :) im always hungry yeah U45 .. i believe i
without first asking permission . U35 im sorry U35 i tried to refrain me too U

词语索引使我们看到词的上下文。

看到monstrous 出现的上下文，如the___ pictures 和the ___ size。还有哪些词出现在相似的上下文中？通过函数similar，来查找到这些上下文相似的词

text1.similar("monstrous")

reliable curious imperial gamesome vexatious pitiable impalpable
maddens delightfully tyrannical exasperate subtly passing loving
candid perilous mystifying lamentable lazy doleful

text2.similar("monstrous")

very so exceedingly heartily sweet great extremely good amazingly vast
a remarkably as

观察我们从不同的文本中得到的不同结果。Austen(奥斯丁，英国女小说家)使用这些词与Melville 完全不同；在她那里，monstrous 是正面的意思，有时它的功能像词very 一样作强调成分。

函数common_contexts允许我们研究两个或两个以上的词共同的上下文

text2.common_contexts(["monstrous", "very"])

be_glad a_pretty am_glad a_lucky is_pretty

判断词在文本中的位置：从文本开头算起在它前面有多少词。这个位置信息可以用离散图表示，每一个竖线代表一个单词，每一行代表整个文本。

text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

函数generate，不同风格产生一些随机文本。

text3.generate("freedom")

计数词汇

标识符
词和标点符号或者叫标识符(tokens)，一个标识符是表示一个我们想要放在一组对待的字符序列——如：hairy、his

len(text3)  #《创世纪》

类型

print(sorted(set(text3))) #set(text3)获得text3 的词汇表

['!', "'", '(', ')', ',', ',)', '.', '.)', ':', ';', ';)', '?', '?)', 'A', 'Abel', 'Abelmizraim', 'Abidah', 'Abide', 'Abimael', 'Abimelech', 'Abr', 'Abrah', 'Abraham', 'Abram', 'Accad', 'Achbor', ... 'yielded', 'yielding', 'yoke', 'yonder', 'you', 'young', 'younge', 'younger', 'youngest', 'your', 'yourselves', 'youth']

len(set(text3))

不同的词汇或词类型。一个词类型是指一个词在一个文本中独一无二的出现形式或拼写。也就是说，这个词在词汇表中是唯一的。我们计数的2,789 个项目中包括标点符号，所以我们把这些叫做唯一项目类型而不是词类型。

词汇多样性

len(text3)/len(set(text3)) #文本词汇丰富度进行测量,每个字平均被使用了16 次

16.050197203298673

text3.count("smote") #计数一个词在文本中出现的次数

100*text4.count("a")/len(text4) #计算一个特定的词在文本中占据的百分比

1.4643016433938312

函数
使用关键字def 给函数定义一个简短的名字

def lexical_diversity(text):    #指定了一个text 参数。这个参数是我们想要计算词汇多样性的实际文本的一个“占位符”
    return len(text) / len(set(text))

def percentage(count,total):    #定义了两个参数：count 和total
    return 100 * count / total

调用一个如lexical_diversity()这样的函数，任务名——如：lexical_diversity()——与任务将要处理的数据——如：text3。调用函数时放在参数位置的数据值叫做函数的实参。

lexical_diversity(text3)   #调用lexical_diversity()这样的函数

16.050197203298673

lexical_diversity(text5)

7.420046158918563

percentage(4,5)

80.0

percentage(text4.count("a"),len(text4))

1.4643016433938312

1.2 近观Python：将文本当做词链表

链表（list，也叫列表）

sent1 = ['Call','me','Ishmeal','.'] #文本不外乎是词和标点符号的序列。

sent1

['Call', 'me', 'Ishmeal', '.']

len(sent1)

每个文本开始的句子定义为sent2…sent9

print(sent2) #如果错误说：sent2 没有定义，需要先输入from nltk.book import *）

['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.']

print(sent3)

['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']

ex1 = ['Monty', 'Python', 'and', 'the', 'Holy', 'Grail']

sorted(ex1)

['Grail', 'Holy', 'Monty', 'Python', 'and', 'the']

len(set(ex1))

ex1.count('the')

['Monty', 'Python'] + ['and', 'the', 'Holy', 'Grail'] #链表加法运算

['Monty', 'Python', 'and', 'the', 'Holy', 'Grail']

print(sent4 + sent1) #加法的特殊用途叫做连接；它将多个链表组合为一个链表。

['Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the', 'House', 'of', 'Representatives', ':', 'Call', 'me', 'Ishmeal', '.']

sent1.append("Some") #追加,向链表中增加一个元素

sent1

['Call', 'me', 'Ishmeal', '.', 'Some']

索引列表

索引
表示词在文本中位置，这个位置的数字叫做这个元素的索引

text4[173] #第173个位置词

'awaken'

text4.index('awaken') #反过来做；找出一个词第一次出现的索引。

sent = ['word1', 'word2', 'word3', 'word4', 'word5',
... 'word6', 'word7', 'word8', 'word9', 'word10']

sent[0]

'word1'

sent[9]

'word10'

sent[10]

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

 in ()
----> 1 sent[10]

IndexError: list index out of range

注意，索引从零开始：第0 个元素写作sent[0]，其实是第1 个词“word1”；而句子的第9 个元素是“word10”。

切片
子链表，从大文本中任意抽取语言片段，术语叫做切片。

print(text5[16715:16735])

['U86', 'thats', 'why', 'something', 'like', 'gamefly', 'is', 'so', 'good', 'because', 'you', 'can', 'actually', 'play', 'a', 'full', 'game', 'without', 'buying', 'it']

print(text6[1600:1625])

['We', "'", 're', 'an', 'anarcho', '-', 'syndicalist', 'commune', '.', 'We', 'take', 'it', 'in', 'turns', 'to', 'act', 'as', 'a', 'sort', 'of', 'executive', 'officer', 'for', 'the', 'week']

print(sent[5:8])

['word6', 'word7', 'word8']

按照惯例，m:n 表示元素m…n-1。

sent[:3]

['word1', 'word2', 'word3']

sent[8:]

['word9', 'word10']

修改链表中的元素

sent[0] = 'First'

sent[9] = 'Last'

len(sent)

sent[1:9] = ['Second', 'Third']

sent

['First', 'Second', 'Third', 'Last']

sent[9]

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

 in ()
----> 1 sent[9]

IndexError: list index out of range

变量

变量必须以字母开头，可以包含数字和下划线。变量名不能是Python 的保留字，如def，if ，not 和import。名称是大小写敏感的。这意味着myVar 和myvar 是不同的变量。

赋值

my_sent = ['Bravely', 'bold', 'Sir', 'Robin', ',', 'rode',
... 'forth', 'from', 'Camelot', '.']

使用…提示符表示期望更多的输入,在这些连续的行中有多少缩进都没有关系，只是加入缩进通常会便于阅读。

noun_phrase = my_sent[1:4]

noun_phrase

['bold', 'Sir', 'Robin']

wOrDs = sorted(noun_phrase) #排序表中大写字母出现在小写字母之前

wOrDs

['Robin', 'Sir', 'bold']

not = 'Camelot' #使用了保留字，会产生一个语法错误

  File "", line 1
    not = 'Camelot' #使用了保留字，会产生一个语法错误
        ^
SyntaxError: invalid syntax

使用变量来保存计算的中间步骤，尤其是当这样做使代码更容易读懂时

vocab = set(text1)

vocab_size = len(vocab)

vocab_size

字符串

访问链表元素的一些方法也可以用在单独的词或字符串

name = 'Monty'

name[0] #索引一个字符串

'M'

name[:4]  #切片一个字符串

'Mont'

name * 2 #对字符串执行乘法

'MontyMonty'

name + '!' #对字符串执行加法

'Monty!'

''.join(['Monty','Python']) #把词用链表连接起来组成单个字符串

'MontyPython'

'Monty Python'.split() #把字符串分割成一个链表

['Monty', 'Python']

1.3 计算语言：简单的统计

saying = ['After', 'all', 'is', 'said', 'and', 'done',
... 'more', 'is', 'said', 'than', 'done']

tokens = set(saying)

tokens = sorted(tokens)

tokens[-2:]

['said', 'than']

频率分布

如何能自动识别文本中最能体现文本的主题和风格的词汇？频率分布，它告诉我们在文本中的每一个词项的频率。

FreqDist 寻找《白鲸记》中最常见的20 个词。

fdist1 = FreqDist(text1)

print(fdist1)

print(fdist1.most_common(20))

[(',', 18713), ('the', 13721), ('.', 6862), ('of', 6536), ('and', 6024), ('a', 4569), ('to', 4542), (';', 4072), ('in', 3916), ('that', 2982), ("'", 2684), ('-', 2552), ('his', 2459), ('it', 2209), ('I', 2124), ('s', 1739), ('is', 1695), ('he', 1661), ('with', 1659), ('was', 1632)]

fdist1['whale']

fdist1.plot(20, cumulative=True)  #高频词

len(fdist1.hapaxes()) #低频词 ,只出现了一次的词

细粒度的选择词

长词
a. {w | w ∈ V & P(w)}
b. [w for w in V if p(w)]
定义长词性质为P，则P(w)为真当且仅当词w 的长度大余XX个字符。此集合中所有w 都满足w 是集合V（词汇表）的一个元素且w 有性质P。

V = set(text1)

long_words = [w for w in V if len(w) > 15] #文本词汇表长度中超过15 个字符的词

print(sorted(long_words))

['CIRCUMNAVIGATION', 'Physiognomically', 'apprehensiveness', 'cannibalistically', 'characteristically', 'circumnavigating', 'circumnavigation', 'circumnavigations', 'comprehensiveness', 'hermaphroditical', 'indiscriminately', 'indispensableness', 'irresistibleness', 'physiognomically', 'preternaturalness', 'responsibilities', 'simultaneousness', 'subterraneousness', 'supernaturalness', 'superstitiousness', 'uncomfortableness', 'uncompromisedness', 'undiscriminating', 'uninterpenetratingly']

短高频词（如the）和长低频词（如antiphilosophists）

fdist5 = FreqDist(text5)

print(sorted(w for w in set(text5) if len(w) > 7 and fdist5[w] > 7))  #聊天语料库中所有长度超过7 个字符出现次数超过7 次的词：

['#14-19teens', '#talkcity_adults', '((((((((((', '........', 'Question', 'actually', 'anything', 'computer', 'cute.-ass', 'everyone', 'football', 'innocent', 'listening', 'remember', 'seriously', 'something', 'together', 'tomorrow', 'watching']

至此，我们已成功地自动识别出与文本内容相关的高频词。

词语搭配和双连词（bigrams）

搭配
一个搭配的特点是其中的词不能被类似的词置换。red wine 是一个搭配而the wine 不是，maroon wine（粟色酒）听起来就很奇怪。
双连词
搭配基本上就是频繁的双连词

list(bigrams(['more', 'is', 'said', 'than', 'done']))

[('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]

text4.collocations() #基于单个词的频率预期得到的更频繁出现的双连词

United States; fellow citizens; four years; years ago; Federal
Government; General Government; American people; Vice President; Old
World; Almighty God; Fellow citizens; Chief Magistrate; Chief Justice;
God bless; every citizen; Indian tribes; public debt; one another;
foreign nations; political parties

text8.collocations()

would like; medium build; social drinker; quiet nights; non smoker;
long term; age open; Would like; easy going; financially secure; fun
times; similar interests; Age open; weekends away; poss rship; well
presented; never married; single mum; permanent relationship; slim
build

计数其他东西

text1_w_len = [len(w) for w in text1]

text1_w_len[:10]

[1, 4, 4, 2, 6, 8, 4, 1, 9, 1]

fdist = FreqDist([len(w) for w in text1])

list(fdist)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20]

fdist.items()

dict_items([(1, 47933), (2, 38513), (3, 50223), (4, 42345), (5, 26597), (6, 17111), (7, 14399), (8, 9966), (9, 6428), (10, 3528), (11, 1873), (12, 1053), (13, 567), (14, 177), (15, 70), (16, 22), (17, 12), (18, 1), (20, 1)])

fdist.max()

fdist[3]

fdist.freq(3)

0.19255882431878046

表1-2. NLTK 频率分布类中定义的函数

例子	描述
fdist = FreqDist(samples)	创建包含给定样本的频率分布
fdist[sample] += 1	增加样本
fdist[‘monstrous’]	计数给定样本出现的次数
fdist.freq(‘monstrous’)	给定样本的频率
fdist.N()	样本总数
fdist.most_common(n)	以频率递减顺序排序的样本链表
for sample in fdist:	以频率递减的顺序遍历样本
fdist.max()	数值最大的样本
fdist.tabulate()	绘制频率分布表
fdist.plot()	绘制频率分布图
fdist.plot(cumulative=True)	绘制累积频率分布图
fdist1 =	fdist2 update fdist1 with counts from fdist2
fdist1 <	fdist2 测试样本在fdist1 中出现的频率是否小于fdist2

1.4 回到Python决策与控制

条件

关系运算符
表1-3. 数值比较运算符

运算符	关系
<	小于
<=	小于等于
==	等于（注意是两个“=”号而不是一个）
!=	不等于
>	大于
>=	大于等于

print(sent7)

['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']

[w for w in sent7 if len(w) < 4]

[',', '61', 'old', ',', 'the', 'as', 'a', '29', '.']

[w for w in sent7 if len(w)<=4]

[',', '61', 'old', ',', 'will', 'join', 'the', 'as', 'a', 'Nov.', '29', '.']

[w for w in sent7 if len(w)==4]

['will', 'join', 'Nov.']

[w for w in sent7 if len(w)!=4]

['Pierre',
 'Vinken',
 ',',
 '61',
 'years',
 'old',
 ',',
 'the',
 'board',
 'as',
 'a',
 'nonexecutive',
 'director',
 '29',
 '.']

表1-4. 一些词比较运算符

函数	含义
s.startswith(t)	测试s 是否以t 开头
s.endswith(t)	测试s 是否以t 结尾
t in s	测试s 是否包含t
s.islower()	测试s 中所有字符是否都是小写字母
s.isupper()	测试s 中所有字符是否都是大写字母
s.isalpha()	测试s 中所有字符是否都是字母
s.isalnum()	测试s 中所有字符是否都是字母或数字
s.isdigit()	测试s 中所有字符是否都是数字
s.istitle()	测试s 是否首字母大写（s 中所有的词都首字母大写）

sorted(w for w in set(text1) if w.endswith('ableness')) #以-ableness 结尾的词

['comfortableness',
 'honourableness',
 'immutableness',
 'indispensableness',
 'indomitableness',
 'intolerableness',
 'palpableness',
 'reasonableness',
 'uncomfortableness']

sorted([term for term in set(text4) if 'gnt' in term]) #包含gnt 的词

['Sovereignty', 'sovereignties', 'sovereignty']

sorted([item for item in set(text6) if item.istitle()]) #首字母大写的词

['A',
 'Aaaaaaaaah',
 'Aaaaaaaah',
 'Aaaaaah',
 'Aaaah',....

sorted([item for item in set(sent7) if item.isdigit()]) #完全由数字组成的词

['29', '61']

sorted(w for w in set(text7) if '-' in w and 'index' in w)

['Stock-index',
 'index-arbitrage',
 'index-fund',
 'index-options',
 'index-related',
 'stock-index']

sorted(wd for wd in set(text3) if wd.istitle() and len(wd) > 10)

['Abelmizraim',
 'Allonbachuth',
 'Beerlahairoi',
 'Canaanitish',
 'Chedorlaomer',
 'Girgashites',
 'Hazarmaveth',
 'Hazezontamar',
 'Ishmeelites',
 'Jegarsahadutha',
 'Jehovahjireh',
 'Kirjatharba',
 'Melchizedek',
 'Mesopotamia',
 'Peradventure',
 'Philistines',
 'Zaphnathpaaneah']

sorted(w for w in set(sent7) if not w.islower())

[',', '.', '29', '61', 'Nov.', 'Pierre', 'Vinken']

sorted(t for t in set(text2) if 'cie' in t or 'cei' in t)

['ancient',
 'ceiling',
 'conceit',
 'conceited',
 'conceive',
 'conscience',
 'conscientious',
 'conscientiously',
 'deceitful',
 'deceive',
 'deceived',
 'deceiving',
 'deficiencies',
 'deficiency',
 'deficient',
 'delicacies',
 'excellencies',
 'fancied',
 'insufficiency',
 'insufficient',
 'legacies',
 'perceive',
 'perceived',
 'perceiving',
 'prescience',
 'prophecies',
 'receipt',
 'receive',
 'received',
 'receiving',
 'society',
 'species',
 'sufficient',
 'sufficiently',
 'undeceive',
 'undeceiving']

对每个元素进行操作

[len(w) for w in text1]  #形式为[f(w) for ...]或[w.f() for ...]，其中f 是一个函数

[1,4, 4,2,...,

[w.upper() for w in text1]

['[','MOBY','DICK', 'BY',.......]

len(text1)

len(set(text1))

len(set([word.lower() for word in text1])) #不重复计算像This 和this 这样仅仅大小写不同的词

len(set([word.lower() for word in text1 if word.isalpha()])) #通过过滤掉所有非字母元素，从词汇表中消除数字和标点符号

嵌套代码块

if 语句

word = 'cat'
if len(word) < 5:  #if 语句叫做一个控制结构
    print("word length is less than 5")
    #使用Python 解释器时，我们必须添加一个额外的空白行?，这样它才能检测到嵌套块结束。

word length is less than 5

for 循环

for word in ['Call', 'me', 'Ishmael', '.']:
    print(word)

Call
me
Ishmael
.

条件循环

sent1 = ['Call', 'me', 'Ishmael', '.']
for xyzzy in sent1:  #冒号表示当前语句与后面的缩进块有关联
    if xyzzy.endswith('l'):
        print(xyzzy)

Call
Ishmael

for token in sent1:
    if token.islower():
        print(token, 'is a lowercase word')
    elif token.istitle():
        print(token, 'is a titlecase word')
    else:
        print(token, 'is punctuation')

Call is a titlecase word
me is a lowercase word
Ishmael is a titlecase word
. is punctuation

tricky = sorted([w for w in set(text2) if 'cie' in w or 'cei' in w])
for word in tricky:
    print(word,end=' ')  #同一行输出

ancient ceiling conceit conceited conceive conscience conscientious conscientiously deceitful deceive deceived deceiving deficiencies deficiency deficient delicacies excellencies fancied insufficiency insufficient legacies perceive perceived perceiving prescience prophecies receipt receive received receiving society species sufficient sufficiently undeceive undeceiving

1.5 自动理解自然语言

词意消歧

指代消解

自动生成语言

机器翻译

人机对话系统

文本的含义

文本含义识别(Recognizing Textual Entailment 简称RTE)

NLP 的局限性

尽管NLP在很多如RTE这样的任务中研究取得了进展，但在现实世界的应用中已经部署的语言理解系统仍不能进行常识推理或以一种一般的可靠的方式描绘这个世界的知识。我们在等待这些困难的人工智能问题得到解决的同时，接受一些在推理和知识能力上存在严重限制的自然语言系统是有必要的。因此，从一开始，自然语言处理研究的一个重要目标一直是使用浅显但强大的技术代替无边无际的知识和推理能力，促进构建“语言理解”技术的艰巨任务的不断取得进展。

1.6 小结

在Python 中文本用链表来表示：[‘Monty’, ‘Python’]。我们可以使用索引、分片和len()函数对链表进行操作。
词“token”（标识符）是指文本中给定词的特定出现；词“type”（类型）则是指词作为一个特定序列字母的唯一形式。我们使用len(text)计数词的标识符，使用len(set(text))计数词的类型。
我们使用sorted(set(t))获得文本t 的词汇表。
我们使用[f(x) for x in text]对文本的每一项目进行操作。
为了获得没有大小写区分和忽略标点符号的词汇表，我们可以使用set([w.lower() for w in text if w.isalpha()])。
我们使用for 语句对文本中的每个词进行处理，例如for w in t:或者for word in text:。后面必须跟冒号和一块在每次循环被执行的缩进的代码。
我们使用if 语句测试一个条件：if len(word)<5:。后面必须跟冒号和一块仅当条件为真时执行的缩进的代码。
频率分布是项目连同它们的频率计数的集合(例如：一个文本中的词与它们出现的频率)。
函数是指定了名字并且可以重用的代码块。函数通过def 关键字定义，例如在def mult(x, y)中x 和y 是函数的参数，起到实际数据值的占位符的作用。
函数是通过指定它的名字及一个或多个放在括号里的实参来调用，就像这样：mult(3,4)或者len(text1)。

致谢
《Python自然语言处理》¹²³ ⁴，作者：Steven Bird, Ewan Klein & Edward Loper，是实践性很强的一部入门读物，2009年第一版，2015年第二版，本学习笔记结合上述版本，对部分内容进行了延伸学习、练习，在此分享，期待对大家有所帮助，欢迎加我微信（验证：NLP），一起学习讨论，不足之处，欢迎指正。

参考文献

http://nltk.org/ ↩︎
Steven Bird, Ewan Klein & Edward Loper,Natural Language Processing with Python,2009 ↩︎
（英）伯德，（英）克莱因，（美）洛普，《Python自然语言处理》，2010年，东南大学出版社 ↩︎
Steven Bird, Ewan Klein & Edward Loper,Natural Language Processing with Python,2015 ↩︎

你可能感兴趣的:(2015年度,自然语言处理,Python,第二版,Steven,Bird)

SessionNotCreatedException:消息:无法创建新服务:通过 Python 使用 ChromeDriver 和 SeleniumGrid 的 ChromeDriverService 潮易 python 开发语言
SessionNotCreatedException:消息:无法创建新服务:通过Python使用ChromeDriver和SeleniumGrid的ChromeDriverService首先，你需要确保你的系统中已经安装了Chrome浏览器以及对应的ChromeDriver版本。然后，你需要在你的项目中安装Selenium库，可以通过pipinstallselenium命令进行安装。接下来，你需要
使用 Nocalhost 开发 Rainbond 上的微服务应用 u012804784 android 微服务 microservices 架构计算机
优质资源分享学习路线指引（点击解锁）知识定位人群定位Python实战微信订餐小程序进阶级本课程是pythonflask+微信小程序的完美结合，从项目搭建到腾讯云部署上线，打造一个全栈订餐系统。Python量化交易实战入门级手把手带你打造一个易扩展、更安全、效率更高的量化交易系统本文将介绍如何使用Nocalhost快速开发Rainbond上的微服务应用的开发流程以及实践操作步骤。Nocalhost可
Dapr 远程调试之 Nocalhost 虚幻私塾 python 计算机
优质资源分享学习路线指引（点击解锁）知识定位人群定位Python实战微信订餐小程序进阶级本课程是pythonflask+微信小程序的完美结合，从项目搭建到腾讯云部署上线，打造一个全栈订餐系统。Python量化交易实战入门级手把手带你打造一个易扩展、更安全、效率更高的量化交易系统虽然Visualstudio、Visualstudiocode都支持debug甚至远程debug，Dapr搭配Bridge
【数据治理】数据治理框架概述野老杂谈数据治理数据治理框架 DAMA-DMBOK COBIT 企业数据治理数据管理
欢迎来到我的博客，很高兴能够在这里和您见面！欢迎订阅相关专栏：⭐️全网最全IT互联网公司面试宝典：收集整理全网各大IT互联网公司技术、项目、HR面试真题.⭐️AIGC时代的创新与未来：详细讲解AIGC的概念、核心技术、应用领域等内容。⭐️大数据平台建设指南：全面讲解从数据采集到数据可视化的整个过程，掌握构建现代化数据平台的核心技术和方法。⭐️《遇见Python：初识、了解与热恋》：涵盖了Pytho
如何使用 Python 进行文件读写操作？大G哥 python 前端 linux 数据库开发语言
大家好，我是V哥。今天的内容来介绍Python中进行文件读写操作的方法，这在学习Python时是必不可少的技术点，希望可以帮助到正在学习python的小伙伴。以下是Python中进行文件读写操作的基本方法：一、文件读取：#打开文件withopen('example.txt','r')asfile:#读取文件的全部内容content=file.read()print(content)#将文件指针重置
使用SolarChat实现中英韩翻译的实战指南 azzxcvhj python
在这篇文章中，我们将探索如何利用SolarChat这一强大的聊天模型来实现中英韩翻译功能。SolarChat是一个方便的语言模型接口，能够帮助我们将自然语言处理任务集成到项目中。本文将详细介绍这个模型的核心原理，并通过示例代码展示如何使用它进行翻译。技术背景介绍随着人工智能的发展，语言模型在各种自然语言处理任务中扮演了重要角色。特别是在翻译、对话生成等领域，先进的语言模型如SolarChat为我们
python数据处理的全流程若木胡 tools python 开发语言
Python数据处理全流程一、数据收集（一）从文件中读取数据读取文本文件CSV文件（逗号分隔值）CSV文件是一种常见的简单数据存储格式，使用逗号来分隔数据值。Python中的csv模块可以方便地读取和写入CSV文件。例如，读取一个简单的CSV文件，其中包含姓名和年龄两列数据：importcsvdata=[]withopen('example.csv','r')asfile:reader=csv.r
Python的输入函数input() 蜗牛_Chenpangzi Python学习笔记总集 python 字符串编程语言
前言此篇文章是我在B站学习时所做的笔记，部分为亲自动手演示过的，方便复习用。此篇文章仅供学习参考。提示：以下是本篇文章正文内容，下面案例可供参考input函数input函数的基本使用#输入函数inputpresent=input('大圣想要什么礼物呢?')print(present,
mysql plugin 没有_无法打开mysql.plugin表。某些插件可能未加载 ChinaTerran mysql plugin 没有
IhaveanissuewithMySQL.WhenI'mtryingtostartit,thatgivesmeanerrormessage,whichis2015-12-1010:52:3113f4InnoDB:Warning:Usinginnodb_additional_mem_pool_sizeisDEPRECATED.Thisoptionmayberemovedinfuturereleas
python multiprocessing模块_Python multiprocessing模块 weixin_39646084 python
一、简介python多线程有个讨厌的限制，全局解释器锁(globalinterpreterlock)，这个锁的意思是任一时间只能有一个线程使用解释器，跟单cpu跑多个程序一个意思，大家都是轮着用的，这叫“并发”，不是“并行”。手册上的解释是为了保证对象模型的正确性！这个锁造成的困扰是如果有一个计算密集型的线程占着cpu，其他的线程都得等着....，试想你的多个线程中有这么一个线程，得多悲剧，多线程
python自动化扫描，多线程枚举获取wifi信息，让你走在任何一个地方都能上网代码讲故事深耕技术之源 python 自动化扫描无线网络网络连接
python自动化扫描，多线程枚举获取wifi信息，让你走在任何一个地方都能上网。无线网络在无线局域网的范畴是指“无线相容性认证”，实质上是一种商业认证，同时也是一种无线联网技术，以前通过网线连接电脑，而Wi-Fi则是通过无线电波来连网；常见的就是一个无线路由器，那么在这个无线路由器的电波覆盖的有效范围都可以采用Wi-Fi连接方式进行联网，如果无线路由器连接了一条ADSL线路或者别的上网线路，则又
【分享】一个查看无线网络密钥的小方法（查看 WiFi密码，热点密码）| 区块链面试题：区块链技术中，如何保证交易的匿名性和隐私性？| 公钥加密，数字签名，零知识证明追光者♂ 工具技巧解决办法百题千解计划(项目实战案例）网络 wlan 热点密码 WiFi密码区块链面试 WiFi
“你不是我，你不会懂。”作者主页：追光者♂个人简介：[1]计算机专业硕士研究生[2]2023年城市之星领跑者TOP1(哈尔滨)[3]2022年度博客之星人工智能领域TOP4[4]阿里云社区特邀专家博主[5]CSDN-人工智能领域优质创作者无限进步，一起追光！！！感谢大家点赞收藏⭐留言！！！目录一、基础回顾步骤1、win+R:cmd，进入Dos命令窗口
潇洒郎： Python获取设备已连接的所有WIFi账号和密码潇洒郎 Python学习 python WiFi账号和密码
Python获取设备已连接的所有WIFi账号和密码如果你忘记了密码，可以使用这个脚本获取，不要使用非法用途哦！#coding=utf8#User:Administrator#Date:2024/11/5#Time:13:02importsubprocessimportjsondefsub_cmd(cmd):res=subprocess.getoutput(cmd)returnresdefget_a
一.组合数据类型：列表 muxue178 python 开发语言
1.下标下标从零开始name_list=['python','php','java']print(name_list)print(name_list[0])print(name_list[2])运行结果['python','php','java']pythonjava2.查找函数index()count()len()1.index()name_list=['zhangsan','lisi','wa
第19篇：python高级编程进阶：使用Flask进行Web开发猿享天开 python从入门到精通 python 开发语言
第19篇：python高级编程进阶：使用Flask进行Web开发内容简介在第18篇文章中，我们介绍了Web开发的基础知识，并使用Flask框架构建了一个简单的Web应用。本篇文章将深入探讨Flask的高级功能，涵盖模板引擎（Jinja2）、表单处理、数据库集成以及用户认证等主题。通过系统的讲解和实战案例，您将掌握构建功能更为丰富和复杂的Web应用所需的技能。目录Flask的深入使用Flask扩展蓝
第18篇：python高级编程进阶：Web开发基础详解猿享天开 python从入门到精通 python 开发语言
第18篇：Web开发基础内容简介本篇文章将为您介绍Web开发基础的核心概念和实用技能。您将了解Web开发的基本概念和流程，掌握HTTP协议的基础知识，学习如何使用Flask框架构建简单的Web应用，并深入理解路由与视图函数的工作原理。通过丰富的代码示例和实战案例，您将能够快速入门Web开发，搭建自己的第一个Web应用。目录Web开发概述什么是Web开发前端与后端开发Web开发的技术栈HTTP协议基
大数据学习（七）Python3操作livy（使用pylivy模块）猪笨是念来过倒大数据大数据 python
Livy是一个用于与Spark交互的开源REST接口。pylivy是Livy的Python客户端，可以在Spark集群上轻松实现远程代码执行。安装$pipinstall-Ulivy请注意，pylivy需要Python3.6或更高版本。用法所述LivySession类的主界面提供由pylivy：from
python multiprocessing iteye_20379 python
importmultiprocessingimportmathdeffactorize_naive(n):"""Anaivefactorizationmethod.Takeinteger'n',returnlistoffactors."""ifn=n:factors.append(n)returnfactorselifp>2:#Advanceinstepsof2overoddnumbersp+=2
Python进阶—高级语法 Echo.py Python基础语法 python 开发语言
目录文章目录目录1、在==和is之间选择2、元组的相对不可变性3、字典中的键映射多个值4、Linux5、python中字典的key要求6、编码7、进制之间的转换8、关系运算符(时间处理)9、时间处理模块❶常用时间处理方法❷转化为13位时间戳10、三元运算符11、成员运算符12、For循环机制13、变量的分类14、闭包(函数的嵌套)15、函数(方法)的执行流程16、匿名函数17、Django和Fla
对本地部署的ChatGLM模型进行API调用 BBluster LLM python 开发语言语言模型
ChatGLM作为一个小参数模型，给予了我们在本地部署LLM的条件，接下来我将展示如何使用python对本地部署的ChatGLM模型进行API调用对于如何部署本地ChatGLM模型我们可以访问本地化部署大语言模型ChatGLM接下来我首先分享api调用的测试代码：importtimeimportrequests#测试GPU运行是否成功deftest_function_1():importtorch
Traceback包【持续更新】 BBluster python python
Traceback包简介traceback是Python标准库中的一个模块，它提供了一组用于提取、格式化和打印程序执行过程中的堆栈跟踪信息的工具。当程序发生异常且未被捕获时，Python会自动生成一个堆栈跟踪，显示出错的位置和调用栈。这有助于开发者理解和调试程序中出现的问题。主要功能当程序发生异常时，traceback模块可以用来捕获和格式化相关的堆栈信息。这有助于开发者快速定位问题所在。格式化的
Python timeit的使用 egzosn python 开发语言
假设您要测量代码段的执行时间。你是做什么？直到现在，我就像大多数人一样会做以下事情：登录后复制#导入时间start_time=time.time()"""某些代码"""end_time=time.time()print(f“执行时间为：{end_time-start_time}”)1.2.3.4.5.现在说我们要比较两个不同函数的执行时间，然后：登录后复制#导入时间deffunction_1(*参
Python多进程 multiprocessing 培之编程语言 python 机器学习开发语言
在大数据时代，Python已经成为最受追捧的语言。在本文中，让我们专注于Python的一个特定方面，它使其成为最强大的编程语言之一——Multi-Processing。在阅读本文之前，我建议您阅读我之前关于Python中的线程的文章，因为它可以为当前文章提供更好的上下文。多进程是什么？假设你是一名小学生，你的作业是让1200对数字相乘，这让你感到麻木。假设您能够在3秒内将一对数字相乘。那么总共需要
Python 并发 multiprocessing-Process lainegates python Python multiprocess
＊multiprocessing支持子进程、通信和共享数据、执行不同形式的同步。＊Process创建进程的类：Process([group[,target[,name[,args[,kwargs]]]]])，target表示调用对象，args表示调用对象的位置参数元组。kwargs表示调用对象的字典。Name为别名。Group实质上不使用。方法有：is_alive()、.join([timeout
python+playwright自动化测试(四)：元素操作(键盘鼠标事件)、文件上传觅远 python 自动化测试爬虫 python 自动化
目录鼠标事件悬停移动按键点击滚轮操作拖拽键盘事件输入文本内容type输入内容fill输入内容按键操作press文件上传下拉选/单选框/复选框滚动条操作鼠标事件悬停page.get_by_text('设置',exact=True).nth(1).hover()移动page.mouse.move(x=33,y=50)按键#点击操作可设置button参数，选择点击键["left","middle","r
Flask --（2）Flask 框架的诞生 feiyy404 flask
Flask诞生于2010年，是Arminronacher（人名）用Python语言基于Werkzeug工具箱编写的轻量级Web开发框架。Flask本身相当于一个内核，其他几乎所有的功能都要用到扩展（邮件扩展Flask-Mail，用户认证Flask-Login），都需要用第三方的扩展来实现。比如可以用Flask-extension加入ORM、窗体验证工具，文件上传、身份验证等。Flask没有默认使用
使用 Tokenizers 分割文本：深入了解与实践 AWsggdrg python
在开发应用自然语言处理（NLP）模型时，一个常见的需求是将文本拆分为较小的块，通常称为“tokens”。现代语言模型对tokens的数量有限制，因此在处理长文本时，我们需要仔细计算tokens以避免超过限制。本文将介绍如何使用不同的tokenizer来分割文本，并提供实用代码示例。技术背景介绍自然语言处理中的tokenization是指将文本拆分为更小的、可管理的单元，称为tokens。使用tok
LangServe：快速部署和运行LangChain的实用指南 AWsggdrg langchain python
LangServe：快速部署和运行LangChain的实用指南在AI应用开发领域，LangServe为开发者提供了便利的方式，将LangChain的运行单元和链路部署为RESTAPI。本文将通过技术解析和实战示例，带您深入了解LangServe的强大功能和应用场景。1.技术背景介绍LangServe是一个基于Python的库，整合了FastAPI和Pydantic技术，用于将LangChain的运
华为OD机试E卷 --矩形相交的面积--24年OD统一考试（Java & JS & Python & C & C++）飞码创造者最新华为OD机试题库2024 华为od java javascript python js c语言
文章目录题目描述输入描述输出描述用例题目解析JS算法源码Java算法源码python算法源码题目描述给出3组点坐标(x，y,w,h)，-1000
第17篇：python进阶：详解数据分析与处理猿享天开 python从入门到精通 python 开发语言
第17篇：数据分析与处理内容简介本篇文章将深入探讨数据分析与处理在Python中的应用。您将学习如何使用pandas库进行数据清洗与分析，掌握matplotlib和seaborn库进行数据可视化，以及处理大型数据集的技巧。通过丰富的代码示例和实战案例，您将能够高效地进行数据处理、分析和可视化，为数据驱动的决策提供有力支持。目录数据分析与处理概述什么是数据分析与处理数据分析的流程使用pandas进行
桌面上有多个球在同时运动，怎么实现球之间不交叉，即碰撞？换个号韩国红果果 html 小球碰撞
稍微想了一下，然后解决了很多bug，最后终于把它实现了。其实原理很简单。在每改变一个小球的x y坐标后，遍历整个在dom树中的其他小球，看一下它们与当前小球的距离是否小于球半径的两倍？若小于说明下一次绘制该小球（设为a）前要把他的方向变为原来相反方向（与a要碰撞的小球设为b），即假如当前小球的距离小于球半径的两倍的话，马上改变当前小球方向。那么下一次绘制也是先绘制b，再绘制a，由于a的方向已经改变
《高性能HTML5》读后整理的Web性能优化内容白糖_ html5
读后感先说说《高性能HTML5》这本书的读后感吧，个人觉得这本书前两章跟书的标题完全搭不上关系，或者说只能算是讲解了“高性能”这三个字，HTML5完全不见踪影。个人觉得作者应该首先把HTML5的大菜拿出来讲一讲，再去分析性能优化的内容，这样才会有吸引力。因为只是在线试读，没有机会看后面的内容，所以不胡乱评价了。
[JShop]Spring MVC的RequestContextHolder使用误区 dinguangx jeeshop 商城系统 jshop 电商系统
在spring mvc中，为了随时都能取到当前请求的request对象，可以通过RequestContextHolder的静态方法getRequestAttributes()获取Request相关的变量，如request, response等。在jshop中，对RequestContextHolder的
算法之时间复杂度周凡杨 java 算法时间复杂度效率
在计算机科学中，算法的时间复杂度是一个函数，它定量描述了该算法的运行时间。这是一个关于代表算法输入值的字符串的长度的函数。时间复杂度常用大O符号表述，不包括这个函数的低阶项和首项系数。使用这种方式时，时间复杂度可被称为是渐近的，它考察当输入值大小趋近无穷时的情况。这样用大写O()来体现算法时间复杂度的记法，
Java事务处理 g21121 java
一、什么是Java事务通常的观念认为，事务仅与数据库相关。事务必须服从ISO/IEC所制定的ACID原则。ACID是原子性（atomicity）、一致性（consistency）、隔离性（isolation）和持久性（durability）的缩写。事务的原子性表示事务执行过程中的任何失败都将导致事务所做的任何修改失效。一致性表示当事务执行失败时，所有被该事务影响的数据都应该恢复到事务执行前的状
Linux awk命令详解 510888780 linux
一. AWK 说明 awk是一种编程语言，用于在linux/unix下对文本和数据进行处理。数据可以来自标准输入、一个或多个文件，或其它命令的输出。它支持用户自定义函数和动态正则表达式等先进功能，是linux/unix下的一个强大编程工具。它在命令行中使用，但更多是作为脚本来使用。 awk的处理文本和数据的方式：它逐行扫描文件，从第一行到
android permission 布衣凌宇 Permission
<uses-permission android:name="android.permission.ACCESS_CHECKIN_PROPERTIES" ></uses-permission>允许读写访问"properties"表在checkin数据库中，改值可以修改上传 <uses-permission android:na
Oracle和谷歌Java Android官司将推迟 aijuans java oracle
北京时间 10 月 7 日，据国外媒体报道，Oracle 和谷歌之间一场等待已久的官司可能会推迟至 10 月 17 日以后进行，这场官司的内容是 Android 操作系统所谓的 Java 专利权之争。本案法官 William Alsup 称根据专利权专家 Florian Mueller 的预测，谷歌 Oracle 案很可能会被推迟。　　该案中的第二波辩护被安排在 10 月 17 日出庭，从目前看来
linux shell 常用命令 antlove linux shell command
grep [options] [regex] [files] /var/root # grep -n "o" * hello.c:1:/* This C source can be compiled with:
Java解析XML配置数据库连接(DOM技术连接 SAX技术连接) 百合不是茶 sax技术 Java解析xml文档 dom技术 XML配置数据库连接
XML配置数据库文件的连接其实是个很简单的问题,为什么到现在才写出来主要是昨天在网上看了别人写的,然后一直陷入其中,最后发现不能自拔所以今天决定自己完成 ,,,,现将代码与思路贴出来供大家一起学习 XML配置数据库的连接主要技术点的博客; JDBC编程 : JDBC连接数据库 DOM解析XML: DOM解析XML文件 SA
underscore.js 学习（二） bijian1013 JavaScript underscore
Array Functions 所有数组函数对参数对象一样适用。1.first _.first(array, [n]) 别名: head, take 返回array的第一个元素，设置了参数n，就
plSql介绍 bijian1013 oracle 数据库 plsql
/* * PL/SQL 程序设计学习笔记 * 学习plSql介绍.pdf * 时间：2010-10-05 */ --创建DEPT表 create table DEPT ( DEPTNO NUMBER(10), DNAME NVARCHAR2(255), LOC NVARCHAR2(255) ) delete dept; select
【Nginx一】Nginx安装与总体介绍 bit1129 nginx
启动、停止、重新加载Nginx nginx 启动Nginx服务器，不需要任何参数u nginx -s stop 快速(强制)关系Nginx服务器 nginx -s quit 优雅的关闭Nginx服务器 nginx -s reload 重新加载Nginx服务器的配置文件 nginx -s reopen 重新打开Nginx日志文件
spring mvc开发中浏览器兼容的奇怪问题 bitray jquery Ajax springMVC 浏览器上传文件
最近个人开发一个小的OA项目,属于复习阶段.使用的技术主要是spring mvc作为前端框架,mybatis作为数据库持久化技术.前台使用jquery和一些jquery的插件. 在开发到中间阶段时候发现自己好像忽略了一个小问题,整个项目一直在firefox下测试,没有在IE下测试,不确定是否会出现兼容问题.由于jquer
Lua的io库函数列表 ronin47 lua io
1、io表调用方式：使用io表，io.open将返回指定文件的描述，并且所有的操作将围绕这个文件描述　　io表同样提供三种预定义的文件描述io.stdin,io.stdout,io.stderr 　　2、文件句柄直接调用方式,即使用file:XXX()函数方式进行操作,其中file为io.open()返回的文件句柄　　多数I/O函数调用失败时返回nil加错误信息,有些函数成功时返回nil
java-26-左旋转字符串 bylijinnan java
public class LeftRotateString { /** * Q 26 左旋转字符串 * 题目：定义字符串的左旋转操作：把字符串前面的若干个字符移动到字符串的尾部。 * 如把字符串abcdef左旋转2位得到字符串cdefab。 * 请实现字符串左旋转的函数。要求时间对长度为n的字符串操作的复杂度为O(n)，辅助内存为O(1)。 */ pu
《vi中的替换艺术》-linux命令五分钟系列之十一 cfyme linux命令
vi方面的内容不知道分类到哪里好，就放到《Linux命令五分钟系列》里吧！今天编程，关于栈的一个小例子，其间我需要把”S.”替换为”S->”(替换不包括双引号)。其实这个不难，不过我觉得应该总结一下vi里的替换技术了，以备以后查阅。 1 所有替换方案都要在冒号“:”状态下书写。 2 如果想将abc替换为xyz，那么就这样 :s/abc/xyz/ 不过要特别
[轨道与计算]新的并行计算架构 comsci 并行计算
我在进行流程引擎循环反馈试验的过程中，发现一个有趣的事情。。。如果我们在流程图的每个节点中嵌入一个双向循环代码段，而整个流程中又充满着很多并行路由，每个并行路由中又包含着一些并行节点，那么当整个流程图开始循环反馈过程的时候，这个流程图的运行过程是否变成一个并行计算的架构呢？
重复执行某段代码 dai_lm android
用handler就可以了 private Handler handler = new Handler(); private Runnable runnable = new Runnable() { public void run() { update(); handler.postDelayed(this, 5000); } }; 开始计时 h
Java实现堆栈（list实现） datageek 数据结构——堆栈
public interface IStack<T> { //元素出栈，并返回出栈元素 public T pop(); //元素入栈 public void push(T element); //获取栈顶元素 public T peek(); //判断栈是否为空 public boolean isEmpty
四大备份MySql数据库方法及可能遇到的问题 dcj3sjt126com DB backup
一：通过备份王等软件进行备份前台进不去？用备份王等软件进行备份是大多老站长的选择，这种方法方便快捷，只要上传备份软件到空间一步步操作就可以，但是许多刚接触备份王软件的客用户来说还原后会出现一个问题：因为新老空间数据库用户名和密码不统一，网站文件打包过来后因没有修改连接文件，还原数据库是好了，可是前台会提示数据库连接错误，网站从而出现打不开的情况。解决方法：学会修改网站配置文件，大多是由co
github做webhooks：[1]钩子触发是否成功测试 dcj3sjt126com github git webhook
转自: http://jingyan.baidu.com/article/5d6edee228c88899ebdeec47.html github和svn一样有钩子的功能，而且更加强大。例如我做的是最常见的push操作触发的钩子操作，则每次更新之后的钩子操作记录都会在github的控制板可以看到！工具/原料 github 方法/步骤
">的作用" target="_blank">JSP中的作用蕃薯耀
JSP中<base href="<%=basePath%>">的作用 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
linux下SAMBA服务安装与配置 hanqunfeng linux
局域网使用的文件共享服务。一.安装包： rpm -qa | grep samba samba-3.6.9-151.el6.x86_64 samba-common-3.6.9-151.el6.x86_64 samba-winbind-3.6.9-151.el6.x86_64 samba-client-3.6.9-151.el6.x86_64 samba-winbind-clients
guava cache IXHONG cache
缓存，在我们日常开发中是必不可少的一种解决性能问题的方法。简单的说，cache 就是为了提升系统性能而开辟的一块内存空间。　　缓存的主要作用是暂时在内存中保存业务系统的数据处理结果，并且等待下次访问使用。在日常开发的很多场合，由于受限于硬盘IO的性能或者我们自身业务系统的数据处理和获取可能非常费时，当我们发现我们的系统这个数据请求量很大的时候，频繁的IO和频繁的逻辑处理会导致硬盘和CPU资源的
Query的开始--全局变量,noconflict和兼容各种js的初始化方法 kvhur JavaScript jquery css
这个是整个jQuery代码的开始，里面包含了对不同环境的js进行的处理，例如普通环境，Nodejs，和requiredJs的处理方法。还有jQuery生成$, jQuery全局变量的代码和noConflict代码详解完整资源： http://www.gbtags.com/gb/share/5640.htm jQuery 源码： (
美国人的福利和中国人的储蓄 nannan408
今天看了篇文章，震动很大，说的是美国的福利。美国医院的无偿入院真的是个好措施。小小的改善，对于社会是大大的信心。小孩，税费等，政府不收反补，真的体现了人文主义。美国这么高的社会保障会不会使人变懒？答案是否定的。正因为政府解决了后顾之忧，人们才得以倾尽精力去做一些有创造力，更造福社会的事情，这竟成了美国社会思想、人
N阶行列式计算(JAVA) qiuwanchi N阶行列式计算
package gaodai; import java.util.List; /** * N阶行列式计算 * @author 邱万迟 * */ public class DeterminantCalculation { public DeterminantCalculation(List<List<Double>> determina
C语言算法之打渔晒网问题 qiufeihu c 算法
如果一个渔夫从2011年1月1日开始每三天打一次渔，两天晒一次网，编程实现当输入2011年1月1日以后任意一天，输出该渔夫是在打渔还是在晒网。代码如下： #include <stdio.h> int leap(int a) /*自定义函数leap()用来指定输入的年份是否为闰年*/ { if((a%4 == 0 && a%100 != 0
XML中DOCTYPE字段的解析 wyzuomumu xml
DTD声明始终以!DOCTYPE开头,空一格后跟着文档根元素的名称,如果是内部DTD,则再空一格出现[],在中括号中是文档类型定义的内容. 而对于外部DTD,则又分为私有DTD与公共DTD,私有DTD使用SYSTEM表示,接着是外部DTD的URL. 而公共DTD则使用PUBLIC,接着是DTD公共名称,接着是DTD的URL. 私有DTD <!DOCTYPErootSYST

《Python自然语言处理（第二版）-Steven Bird等》学习笔记：第01章 语言处理与Python

第01章 语言处理与Python