编辑距离是文本处理中很常见的一种判别相似度的方法,
Wikipedia
In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.Given two strings a and b on an alphabet Σ (e.g. the set of ASCII characters, the set of bytes [0..255], etc.), the edit distance d(a, b) is the minimum-weight series of edit operations that transforms a into b. One of the simplest sets of edit operations is that defined by Levenshtein in 1966:[2]
Baidu
编辑距离(Edit Distance),又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。一般来说,编辑距离越小,两个串的相似度越大。
例如将kitten转成sitting:
kitten->sitten (k→s)
sitten->sittin (e→i)
sittin->sitting (插入g)
俄罗斯科学家Vladimir Levenshtein在1965年提出这个概念。
在python中,我们通常使用import editdistance
来直接调用,但是,在Windows下我们直接pip install editdistance
的时候会报错
C:\Users\Work>pip install editdistance
Collecting editdistance
Using cached editdistance-0.3.1.tar.gz
Building wheels for collected packages: editdistance
Running setup.py bdist_wheel for editdistance ... error
Complete output from command C:\ProgramData\Anaconda3\envs\python2.7\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\work\\appdata\\local\\temp\\pip-build-w4myie\\editdistance\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d c:\users\work\appdata\local\temp\tmpnrvewtpip-wheel- --python-tag cp27:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-2.7
creating build\lib.win-amd64-2.7\editdistance
copying editdistance\__init__.py -> build\lib.win-amd64-2.7\editdistance
copying editdistance\_editdistance.h -> build\lib.win-amd64-2.7\editdistance
copying editdistance\def.h -> build\lib.win-amd64-2.7\editdistance
running build_ext
building 'editdistance.bycython' extension
creating build\temp.win-amd64-2.7
creating build\temp.win-amd64-2.7\Release
creating build\temp.win-amd64-2.7\Release\editdistance
C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I./editdistance -IC:\ProgramData\Anaconda3\envs\python2.7\include -IC:\ProgramData\Anaconda3\envs\python2.7\PC /Tpeditdistance/_editdistance.cpp /Fobuild\temp.win-amd64-2.7\Release\editdistance/_editdistance.obj
_editdistance.cpp
editdistance/_editdistance.cpp : warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss
C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Include\xlocale(342) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
editdistance/_editdistance.cpp(117) : error C2059: syntax error : 'if'
editdistance/_editdistance.cpp(118) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(119) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(120) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(121) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(122) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(123) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(124) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(125) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(126) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(127) : error C2059: syntax error : 'return'
editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}'
editdistance/_editdistance.cpp(128) : error C2143: syntax error : missing ';' before '}'
editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}'
error: command 'C:\\Users\\Work\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\amd64\\cl.exe' failed with exit status 2
----------------------------------------
Failed building wheel for editdistance
Running setup.py clean for editdistance
Failed to build editdistance
Installing collected packages: editdistance
Running setup.py install for editdistance ... error
Complete output from command C:\ProgramData\Anaconda3\envs\python2.7\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\work\\appdata\\local\\temp\\pip-build-w4myie\\editdistance\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\work\appdata\local\temp\pip-xfiuou-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-2.7
creating build\lib.win-amd64-2.7\editdistance
copying editdistance\__init__.py -> build\lib.win-amd64-2.7\editdistance
copying editdistance\_editdistance.h -> build\lib.win-amd64-2.7\editdistance
copying editdistance\def.h -> build\lib.win-amd64-2.7\editdistance
running build_ext
building 'editdistance.bycython' extension
creating build\temp.win-amd64-2.7
creating build\temp.win-amd64-2.7\Release
creating build\temp.win-amd64-2.7\Release\editdistance
C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I./editdistance -IC:\ProgramData\Anaconda3\envs\python2.7\include -IC:\ProgramData\Anaconda3\envs\python2.7\PC /Tpeditdistance/_editdistance.cpp /Fobuild\temp.win-amd64-2.7\Release\editdistance/_editdistance.obj
_editdistance.cpp
editdistance/_editdistance.cpp : warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss
C:\Users\Work\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Include\xlocale(342) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
editdistance/_editdistance.cpp(117) : error C2059: syntax error : 'if'
editdistance/_editdistance.cpp(118) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(119) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(120) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(121) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(122) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(123) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(124) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(125) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(126) : error C2059: syntax error : 'else'
editdistance/_editdistance.cpp(127) : error C2059: syntax error : 'return'
editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}'
editdistance/_editdistance.cpp(128) : error C2143: syntax error : missing ';' before '}'
editdistance/_editdistance.cpp(128) : error C2059: syntax error : '}'
error: command 'C:\\Users\\Work\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\amd64\\cl.exe' failed with exit status 2
----------------------------------------
Command "C:\ProgramData\Anaconda3\envs\python2.7\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\work\\appdata\\local\\temp\\pip-build-w4myie\\editdistance\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\work\appdata\local\temp\pip-xfiuou-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\work\appdata\local\temp\pip-build-w4myie\editdistance\
于是,我就去 PYPI 下载了一下 editdistance-0.3.1.tar.gz
然后解压,在本地安装
$ python setup.py install
一样的报错,但是这一次我注意到了一点:
editdistance/ _editdistance.cpp (117) : error C2059: syntax error
于是我在文件夹里找到这个cpp文件,打开:
好的我懂了……
我把文件里所有的日文注释全都删掉了,然后再回到setup.py的位置,
$ python setup.py install
$ python -c "import editdistance"
好的,搞定了~