奶奶的,搞了好几天才搞出来,怪我太笨,现在记录下来以免我忘记
1.文章中已经提供了完全可以实现的代码,点击打开链接,下载下来
这个代码根本都不需要用pycharm打开,直接在cmd里面就能实现
2.
3.complie步骤,下载下来文件的命名是topical_word_embeddings-master,这个文件名是不符合complie的规则的,我把他修改成了topicalembeddings,然后
1.下载完成并安装。以本机为例,安装完成后的路径为:
1
|
C:\Users\Administrator\AppData\Local\Programs\Common\Microsoft\Visual C + + for Python\ 9.0
|
2.修改python安装目录下Lib\distutils\msvc9compiler.py文件(如有必要可能msvccompiler.py文件也需要做相应更改,视系统而定),找到get_build_version方法直接return 9.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
def get_build_version():
"""Return the version of MSVC that was used to build Python.
For Python 2.3 and up, the version number is included in
sys.version. For earlier versions, assume the compiler is MSVC 6.
"""
return 9.0
prefix = "MSC v."
i = sys.version.find(prefix)
if i = = - 1 :
return 6
i = i + len (prefix)
s, rest = sys.version[i:].split( " " , 1 )
majorVersion = int (s[: - 2 ]) - 6
minorVersion = int (s[ 2 : 3 ]) / 10.0
# I don't think paths are affected by minor version in version 6
if majorVersion = = 6 :
minorVersion = 0
if majorVersion > = 6 :
return majorVersion + minorVersion
# else we don't know what version of the compiler this is
return None
|
然后再找到find_vcvarsall方法直接返回vcvarsall.bat的路径(以自己机器安装后的路径为准)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
def find_vcvarsall(version):
"""Find the vcvarsall.bat file
At first it tries to find the productdir of VS 2008 in the registry. If
that fails it falls back to the VS90COMNTOOLS env var.
"""
return r 'C:\Users\Administrator\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\vcvarsall.bat'
vsbase = VS_BASE % version
try :
productdir = Reg.get_value(r "%s\Setup\VC" % vsbase,
"productdir" )
except KeyError:
productdir = None
# trying Express edition
if productdir is None :
vsbase = VSEXPRESS_BASE % version
try :
productdir = Reg.get_value(r "%s\Setup\VC" % vsbase,
"productdir" )
except KeyError:
productdir = None
log.debug( "Unable to find productdir in registry" )
if not productdir or not os.path.isdir(productdir):
toolskey = "VS%0.f0COMNTOOLS" % version
toolsdir = os.environ.get(toolskey, None )
if toolsdir and os.path.isdir(toolsdir):
productdir = os.path.join(toolsdir, os.pardir, os.pardir, "VC" )
productdir = os.path.abspath(productdir)
if not os.path.isdir(productdir):
log.debug( "%s is not a valid directory" % productdir)
return None
else :
log.debug( "Env var %s is not set or invalid" % toolskey)
if not productdir:
log.debug( "No productdir found" )
return None
vcvarsall = os.path.join(productdir, "vcvarsall.bat" )
if os.path.isfile(vcvarsall):
return vcvarsall
log.debug( "Unable to find vcvarsall.bat" )
return None
|
3.上述完成之后就可以在windwos下正常编译python的C扩展。
首先在 gensim下的models建立文件setup.py
from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize import numpy extensions = [ Extension("word2vec_inner", ["word2vec_inner.pyx"], include_dirs=[numpy.get_include()]) ] setup( name="word2vec_inner", ext_modules=cythonize(extensions), )
cmd进入gensim下的models目录运行
1
|
python setup.py install
|
python train.py wordmap_filename tassign_filename
to run the TWE-3 ####(注意这点非常重要啊,这个是在cmd里直接运行的都不需要打开代码)
output
:
word_vector.txt
and
topic_vector.txt
1. Get the gibbslda++, run it and get the tassign file and the wordmap.txt ####2. Use the command: python train.py wordmap_filename tassign_filename
to run the TWE-3 ####3. Output file are under the directory output
: word_vector.txt
and topic_vector.txt
好了好了搞了这么久的代码终于明白什么情况了,可以写论文了