环境:Windows7 64位 + Python2.7
依赖:Numpy、Scipy包
正文 :
1>Theano的编译安装需要上面提到的两个包,还需要g++的编译器,不幸的是一般我们只有GCC。解决办法是安装Minwg,注意要64位的,否则不兼容。
下载地址:https://sourceforge.net/projects/mingw-w64/
按提示一步步安装就可以了,但要添加环境变量到Path(到bin目录一级就可以了),使用gcc -v命令查看是否安装成功
2>之前已经安装了pip,所以可以直接使用pip安装:
pip install theano
import theano 就会进行初次编译
官方文档:http://deeplearning.net/software/theano/tutorial/
译文1:http://www.gumpcs.com/index.php/archives/576
译文2:http://blog.csdn.net/walegahaha/article/details/50884627
译文3:http://www.cnblogs.com/charleshuang/p/3648804.html
译文4:http://www.cnblogs.com/charleshuang/p/3651843.html
还有某个大大的全部翻译版:
http://www.cnblogs.com/shouhuxianjian/category/699462.html
有时间再写心得。。。
softmax 的粗略实现,依然使用文本分类任务:
#coding=utf-8
from numpy import *
import theano
import theano.tensor as T
from sklearn.datasets import load_files
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
import warnings
warnings.filterwarnings("ignore")
def calculate_result(actual,pred):
m_precision = metrics.precision_score(actual,pred)
m_recall = metrics.recall_score(actual,pred)
m_acc = metrics.accuracy_score(actual,pred)
print 'predict info:'
print 'accuracy:{0:.3f}'.format(m_acc)
print 'precision:{0:.3f}'.format(m_precision)
print 'recall:{0:0.3f}'.format(m_recall)
print 'f1-score:{0:.3f}'.format(metrics.f1_score(actual,pred))
if __name__ == '__main__':
#load datasets
doc_train = load_files('training')
doc_test = load_files('test')
#Bool型特征(one-hot)
count_vec = CountVectorizer(binary=True,decode_error='replace')
doc_train_bool = count_vec.fit_transform(doc_train.data)
doc_test_bool = count_vec.transform(doc_test.data)
m, n = shape(doc_train_bool)
#generate a dataset: D = (input_values, target_class)
D = (doc_train_bool.toarray(), doc_train.target)
training_steps = 100
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.ivector("y")
w = theano.shared(zeros([n,10]), name="w")
# initialize the bias term
b = theano.shared(zeros(10), name="b")
print("Initial model:")
#print(w.get_value())
# Construct Theano expression graph
p = T.nnet.softmax(T.dot(x,w) + b)# Probability that target = i
prediction = T.argmax(p,1) # The prediction thresholded
xent = T.log(p[T.arange(y.shape[0]), y])
cost = -T.mean(xent) #The cost to minimize
gw, gb = T.grad(cost, [w, b]) # Compute the gradient of the cost
# Compile
train = theano.function(
inputs=[x,y],
outputs=[prediction,xent,cost],
updates=((w, w - 2 * gw), (b, b - 2 * gb)))
#predict = theano.function(inputs=[x], outputs=prediction)
# Train
for i in range(training_steps):
print i,
pred,xent,cost = train(D[0], D[1])
print cost
#Test
D1 = (doc_test_bool.toarray(), doc_test.target)
test = theano.function(inputs=[x], outputs=prediction)
calculate_result(doc_test.target,test(D1[0]))
结果:
predict info:
accuracy:0.879
precision:0.873
recall:0.879
f1-score:0.872