maxen测试用例

maxen测试用例_第1张图片

作者:金良([email protected]) csdn博客: http://blog.csdn.net/u012176591

1.easyME in C++

This is a simple implementation of Maximum Entropy model.
Algorithms implemented include: GIS, SCGIS, LBFGS, Gaussian
smoothing and Exponential smoothing.

We provide ample interface to build your own applications,
please refer to src/maxEntModel.h for more information.

We provide two sample applications at ./examples, one for
training and one for testing.

The folder ./data contains a small test corpus, which is borrowed
from Lin’s toolkit

Using the following scripts to build and run our example programs
with the small corpus:
$ ./configure
$ make
$ ./train ./data/train Model
$ ./test Model ./data/test

NOTE: the example programs are likely to be insufficient for your
demands, refer to src/maxEntModel.h and build your own application.

这是个精简版的maxen模型,用C/C++写成,下载地址:https://github.com/nicyun/easyME

文件目录:

安装automake1.11, autoconf, gcc, g++,

执行编译命令make时出错。

make[1]: Entering directory /home/jin/Desktop/easyME-simple'
source='./src/dataManager.cpp' object='dataManager.o' libtool=no \
DEPDIR=.deps depmode=none /bin/bash ./depcomp \
g++ -DHAVE_CONFIG_H -I. -g -O2 -c -o dataManager.o
test -f ‘./src/dataManager.cpp’ || echo ‘./’./src/dataManager.cpp
/usr/share/automake-1.11/depcomp: No command. Try
/usr/share/automake-1.11/depcomp –help’ for more information.
make[1]: * [dataManager.o] Error 1

根据提示,知问题出在 /usr/share/automake-1.11/depcomp: No command 一句。

直接在命令行下运行,发现同样的错误

而在目录 /usr/share/automake-1.11/ 下发现depcomp 命令是存在的。
maxen测试用例_第2张图片
所以可以肯定depcomp命令不是全局命令,不能在其他目录直接使用,为了解决这个问题,建立depcomp 的软连接,命令如下:

sudo ln -s -f /usr/share/automake-1.11/depcomp depcomp

然后继续编译,正常。

训练过程:

jin@master11:~/Desktop/easyME-simple$ ./train ./data/train Model
iter = 100
method = LBFGS
tol = 0.001
sigma2 = 100
alpha = 0
begin Initialization…
Initialization successful.
num of train events: 47
max fetId is: 35
max classId is: 7
sum of all events’ count: 80
number of feature count is: 151
begin trainning…
begin LBFGS training …
Iter = 1 Loglike = 94.9416
Iter = 2 Loglike = 73.3042
Iter = 3 Loglike = 48.8047
Iter = 4 Loglike = 29.6198
Iter = 5 Loglike = 15.5418
Iter = 6 Loglike = 8.41153
Iter = 7 Loglike = 4.92776
Iter = 8 Loglike = 3.10029
Iter = 9 Loglike = 2.16672
Iter = 10 Loglike = 1.64891
Iter = 11 Loglike = 1.63946
Iter = 12 Loglike = 1.40477
Iter = 13 Loglike = 1.37438
Iter = 14 Loglike = 1.35029
Iter = 15 Loglike = 1.31948
Iter = 16 Loglike = 1.26783
Iter = 17 Loglike = 1.22105
Iter = 18 Loglike = 1.18695
Iter = 19 Loglike = 1.15854
Iter = 20 Loglike = 1.13925
Iter = 21 Loglike = 1.11949
Iter = 22 Loglike = 1.09984
Iter = 23 Loglike = 1.08094
Iter = 24 Loglike = 1.07462
Iter = 25 Loglike = 1.07144
Iter = 26 Loglike = 1.06549
Iter = 27 Loglike = 1.06473
Iter = 28 Loglike = 1.06206
Iter = 29 Loglike = 1.06127
Iter = 30 Loglike = 1.05915
Iter = 31 Loglike = 1.0575
Iter = 32 Loglike = 1.05679
Iter = 33 Loglike = 1.05583
Iter = 34 Loglike = 1.05542
Model Training successful.
time consumed by training is : 0
save model to Model …
model saved ok!

测试过程:

jin@master11:~/Desktop/easyME-simple$ ./test Model ./data/test
load model from Model …
model loaded ok!
num of fets: 35
num of classes: 7
tot feature count is: 151
file load and string map consumes: 0s
start testing …
all test time (including read file time) is : 0s
only predict time is : 0s
test ok and corr is : 0.809524

Please reference https://github.com/nicyun/easyME for the complete codes.

2.A simple C++ library for maximum entropy classification

This is a simple C++ library for maximum entropy classification
(also known as “multinomial logistic regression”).

The main features of this library are:

  • supporting L1/L2 regularization
  • fast parameter estimation algorithms (LBFGS , OWLQN ,
    and SGD
  • supporting real-valued features
  • saving/loading the model to/from a file

maxen测试用例_第3张图片
Follow the below steps to test the examples:

  • Compile the programs

    make

  • Run the examples

    ./bicycle

    this is a toy example of binary classification

    ./postagging

    a more realistic example (part-of-speech tagging)

jin@master11:~/Desktop/maxent-3.0.2$ ./bicycle
L1 regularizer = 1
preparing for estimation...done
number of samples = 4
number of features = 8
calculating empirical expectation...done
performing OWLQN
  1  obj(err) = -0.693147 (0.5000)
  2  obj(err) = -0.667614 (0.5000)
  3  obj(err) = -0.662032 (0.5000)
  4  obj(err) = -0.661814 (0.5000)
  5  obj(err) = -0.660544 (0.5000)
  6  obj(err) = -0.658878 (0.5000)
  7  obj(err) = -0.657125 (0.2500)
  8  obj(err) = -0.657001 (0.2500)
  9  obj(err) = -0.657000 (0.2500)
 10  obj(err) = -0.657000 (0.2500)
 11  obj(err) = -0.657000 (0.2500)
number of active features = 3

it's a BICYCLE !

0.499998    CAR
0.500002    BICYCLE

     0.000  CAR        red
     0.000  BICYCLE    red
     0.087  CAR        length
    -0.087  BICYCLE    length
     0.000  CAR        blue
     0.000  CAR        four wheels
     0.000  BICYCLE    yellow
     0.314  BICYCLE    two wheels
jin@master11:~/Desktop/maxent-3.0.2$ ./postagging L1 regularizer = 1 preparing for estimation...done number of samples = 2086 number of features = 17720 calculating empirical expectation...done performing OWLQN 1 obj(err) = -3.465736 (0.9856) heldout_logl(err) = -3.465736 (1.0000) 2 obj(err) = -3.369402 (0.2651) heldout_logl(err) = -3.371396 (0.3200) 3 obj(err) = -1.823736 (0.2143) heldout_logl(err) = -1.601867 (0.2600) 4 obj(err) = -1.288232 (0.1405) heldout_logl(err) = -0.950698 (0.1700) 5 obj(err) = -1.088961 (0.1170) heldout_logl(err) = -0.801912 (0.1600) 6 obj(err) = -0.948978 (0.0964) heldout_logl(err) = -0.748310 (0.1700) 7 obj(err) = -0.828432 (0.0911) heldout_logl(err) = -0.772196 (0.1900) 8 obj(err) = -0.786655 (0.0695) heldout_logl(err) = -0.724498 (0.1900) 9 obj(err) = -0.750608 (0.0657) heldout_logl(err) = -0.709734 (0.1800) 10 obj(err) = -0.725125 (0.0724) heldout_logl(err) = -0.661388 (0.1600) 11 obj(err) = -0.706894 (0.0575) heldout_logl(err) = -0.613361 (0.1600) 12 obj(err) = -0.696036 (0.0518) heldout_logl(err) = -0.600375 (0.1500) 13 obj(err) = -0.678451 (0.0499) heldout_logl(err) = -0.579126 (0.1400) 14 obj(err) = -0.663965 (0.0513) heldout_logl(err) = -0.585004 (0.1500) 15 obj(err) = -0.655236 (0.0537) heldout_logl(err) = -0.586777 (0.1600) 16 obj(err) = -0.648777 (0.0499) heldout_logl(err) = -0.587966 (0.1600) 17 obj(err) = -0.642385 (0.0475) heldout_logl(err) = -0.588471 (0.1600) 18 obj(err) = -0.638407 (0.0460) heldout_logl(err) = -0.580763 (0.1600) 19 obj(err) = -0.634670 (0.0451) heldout_logl(err) = -0.580354 (0.1300) 20 obj(err) = -0.631843 (0.0465) heldout_logl(err) = -0.585395 (0.1300) 21 obj(err) = -0.630647 (0.0446) heldout_logl(err) = -0.584640 (0.1400) 22 obj(err) = -0.629160 (0.0460) heldout_logl(err) = -0.583585 (0.1300) 23 obj(err) = -0.628097 (0.0431) heldout_logl(err) = -0.584579 (0.1300) 24 obj(err) = -0.627372 (0.0422) heldout_logl(err) = -0.585412 (0.1300) 25 obj(err) = -0.626417 (0.0422) heldout_logl(err) = -0.588624 (0.1300) 26 obj(err) = -0.625662 (0.0412) heldout_logl(err) = -0.591937 (0.1300) 27 obj(err) = -0.624526 (0.0407) heldout_logl(err) = -0.600038 (0.1400) 28 obj(err) = -0.623989 (0.0407) heldout_logl(err) = -0.600827 (0.1400) 29 obj(err) = -0.623642 (0.0412) heldout_logl(err) = -0.602044 (0.1400) 30 obj(err) = -0.623010 (0.0407) heldout_logl(err) = -0.606101 (0.1500) 31 obj(err) = -0.622608 (0.0403) heldout_logl(err) = -0.607224 (0.1500) 32 obj(err) = -0.622305 (0.0407) heldout_logl(err) = -0.612063 (0.1500) 33 obj(err) = -0.622125 (0.0398) heldout_logl(err) = -0.611105 (0.1400) 34 obj(err) = -0.621901 (0.0407) heldout_logl(err) = -0.613722 (0.1500) 35 obj(err) = -0.621700 (0.0398) heldout_logl(err) = -0.611328 (0.1400) 36 obj(err) = -0.621500 (0.0393) heldout_logl(err) = -0.613714 (0.1400) 37 obj(err) = -0.621345 (0.0407) heldout_logl(err) = -0.615883 (0.1500) 38 obj(err) = -0.621195 (0.0417) heldout_logl(err) = -0.616459 (0.1500) 39 obj(err) = -0.621063 (0.0417) heldout_logl(err) = -0.617795 (0.1500) 40 obj(err) = -0.620851 (0.0422) heldout_logl(err) = -0.620535 (0.1500) 41 obj(err) = -0.620704 (0.0417) heldout_logl(err) = -0.623514 (0.1500) 42 obj(err) = -0.620607 (0.0417) heldout_logl(err) = -0.623967 (0.1400) 43 obj(err) = -0.620523 (0.0412) heldout_logl(err) = -0.626674 (0.1300) 44 obj(err) = -0.620437 (0.0407) heldout_logl(err) = -0.630054 (0.1400) 45 obj(err) = -0.620383 (0.0403) heldout_logl(err) = -0.631417 (0.1400) 46 obj(err) = -0.620310 (0.0398) heldout_logl(err) = -0.632599 (0.1400) 47 obj(err) = -0.620254 (0.0393) heldout_logl(err) = -0.630917 (0.1400) 48 obj(err) = -0.620214 (0.0398) heldout_logl(err) = -0.631962 (0.1400) 49 obj(err) = -0.620177 (0.0398) heldout_logl(err) = -0.631763 (0.1400) 50 obj(err) = -0.620143 (0.0398) heldout_logl(err) = -0.630723 (0.1500) 51 obj(err) = -0.620119 (0.0393) heldout_logl(err) = -0.632282 (0.1400) 52 obj(err) = -0.620100 (0.0398) heldout_logl(err) = -0.631808 (0.1400) 53 obj(err) = -0.620075 (0.0403) heldout_logl(err) = -0.632659 (0.1400) 54 obj(err) = -0.620060 (0.0403) heldout_logl(err) = -0.631844 (0.1400) 55 obj(err) = -0.620040 (0.0403) heldout_logl(err) = -0.632050 (0.1400) 56 obj(err) = -0.620017 (0.0403) heldout_logl(err) = -0.631608 (0.1400) 57 obj(err) = -0.620002 (0.0407) heldout_logl(err) = -0.631577 (0.1500) 58 obj(err) = -0.619990 (0.0407) heldout_logl(err) = -0.631465 (0.1500) 59 obj(err) = -0.619978 (0.0403) heldout_logl(err) = -0.630988 (0.1500) 60 obj(err) = -0.619969 (0.0407) heldout_logl(err) = -0.630216 (0.1500) 61 obj(err) = -0.619961 (0.0407) heldout_logl(err) = -0.629818 (0.1500) 62 obj(err) = -0.619951 (0.0403) heldout_logl(err) = -0.628252 (0.1500) 63 obj(err) = -0.619941 (0.0393) heldout_logl(err) = -0.627779 (0.1500) 64 obj(err) = -0.619931 (0.0388) heldout_logl(err) = -0.626529 (0.1500) 65 obj(err) = -0.619926 (0.0388) heldout_logl(err) = -0.626376 (0.1500) 66 obj(err) = -0.619921 (0.0388) heldout_logl(err) = -0.626927 (0.1500) 67 obj(err) = -0.619918 (0.0388) heldout_logl(err) = -0.627095 (0.1500) 68 obj(err) = -0.619911 (0.0388) heldout_logl(err) = -0.627512 (0.1500) 69 obj(err) = -0.619903 (0.0388) heldout_logl(err) = -0.627728 (0.1500) 70 obj(err) = -0.619896 (0.0393) heldout_logl(err) = -0.627833 (0.1400) 71 obj(err) = -0.619887 (0.0393) heldout_logl(err) = -0.627532 (0.1400) 72 obj(err) = -0.619880 (0.0393) heldout_logl(err) = -0.626991 (0.1400) 73 obj(err) = -0.619874 (0.0393) heldout_logl(err) = -0.627604 (0.1400) 74 obj(err) = -0.619868 (0.0388) heldout_logl(err) = -0.627362 (0.1500) 75 obj(err) = -0.619865 (0.0393) heldout_logl(err) = -0.627754 (0.1500) 76 obj(err) = -0.619862 (0.0388) heldout_logl(err) = -0.627513 (0.1500) 77 obj(err) = -0.619859 (0.0393) heldout_logl(err) = -0.627489 (0.1500) 78 obj(err) = -0.619854 (0.0398) heldout_logl(err) = -0.627244 (0.1500) 79 obj(err) = -0.619851 (0.0393) heldout_logl(err) = -0.627035 (0.1500) 80 obj(err) = -0.619849 (0.0393) heldout_logl(err) = -0.626969 (0.1500) 81 obj(err) = -0.619845 (0.0393) heldout_logl(err) = -0.627016 (0.1500) 82 obj(err) = -0.619843 (0.0393) heldout_logl(err) = -0.627159 (0.1400) 83 obj(err) = -0.619842 (0.0393) heldout_logl(err) = -0.627183 (0.1500) 84 obj(err) = -0.619841 (0.0393) heldout_logl(err) = -0.627268 (0.1400) 85 obj(err) = -0.619840 (0.0393) heldout_logl(err) = -0.627303 (0.1400) 86 obj(err) = -0.619840 (0.0393) heldout_logl(err) = -0.627392 (0.1400) 87 obj(err) = -0.619839 (0.0393) heldout_logl(err) = -0.627411 (0.1400) 88 obj(err) = -0.619839 (0.0393) heldout_logl(err) = -0.627521 (0.1400) 89 obj(err) = -0.619838 (0.0393) heldout_logl(err) = -0.627507 (0.1400) number of active features = 752 accuracy = 2138 / 2527 = 0.846063 

Please reference http://www.logos.ic.i.u-tokyo.ac.jp/~tsuruoka/maxent/ for the complete codes.

3.maxent-python_cpp

首先安装g++,gcc,python等。

由于源文件里有个Fortran源码文件lbfgs.f ,所以要安装Fortran编译器,否则LBFGS优化算法将不能使用,而只能使用GIS算法。这里安装gfortran,这是一个Fortran编译器

sudo apt-get install gfortran

然后重启,确保Fortran编译器安装成功。
项目目录:

执行configure命令,检查系统环境,配置编译选项,生成Makefile.:

./configure

该命令会生成一系列编译器相关的信息,其中关于Fortran编译器的系信息如果包含以下内容,说明Fortran编译器安装成功。

checking for gfortran… gfortran
checking whether we are using the GNU Fortran compiler… yes
checking whether gfortran accepts -g… yes

如果没有问题的话,用make 命令进行编译,然后用’make unittest’ 生成最终的可执行文件,如下:
maxen测试用例_第4张图片

然后执行以下命令将其作为库文件安装到系统

make install

进入项目下的python目录,执行以下命令建立Python扩展,这样能够在Python中直接调用

maxen测试用例_第5张图片

然后回到 test目录执行runall.py 文件,测试该目录下的9个用例是否都能执行成功。如果出现以下信息,everything is OK.
这里写图片描述

注: doc目录下有个用户手册,介绍了最大熵原理在标注问题中的应用,积极该库的使用方法。缺点是实验用的WSJ语料库一时无法得到,无法进行实验验证。

你可能感兴趣的:(maxen测试用例)