AnyQ 环境配置(mac环境或者win环境)
安装 Docker
Docker image 获取
docker pull docker.paddlepaddlehub.com/paddle:latest-dev
Docker 容器
docker run -it -p 22:22 -p 8999:8999 --name AnyQ (imageid)
clone AnyQ
git clone https://github.com/baidu/AnyQ.git
make
cd AnyQ
mkdir build && cd build && cmake .. & make
其中cmake .. && make 的过程是一个艰难的过程,但凡出现网络卡住的时候,可以ctrl+c中止,然后通过重新执行cmake .. && make继续执行,直到完全成功
cd build
cmake .. && make
运行 Demo
cp ../tools/anyq_deps.sh .
sh anyq_deps.sh
因为下一步需要jdk加持,官方的docker镜像里面没有jdk所以这一步官方作者给的解决方案如下:
wget http://anyq.bj.bcebos.com/jdk-8u171-linux-x64.tar.gz
tar xzvf jdk-8u171-linux-x64.tar.gz
添加环境变量
export PATH=`pwd`/jdk1.8.0_171/bin:$PATH
安装好jdk后执行:
cp ../tools/solr -rp solr_script
sh solr_script/anyq_solr.sh solr_script/sample_docs
至此,所有的步骤完成,可以运行demo
./run_server
容器内使用curl调用服务
curl "127.0.0.1:8999/anyq?question=需要使用什么账号登录"
可以得到结果
I0415 07:02:55.794643 2268 utils.cpp:590] RAW: in json_to_analysis_item, query:需要使用什么账号登录
I0415 07:02:55.794643 2268 utils.cpp:594] RAW: in json_to_analysis_item, type:(null)
I0415 07:02:55.794643 2268 analysis_strategy.cpp:115] RAW: analysis_analysis size: 1
I0415 07:02:55.794643 2268 analysis_strategy.cpp:130] RAW: before use analysis strategy's analysis_result
I0415 07:02:55.794643 2268 utils.cpp:609] RAW: query:需要使用什么账号登录
I0415 07:02:55.794643 2268 utils.cpp:611] RAW: tokens_basic size is 0
I0415 07:02:55.794643 2268 analysis_strategy.cpp:148] RAW: method_process method_wordseg start
I0415 07:02:55.794643 2268 analysis_strategy.cpp:154] RAW: method_process method_wordseg sucess
I0415 07:02:55.794643 2268 analysis_strategy.cpp:163] RAW: after use analysis strategy's analysis_result
I0415 07:02:55.794643 2268 utils.cpp:609] RAW: query:需要使用什么账号登录
I0415 07:02:55.794643 2268 utils.cpp:611] RAW: tokens_basic size is 5
I0415 07:02:55.794643 2268 utils.cpp:614] RAW: tokens_basic buffer:需要 length:6 offset:0 analysis_term_weight:0.200000
I0415 07:02:55.794643 2268 utils.cpp:614] RAW: tokens_basic buffer:使用 length:6 offset:6 analysis_term_weight:0.200000
I0415 07:02:55.794643 2268 utils.cpp:614] RAW: tokens_basic buffer:什么 length:6 offset:12 analysis_term_weight:0.200000
I0415 07:02:55.794643 2268 utils.cpp:614] RAW: tokens_basic buffer:账号 length:6 offset:18 analysis_term_weight:0.200000
I0415 07:02:55.794643 2268 utils.cpp:614] RAW: tokens_basic buffer:登录 length:6 offset:24 analysis_term_weight:0.200000
I0415 07:02:55.794643 2268 equal_solr_q_builder.cpp:48] RAW: equal solr_fetch_q=question:需要使用什么账号登录
I0415 07:02:55.794643 2268 term_retrieval.cpp:109] RAW: solr_fetch_q=question:需要使用什么账号登录
I0415 07:02:55.794643 2268 term_retrieval.cpp:119] RAW: url = http://127.0.0.1:8900/solr/collection1/select
I0415 07:02:55.794643 2268 http_client.cpp:84] RAW: para_url: fl=id,question,answer&q=question%3A%E9%9C%80%E8%A6%81%E4%BD%BF%E7%94%A8%E4%BB%80%E4%B9%88%E8%B4%A6%E5%8F%B7%E7%99%BB%E5%BD%95&rows=15&wt=json
I0415 07:02:55.794643 2268 term_retrieval.cpp:185] RAW: Term retrieval item id=1, query=需要使用什么账号登录?;
I0415 07:02:55.794643 2268 term_retrieval.cpp:185] RAW: Term retrieval item id=3, query=AI服务支持推广账号使用么?;
I0415 07:02:55.794643 2268 term_retrieval.cpp:185] RAW: Term retrieval item id=4, query=为什么登录到百度云还要填写手机号、邮箱等信息?;
I0415 07:02:55.794643 2268 term_retrieval.cpp:185] RAW: Term retrieval item id=5, query=我以前是百度开发者中心用户,还需要进行开发者认证么?;
I0415 07:02:55.794643 2268 term_retrieval.cpp:185] RAW: Term retrieval item id=11, query=我是百度云的老用户,可以使用百度云的AK/SK么?;
I0415 07:02:55.794643 2268 term_retrieval.cpp:185] RAW: Term retrieval item id=14, query=如果我正在做一个比较大型的落地项目,需要更多配额如何接洽?;
I0415 07:02:55.794643 2268 term_retrieval.cpp:187] RAW: keyword retrieval result len=6
I0415 07:02:55.794643 2268 retrieval_strategy.cpp:129] RAW: before rm_dup, retrieval result len=6;
I0415 07:02:55.794643 2268 retrieval_strategy.cpp:132] RAW: after rm_dup retrieval result len=6;
I0415 07:02:55.794643 2268 rank_strategy.cpp:460] RAW: after rough rank, remain 6 cands
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:91] RAW: source query:需要使用什么账号登录
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:126] RAW: candidate query:需要使用什么账号登录?, socre:0.993676
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:126] RAW: candidate query:AI服务支持推广账号使用么?, socre:0.712404
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:126] RAW: candidate query:为什么登录到百度云还要填写手机号、邮箱等信息?, socre:0.442901
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:126] RAW: candidate query:我以前是百度开发者中心用户,还需要进行开发者认证么?, socre:0.288477
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:126] RAW: candidate query:我是百度云的老用户,可以使用百度云的AK/SK么?, socre:0.251138
I0415 07:02:55.794643 2268 simnet_paddle_sim.cpp:126] RAW: candidate query:如果我正在做一个比较大型的落地项目,需要更多配额如何接洽?, socre:0.178194
I0415 07:02:55.794643 2268 jaccard_sim.cpp:48] RAW: jaccard = 0.909091
I0415 07:02:55.794643 2268 jaccard_sim.cpp:48] RAW: jaccard = 0.277778
I0415 07:02:55.794643 2268 jaccard_sim.cpp:48] RAW: jaccard = 0.222222
I0415 07:02:55.794643 2268 jaccard_sim.cpp:48] RAW: jaccard = 0.125000
I0415 07:02:55.794643 2268 jaccard_sim.cpp:48] RAW: jaccard = 0.107143
I0415 07:02:55.794643 2268 jaccard_sim.cpp:48] RAW: jaccard = 0.054054
I0415 07:02:55.794643 2268 rank_strategy.cpp:474] RAW: after rank, remain 2 cands
I0415 07:02:55.794643 2268 rank_strategy.cpp:255] RAW: anyq query: 需要使用什么账号登录
I0415 07:02:55.794643 2268 rank_strategy.cpp:256] RAW: output
I0415 07:02:55.794643 2268 rank_strategy.cpp:258] RAW: query:需要使用什么账号登录?
I0415 07:02:55.794643 2268 rank_strategy.cpp:259] RAW: confidence:0.976759
I0415 07:02:55.794643 2268 anyq_strategy.cpp:77] RAW: input_query=需要使用什么账号登录;analysis_time=3.00ms;retrieval_num=6;retrieval_time=14.00ms;anyq_answer=您需要拥有一个百度账号,用来登录百度云,可以点击此处注册百度账户。如您以前拥有百度推广账户,同样可以登录百度云。;rank_item_num=2;rank_time=61.00ms;
遇到的问题
- 端口映射一定要正确,否则会导致git失败
- git失败,报错信息
RPC failed; curl 18 transfer closed with outstanding read data remaining
可以尝试添加配置
git config --global http.sslVerify false
git config --global http.postbuffer 524288000
在docker中添加无效的话那么在本地也要添加,我是这样解决的
- 下载git库经常失败的话可以考虑从已经安装好的人那里拷贝AnyQ/build/third_party/目录下的所有依赖拷贝到对应的目录下
docker cp /${addr}/AnyQ/build/third_party ${container_id}:/${workspace}/AnyQ/build/
或者根据AnyQ/cmake/external/
编译前记得删除所有的CMakeCache.txt,否则可能报错
cd AnyQ/build/
find ./ -name 'CMakeCache.txt' | xargs rm -rf
- 假如编译过程中遇到如下报错
c++: internal compiler error: Killed (program cc1plus)
则是内存不够导致,可以增加docker的内存大小或者增加swap大小
- 遇到xgboost编译报错
1、
make[2]: *** [CMakeFiles/anyq.dir/src/rank/rank_strategy.cpp.o] Error 1
make[1]: *** [CMakeFiles/anyq.dir/all] Error 2
make: *** [all] Error 2
2、
/root/projects/AI/AnyQ/build/third_party/include/dmlc/./logging.h:323:31: error: ‘DMLC_LOG_STACK_TRACE_SIZE’ was not declared in this scope
由于xgboost中做了新的修改,cmake中的cp导致正确的rabit被覆盖掉了,因此应修改xgboost.cmake文件
COMMAND cp -r ${XGBOOST_INSTALL_DIR}/dmlc-core/include/* ${XGBOOST_INSTALL_DIR}/rabit/include/rabit third_party/include/
关于 docker 的一些操作
- 进入container
docker exec -it AnyQ /bin/bash
- 启动container
docker start ${container_id}
其他
- docker app要使用用户名密码登录,且与docker hub的登录用户名一致
- linux下安装docker--centos
wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/docker-ce-17.09.0.ce-1.el7.centos.x86_64.rpm
yum localinstall docker-ce-17.09.0.ce-1.el7.centos.x86_64.rpm