elasticsearch-analysis-ik 源码包 下载地址:
https://github.com/medcl/elasticsearch-analysis-ik
注:
ik插件本来已为用户提供自定义词典扩展功能,只要修改配给文件即可:
在elasticsearch-7.2.0/plugins/ik/config目录下创建custom目录,并在目录内创建mydict.dic文件;
mydict.dic文件中添加自定义热词;
在elasticsearch-7.2.0/plugins/ik/config目录下修改IKAnalyzer.cfg.xml,修改内容如下:
/custom/mydict.dic
目前该插件支持热更新 IK 分词,通过上文在 IK 配置文件中提到的如下配置
location
location
其中 location
是指一个 url,比如 http://yoursite.com/getCustomDict
,该请求只需满足以下两点即可完成分词热更新。
Last-Modified
,一个是 ETag
,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。\n
即可。满足上面两点要求就可以实现热更新分词了,不需要重启 ES 实例。
可以将需自动更新的热词放在一个 UTF-8 编码的 .txt 文件里,放在 nginx 或其他简易 http server 下,当 .txt 文件修改时,http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇,并更新这个 .txt 文件。
个人体会:nginx方式比较简单容易实现,建议使用;
@RestController
@RequestMapping("/keyWord")
@Slf4j
public class KeyWordDict {
private String lastModified = new Date().toString();
private String etag = String.valueOf(System.currentTimeMillis());
@RequestMapping(value = "/hot", method = {
RequestMethod.GET,RequestMethod.HEAD}, produces="text/html;charset=UTF-8")
public String getHotWordByOracle(HttpServletResponse response,Integer type){
response.setHeader("Last-Modified",lastModified);
response.setHeader("ETag",etag);
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
String sql = "";
final ArrayList<String> list = new ArrayList<String>();
StringBuilder words = new StringBuilder();
try {
Class.forName("oracle.jdbc.driver.OracleDriver");
conn = DriverManager.getConnection(
"jdbc:oracle:thin:@192.168.114.13:1521:xe",
"test",
"test"
);
if(ObjectUtils.isEmpty(type)){
type = 99;
}
switch (type){
case 0:
sql = "select word from IK_HOT_WORD where type=0 and status=0";
break;
case 1:
sql = "select word from IK_HOT_WORD where type=1 and status=0";
break;
default:
sql = "select word from IK_HOT_WORD where type=99";
break;
}
stmt = conn.createStatement();
rs = stmt.executeQuery(sql);
while(rs.next()) {
String theWord = rs.getString("word");
System.out.println("hot word from mysql: " + theWord);
words.append(theWord);
words.append("\n");
}
return words.toString();
} catch (Exception e) {
e.printStackTrace();
} finally {
if(rs != null) {
try {
rs.close();
} catch (SQLException e) {
log.error("资源关闭异常:",e);
}
}
if(stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
log.error("资源关闭异常:",e);
}
}
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
log.error("资源关闭异常:",e);
}
}
}
return null;
}
@RequestMapping(value = "/update", method = RequestMethod.GET)
public void updateModified(){
lastModified = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());
etag = String.valueOf(System.currentTimeMillis());
}
}
注:
文件目录:/data/elasticsearch-7.2.0/plugins/ik/config/IKAnalyzer.cfg.xml
远程调用方法填写在“用户可以在这里配置远程扩展字典”下:
IK Analyzer 扩展配置
http://192.168.xx.xx:8080/keyWord/hot?type=0
https://github.com/medcl/elasticsearch-analysis-ik
在项目根目录下的config目录中添加config\jdbc-reload.properties配置文件:
jdbc.url=jdbc:oracle:thin:@192.168.xxx.xx:1521:xe
jdbc.user=test
jdbc.password=test
jdbc.reload.sql=select word from IK_HOT_WORD where type=0 and status=0
jdbc.reload.stop_word.sql=select word from IK_HOT_WORD where type=1 and status=0
period_time_seconds=60
在Dictionary类中添加db驱动:
static {
try {
//驱动默认选择oracle
logger.info("初始化驱动开始..............");
Class.forName("oracle.jdbc.driver.OracleDriver");
logger.info("初始化驱动完成..............");
} catch (Exception e) {
logger.error("初始化驱动失败..............",e);
}
}
在Dictionary构造方法中添加db配置文件读取方法:
//添加db配置文件读取
try {
Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties");
dbProps.load(new FileInputStream(file.toFile()));
} catch (IOException e) {
logger.error("db配置文件读取失败", e);
}
在Dictionary类中添加创建数据连接方法:
private Connection getConn() {
try {
// 创建数据连接
conn = DriverManager.getConnection(
dbProps.getProperty("jdbc.url"),
dbProps.getProperty("jdbc.user"),
dbProps.getProperty("jdbc.password")
);
} catch (Exception e) {
logger.error("创建数据连接失败..............",e);
}
return conn;
}
在Dictionary类中添加从oracle加载热词热更新词典:
public void loadOracleHotDict() throws Exception {
Statement stmt = null;
ResultSet rs = null;
try {
logger.info("hot words loading");
conn = getConn();
String sql = dbProps.getProperty("jdbc.reload.sql");
if(sql != null && !"".equals(sql)){
stmt = conn.createStatement();
rs = stmt.executeQuery(sql);
while(rs.next()) {
String theWord = rs.getString("word");
logger.info("hot word from oracle: " + theWord);
_MainDict.fillSegment(theWord.trim().toCharArray());
}
}
logger.info("hot words load end");
} catch (Exception e) {
logger.error("热词更新异常:",e);
throw new Exception("热词更新异常:",e);
} finally {
if(rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
在Dictionary类中添加调用上面热词更新的方法:
public void reLoadSQLDict() throws Exception {
this.loadOracleHotDict();
this.loadMyOracleStopWordDict();
}
在dic目录下创建OracleDictReloadThread类:
public class OracleDictReloadThread implements Runnable {
private static final Logger logger = ESPluginLoggerFactory.getLogger(OracleDictReloadThread.class.getName());
@Override
public void run() {
logger.info("reloading hot_word and stop_word dict from oracle");
try {
Dictionary.getSingleton().reLoadSQLDict();
} catch (Exception e) {
logger.error("调用热词更新方法异常:",e);
}
}
}
在Dictionary类的initial初始化方法中添加数据库更新热词线程调用:
public static synchronized void initial(Configuration cfg) {
if (singleton == null) {
synchronized (Dictionary.class) {
if (singleton == null) {
singleton = new Dictionary(cfg);
singleton.loadMainDict();
singleton.loadSurnameDict();
singleton.loadQuantifierDict();
singleton.loadSuffixDict();
singleton.loadPrepDict();
singleton.loadStopWordDict();
if(cfg.isEnableRemoteDict()){
// 建立监控线程
for (String location : singleton.getRemoteExtDictionarys()) {
// 10 秒是初始延迟可以修改的 60是间隔时间 单位秒
pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
}
for (String location : singleton.getRemoteExtStopWordDictionarys()) {
pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
}
}
//定时调用数据库查询热词
logger.info("查数据库更新线程启动--------------------");
pool.scheduleAtFixedRate(new OracleDictReloadThread(), 10, Integer.parseInt(dbProps.getProperty("period_time_seconds")), TimeUnit.SECONDS);
}
}
}
}
在pom.xml文件中添加数据库依赖
com.oracle.ojdbc
ojdbc8
19.3.0.0
根据es版本修改ik对应版本
7.2.0
在src\main\assemblies\plugin.xml中添加配置使得数据库相关依赖一并打包
在中添加:
true
true
com.oracle.ojdbc:ojdbc8
如更新ik插件以后,出现报错如下:
java.security.AccessControlException: access denied (java.net.SocketPermission172.16.xxx.xxx:3306 connect,resolve)
这是jar的安全策略的错误(具体没有深究),解决方案如下:
1、在ik源码的config中创建文件socketPolicy.policy
grant {
permission java.net.SocketPermission "business.mysql.youboy.com:3306","connect,resolve";
};
注:如有其它相关错误,请自行添加
2、在服务器上的es中的config目录文件jvm.option添加如下代码配置上面的文件路径
-Djava.security.policy=/data/elasticsearch-6.5.3/plugins/ik/config/socketPolicy.policy
https://blog.csdn.net/weixin_43315211/article/details/99650363
https://www.icode9.com/content-4-614375.html