Elasticsearch热词(新词/自定义词)更新配置

网络词语日新月异,如何让新出的网络热词(或特定的词语)实时的更新到我们的搜索当中呢 

先用 ik 测试一下 :

curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '
成龙原名陈港生
'
#返回
{
  "tokens" : [ {
    "token" : "成龙",
    "start_offset" : 1,
    "end_offset" : 3,
    "type" : "CN_WORD",
    "position" : 0
  }, {
    "token" : "原名",
    "start_offset" : 3,
    "end_offset" : 5,
    "type" : "CN_WORD",
    "position" : 1
  }, {
    "token" : "陈",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "CN_CHAR",
    "position" : 2
  }, {
    "token" : "港",
    "start_offset" : 6,
    "end_offset" : 7,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "生",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "CN_CHAR",
    "position" : 4
  } ]
}
ik 的主词典中没有”陈港生” 这个词,所以被拆分了。 
现在我们来配置一下 
修改 IK 的配置文件 :ES 目录/plugins/ik/config/ik/IKAnalyzer.cfg.xml 

修改如下:

  
    
    
    IK Analyzer 扩展配置  
          
    custom/mydict.dic;custom/single_word_low_freq.dic       
       
    custom/ext_stopword.dic  
       
    http://192.168.1.136/hotWords.php  
      
      
  
这里我是用的是远程扩展字典,因为可以使用其他程序调用更新,且不用重启 ES,很方便; 使用本地的文件进行词库扩展,需要重启ES 。当然使用自定义的 mydict.dic 字典也是很方便的,一行一个词,自己加就可以了 
既然是远程词典,那么就要是一个可访问的链接,可以是一个页面,也可以是一个txt的文档,但要保证输出的内容是 utf-8 的格式 

hotWords.php 的内容

$s = <<<'EOF'  
陈港生  
元楼  
蓝瘦  
EOF;  
header('Last-Modified: '.gmdate('D, d M Y H:i:s', time()).' GMT', true, 200);  
header('ETag: "5816f349-19"');  
echo $s; 

ik 接收两个返回的头部属性 Last-Modified 和 ETag,只要其中一个有变化,就会触发更新,ik 会每分钟获取一次 

重启 Elasticsearch ,查看启动记录,看到了三个词已被加载进来

[2016-10-31 15:08:57,749][INFO ][ik-analyzer              ] 陈港生  
[2016-10-31 15:08:57,749][INFO ][ik-analyzer              ] 元楼  
[2016-10-31 15:08:57,749][INFO ][ik-analyzer              ] 蓝瘦  

现在我们来测试一下,再次执行上面的请求,返回

...  
  }, {  
    "token" : "陈港生",  
    "start_offset" : 5,  
    "end_offset" : 8,  
    "type" : "CN_WORD",  
    "position" : 2  
  }, {  
... 

可以看到 ik 分词器已经匹配到了 “陈港生” 这个词。


Java服务器端实现:实现加载扩展词、添加扩展词、扩展词刷新接口

   
    http://ip:port/es/dic/loadExtDict  
@RestController
@RequestMapping("/es/dic")
public class DicController {
	
	private static final Logger logger = LoggerFactory.getLogger(DicController.class);
	
	@Autowired
	private DictRedis dictRedis;
	
	private static final String EXT_DICT_PATH = "E:\\ext_dict.txt";
	
	/**
	  * Description:加载扩展词
	  * @param response
	 */
	@RequestMapping(value = "/loadExtDict")
	public void loadExtDict(HttpServletResponse response) {
		logger.error("extDict get start");
		long count = dictRedis.incr(RedisKeyConstants.ES_EXT_DICT_FLUSH);
		//要保证每个节点都能获取到扩展词
		if(count > getEsClusterNodesNum()) {
			return;
		}
		
		String result = FileUtil.read(EXT_DICT_PATH);
		if(StringUtils.isEmpty(result)) {
			return;
		}
		
//		String result = "黄焖鸡米饭\n腾冲大救驾\n陈港生\n大西瓜\n大南瓜";
		try {
			response.setHeader("Last-Modified", TimeUtil.currentTimeHllDT().toString());
			response.setHeader("ETag",TimeUtil.currentTimeHllDT().toString());
			response.setContentType("text/plain; charset=UTF-8");
            PrintWriter out = response.getWriter();
            out.write(result);
            out.flush();
        } catch (IOException e) {
            logger.error("DicController loadExtDict exception" , e);
        }
		
		logger.error("extDict get end,result:{}", result);
	}
	
	/**
	  * Description:扩展词刷新
	  * @param response
	  * @return
	 */
	@RequestMapping(value = "/extDictFlush")
	public String extDictFlush() {
		String result = "ok";
		try {
			dictRedis.del(RedisKeyConstants.ES_EXT_DICT_FLUSH);
        } catch (Exception e) {
        	result = e.getMessage();
        }
		return result;
	}
	
	/**
	  * Description:添加扩展词典,多个词以逗号隔开“,”
	  * @param dict
	  * @return
	 */
	@RequestMapping(value = "/addExtDict")
	public String addExtDict(String dict) {
		String result = "ok";
		if(StringUtils.isEmpty(dict)) {
			return "添加词不能为空";
		}
		
		StringBuilder sb = new StringBuilder();
		String[] dicts = dict.split(",");
		for (String str : dicts) {
			sb.append("\n").append(str);
		}
		
		boolean flag = FileUtil.write(EXT_DICT_PATH, sb.toString());
		if(flag) {
			extDictFlush();
		} else {
			result = "fail";
		}
		
		return result;
	}
	
	/**
	  * Description:获取集群节点个数,若未获取到,默认10个
	  * @return
	 */
	private int getEsClusterNodesNum() {
		int num = 10;
		String esAddress = PropertyConfigurer.getString("es.address","http://172.16.32.69:9300,http://172.16.32.48:9300");
		List clusterNodes = Arrays.asList(esAddress.split(","));
		if(clusterNodes != null && clusterNodes.size() != 0) {
			num = clusterNodes.size();
		}
		return num;
	}
}

文件读写工具类:

public class FileUtil {

	private static final Logger logger = LoggerFactory.getLogger(FileUtil.class);

	/**
	  * Description:文件读取
	  *
	  * @param path
	  * @return
	  * @throws Exception
	 */
	public static String read(String path) {
		StringBuilder sb = new StringBuilder();
		BufferedReader reader = null;
		try {
			BufferedInputStream fis = new BufferedInputStream(new FileInputStream(new File(path)));
			reader = new BufferedReader(new InputStreamReader(fis, "utf-8"), 512);// 用512的缓冲读取文本文件

			String line = "";
			while ((line = reader.readLine()) != null) {
				sb.append(line).append("\n");
			}
		} catch (Exception e) {
			logger.error("FileUtil read exception", e);
		} finally {
			if(reader != null) {
				try {
					reader.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		return sb.toString();
	}

	/**
	  * Description:追加写入文件
	  *
	 */
	public static boolean write(String path, String content) {
		boolean flag = true;
		BufferedWriter out = null;
		try {
			out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(path), true))); // 追加的方法
			out.write(content);
		} catch (IOException e) {
			flag = false;
			logger.error("FileUtil write exception", e);
		} finally {
			try {
				if(out != null) {
					out.close();
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return flag;
	}

}



你可能感兴趣的:(elasticsearch,elasticsearch)