ES elasticsearch-analysis-dynamic-synonym连接数据库动态更新synonym近义词

 前言

        在很多搜索场景中,我们希望能够搜索出填写词的近义词的商品。例如在商品搜索中,我填写“瓠瓜”,希望能够搜索出“西葫芦”,但“西葫芦”商品名称因不含有“瓠瓜”无法搜索出来。
        此时就需要将“瓠瓜”解析成“瓠瓜”和“西葫芦”,es的synonym,synonym过滤器就是提供了该功能,将词转为近义词再分词。
        如下,声明了一个分词器将“瓠瓜”和“西葫芦”定义为近义词

// 定义自定义分词
PUT info_goods_v1/_settings
{
  "analysis": {
    "filter": {
      "my_synonyms": {
        "type": "synonym_graph",
        "synonyms": [
          "瓠瓜,西葫芦"
        ]
      }
    },
    "analyzer": {
      "my_analyzer": {
        "type": "custom",
        "tokenizer": "ik_max_word",
        "filter": [
          "lowercase",
          "my_synonyms"
        ]
      }
    }
  }
}


// 使用“瓠瓜”分词
GET info_goods_v1/_analyze
{
  "analyzer": "my_analyzer",
  "text": "瓠瓜"
}


// 结果:
{
  "tokens" : [
    {
      "token" : "西葫芦",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "瓠",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0,
      "positionLength" : 2
    },
    {
      "token" : "葫芦",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "SYNONYM",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "瓜",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 2
    }
  ]
}

        可以看到,“瓠瓜” 被分词成为了“西葫芦”,“葫芦”,“瓠”和“瓜”。这是因为在自定分词器中,我们将“瓠瓜”和“西葫芦”定义成了近义词“瓠瓜=》 瓠瓜,西葫芦”,相当于先将“瓠瓜”转为“瓠瓜”和“西葫芦”,再依次对近义词集合(也就是“瓠瓜”和“西葫芦”)分词得到结果。

        是不是被“瓠瓜” 和“西葫芦”,不急缓一缓我们接着看...

        假如近义词发生了更新,我们该如何更新呢?一种方案是关闭索引,更新索引的分词器后再打开;或者也可以借助elasticsearch-analysis-dynamic-synonym插件来动态更新,该插件提供了基于接口和文件的动态更新,但是没有提供连接数据库。但是不要紧,我们可以稍稍修改一下就能使用了,这也是本文的主要内容。

        过程如下

修改源码实现连接数据库获取近义词汇

        下载elasticsearch-analysis-dynamic-synonym打开项目

一、修改pom.xml

        引入依赖

       
            mysql
            mysql-connector-java
            8.0.21
       

        将版本修改成跟你的es版本号一样的,比如我的是7.17.7

7.17.7

二、 修改main/assemblies/plugin.xml

        在标签下添加

       
           
            true
            true
           
                mysql:mysql-connector-java
           

       

        在标签下添加

   
       
            ${project.basedir}/config
            config
       

   

三、jdbc配置文件

        在项目根目录下创建config/jdbc.properties文件,写入以下内容

jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://cckg.liulingjie.cn:3306/test?useUnicode=true&characterEncoding=utf8&autoReconnect=true&useSSL=false&serverTimezone=Asia/Shanghai
jdbc.username=账号
jdbc.password=密码
#近义词sql查询语句。(注意要以words字段展示)
synonym.word.sql=SELECT `keys` AS words FROM es_synonym WHERE ifdel = '0'
#获取近义词最后更新时间,用来判断是否发生了更新。(注意要以maxModitime词汇显示)
synonym.lastModitime.sql=SELECT MAX(moditime) AS maxModitime FROM es_synonym
interval=10

  四、编写加载词汇类

        在com.bellszhu.elasticsearch.plugin.synonym.analysis包下,我们可以看到很多加载近义词汇的类,比如RemoteSynonymFile类就是通过接口来加载近义词词汇的。
        我们在该包下创建类DynamicSynonymFromDb,同时继承SynonymFile接口,该类是用来读取数据库的近义词汇的,代码如下:


/**
 * @author liulingjie
 * @date 2023/4/12 19:43
 */
public class DynamicSynonymFromDb implements SynonymFile {

    /**
     * 配置文件名
     */
    private final static String DB_PROPERTIES = "jdbc.properties";

    private static Logger logger = LogManager.getLogger("dynamic-synonym");

    private String format;

    private boolean expand;

    private boolean lenient;

    private Analyzer analyzer;

    private Environment env;

    /**
     * 动态配置类型
     */
    private String location;

    /**
     * 作用类型
     */
    private String group;

    private long lastModified;

    private Path conf_dir;

    private JdbcConfig jdbcConfig;

    DynamicSynonymFromDb(Environment env, Analyzer analyzer,
                         boolean expand, boolean lenient, String format, String location, String group) {
        this.analyzer = analyzer;
        this.expand = expand;
        this.lenient = lenient;
        this.format = format;
        this.env = env;
        this.location = location;
        this.group = group;
        // 读取配置文件
        setJdbcConfig();
        // 加载驱动
        try {
            Class.forName(jdbcConfig.getDriver());
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
        // 判断是否需要加载
        isNeedReloadSynonymMap();
    }

    /**
     * 读取配置文件
     */
    private void setJdbcConfig() {
        // 读取当前 jar 包存放的路径
        Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource()
                .getLocation().getPath())
                .getParent(), "config")
                .toAbsolutePath();
        this.conf_dir = filePath.resolve(DB_PROPERTIES);
        File file = conf_dir.toFile();
        Properties properties = null;
        try {
            properties = new Properties();
            properties.load(new FileInputStream(file));
        } catch (Exception e) {
            logger.error("load jdbc.properties failed");
            logger.error(e.getMessage());
        }
        jdbcConfig = new JdbcConfig(
                properties.getProperty("jdbc.driver"),
                properties.getProperty("jdbc.url"),
                properties.getProperty("jdbc.username"),
                properties.getProperty("jdbc.password"),
                properties.getProperty("synonym.word.sql"),
                properties.getProperty("synonym.lastModitime.sql"),
                Integer.valueOf(properties.getProperty("interval"))
        );
    }

    /**
     * 加载同义词词典至SynonymMap中
     * @return SynonymMap
     */
    @Override
    public SynonymMap reloadSynonymMap() {
        try {
            logger.info("start reload local synonym from {}.", location);
            Reader rulesReader = getReader();
            SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);
            return parser.build();
        } catch (Exception e) {
            logger.error("reload local synonym {} error!", e, location);
            throw new IllegalArgumentException(
                    "could not reload local synonyms file to build synonyms", e);
        }
    }

    /**
     * 判断是否需要进行重新加载
     * @return true or false
     */
    @Override
    public boolean isNeedReloadSynonymMap() {
        try {
            Long lastModify = getLastModify();
            if (lastModified < lastModify) {
                lastModified = lastModify;
                return true;
            }
        } catch (Exception e) {
            logger.error(e);
        }

        return false;
    }

    /**
     * 获取同义词库最后一次修改的时间
     * 用于判断同义词是否需要进行重新加载
     *
     * @return getLastModify
     */
    public Long getLastModify() {
        Connection connection = null;
        Statement statement = null;
        ResultSet resultSet = null;
        Long last_modify_long = null;
        try {
            connection = DriverManager.getConnection(
                    jdbcConfig.getUrl(),
                    jdbcConfig.getUsername(),
                    jdbcConfig.getPassword()
            );
            statement = connection.createStatement();
            resultSet = statement.executeQuery(jdbcConfig.getSynonymLastModitimeSql());
            while (resultSet.next()) {
                Timestamp last_modify_dt = resultSet.getTimestamp("maxModitime");
                last_modify_long = last_modify_dt.getTime();
            }
        } catch (SQLException e) {
            logger.error("获取同义词库最后一次修改的时间",e);
        } finally {
            try {
                if (resultSet != null) {
                    resultSet.close();
                }
                if (statement != null) {
                    statement.close();
                }
                if (connection != null) {
                    connection.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }

        }
        return last_modify_long;
    }

    /**
     * 查询数据库中的同义词
     * @return DBData
     */
    public ArrayList getDBData() {
        ArrayList arrayList = new ArrayList<>();
        Connection connection = null;
        Statement statement = null;
        ResultSet resultSet = null;
        try {
            connection = DriverManager.getConnection(
                    jdbcConfig.getUrl(),
                    jdbcConfig.getUsername(),
                    jdbcConfig.getPassword()
            );
            statement = connection.createStatement();
            String sql = jdbcConfig.getSynonymWordSql();
            if (group != null && !"".equals(group.trim())) {
                sql = String.format("%s AND `key_group` = '%s'", sql, group);
            }
            resultSet = statement.executeQuery(sql);
            while (resultSet.next()) {
                String theWord = resultSet.getString("words");
                arrayList.add(theWord);
            }
        } catch (SQLException e) {
            logger.error("查询数据库中的同义词异常",e);
        } finally {
            try {
                if (resultSet != null) {
                    resultSet.close();
                }
                if (statement != null) {
                    statement.close();
                }
                if (connection != null) {
                    connection.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
        return arrayList;
    }

    /**
     * 同义词库的加载
     * @return Reader
     */
    @Override
    public Reader getReader() {
        StringBuffer sb = new StringBuffer();
        try {
            ArrayList dbData = getDBData();
            for (int i = 0; i < dbData.size(); i++) {
                sb.append(dbData.get(i))
                        .append(System.getProperty("line.separator"));
            }
            logger.info("load the synonym from db");
        } catch (Exception e) {
            logger.error("reload synonym from db failed:", e);
        }
        return new StringReader(sb.toString());
    }
}

/**
 * 自己创建的配置类
 */

/**
 * @author liulingjie
 * @date 2022/11/30 16:03
 */
public class JdbcConfig {

    public JdbcConfig() {
    }

    public JdbcConfig(String driver, String url, String username, String password, String synonymWordSql, String synonymLastModitimeSql, Integer interval) {
        this.url = url;
        this.username = username;
        this.password = password;
        this.synonymWordSql = synonymWordSql;
        this.synonymLastModitimeSql = synonymLastModitimeSql;
        this.interval = interval;
        this.driver = driver;
    }

    /**
     * 驱动名
     */
    private String driver;

    /**
     * 数据库url
     */
    private String url;

    /**
     * 数据库账号
     */
    private String username;

    /**
     * 数据库密码
     */
    private String password;

    /**
     * 查询近义词汇的sql,注意是以words字段展示
     */
    private String synonymWordSql;

    /**
     * 获取近义词最近更新时间的sql
     */
    private String synonymLastModitimeSql;

    /**
     * 间隔,暂时无用
     */
    private Integer interval;
}

        然后在DynamicSynonymTokenFilterFactory类的getSynonymFile方法添加如下代码

ES elasticsearch-analysis-dynamic-synonym连接数据库动态更新synonym近义词_第1张图片

         注意group字段是我自己加的,你们可以删除或者传空!!!

 五、打包

        最后点击 package 打包

ES elasticsearch-analysis-dynamic-synonym连接数据库动态更新synonym近义词_第2张图片

        在~\target\releases可以看到压缩包

ES elasticsearch-analysis-dynamic-synonym连接数据库动态更新synonym近义词_第3张图片

六、配置放入ES

        在es安装路径\plugins下创建dynamic-synonym文件夹,将上面的压缩包解压放入该文件夹

ES elasticsearch-analysis-dynamic-synonym连接数据库动态更新synonym近义词_第4张图片

         最后重启es,可以看到以下内容

ES elasticsearch-analysis-dynamic-synonym连接数据库动态更新synonym近义词_第5张图片

七、尝试一下         

        然后,我们使用该过滤器类型。参考语句如下

POST info_goods/_close
PUT info_goods/_settings
{
  "analysis": {
    "filter": {
      "my_synonyms": {
        "type": "dynamic_synonym",
        "synonyms_path": "fromDB",
        "interval": 30    // 刷新间隔(秒)
      }
    },
    "analyzer": {
      "my_analyzer": {
        "type": "custom",
        "tokenizer": "ik_max_word",
        "filter": [
          "lowercase",
          "my_synonyms"
        ]
      }
    }
  }
}
POST info_goods/_open

           浅浅试一下

# 解析“瓠瓜”
GET info_goods/_analyze
{
  "analyzer": "my_analyzer",
  "text": "瓠瓜"
}

# 结果
{
  "tokens" : [
    {
      "token" : "西葫芦",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "瓠",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0,
      "positionLength" : 2
    },
    {
      "token" : "葫芦",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "SYNONYM",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "瓜",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 2
    }
  ]
}

         

        有效果了!大功搞成!嘿嘿^_^

        知道你们懒,源码最终插件包已上传,你们看需下载吧^_^

你可能感兴趣的:(数据库,中间件,web应用,数据库,java,前端,elasticsearch)