本文英文版地址:https://examples.javacodegeeks.com/enterprise-java/apache-solr/solr-dataimporthandler-example/
在Dataimporthandler的这个例子中,我们将讨论如何使用Dataimporthandler从数据库导入和索引数据。我们还将讨论从关系数据库导入数据所需的各种配置。 许多搜索应用将要索引的内容存储在结构化数据存储中,例如关系数据库。 除了数据库,DIH可以用于索引来自RSS和ATOM订阅源,电子邮件存储库和结构化XML的内容。
为了演示数据库的导入功能,我们将使用Solr服务器附带的样例hsql数据库。
这个例子的首选环境是solr-5.0.0。在开始Solr安装之前,请确保已安装JDK并已正确设置Java_Home。
首先,让我们从以下位置下载最新版本的Apache Solr:
http://lucene.apache.org/solr/downloads.html
Apache Solr已经经历了从4.xx到5.0.0的各种更改,因此如果您有不同版本的Solr,则需要下载5.xx版本以遵循此示例。
下载Solr zip文件后,将其解压缩到文件夹中。 提取的文件夹将如下所示:
bin
文件夹包含用于启动和停止服务器的脚本。 example
文件夹包含几个示例文件。 我们将使用其中一个来演示Solr如何对数据进行索引。server
文件夹包含用于写入所有Solr日志的logs
文件夹。在索引期间检查日志中的任何错误将有所帮助。 服务器下的solr
文件夹包含不同的集合或核心。 每个核/集合的配置和数据存储在相应的核/集合文件夹中。
Apache Solr附带了一个内置的Jetty服务器。 但在我们启动solr实例之前,我们必须验证JAVA_HOME是否在机器上设置。
我们可以使用命令行脚本启动服务器。 让我们从命令提示符转到bin目录并发出以下命令:
solr start
这将在默认端口8983下启动Solr服务器。
我们现在可以在浏览器中打开以下URL,并验证我们的Solr实例正在运行。 solr管理工具的具体细节超出了示例的范围。
http://localhost:8983/solr/
当Solr服务器在独立模式下启动时,配置称为核心,当在SolrCloud模式下启动时,配置称为集合。在这个例子中,我们将讨论独立的服务器和核心。 我们将暂停SolrCloud讨论以供日后使用。
首先,我们需要创建一个用于索引数据的Core。 Solr create命令有以下选项:
在本例中,我们将使用-c参数作为核心名称,使用-d参数作为配置目录。 对于所有其他参数,我们使用默认设置。
现在在命令窗口中导航solr-5.0.0\bin
文件夹,并发出以下命令:
solr create -c jcg -d basic_configs
我们可以在命令窗口中看到以下输出。
1 |
|
2 |
|
3 |
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
现在我们导航到以下URL,我们可以看到jcg core被填充在核心选择器中。 您还可以查看核心的统计信息。
http://localhost:8983/solr
当使用Dataimporthandler时,我们需要处理多种配置。 配置跨三个不同的文件
我们将在这里讨论所有的配置步骤。
要使用Dataimporthandler和hsql数据库,我们需要首先加载它们各自的库。 因此,可以在文件夹server\solr\jcg\conf
下配置solrconfig.xml以加载必需的库。 将以下配置添加到文件:
solrconfig.xml
1 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< luceneMatchVersion >5.0.0 luceneMatchVersion > span > < luceneMatchVersion > 5.0.0 luceneMatchVersion > span > |
2 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< lib dir = "${solr.install.dir:../../../..}/dist/" regex = "solr-dataimporthandler-.*\.jar" /> span > < lib dir =“$ {solr.install.dir:../../../ ..} / dist /”regex =“solr-dataimporthandler - 。* \。jar”/> span > |
3 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< lib dir = "${solr.install.dir:../../../..}/contrib/extraction/lib" regex = ".*\.jar" /> span > < lib dir =“$ {solr.install.dir:../../../ ..} / contrib / extract / lib”regex =“。* \。jar”/> span > |
现在,将hsqldb * .jar从路径example \ example-DIH \ solr \ db \ lib复制到contrib \ extract \ lib
Dataimporthandler必须通过requestHandler标签在solrconfig.xml
中注册。配置中唯一必需的参数是指定DIH配置文件位置的config
参数。 配置文件包含数据源的规范,如何获取数据,要提取的数据以及如何处理它以生成要发布到索引的Solr文档。
solrconfig.xml
1 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< requestHandler name = "/dataimport" class = "solr.DataImportHandler" > span > < requestHandler name =“/ dataimport”class =“solr.DataImportHandler”> span > |
2 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< lst name = "defaults" > span > < lst name =“defaults”> span > |
3 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< str name = "config" >db-data-config.xml str > span > < str name =“config”> db-data-config.xml str > span > |
4 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" > lst > span > lst > span > |
5 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" > requestHandler > span > requestHandler > span > |
6 |
7 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >
span > - > span > |
5 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "id" type = "string" indexed = "true" stored = "true" required = "true" multiValued = "false" /> span > < field name =“id”type =“string”indexed =“true”stored =“true”required =“true”multiValued =“false” |
6 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "name" type = "text_general" indexed = "true" stored = "true" /> span > < field name =“name”type =“text_general”indexed =“true”stored =“true”/> span > |
7 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "price" type = "float" indexed = "true" stored = "true" /> span > < field name =“price”type =“float”indexed =“true”stored =“true”/> span > |
8 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "manu" type = "text_general" indexed = "true" stored = "true" omitNorms = "true" /> span > < field name =“manu”type =“text_general”indexed =“true”stored =“true”omitNorms =“true”/> span > |
由于我们更改了配置文件,我们必须重新启动Solr实例来加载新的配置。 让我们发出以下命令。
solr stop -all
solr start
现在我们将启动管理控制台并完成数据的完全导入。 打开以下URL并单击执行按钮。
http://localhost:8983/solr/#/jcg/dataimport//dataimport
几秒钟后刷新页面,我们可以看到索引完成状态。
现在打开以下URL并单击执行查询按钮。 我们可以看到索引数据。
http://localhost:8983/solr/#/jcg/query
现在我们将看到如何向索引添加另一列。 我们将获取与每个项目相关联的功能。 为此,我们将编辑db-data-config.xml
并添加以下突出显示的行。
01 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< dataConfig > span > < dataConfig > span > |
02 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< dataSource driver = "org.hsqldb.jdbcDriver" url = "jdbc:hsqldb:${solr.install.dir}/example/example-DIH/hsqldb/ex" user = "sa" /> span > < dataSource driver =“org.hsqldb.jdbcDriver”url =“jdbc:hsqldb:$ {solr.install.dir} / example / example-DIH / hsqldb / ex”user =“sa”/> span > |
03 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< document > span > < document > span > |
04 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< entity name = "item" query = "select id, NAME, price from item" < entity name =“item”query =“select id,NAME,price from item” |
05 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> span > deltaQuery =“select item from item where last_modified>'$ {dataimporter.last_index_time}'”> span > |
06 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field column = "NAME" name = "name" /> span > < field column =“NAME”name =“name”/> span > |
07 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< entity name = "feature" < entity name =“feature” |
08 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'"> span > query =“select FEATURE,其中ITEM_ID ='$ {item.ID}'”> span > |
09 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "features" column = "DESCRIPTION" /> span > < field name =“features”column =“DESCRIPTION”/> span > |
10 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" > entity > span > entity > span > |
11 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" > entity > span > entity > span > |
12 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" > document > span > document > span > |
13 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" > dataConfig > span > dataConfig > span > |
现在修改schema.xml
以配置新添加的字段。
1 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >
span > - > span > |
5 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "id" type = "string" indexed = "true" stored = "true" required = "true" multiValued = "false" /> span > < field name =“id”type =“string”indexed =“true”stored =“true”required =“true”multiValued =“false” |
6 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "name" type = "text_general" indexed = "true" stored = "true" /> span > < field name =“name”type =“text_general”indexed =“true”stored =“true”/> span > |
7 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "price" type = "float" indexed = "true" stored = "true" /> span > < field name =“price”type =“float”indexed =“true”stored =“true”/> span > |
8 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "manu" type = "text_general" indexed = "true" stored = "true" omitNorms = "true" /> span > < field name =“manu”type =“text_general”indexed =“true”stored =“true”omitNorms =“true”/> span > |
9 |
< span class = "notranslate" onmouseover = "_tipon(this)" onmouseout = "_tipoff()" >< span class = "google-src-text" style = "direction: ltr; text-align: left" >< field name = "features" type = "text_general" indexed = "true" stored = "true" multiValued = "true" /> span > < field name =“features”type =“text_general”indexed =“true”stored =“true”multiValued =“true”/> span > |
由于我们更改了配置文件,我们必须重新启动Solr实例来加载新的配置。 让我们发出以下命令。
solr stop -all
solr start
现在我们再做一个完全进口。 打开以下URL并单击执行查询按钮。我们可以看到添加到每个项的功能。
http://localhost:8983/solr/#/jcg/query
这是Dataimporthandler的一个例子。