Download the JDBC Driver for MySQL from http://mysql.spd.co.il/Downloads/Connector-J/mysql-connector-java-3.1.14.zip
Put the "mysql-connector-java-3.1.14-bin.jar" in
Solr Dir/example/lib
Solr can be configured to connect to a MySQL Data base using the DataImportHandler. To do so first in
solrconfig.xml
(This file would be in
Solr Dir/example/conf
) add a new requestHandler which is handled by DIH (DataImportHandler).
<requestHandler name ="/dataimport"
class ="org.apache.solr.handler.dataimport.DataImportHandler" >
<lst name ="defaults" >
<str name ="config" > data-config.xml</str>
</lst>
</requestHandler>
Create a new file called data-config.xml in the same directory and configure the database connection and table schema to reflect your database structure.
Sample data-config.xml
<dataConfig>
<dataSource type ="JdbcDataSource"
driver ="com.mysql.jdbc.Driver"
url ="jdbc:mysql://SERVER/DATABASE"
user ="USERNAME"
password ="PASSWORD" />
<document name ="content" >
<entity name ="node" query ="select node.nid AS nid,node_revisions.body AS body,node_revisions.title AS title from node,node_revisions where node.status = 1 and node.nid = node_revisions.nid and node.vid = node_revisions.vid" >
<field column ="nid" name ="id" />
<field column ="body" name ="body" />
<field column ="title" name ="title" />
</entity>
</document>
</dataConfig>
The dataSource configuration attributes , query in entity tag and database fields must be modified to match your database structure. The Query given in the example is a simple Join of the drupal node and node revisions tables.
Now restart or start ApacheSolr using java -jar start.jar .
Hit the full-commit url (http://SERVER:PORT/solr/dataimport?command=full-import) and your website would start getting indexed.
<response>
<lst name ="responseHeader" >
<int name ="status" > 0</int>
<int name ="QTime" > 0</int>
</lst>
<lst name ="initArgs" >
<lst name ="defaults" >
<str name ="config" > data-config.xml</str>
</lst>
</lst>
<str name ="command" > full-import</str>
<str name ="status" > idle</str>
<str name ="importResponse" />
<lst name ="statusMessages" >
<str name ="Total Requests made to DataSource" > 1</str>
<str name ="Total Rows Fetched" > 1056</str>
<str name ="Total Documents Skipped" > 0</str>
<str name ="Full Dump Started" > 2010-02-22 14:46:35</str>
<str name ="" >
Indexing completed. Added/Updated: 1056 documents. Deleted 0 documents.
</str>
<str name ="Committed" > 2010-02-22 14:46:42</str>
<str name ="Optimized" > 2010-02-22 14:46:42</str>
<str name ="Total Documents Processed" > 1056</str>
<str name ="Time taken " > 0:0:6.562</str>
</lst>
<str name ="WARNING" >
This response format is experimental. It is likely to change in the future.
</str>
</response>