全文索引-lucene,solr,nutch,hadoop之solr

     上一节大概讲了一下lucene,但真正运用在项目中的并不多,运用的最多的当属于solr,solr是对lucene的封装,形成一个独立的服务,专门提供索引,分词,搜索的服务,一般在项目中,大概的布局也是这样,项目一般分好多个模块,而搜索则使用solr专门提供一个服务,别的模块需要使用搜索的功能时,则使用solrj 来调用solr的搜索功能获取结果。

    而且solr已经默认启用了近实时搜索的功能,还有高亮的功能,使其在项目中非常容易上手。下面大概说下我在上个公司时使用solr做成的搜索服务。

  关系:在商城里面用户可以开店,用户可以在自己的店铺页面添加产品

  需求:1、能够根据关键字搜索指定的商店(根据店主设置的搜索关键字-以逗号隔开,和店铺描述内容进行关键字搜索)

             2、能够根据关键字搜索指定的产品(根据店主设置的搜索关键字-以逗号隔开,和产品描述内容进行关键字搜索)

             3、能够对产品进行多个属性同时进行搜索,比如搜索颜色为红 , 操作系统为ANDROID ,外形为直板,价格位于2000到3000的产品

 

  实现:

   利用solr提供搜索分词服务和solrj插件调用solr服务,以及mmseg中文词库能够多中文进行分词

    solr的设置

全文索引-lucene,solr,nutch,hadoop之solr_第1张图片

                                                                                                  图一 solr安装

   solr分为solr_home和solr_web 和tomcat

      solr_home是solr的核心,提供核心功能,比如分词,搜索。

      solr_web 对外提供一个管理,查询的页面,也就是对外的接口。

      tomcat提供对外服务。

 

   solr_home的设置

全文索引-lucene,solr,nutch,hadoop之solr_第2张图片

                                                                                             图2 solr_home设置

其中

lib下面放的mmseg4j-all-1.8.5.jar,提供中文分词

dic目录下面放的是中文词库

data目录下面是solr进行分词后的数据,索引

conf目录下面放的是solr的配置文件,比较重要的是schema.xml

schema.xml中添加的内容:

types里面添加

 
       <fieldType name="textComplex" class="solr.TextField" positionIncrementGap="100" >    
            <analyzer>    
               <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="./dic"/>    
               <filter class="solr.LowerCaseFilterFactory"/>    
           </analyzer>    
       </fieldType>    
  
     <fieldType name="textMaxWord" class="solr.TextField" positionIncrementGap="100" >    
        <analyzer>    
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="./dic"/>    
            <filter class="solr.LowerCaseFilterFactory"/>    
        </analyzer>    
     </fieldType>    
	<field name="pkey" type="text_ws" indexed="true" stored="false" />
	<field name="pshopid" type="int" indexed="true" stored="true" />
	<field name="pname" type="textSimple" indexed="false" stored="true" />    
    <field name="purl" type="textComplex" indexed="false" stored="true"/>    
    <field name="pprice" type="float" indexed="true" stored="true"/> 
	<field name="queryStr" type="text_ws" indexed="true" stored="true"/>
	<field name="cid" type="int" indexed="true" stored="true"/>

	<field name="shopName" type="textSimple" indexed="true" stored="true"/>
	<field name="shopDetail" type="text_ws" indexed="true" stored="true"/>
	<field name="shopImage" type="textSimple" indexed="false" stored="true"/>
	<field name="shopKey" type="text_ws" indexed="true" stored="false"/>
	
	<field name="simplemmseg" type="textSimple" indexed="true" stored="true"/>    
    <field name="complexmmseg" type="textComplex" indexed="true" stored="true"/>    
    <field name="maxwordmmseg" type="textMaxWord" indexed="true" stored="true"/> 

<fieldType name="textSimple" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="./dic"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

 fields里面添加

	<field name="pkey" type="text_ws" indexed="true" stored="false" />
	<field name="pshopid" type="int" indexed="true" stored="true" />
	<field name="pname" type="textSimple" indexed="false" stored="true" />    
    <field name="purl" type="textComplex" indexed="false" stored="true"/>    
    <field name="pprice" type="float" indexed="true" stored="true"/> 
	<field name="queryStr" type="text_ws" indexed="true" stored="true"/>
	<field name="cid" type="int" indexed="true" stored="true"/>

	<field name="shopName" type="textSimple" indexed="true" stored="true"/>
	<field name="shopDetail" type="text_ws" indexed="true" stored="true"/>
	<field name="shopImage" type="textSimple" indexed="false" stored="true"/>
	<field name="shopKey" type="text_ws" indexed="true" stored="false"/>
	
	<field name="simplemmseg" type="textSimple" indexed="true" stored="true"/>    
    <field name="complexmmseg" type="textComplex" indexed="true" stored="true"/>    
    <field name="maxwordmmseg" type="textMaxWord" indexed="true" stored="true"/> 


下面添加

<copyField source="shopName" dest="shopKey"/>
<copyField source="shopDetail" dest="shopKey"/>

   <copyField source="cat" dest="text"/>
   <copyField source="name" dest="text"/>
   <copyField source="manu" dest="text"/>
   <copyField source="features" dest="text"/>
   <copyField source="includes" dest="text"/>
   <copyField source="manu" dest="manu_exact"/>

   <!-- Copy the price into a currency enabled field (default USD) -->
   <copyField source="price" dest="price_c"/>


这样solr_home便设置完毕,已经可以提供索引搜索服务,但还需要solr_web 来对外提供接口,我们才能够操作。

solr_web

 solr_web是solr自带的一个web项目,部署到tomcat下面即可,更改的地方如下

修改tomcat的server.xml文件如下:


全文索引-lucene,solr,nutch,hadoop之solr_第3张图片

                                                                                      图3 编码

                                                                                                                               图4 映射路径

这样solr_web便设置完毕,这是启动tomcat,就可以查看solr的界面。

 调用模块

 利用solrj调用solr获取结果,代码如下:

package net.b2c.a.solr;

import java.io.Serializable;

import org.apache.solr.client.solrj.beans.Field;

public class SolrProduct implements Serializable{
	/*
	 * <field name="pkey" type="text_ws" indexed="true" stored="false"/>
	<field name="pshopid" type="int" indexed="true" stored="true"/>
	<field name="pname" type="textSimple" indexed="false" stored="true"/>    
    <field name="purl" type="textComplex" indexed="false" stored="true"/>    
    <field name="pprice" type="float" indexed="true" stored="true"/> 
    <field name="queryStr" type="text_ws" indexed="true" stored="queryStr"/>
	<field name="cid" type="int" indexed="true" stored="true"/>
    */ 
	
	@Field
	private String id;
	@Field
	private int pshopid;
	@Field
	private String pname;
	@Field
	private String purl;
	@Field("pprice")
	private float price;
	@Field
	private String pkey;
	//类别id
	@Field
	private int cid;
	//满足查询的条件
	@Field
	private String queryStr; //颜色@红  操作系统@ANDROID 外形@直板
	
	
	public String getPkey() {
		return pkey;
	}
	public void setPkey(String pkey) {
		this.pkey = pkey;
	}
	public String getId() {
		return id;
	}
	public void setId(String id) {
		this.id = id;
	}
	public String getPname() {
		return pname;
	}
	public void setPname(String pname) {
		this.pname = pname;
	}
	public String getPurl() {
		return purl;
	}
	public void setPurl(String purl) {
		this.purl = purl;
	}

	public float getPrice() {
		return price;
	}
	public void setPrice(float price) {
		this.price = price;
	}
	public int getPshopid() {
		return pshopid;
	}
	public void setPshopid(int pshopid) {
		this.pshopid = pshopid;
	}
	
	
	public int getCid() {
		return cid;
	}
	public void setCid(int cid) {
		this.cid = cid;
	}
	public String getQueryStr() {
		return queryStr;
	}
	public void setQueryStr(String queryStr) {
		this.queryStr = queryStr;
	}
	@Override
	public String toString() {
		return "SolrProduct [id=" + id + ", pshopid=" + pshopid + ", pname="
				+ pname + ", purl=" + purl + ", price=" + price + ", pkey="
				+ pkey + ", cid=" + cid + ", queryStr=" + queryStr + "]";
	}
	
	
	
	
	

}


 

package net.b2c.a.solr;

import java.io.Serializable;

import org.apache.solr.client.solrj.beans.Field;

public class SolrShop implements Serializable{
	/*
	 * <field name="shopName" type="textSimple" indexed="true" stored="true"/>
	<field name="shopDetail" type="textSimple" indexed="true" stored="true"/>
	<field name="shopImage" type="textSimple" indexed="false" stored="true"/>
	<field name="shopKey" type="text_ws" indexed="true" stored="false"/>
	*/
	
	
	
	@Field
	private String id;
	@Field
	private String shopName;
	@Field
	private String shopDetail;
	@Field
	private String shopImage;
	@Field
	private String shopKey;
	
	public String getId() {
		return id;
	}
	public void setId(String id) {
		this.id = id;
	}
	public String getShopName() {
		return shopName;
	}
	public void setShopName(String shopName) {
		this.shopName = shopName;
	}
	public String getShopDetail() {
		return shopDetail;
	}
	public void setShopDetail(String shopDetail) {
		this.shopDetail = shopDetail;
	}
	public String getShopImage() {
		return shopImage;
	}
	public void setShopImage(String shopImage) {
		this.shopImage = shopImage;
	}
	public String getShopKey() {
		return shopKey;
	}
	public void setShopKey(String shopKey) {
		this.shopKey = shopKey;
	}
	@Override
	public String toString() {
		return "SolrShop [id=" + id + ", shopName=" + shopName
				+ ", shopDetail=" + shopDetail + ", shopImage=" + shopImage
				+ ", shopKey=" + shopKey + "]";
	}
	
}


 

package net.b2c.a.solr;

import java.util.List;

public class SolrResult {
	private List<SolrProduct> queryProducts;
	private long totalNum;
	private List<SolrShop> queryShops;
	public List<SolrProduct> getQueryProducts() {
		return queryProducts;
	}
	public void setQueryProducts(List<SolrProduct> queryProducts) {
		this.queryProducts = queryProducts;
	}
	public long getTotalNum() {
		return totalNum;
	}
	public void setTotalNum(long totalNum) {
		this.totalNum = totalNum;
	}
	public List<SolrShop> getQueryShops() {
		return queryShops;
	}
	public void setQueryShops(List<SolrShop> queryShops) {
		this.queryShops = queryShops;
	}
	@Override
	public String toString() {
		return "SolrResult [queryProducts=" + queryProducts + ", totalNum="
				+ totalNum + ", queryShops=" + queryShops + "]";
	}
	
	
	
}


 

package net.b2c.a.solr;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.Properties;


import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;

public class SolrUtil {
	private  static String URL = "";
	private final static String PKEY = "pkey";
	private final static String PNAME = "pname";
	private final static String SHOPKEY = "shopKey";
	private final static String SHOPIDPRE = "shop";
	
	private static SolrServer server = null;
	static {
		try {
			String path = SolrUtil.class.getClassLoader().getResource("").toURI().getPath();
			Properties property = new Properties();	
			property.load(new FileInputStream(path+"../resource.properties"));	
			URL = property.getProperty("solrUrl");
			System.out.println("**************URL************************"+URL);
			server = new CommonsHttpSolrServer(URL);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	
	/**
	 * 添加商品关键词
	 * @param product
	 * @return
	 */
	public static boolean addOrUpdateProduct(SolrProduct product) {
		if(product == null)return false;
		try {		
			server.addBean(product);
			server.commit();
			return true;
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (SolrServerException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		return false;
	}
	
	/**
	 * 添加商店关键词
	 * @param product
	 * @return
	 */
	public static boolean addOrUpdateShop(SolrShop shop) {
		if(shop == null)return false;
		try {
			shop.setId(SHOPIDPRE+shop.getId());
			server.addBean(shop);
			server.commit();
			return true;
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (SolrServerException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		return false;
	}
	
	/**
	 * 根据产品id删除产品
	 * @param id
	 */
	public static void deleteProductById(long id)
	{
		
		try {
			server.deleteById(id+"");
			server.commit();			
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (SolrServerException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
	/**
	 * 根据id删除商店
	 * @param id
	 */
	public static void deleteShopById(long id)
	{
		
		try {
			server.deleteById(SHOPIDPRE+id);
			server.commit();			
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (SolrServerException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
	
	/**
	 * 系统内查询
	 * @param queryName 查询字符串
	 * @param start 开始行号 ,从1开始
	 * @param limit 每页大小
	 * @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
	 */
	public static SolrResult queryProduct(String queryName,int start,int limit) {

		SolrResult result = new SolrResult();
		List<SolrProduct> products = new LinkedList<SolrProduct>();
		result.setQueryProducts(products);
		if(queryName==null ||queryName.trim().equals(""))return result;
		

		try {
			SolrQuery query = new SolrQuery(PKEY + ":" + queryName);
			query.setHighlight(true)
					.setHighlightSimplePre("<span class='solrhighligter'>")
					.setHighlightSimplePost("</span>").setStart(0).setRows(5);
			query.setParam("hl.fl", PKEY+","+PNAME);
			query.setParam("start",(start-1)+"");
			query.setParam("rows", limit+"");
			QueryResponse resp = server.query(query);
			SolrDocumentList sdl = resp.getResults();
			
			result.setTotalNum(sdl.getNumFound());
			for (SolrDocument sd : sdl) {

				String id = (String) sd.getFieldValue("id");

				SolrProduct p = new SolrProduct();
				p.setId(id);
				Object tname=resp.getHighlighting().get(id).get(PNAME);
				if(tname == null)
				{
					p.setPname(sd.getFieldValue(PNAME).toString());
				}else {
					p.setPname(tname.toString());
				}
			
				p.setPrice(Float.valueOf(sd.getFieldValue("pprice").toString()));
				p.setPurl(sd.getFieldValue("purl").toString());
				products.add(p);
			}
		} catch (SolrServerException e) {
			e.printStackTrace();
		}

		System.out.println(result);
		
		
		return result;
	}
	
	
	/**
	 * 系统 产品类别下面 属性值搜索
	 * @param queryStr 查询字符串 如 "queryStr:[email protected] AND queryStr:颜色@红" 或者 "queryStr:[email protected]" 为空则不限
	 * @param start 开始行号 ,从1开始
	 * @param limit 每页大小
	 * @param cid 产品类别id  必填项
	 * @param priceRange 价格区间 如[100 TO 200] 为空则不限
	 * @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
	 */
	public static SolrResult queryCategoryProduct(String queryStr,int start,int limit,Integer cid,String priceRange) {

		SolrResult result = new SolrResult();
		List<SolrProduct> products = new LinkedList<SolrProduct>();
		result.setQueryProducts(products);
		
		StringBuffer sb= new StringBuffer();
		if(queryStr != null && !queryStr.trim().equals(""))
		{
			sb.append(queryStr+" ");
		}
		
		if(priceRange != null && !priceRange.trim().equals(""))
		{
			if(!sb.toString().trim().equals(""))
			{
				sb.append("AND ");
			}
			sb.append("pprice:"+priceRange+" ");
		}
		
		if(null != cid)
		{
			if(!sb.toString().trim().equals(""))
			{
				sb.append("AND ");
			}
			sb.append("cid:"+cid);
		}
		
		String q=sb.toString().trim();
		
		System.out.println(q);
		if("".equals(q.trim()))return result;
		
		try {
			SolrQuery query = new SolrQuery(q);//PKEY + ":" + queryStr
			query.setHighlight(true)
					.setHighlightSimplePre("<span class='solrhighligter'>")
					.setHighlightSimplePost("</span>").setStart(0).setRows(5);
			query.setParam("hl.fl", PKEY+","+PNAME);
			query.setParam("start",(start-1)+"");
			query.setParam("rows", limit+"");
			QueryResponse resp = server.query(query);
			SolrDocumentList sdl = resp.getResults();
			
			result.setTotalNum(sdl.getNumFound());
			for (SolrDocument sd : sdl) {

				String id = (String) sd.getFieldValue("id");

				SolrProduct p = new SolrProduct();
				p.setId(id);
				Object tname=resp.getHighlighting().get(id).get(PNAME);
				if(tname == null)
				{
					p.setPname(sd.getFieldValue(PNAME).toString());
				}else {
					p.setPname(tname.toString());
				}
				p.setPrice(Float.valueOf(sd.getFieldValue("pprice").toString()));
				p.setPurl(sd.getFieldValue("purl").toString());
				products.add(p);
			}
		} catch (SolrServerException e) {
			e.printStackTrace();
		}

		System.out.println(result);
		
		
		return result;
	}
	
	
	
	
	/**
	 * 店内查询
	 * @param queryName 查询字符串
	 * @param start 开始行号 ,从1开始
	 * @param limit 每页大小
	 * @param shopid 商店id
	 * @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
	 */
	public static SolrResult queryProduct(String queryName,int start,int limit,int shopid) {

		SolrResult result = new SolrResult();
		List<SolrProduct> products = new LinkedList<SolrProduct>();
		result.setQueryProducts(products);
		if(queryName==null ||queryName.trim().equals(""))return result;

		try {
			SolrQuery query = new SolrQuery(PKEY + ":" + queryName);
			query.setHighlight(true)
					.setHighlightSimplePre("<span class='solrhighligter'>")
					.setHighlightSimplePost("</span>").setStart(start-1).setRows(limit);
			query.setParam("hl.fl", PKEY+","+PNAME);
			query.setParam("start",(start-1)+"");
			query.setParam("rows", limit+"");
			query.addFilterQuery("pshopid:"+shopid);
			QueryResponse resp = server.query(query);
			SolrDocumentList sdl = resp.getResults();
			
			result.setTotalNum(sdl.getNumFound());
			for (SolrDocument sd : sdl) {

				String id = (String) sd.getFieldValue("id");

				SolrProduct p = new SolrProduct();
				p.setId(id);
				Object tname=resp.getHighlighting().get(id).get(PNAME);
				if(tname == null)
				{
					p.setPname(sd.getFieldValue(PNAME).toString());
				}else {
					p.setPname(tname.toString());
				}
			
				p.setPrice(Float.valueOf(sd.getFieldValue("pprice").toString()));
				p.setPurl(sd.getFieldValue("purl").toString());
				p.setPshopid(shopid);
				products.add(p);
			}
		} catch (SolrServerException e) {
			e.printStackTrace();
		}

		System.out.println(result);
		
		
		return result;
	}

	
	/**
	 * 系统内查询
	 * @param queryName 查询字符串
	 * @param start 开始行号 ,从1开始
	 * @param limit 每页大小
	 * @return SolrResult totalNum 为总共多少条 ,queryProducts为查询此页的结果
	 */
	public static SolrResult queryShop(String queryName,int start,int limit) {

		SolrResult result = new SolrResult();
		List<SolrShop> shops = new LinkedList<SolrShop>();
		result.setQueryShops(shops);
		if(queryName==null ||queryName.trim().equals(""))return result;

		try {
			SolrQuery query = new SolrQuery(SHOPKEY + ":" + queryName);
			query.setHighlight(true)
					.setHighlightSimplePre("<span class='solrhighligter'>")
					.setHighlightSimplePost("</span>").setStart(start-1).setRows(limit);
			query.setParam("hl.fl", "shopName,shopDetail");
			query.setParam("start",(start-1)+"");
			query.setParam("rows", limit+"");
			QueryResponse resp = server.query(query);
			SolrDocumentList sdl = resp.getResults();
			
			result.setTotalNum(sdl.getNumFound());
			for (SolrDocument sd : sdl) {

				String id = (String) sd.getFieldValue("id");

				SolrShop s= new SolrShop();
				try {
					s.setId(id.substring(4));
				} catch (Exception e) {
					// TODO Auto-generated catch block
				//	e.printStackTrace();
					s.setId(id);
				}
				Object tname=resp.getHighlighting().get(id).get("shopName");
				if(tname == null)
				{
					s.setShopName(sd.getFieldValue("shopName").toString());
				}else {
					s.setShopName(tname.toString());
				}
				
				tname=resp.getHighlighting().get(id).get("shopDetail");
				if(tname == null)
				{
					s.setShopDetail(sd.getFieldValue("shopDetail").toString());
				}else {
					s.setShopDetail(tname.toString());
				}
			
				s.setShopImage(sd.getFieldValue("shopImage").toString());
				
				shops.add(s);
			}
		} catch (SolrServerException e) {
			e.printStackTrace();
		}

		System.out.println(result);
		
		
		return result;
	}
	
	public static void testAdd() {

		SolrProduct product = new SolrProduct();
		String[] qq={"[email protected] 颜色@红  外形@直板","[email protected] 颜色@红","颜色@红"};
		
		for(int i=1;i<=3;i++)
		{
			product.setId(i+"");
			product.setPname("myname 垃圾 你就是个KK");
			product.setPrice(i);
			product.setPurl("../im/kk.jpg");
			String key="aa bb 垃圾 AA Ab myname     wei";
			product.setPkey(key);
			product.setPshopid(i%3);
			product.setCid(i%2);
			product.setQueryStr(qq[i-1]);
			System.out.println(product);
			addOrUpdateProduct(product);
		}
		

	}
	
	public static void testAddShop() {

		
		
		
		for(int i=3;i<=3;i++)
		{
			SolrShop shop= new SolrShop();
			shop.setId(""+"1");
			shop.setShopDetail("欢迎来到星空的专栏, zwls  aa");
			shop.setShopImage("../image.gif");
			shop.setShopName("星空专栏");
			shop.setShopKey("aa 星空  zwls");
			addOrUpdateShop(shop);
		}
		

	}
	
	public static void main(String[] args) {
		String string="[email protected] 颜色@红";
		String dd="queryStr:[email protected] AND queryStr:颜色@红";
		
			queryCategoryProduct(dd,1,100,1,"[0 TO 100]");//[10 TO 100]
			//					 testAdd();
			
		//	testAddShop();
		//	deleteProductById(7);
			//		testAdd();
			//	queryShop("zwls",1,2);
		//	String d=SHOPIDPRE+"100";
	//		System.out.println(d.substring(4));
		
		
		}

}

其中SolrUtil是封装的一个接口,看代码即可。

至此,以上3个需求就可以迎刃而解了。

 

 

 

 

 

 

 

你可能感兴趣的:(索引,Solr,Lucene,全文索引)