【netcore基础】.Net core通过 Lucene.Net 和 jieba.NET 处理分词搜索功能

业务要求是对商品标题可以进行模糊搜索

例如用户输入了【我想查询下雅思托福考试】,这里我们需要先将这句话分词成【查询】【雅思】【托福】【考试】,然后搜索包含相关词汇的商品。

思路如下

首先我们需要把数据库里的所有商品内容,自动同步到 Lucene 的分词索引目录下缓存,效果如下

【netcore基础】.Net core通过 Lucene.Net 和 jieba.NET 处理分词搜索功能_第1张图片

 

 

这里就用到了之前写的自动作业 Hangfire 大家可以参考下面的博文

https://www.cnblogs.com/jhli/p/10027074.html

 

定时更新缓存,后面就可以分词搜索了,更新索引代码如下

        public void UpdateMerchIndex()
        {
            try
            {
                Console.WriteLine($"[{DateTime.Now}] UpdateMerchIndex job begin...");

                var indexDir = Path.Combine(System.IO.Directory.GetCurrentDirectory(), "temp", "lucene", "merchs");
                if (System.IO.Directory.Exists(indexDir) == false)
                {
                    System.IO.Directory.CreateDirectory(indexDir);
                }

                var VERSION = Lucene.Net.Util.LuceneVersion.LUCENE_48;
                var director = FSDirectory.Open(new DirectoryInfo(indexDir));
                var analyzer = new JieBaAnalyzer(TokenizerMode.Search);
                var indexWriterConfig = new IndexWriterConfig(VERSION, analyzer);

                using (var indexWriter = new IndexWriter(director, indexWriterConfig))
                {
                    if (File.Exists(Path.Combine(indexDir, "segments.gen")) == true)
                    {
                        indexWriter.DeleteAll();
                    }

                    var query = _merchService.Where(t => t.IsDel == false);

                    var index = 1;
                    var size = 200;

                    var count = query.Count();

                    if (count > 0)
                    {
                        while (true)
                        {
                            var rs = query.OrderBy(t => t.CreateTime)
                            .Skip((index - 1) * size)
                            .Take(size).ToList();

                            if (rs.Count == 0)
                            {
                                break;
                            }

                            var addDocs = new List();

                            foreach (var item in rs)
                            {
                                var merchid = item.IdentityId.ToLowerString();

                                var doc = new Document();
                                var field1 = new StringField("merchid", merchid, Field.Store.YES);
                                var field2 = new TextField("name", item.Name?.ToLower(), Field.Store.YES);
                                doc.Add(field1);
                                doc.Add(field2);
                                addDocs.Add(doc);// 添加文本到索引中

                            }
                            
                            if (addDocs.Count > 0)
                            {
                                indexWriter.AddDocuments(addDocs);
                            }

                            index = index + 1;
                        }

                    }

                }

                Console.WriteLine($"[{DateTime.Now}] UpdateMerchIndex job end!");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"UpdateMerchIndex ex={ex}");
            }
        }

 

 

剩下的就是去查询索引内容,匹配到id,然后去数据库查询响应id的项。

搜索代码

        protected List SearchMerchs(string key)
        {
            if (string.IsNullOrEmpty(key))
            {
                return null;
            }
            key = key.Trim().ToLower();

            var rs = new List();

            try
            {
                var indexDir = Path.Combine(System.IO.Directory.GetCurrentDirectory(), "temp", "lucene", "merchs");

                var VERSION = Lucene.Net.Util.LuceneVersion.LUCENE_48;

                if (System.IO.Directory.Exists(indexDir) == true)
                {
                    var reader = DirectoryReader.Open(FSDirectory.Open(new DirectoryInfo(indexDir)));
                    var search = new IndexSearcher(reader);
                    
                    var directory = FSDirectory.Open(new DirectoryInfo(indexDir), NoLockFactory.GetNoLockFactory());
                    var reader2 = IndexReader.Open(directory);
                    var searcher = new IndexSearcher(reader2);

                    var parser = new QueryParser(VERSION, "name", new JieBaAnalyzer(TokenizerMode.Search));
                    var booleanQuery = new BooleanQuery();

                    var list = CutKeyWord(key);
                    foreach (var word in list)
                    {
                        var query1 = new TermQuery(new Term("name", word));
                        booleanQuery.Add(query1, Occur.SHOULD);
                    }

                    var collector = TopScoreDocCollector.Create(1000, true);
                    searcher.Search(booleanQuery, null, collector);
                    var docs = collector.GetTopDocs(0, collector.TotalHits).ScoreDocs;

                    foreach (var d in docs)
                    {
                        var num = d.Doc;
                        var document = search.Doc(num);// 拿到指定的文档

                        var merchid = document.Get("merchid");
                        var name = document.Get("name");

                        if (Guid.TryParse(merchid, out Guid mid) == true)
                        {
                            rs.Add(mid);
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine($"SearchMerchs ex={ex}");
            }

            return rs;
        }

 

对用户输入的话进行拆分分词代码 JiebaNet

        protected List<string> CutKeyWord(string key)
        {
            var rs = new List<string>();
            var segmenter = new JiebaSegmenter();
            var list = segmenter.Cut(key);
            if (list != null && list.Count() > 0)
            {
                foreach (var item in list)
                {
                    if (string.IsNullOrEmpty(item) || item.Length <= 1)
                    {
                        continue;
                    }

                    rs.Add(item);
                }
            }
            return rs;
        }

 

需要添加的 nuget 引用的包和对应版本

Hangfire 1.7.0-beta1

Lucene.Net 4.8.0-beta00005

Lucene.Net.Analysis.Common 4.8.0-beta00005

Lucene.Net.QueryParser 4.8.0-beta00005

 

需要单独引用的dll文件

JiebaNet.Segmenter.dll 

下载地址

https://pan.baidu.com/s/1D7mQnow0FmoqedNYzugfKw

 

如果本地调试没有问题,发布到服务器上 自动执行作业就遇到这个问题

https://stackoverflow.com/questions/47746582/hangfire-job-throws-system-typeloadexception

 
System.TypeLoadException

Could not load type ‘***’ from assembly ‘***, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null’.

 

其实这个报错并不是原因,把异常打印出来就知道了

原因是没有将 Resources 文件夹下的字典文件 dict.txt 发布到服务器上

 

这个坑让我浪费了半天时间。。。

 

转载于:https://www.cnblogs.com/jhli/p/10027396.html

你可能感兴趣的:(【netcore基础】.Net core通过 Lucene.Net 和 jieba.NET 处理分词搜索功能)