ElasticSearch是一个开源的搜索引擎,建立在一个全文搜索引擎库 Apache Lucene™ 基础之上。Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库,无论是开源还是私有。
但是 Lucene 仅仅只是一个库。为了充分发挥其功能,你需要使用 Java 并将 Lucene 直接集成到应用程序中。更糟糕的是,您可能需要获得信息检索学位才能了解其工作原理。Lucene 非常 复杂。
ElasticSearch也是使用 Java 编写的,它的内部使用 Lucene 做索引与搜索,但是它的目的是使全文检索变得简单, 通过隐藏 Lucene 的复杂性,取而代之的提供一套简单一致的 RESTful API。
然而,Elasticsearch 不仅仅是 Lucene,并且也不仅仅只是一个全文搜索引擎。它可以被下面这样准确的形容:
一个分布式的实时文档存储,每个字段 可以被索引与搜索
一个分布式实时分析搜索引擎
能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据
官方客户端在Java、.NET、PHP、Python、Ruby、Nodejs和许多其他语言中都是可用的。根据 DB-Engines 的排名显示,ElasticSearch 是最受欢迎的企业搜索引擎,其次是Apache Solr,也是基于Lucene。
中文文档请参考:《Elasticsearch: 权威指南》[1]
英文文档请参考:《Elasticsearch Reference》[2]
下载:https://www.elastic.co/cn/downloads/
API Conventions[3]
Document APIs[4]
Search APIs[5]
Indices APIs[6]
cat APIs[7]
Cluster APIs[8]
Javascript api[9]
Logstash Reference[10]
Configuring Logstash[11]
Input plugins[12]
Output plugins[13]
Filter plugins[14]
Ctrl+i 自动缩进
Ctrl+Enter 提交
Down 打开自动补全菜单
Enter 或 Tab 选中项自动补全
Esc 关闭补全菜单
pretty = true
在任意的查询字符串中增加pretty参数,会让 Elasticsearch 美化输出(pretty-print)JSON响应以便更加容易阅读。
// 查询集群的磁盘状态
GET _cat/allocation?v
// 获取所有索引
GET _cat/indices
// 按索引数量排序
GET _cat/indices?s=docs.count:desc
GET _cat/indices?v&s=index
// 集群有多少节点
GET _cat/nodes
// 集群的状态
GET _cluster/health?pretty=true
GET _cat/indices/*?v&s=index
//获取指定索引的分片信息
GET logs/_search_shards
...
curl -s -XGET 'http://:9200/_cluster/health?pretty'
//系统正常,返回的结果
{
"cluster_name" : "es-qwerty",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 1,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
POST logs/_search
{
"query":{
"range":{
"createdAt":{
"gt":"2020-04-25",
"lt":"2020-04-27",
"format": "yyyy-MM-dd"
}
}
},
"size":0,
"aggs":{
"url_type_stats":{
"terms": {
"field": "urlType.keyword",
"size": 2
}
}
}
}
POST logs/_search
{
"query":{
"range":{
"createdAt":{
"gte":"2020-04-26 00:00:00",
"lte":"now",
"format": "yyyy-MM-dd hh:mm:ss"
}
}
},
"size":0,
"aggs":{
"url_type_stats":{
"terms": {
"field": "urlType.keyword",
"size": 2
}
}
}
}
POST logs/_search
{
"query":{
"range": {
"createdAt": {
"gte": "2020-04-26 00:00:00",
"lte": "now",
"format": "yyyy-MM-dd hh:mm:ss"
}
}
},
"size" : 0,
"aggs":{
"total_clientIp":{
"cardinality":{
"field": "clientIp.keyword"
}
},
"total_userAgent":{
"cardinality": {
"field": "userAgent.keyword"
}
}
}
}
POST logs/_search
{
"size" : 0,
"aggs":{
"date_total_ClientIp":{
"date_histogram":{
"field": "createdAt",
"interval": "quarter",
"format": "yyyy-MM-dd",
"extended_bounds":{
"min": "2020-04-26 13:00:00",
"max": "2020-04-26 14:00:00",
}
},
"aggs":{
"url_type_api": {
"terms": {
"field": "urlType.keyword",
"size": 10
}
}
}
}
}
}
POST logs/_search
{
"size" : 0,
"aggs":{
"total_clientIp":{
"terms":{
"size":30,
"field": "clientIp.keyword"
}
}
}
}
// 删除
POST logs/_delete_by_query {"query":{"match_all": {}}}
// 删除索引
DELETE logs
数据迁移本质是索引的重建,重建索引不会尝试设置目标索引,它不会复制源索引的设置。所以在操作之前设置目标索引,包括设置映射,分片数,副本等。
// Reindex支持从远程Elasticsearch集群重建索引:
POST _reindex
{
"source": {
"remote": {
"host": "http://lotherhost:9200",
"username": "user",
"password": "pass"
},
"index": "source",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
// host参数必须包含scheme、host和port(例如https://lotherhost:9200)
// username和password参数可选
使用时需要在elasticsearch.yml中配置 reindex.remote.whitelist 属性。可以设置多组(例如,lotherhost:9200, another:9200, 127.0.10.*:9200, localhost:*)。
具体使用可参考 Reindex from Remoteedit[15]
Elasticsearch-Dump是一个elasticsearch数据导入导出开源工具包。安装、迁移相关执行可以在相同可用区的云主机上进行,使用方便。
需要node环境,npm安装elasticdump
npm install elasticdump -g
elasticdump
// Copy an index from production to staging with analyzer and mapping:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=analyzer
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data
// Copy a single shard data:
elasticdump \
--input=http://es.com:9200/api \
--output=http://es.com:9200/api2 \
--params='{"preference" : "_shards:0"}'
elasticdump 命令其他参数使用参考 Elasticdump Options[16]
elasticsearch 超过10000条数据的分页查询会报异常,官方提供了 search_after 的方式来支持
search_after 要求提供上一页两个必须的排序标识
//https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-after.html
GET logs/_search
{
"from":9990,
"size":10,
"_source": ["url","clientIp","createdAt"],
"query":{
"match_all": {}
},
"sort":[
{
"createdAt":{
"order":"desc"
}
},
{
"_id":{
"order":"desc"
}
}
]
}
GET logs/_search
{
"from":-1,
"size":10,
"_source": ["url","clientIp","createdAt"],
"query":{
"match_all": {}
},
"search_after": [1588042597000, "V363vnEBz1D1HVfYBb0V"],
"sort":[
{
"createdAt":{
"order":"desc"
}
},
{
"_id":{
"order":"desc"
}
}
]
}
docker下安装Elasticsearch
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.8.1
docker run -p 9200:9200 --name elasticsearch -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.8.1
docker pull docker.elastic.co/kibana/kibana:7.8.1
docker run -p 5601:5601 --name kibana --link 14e385b1e761:elasticsearch -e "elasticsearch.hosts=http://127.0.0.1:9200" -d docker.elastic.co/kibana/kibana:7.8.1
其它平台安装Elasticsearch[17]
新建一个webapi项目,然后安装两个组件。
Install-Package NEST
Install-Package Swashbuckle.AspNetCore
通过NEST
来实现操作Elasticsearch,开源地址:https://github.com/elastic/elasticsearch-net,同时将swagger也添加以下方便后面调用接口。
接下来演示一个对Elasticsearch的增删改查操作。
添加实体类:VisitLog.cs
。
using System;
namespace ESDemo.Domain
{
public class VisitLog
{
public string Id { get; set; }
///
/// UserAgent
///
public string UserAgent { get; set; }
///
/// Method
///
public string Method { get; set; }
///
/// Url
///
public string Url { get; set; }
///
/// Referrer
///
public string Referrer { get; set; }
///
/// IpAddress
///
public string IpAddress { get; set; }
///
/// Milliseconds
///
public int Milliseconds { get; set; }
///
/// QueryString
///
public string QueryString { get; set; }
///
/// Request Body
///
public string RequestBody { get; set; }
///
/// Cookies
///
public string Cookies { get; set; }
///
/// Headers
///
public string Headers { get; set; }
///
/// StatusCode
///
public int StatusCode { get; set; }
///
/// Response Body
///
public string ResponseBody { get; set; }
public DateTimeOffset CreatedAt { get; set; } = DateTimeOffset.UtcNow;
}
}
确定好实体类后,来包装一下Elasticsearch,简单封装一个基类用于仓储的集成使用。
添加一个接口类IElasticsearchProvider
。
using Nest;
namespace ESDemo.Elasticsearch
{
public interface IElasticsearchProvider
{
IElasticClient GetClient();
}
}
在ElasticsearchProvider
中实现IElasticsearchProvider
接口。
using Nest;
using System;
namespace ESDemo.Elasticsearch
{
public class ElasticsearchProvider : IElasticsearchProvider
{
public IElasticClient GetClient()
{
var connectionSettings = new ConnectionSettings(new Uri("http://localhost:9200"));
return new ElasticClient(connectionSettings);
}
}
}
添加Elasticsearch仓储基类,ElasticsearchRepositoryBase
。
using Nest;
namespace ESDemo.Elasticsearch
{
public abstract class ElasticsearchRepositoryBase
{
private readonly IElasticsearchProvider _elasticsearchProvider;
public ElasticsearchRepositoryBase(IElasticsearchProvider elasticsearchProvider)
{
_elasticsearchProvider = elasticsearchProvider;
}
protected IElasticClient Client => _elasticsearchProvider.GetClient();
protected abstract string IndexName { get; }
}
}
也就是一个抽象类,当我们集成此基类的时候需要重写protected abstract string IndexName { get; }
,指定IndexName。
完成上面简单封装,现在新建一个IVisitLogRepository
仓储接口,里面添加四个方法:
using ESDemo.Domain;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace ESDemo.Repositories
{
public interface IVisitLogRepository
{
Task InsertAsync(VisitLog visitLog);
Task DeleteAsync(string id);
Task UpdateAsync(VisitLog visitLog);
Task>> QueryAsync(int page, int limit);
}
}
所以接下来不用说你也知道改干嘛,实现这个仓储接口,添加VisitLogRepository
,代码如下:
using ESDemo.Domain;
using ESDemo.Elasticsearch;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace ESDemo.Repositories
{
public class VisitLogRepository : ElasticsearchRepositoryBase, IVisitLogRepository
{
public VisitLogRepository(IElasticsearchProvider elasticsearchProvider) : base(elasticsearchProvider)
{
}
protected override string IndexName => "visitlogs";
public async Task InsertAsync(VisitLog visitLog)
{
await Client.IndexAsync(visitLog, x => x.Index(IndexName));
}
public async Task DeleteAsync(string id)
{
await Client.DeleteAsync(id, x => x.Index(IndexName));
}
public async Task UpdateAsync(VisitLog visitLog)
{
await Client.UpdateAsync(visitLog.Id, x => x.Index(IndexName).Doc(visitLog));
}
public async Task>> QueryAsync(int page, int limit)
{
var query = await Client.SearchAsync(x => x.Index(IndexName)
.From((page - 1) * limit)
.Size(limit)
.Sort(x => x.Descending(v => v.CreatedAt)));
return new Tuple>(Convert.ToInt32(query.Total), query.Documents.ToList());
}
}
}
现在去写接口,添加一个VisitLogController
API控制器,代码如下:
using ESDemo.Domain;
using ESDemo.Repositories;
using Microsoft.AspNetCore.Mvc;
using System.ComponentModel.DataAnnotations;
using System.Threading.Tasks;
namespace ESDemo.Controllers
{
[Route("api/[controller]")]
[ApiController]
public class VisitLogController : ControllerBase
{
private readonly IVisitLogRepository _visitLogRepository;
public VisitLogController(IVisitLogRepository visitLogRepository)
{
_visitLogRepository = visitLogRepository;
}
[HttpGet]
public async Task QueryAsync(int page = 1, int limit = 10)
{
var result = await _visitLogRepository.QueryAsync(page, limit);
return Ok(new
{
total = result.Item1,
items = result.Item2
});
}
[HttpPost]
public async Task InsertAsync([FromBody] VisitLog visitLog)
{
await _visitLogRepository.InsertAsync(visitLog);
return Ok("新增成功");
}
[HttpDelete]
public async Task DeleteAsync([Required] string id)
{
await _visitLogRepository.DeleteAsync(id);
return Ok("删除成功");
}
[HttpPut]
public async Task UpdateAsync([FromBody] VisitLog visitLog)
{
await _visitLogRepository.UpdateAsync(visitLog);
return Ok("修改成功");
}
}
}
大功告成,最后一步不要忘记在Startup.cs
中添加服务,不然无法使用依赖注入。
...
services.AddSingleton();
services.AddSingleton();
...
一切准备就绪,现在满怀期待的运行项目,打开swagger界面。
按照新增、更新、删除、查询的顺序依次调用接口。新增可以多来几次,因为默认是没有数据的,多添加一点可以测试分页是否ok,这里就不再演示了。
如果你有安装kibana,现在可以满怀惊喜的去查看一下刚才添加的数据。
GET _cat/indices
GET visitlogs/_search
{}
可以看到,数据已经安安静静的躺在这里了。
本篇简单介绍Elasticsearch在.NET Core中的使用,关于检索数据还有很多语法没有体现出来,如果在开发中需要用到,可以参考官方的各种数据查询示例:https://github.com/elastic/elasticsearch-net/tree/master/examples
[1]
《Elasticsearch: 权威指南》: https://www.elastic.co/guide/cn/elasticsearch/guide/cn/index.html
[2]《Elasticsearch Reference》: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
[3]API Conventions: https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html
[4]Document APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
[5]Search APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html
[6]Indices APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices.html
[7]cat APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
[8]Cluster APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster.html
[9]Javascript api: https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/index.html
[10]Logstash Reference: https://www.elastic.co/guide/en/logstash/current/index.html
[11]Configuring Logstash: https://www.elastic.co/guide/en/logstash/current/configuration.html
[12]Input plugins: https://www.elastic.co/guide/en/logstash/current/input-plugins.html
[13]Output plugins: https://www.elastic.co/guide/en/logstash/current/output-plugins.html
[14]Filter plugins: https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
[15]Reindex from Remoteedit: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#reindex-from-remote
[16]Elasticdump Options: https://github.com/taskrabbit/elasticsearch-dump#options
[17]安装Elasticsearch: https://www.elastic.co/cn/downloads/