reference book:
《Elasticsearch: The Definitive Guide》
ref
As explained previously, an index is like a database in a traditional relational database. It is the place to store related documents. The plural of index is indices or indexes.
类似mysql的table
类似mysql的一行
类似mysql的一列
To index a document is to store a document in an index (noun) so that it can be retrieved and queried. It is much like the INSERT keyword in SQL except that, if the document already exists, the new document would replace the old
使用terminal建一条索引(index a document)
curl -XPUT 'localhost:9200/megacorp/employee/1?pretty' -d'
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}'
或者使用sense建索引
termianl建索引逼格太高,后面我都是用sense啦!
Index: megacorp
type: employee
document:
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}'
field: frst_name,last_name,age,about,interests
Note: type(employee)后面跟个1是什么鬼?这是doc的ID,我们建索引的时候就指定了ID为1,也可以让系统自动生产id。
查看特定索引可以看到索引的所有信息和配置,包括分片(number_of_shards)数为5,备胎数(number_of_replicas)为1,后面填坑。
Note:关键是mappings!elastic给每一个field都自动确定了一个类型(type),这里的type不是table,而是数据类型,后面讲到模版的时候,就可以自定义类型啦!基本类型,让es自动确定已经挺方便哒!
炸出2个索引(-。-;
Note: 5 1 2 分别为 number_of_shards number_of_replicas 文档数目
信息详解:
- took: 耗时1ms
- time_out: 木有超时
- _shards.total: 总共有5个分片
- _shards.successful: 5个分片搜索成功
- _shards.failed: 0个分片搜索失败
- hits:命中的doc
- hits._index,hits.type,hits._id 不累述
- hits.score:doc和查询条件的相关性,后面填坑
- _source:建索引的原始文档
我们直接指定要获取的doc ID,好吧,我也觉得不算搜索(~_~;)
SELECT * FROM employee WHERE age = 25
Note:
- _score是无意义的
- ”25“ 改成 25 也是阔以的,es容忍能力很强大!
索引处理
查询 “slee” (少个p),是不会有结果的哦!因为es不是直接去匹配字符串,它会es先对field分词,“i love sleep” 这句话会被 “ ”(停用词),分拆成i,love,sleep三个单词,然后建倒排索引
,所以用slee查不到的!,扯远了,以后填坑,自定义分词
查询处理
咋这个能命中呢?真相是,es会对查询也进行分词会把“sleep slee” 分词,拆成”sleep”,”slee”,然后去搜索,只要命中一个就算命中!也就是或的关系,但是命中大查询词越多得分越高,这也就是为什么相对于上一条查询,得分从0.15降低到0.02的原因,前者命中所有查询词,后者只命中1/2的查询词。
score机制
同样命中love的两个文档,为啥”I love sleep”得分大于”I love to go rock climbing”,很显然,后者更长,相关性更低.
Note:命中的文档会按score从高到低排序
支持多个取值,默认命中任何一个查询词都会返回
看起来和match差不大是吧,其实match是典型的query,term是典型的filter.
Although we refer to the query DSL, in reality there are two DSLs: the query DSL and the filter DSL. Query clauses and filter clauses are similar in nature, but have slightly different purposes.
filter
A filter asks a yes|no question of every document and is used for fields that contain exact values:
query
A query is similar to a filter, but also asks the question: How well does this document match?
A typical use for a query is to find documents
As a general rule, use query clauses for full-text search or for any condition that should affect the relevance score, and use filter clauses for everything else.