Elasticsearch DSL是一个高级库,为了对Elasticsearch进行辅助书写和运行的。它建在官方低级客户端(elasticsearch-py)之上。
它提供了书写和操纵查询的非常方便和流畅的方式。而且它保持与Elasticsearch JSON DSL非常接近的属于和结构。它从Python揭开了整个DSL,通过定义类或者类似查询集的方式。
它也提供了可选的对文档的包装方式:定义mapping,取回和保存文档,包装文档数据用用户定义的类。
要用其它的Elasticsearch APIs(比如cluster health)只需要用根本客户端即可(underlying client)
我们先直接用dict写一个典型的搜索请求:
(译者注:下文中的filtered在elasticsearch2.0版本以后已经被bool取代)
from elasticsearch import Elasticsearch
client = Elasticsearch()
response = client.search(
index="my-index",
body={
"query": {
"filtered": {
"query": {
"bool": {
"must": [{"match": {"title": "python"}}],
"must_not": [{"match": {"description": "beta"}}]
}
},
"filter": {"term": {"category": "search"}}
}
},
"aggs" : {
"per_tag": {
"terms": {"field": "tags"},
"aggs": {
"max_lines": {"max": {"field": "lines"}}
}
}
}
}
)
for hit in response['hits']['hits']:
print(hit['_score'], hit['_source']['title'])
for tag in response['aggregations']['per_tag']['buckets']:
print(tag['key'], tag['max_lines']['value'])
用这个方法的问题在于它非常冗长,还可能会有错误嵌套的语法错误,很难修改(比如加入另一个filter)而且绝对写起来很无趣
让我们用Python DSL重写一下这个样例:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
client = Elasticsearch()
s = Search(using=client, index="my-index") \
.filter("term", category="search") \
.query("match", title="python") \
.query(~Q("match", description="beta"))
s.aggs.bucket('per_tag', 'terms', field='tags') \
.metric('max_lines', 'max', field='lines')
response = s.execute()
for hit in response:
print(hit.meta.score, hit.title)
for tag in response.aggregations.per_tag.buckets:
print(tag.key, tag.max_lines.value)
正如你所看到的,这个库处理了(took care of):
- 通过名称(eq. “match”)创建合适的Query
对象
- 将一些查询组到一个bool
查询中
- 因为.filter()
被使用而创建一个filtered
查询
- 提供对返回结果数据的很方便的访问
- 没有用到弯曲或竖直的括号(即大括号或中括号)
from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections
# Define a default Elasticsearch client
connections.create_connection(hosts=['localhost'])
class Article(DocType):
title = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})
body = String(analyzer='snowball')
tags = String(index='not_analyzed')
published_from = Date()
lines = Integer()
class Meta:
index = 'blog'
def save(self, ** kwargs):
self.lines = len(self.body.split())
return super(Article, self).save(** kwargs)
def is_published(self):
return datetime.now() > self.published_from
# create the mappings in elasticsearch
Article.init()
# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()
article = Article.get(id=42)
print(article.is_published())
# Display cluster health
print(connections.get_connection().cluster.health())
在这个例子你能看到:
.save()
方法来hook into the 持续生命周期你不用非得转换你的整个应用为了获得Python DSL的好处,你可以逐渐地,通过先从你已经存在的dict创建一个search对象,用API更改它并序列化回dict:
body = {...} # insert complicated query here
# Convert to Search object
s = Search.from_dict(body)
# Add some filters, aggregations, queries, ...
s.filter("term", tags="python")
# Convert back to dict to plug back into existing code
body = s.to_dict()
https://elasticsearch-dsl.readthedocs.org/