elasticsearch-dsl 2.0.0 介绍

elasticsearch-dsl 2.0.0  by Honza Král  原文链接  翻译:AbnerGong

Elasticsearch DSL是一个高级库,为了对Elasticsearch进行辅助书写和运行的。它建在官方低级客户端(elasticsearch-py)之上。
它提供了书写和操纵查询的非常方便和流畅的方式。而且它保持与Elasticsearch JSON DSL非常接近的属于和结构。它从Python揭开了整个DSL,通过定义类或者类似查询集的方式。
它也提供了可选的对文档的包装方式:定义mapping,取回和保存文档,包装文档数据用用户定义的类。
要用其它的Elasticsearch APIs(比如cluster health)只需要用根本客户端即可(underlying client)

适应性(Compatibility)

搜索样例(Search Example)

我们先直接用dict写一个典型的搜索请求:
(译者注:下文中的filtered在elasticsearch2.0版本以后已经被bool取代)

from elasticsearch import Elasticsearch
client = Elasticsearch()

response = client.search(
    index="my-index",
    body={
      "query": {
        "filtered": {
          "query": {
            "bool": {
              "must": [{"match": {"title": "python"}}],
              "must_not": [{"match": {"description": "beta"}}]
            }
          },
          "filter": {"term": {"category": "search"}}
        }
      },
      "aggs" : {
        "per_tag": {
          "terms": {"field": "tags"},
          "aggs": {
            "max_lines": {"max": {"field": "lines"}}
          }
        }
      }
    }
)

for hit in response['hits']['hits']:
    print(hit['_score'], hit['_source']['title'])

for tag in response['aggregations']['per_tag']['buckets']:
    print(tag['key'], tag['max_lines']['value'])

用这个方法的问题在于它非常冗长,还可能会有错误嵌套的语法错误,很难修改(比如加入另一个filter)而且绝对写起来很无趣

让我们用Python DSL重写一下这个样例:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

client = Elasticsearch()

s = Search(using=client, index="my-index") \
    .filter("term", category="search") \
    .query("match", title="python")   \
    .query(~Q("match", description="beta"))

s.aggs.bucket('per_tag', 'terms', field='tags') \
    .metric('max_lines', 'max', field='lines')

response = s.execute()

for hit in response:
    print(hit.meta.score, hit.title)

for tag in response.aggregations.per_tag.buckets:
    print(tag.key, tag.max_lines.value)

正如你所看到的,这个库处理了(took care of):
- 通过名称(eq. “match”)创建合适的Query对象
- 将一些查询组到一个bool查询中
- 因为.filter()被使用而创建一个filtered查询
- 提供对返回结果数据的很方便的访问
- 没有用到弯曲或竖直的括号(即大括号或中括号)

持续性样例(Persistence Example)

from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections

# Define a default Elasticsearch client
connections.create_connection(hosts=['localhost'])

class Article(DocType):
    title = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})
    body = String(analyzer='snowball')
    tags = String(index='not_analyzed')
    published_from = Date()
    lines = Integer()

    class Meta:
        index = 'blog'

    def save(self, ** kwargs):
        self.lines = len(self.body.split())
        return super(Article, self).save(** kwargs)

    def is_published(self):
        return datetime.now() > self.published_from

# create the mappings in elasticsearch
Article.init()

# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()

article = Article.get(id=42)
print(article.is_published())

# Display cluster health
print(connections.get_connection().cluster.health())

在这个例子你能看到:

  • 提供一个默认连接
  • 用mapping配置定义一些域
  • 设置索引名
  • 定义自定义的方法
  • 重写(override)内置的.save()方法来hook into the 持续生命周期
  • 取回并保存对象到Elasticsearch中
  • 访问基本客户端for other APIs
    你可以在文档的persistence章节查看更多内容

从elasticsearch-py迁移

你不用非得转换你的整个应用为了获得Python DSL的好处,你可以逐渐地,通过先从你已经存在的dict创建一个search对象,用API更改它并序列化回dict:

body = {...} # insert complicated query here

# Convert to Search object
s = Search.from_dict(body)

# Add some filters, aggregations, queries, ...
s.filter("term", tags="python")

# Convert back to dict to plug back into existing code
body = s.to_dict()

官方文档 Documentation

https://elasticsearch-dsl.readthedocs.org/

你可能感兴趣的:(Elasticsearch)