ElasticSearch 基本原理(三)ElasticSearch与Lucene、Solr、Mysql的关系

信息检索包含内容非常广,包含文本(搜索引擎)、 图片(以图搜索)、音频(歌曲识别)、视频等多种类型数据的检索。为了高效的检索,通常都用建立索引的方式提高检索速度。因为不同类型的数据存储格式、特征提取方法等等都是不同的,所以建立索引的类型也是不同的,检索的方法也是不同的。

对文本建立索引的数据结构是倒排索引;对关系型数据库中的数据建立索引的数据结构是B-tree;对图片建立索引,根据特征提取方式的不同,方法也是不同的,Hash、乘积量化等方法比较主流。

目前,针对不同类型的数据也有不同的工具针对不同的数据建立索引。Lucene和ElasticSearch都可以认为是全文索引的工具,是信息检索的一部分,下图清晰的介绍了全文索引的基本知识: 

ElasticSearch 基本原理(三)ElasticSearch与Lucene、Solr、Mysql的关系_第1张图片     

接下来,将简单介绍ElasticSearch与Lucene、Mysql、Solr之间的关系 

1.Lucene和ElasticSearch的关系

简单直观的讲,Lucene和ElasticSearch的关系就是飞机发动机和飞机的关系: 

ElasticSearch 基本原理(三)ElasticSearch与Lucene、Solr、Mysql的关系_第2张图片

(1) 数据结构:关系型数据库通过增加一个 索引 比如一个 B树(B-tree)索引 到指定的列上,以便提升数据检索速度。Elasticsearch 和 Lucene 使用了一个叫做倒排索引的结构来达到相同的目的。

(2) Elasticsearch 是一个开源的搜索引擎,建立在一个全文搜索引擎库 Apache Lucene™ 基础之上。无论是开源还是私有, Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库。

但是 Lucene 仅仅只是一个库。为了充分发挥其功能,需要使用 Java 并将 Lucene 直接集成到应用程序中。

(3) Elasticsearch 也是使用 Java 编写的,它的内部使用 Lucene 做索引与搜索,但是它的目的是使全文检索变得简单, 通过隐藏 Lucene 的复杂性,取而代之的提供一套简单一致的 RESTful API。

然而,Elasticsearch 不仅仅是 Lucene,并且也不仅仅只是一个全文搜索引擎。简单描述:

  • 一个分布式的实时文档存储,每个字段 可以被索引与搜索
  • 一个分布式实时分析搜索引擎
  • 能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据

Elasticsearch 将所有的功能打包成一个单独的服务,这样你可以通过程序与它提供的简单的 RESTful API 进行通信, 可以使用自己喜欢的编程语言充当 web 客户端,甚至可以使用命令行(去充当这个客户端)。

2.ElasticSearch和Solr 的关系

Solr

Apache Solr 是一个基于名为 Lucene 的 Java 库构建的开源搜索平台。它以用户友好的方式提供 Apache Lucene 的搜索功能。

作为一个行业参与者已近十年,它是一个成熟的产品,拥有强大而广泛的用户社区。

它提供分布式索引,复制,负载平衡查询以及自动故障转移和恢复。如果它被正确部署然后管理得好,它就能够成为一个高度可靠,可扩展且容错的搜索引擎。

很多互联网巨头,如 Netflix,eBay,Instagram 和亚马逊(CloudSearch)都使用 Solr,因为它能够索引和搜索多个站点。

主要功能列表包括:

  • 全文搜索
  • 突出
  • 分面搜索
  • 实时索引
  • 动态群集
  • 数据库集成
  • NoSQL 功能和丰富的文档处理(例如 Word 和 PDF 文件)

 ElasticSearch 基本原理(三)ElasticSearch与Lucene、Solr、Mysql的关系_第3张图片

其他方面比较:

solr集群搭建是依赖Zookeeper进行元数据管理的,elasticsearch集群则是自己管理,不需要依赖其他第三方,使得elasticsearch集群搭建更方便、快捷。

3.关系型数据库MySql和ElasticSearch的关系

 

从上表中可以看出:

  1. MySQL 中的数据库(DataBase),等价于 ES 中的索引(Index)。

  2. MySQL 中一个数据库下面有 N 张表(Table),等价于1个索引 Index 下面有 N 多类型(Type)。

  3. MySQL 中一个数据库表(Table)下的数据由多行(Row)多列(column,属性)组成,等价于1个 Type 由多个文档(Document)和多 Field 组成。

  4. MySQL 中定义表结构、设定字段类型等价于 ES 中的 Mapping。举例说明,在一个关系型数据库里面,Schema 定义了表、每个表的字段,还有表和字段之间的关系。与之对应的,在 ES 中,Mapping 定义索引下的 Type 的字段处理规则,即索引如何建立、索引类型、是否保存原始索引 JSON 文档、是否压缩原始 JSON 文档、是否需要分词处理、如何进行分词处理等。

  5. MySQL 中的增 insert、删 delete、改 update、查 search 操作等价于 ES 中的增 PUT/POST、删 Delete、改 _update、查 GET。其中的修改指定条件的更新 update 等价于 ES 中的 update_by_query,指定条件的删除等价于 ES 中的 delete_by_query。

  6. MySQL 中的 group by、avg、sum 等函数类似于 ES 中的 Aggregations 的部分特性。

  7. MySQL 中的去重 distinct 类似 ES 中的 cardinality 操作。

  8. MySQL 中的数据迁移等价于 ES 中的 reindex 操作。

(1)ES可以认为是NoSql的数据库,最大的缺点是不支持事物(当然,关系型数据库为了支持事物,损失很多的性能);

(2)以查询为主的情况下,可以替代部分关系型数据库的功能;

(3)关系型数据库在数据量小(百万级以下)情况下,无论是查询还是修改,都还是很有优势的;

(4)对于大量日志的分析和存储,相对于Mysql,ES还是很有优势;

DB-Engines对ElasticSearch和Mysql做了一个系统下的对比:

Editorial information provided by DB-Engines
Name Elasticsearch  Xexclude from comparison MySQL  Xexclude from comparison
Description A distributed, RESTful modern search and analytics engine based on Apache Lucene infoElasticsearch lets you perform and combine many types of searches such as structured, unstructured, geo, and metric Widely used open source RDBMS
Primary database model Search engine Relational DBMS infoKey/Value like access via memcached API
Secondary database models Document store Document store
Key-value store
DB-Engines Ranking infomeasures the popularity of database management systems ranking trend
Trend Chart
Score 145.25
Rank #8   Overall
  #1   Search engines
Score 1167.29
Rank #2   Overall
  #2   Relational DBMS
Website www.elastic.co/­products/­elasticsearch www.mysql.com
Technical documentation www.elastic.co/­guide/­en/­elasticsearch/­reference/­current/­index.html dev.mysql.com/­doc
Developer Elastic Oracle infosince 2010, originally MySQL AB, then Sun
Initial release 2010 1995
Current release 6.6.0, January 2019 8.0.15, February 2019
License infoCommercial or Open Source Open Source infoApache Version 2; Elastic License Open Source infoGPL version 2. Commercial licenses with extended functionallity are available
Cloud-based only infoOnly available as a cloud service no no
DBaaS offerings (sponsored links) infoDatabase as a Service

Providers of DBaaS offerings, please contact us to be listed.
Elasticsearch Service on Elastic Cloud: Try out the official hosted Elasticsearch and Kibana offering available on AWS and GCP that's powered by the creators of Elasticsearch.
  • Azure Database for MySQL: A fully managed, scalable MySQL relational database with high availability and security built in at no extra cost
  • Google Cloud SQL: A fully-managed database service for the Google Cloud Platform
Implementation language Java C and C++
Server operating systems All OS with a Java VM FreeBSD
Linux
OS X
Solaris
Windows
Data scheme schema-free infoFlexible type definitions. Once a type is defined, it is persistent yes
Typing infopredefined data types such as float or date yes yes
XML support infoSome form of processing data in XML format, e.g. support for XML data structures, and/or support for XPath, XQuery or XSLT. no yes
Secondary indexes yes infoAll search fields are automatically indexed yes
SQL infoSupport of SQL SQL-like query language yes infowith proprietary extensions
APIs and other access methods Java API
RESTful HTTP/JSON API
Proprietary native API
ADO.NET
JDBC
ODBC
Supported programming languages .Net
Groovy
Java
JavaScript
Perl
PHP
Python
Ruby
Community Contributed Clients
Ada
C
C#
C++
D
Delphi
Eiffel
Erlang
Haskell
Java
JavaScript (Node.js)
Objective-C
OCaml
Perl
PHP
Python
Ruby
Scheme
Tcl
Server-side scripts infoStored procedures yes yes infoproprietary syntax
Triggers yes infoby using the 'percolation' feature yes
Partitioning methods infoMethods for storing different data on different nodes Sharding horizontal partitioning, sharding with MySQL Cluster or MySQL Fabric
Replication methods infoMethods for redundantly storing data on multiple nodes yes Master-master replication
Master-slave replication
MapReduce infoOffers an API for user-defined Map/Reduce methods ES-Hadoop Connector no
Consistency concepts infoMethods to ensure consistency in a distributed system Eventual Consistency infoSynchronous doc based replication. Get by ID may show delays up to 1 sec. Configurable write consistency: one, quorum, all Immediate Consistency
Foreign keys infoReferential integrity no yes infonot for MyISAM storage engine
Transaction concepts infoSupport to ensure data integrity after non-atomic manipulations of data no ACID infonot for MyISAM storage engine
Concurrency infoSupport for concurrent manipulation of data yes yes infotable locks or row locks depending on storage engine
Durability infoSupport for making data persistent yes yes
In-memory capabilities infoIs there an option to define some or all structures to be held in-memory only. Memcached and Redis integration yes
User concepts infoAccess control   Users with fine-grained authorization concept infono user groups or roles
More information provided by the system vendor
  Elasticsearch MySQL
Specific characteristics Elasticsearch is a highly scalable open source full-text search and analytics engine....
» more
 
Competitive advantages Open source, real-time index, search and analysis, horizontally scalable, distributed,...
» more
 
Typical application scenarios
» more
 
Key customers
» more
 
Market metrics More than 250 million cumulative downloads More than 100,000 community members in...
» more
 
Licensing and pricing models The Elastic Stack (Elasticsearch, Kibana, Beats, and Logstash) is free and open source...
» more
 

We invite representatives of system vendors to contact us for updating and extending the system information,
and for displaying vendor-provided information such as key customers, competitive advantages and market metrics.

Related products and services
3rd parties Dremio: Analyze your data with standard SQL and any BI tool. Accelerate your queries up to 1,000x.
» more

Elastic Cloud: Try the official hosted Elasticsearch.
» more
Dremio is like magic for MySQL accelerating your analytical queries up to 1,000x.
» more

General SQL Parser: Instantly adding parsing, decoding, analysis and rewrite SQL processing capability to your products.
» more

Progress DataDirect: Data connectivity across standard SQL and REST
» more

Navicat for MySQL is the ideal solution for MySQL/MariaDB administration and development.
» more

ScaleGrid: Fully managed MySQL-as-a-Service.
» more

CData: Connect to Big Data & NoSQL through standard Drivers.
» more

Azure Database for MySQL provides fully managed, enterprise-ready community MySQL database as a service
» more

We invite representatives of vendors of related products to contact us for presenting information about their offerings here.

More resources
  Elasticsearch MySQL
DB-Engines blog posts

PostgreSQL is the DBMS of the Year 2017
2 January 2018, Paul Andlinger, Matthias Gelbmann

Elasticsearch moved into the top 10 most popular database management systems
3 July 2017, Matthias Gelbmann

MySQL, PostgreSQL and Redis are the winners of the March ranking
2 March 2016, Paul Andlinger

show all

MariaDB strengthens its position in the open source RDBMS market
5 April 2018, Matthias Gelbmann

The struggle for the hegemony in Oracle's database empire
2 May 2017, Paul Andlinger

Microsoft SQL Server is the DBMS of the Year
4 January 2017, Matthias Gelbmann, Paul Andlinger

show all

Recent citations in the news

Toshi: A full text search engine modeled after Elasticsearch
14 January 2019, JAXenter

Data Silos and Breaches: Building a Long-term Security Operations Platform with Elasticsearch
25 January 2019, TDWI

Another Bank Found in Elasticsearch Database Leaks
24 January 2019, Infosecurity Magazine

Online casino group leaks information on 108 million bets, including user details
21 January 2019, ZDNet

James Spiteri on why security teams need Elastic Search -
23 January 2019, Enterprise Times

provided by Google News

How to create tables and add data to MySQL database with MySQL Workbench
8 February 2019, TechRepublic

Highly Available MySQL Clustering at WePay Using Orchestrator, Consul and HAProxy
10 February 2019, InfoQ.com

MySQL database management vulnerability opens the door to data theft
21 January 2019, SiliconANGLE News

Love data? Learn MySQL for a measly $11 with this online course.
23 January 2019, Mashable

MySQL Design Flaw Allows Malicious Servers to Steal Files from Clients
21 January 2019, BleepingComputer

provided by Google News

Job opportunities

Return to Work Program
Netflix, Los Gatos, CA

Frontend Software Engineer (React) - Chicago, IL
Telnyx, Chicago, IL

Data Scientist / Engineer
IBM, Armonk, NY

Data Engineer (Entry-Level) - 2010953
Software Engineering Institute, Arlington, VA

Associate Engineer
Conde Nast, New York, NY

Database Administrator
etouch, Mountain View, CA

Database Administrator
LT Consulting LLC, Reston, VA

Database Administrator
Hagen Software, Washington, DC

Database Administrator
Leidos, Reston, VA

Database Administrator
QVine Corporation, Reston, VA

参考:

全文搜索引擎 ElasticSearch 还是 Solr?

System Properties Comparison Elasticsearch vs. MySQL

你可能感兴趣的:(ElasticSearch)