Flink的Table以及SQL

Flink的Table以及SQL

1、Flink table以及SQL的基本介绍

Apache Flink 具有两个关系型API:Table API 和SQL,用于统一流和批处理。

Table API 是用于 Scala 和 Java 语言的查询API,允许以非常直观的方式组合关系运算符的查询,例如 select,filter 和 join。

Flink SQL 的支持是基于实现了SQL标准的 Apache Calcite。无论输入是批输入(DataSet)还是流输入(DataStream),任一接口中指定的查询都具有相同的语义并指定相同的结果。

Table API和SQL接口彼此集成,Flink的DataStream和DataSet API亦是如此。我们可以轻松地在基于API构建的所有API和库之间切换。

注意,到目前最新版本为止,Table API和SQL还有很多功能正在开发中。 并非[Table API,SQL]和[stream,batch]输入的每种组合都支持所有操作

2、为什么需要SQL

SQL 作为一个"人所皆知"的语言,如果一个引擎提供 SQL,它将很容易被人们接受。这已经是业界很常见的现象了。

Table API 是一种关系型API,类 SQL 的API,用户可以像操作表一样地操作数据, 非常的直观和方便。

Table & SQL API 还有另一个职责,就是流处理和批处理统一的API层。

3、Flink Table & SQL编程开发

官网介绍

Flink的tableAPI允许我们对流式处理以及批量处理都使用sql语句的方式来进行开发。只要我们知道了dataStream或者dataSet可以转换成为Table,那么我们就可以方便的从各个地方获取数据,然后转换成为Table,通过TableAPI或者SQL来实现我们的数据的处理等。

Flink的表API和SQL程序可以连接到其他外部系统来读写批处理表和流表。Table source提供对存储在外部 系统(如数据库、键值存储、消息队列或文件系统)中的数据的访问。Table Sink将表发送到外部存储系统。

1、使用FlinkSQL实现读取CSV文件数据,并进行查询
需求:读取csv文件,文件内容参见课件当中的flinksql.csv文件,查询年龄大于18岁的人,并将结果写入到csv文件里面去
第一步:导入jar包

<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-table-planner_2.11artifactId>
    <version>1.8.1version>
dependency>
<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-table-api-scala-bridge_2.11artifactId>
    <version>1.8.1version>
dependency>
<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-table-api-scala_2.11artifactId>
    <version>1.8.1version>
dependency>
<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-table-commonartifactId>
    <version>1.8.1version>
dependency>

第二步:开发代码读取csv文件并进行查询

import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.streaming.api.scala.{
     StreamExecutionEnvironment}
import org.apache.flink.table.api.{
     Table, Types}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.sinks.{
     CsvTableSink}
import org.apache.flink.table.sources.CsvTableSource

object FlinkStreamSQL {
     
  def main(args: Array[String]): Unit = {
     
    //流式sql,获取运行环境
    val streamEnvironment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //流式table处理环境
    val tableEnvironment: StreamTableEnvironment = StreamTableEnvironment.create(streamEnvironment)
    //注册我们的tableSource
    val source: CsvTableSource = CsvTableSource.builder()
      .field("id", Types.INT)
      .field("name", Types.STRING)
      .field("age", Types.INT)
      .fieldDelimiter(",")
      .ignoreFirstLine()
      .ignoreParseErrors()
      .lineDelimiter("\r\n")
      .path("D:\\开课吧课程资料\\Flink实时数仓\\datas\\flinksql.csv")
      .build()
    //将tableSource注册成为我们的表
    tableEnvironment.registerTableSource("user",source)
    //查询年龄大于18岁的人
    val result: Table = tableEnvironment.scan("user").filter("age >18")
    //打印我们表的元数据信息===》也就是字段信息
    //将查询出来的结果,保存到我们的csv文件里面去
    val sink = new CsvTableSink("D:\\开课吧课程资料\\Flink实时数仓\\datas\\sink.csv","===",1,WriteMode.OVERWRITE)
    result.writeToSink(sink)
    streamEnvironment.execute()
  }
}

2、DataStream与Table的互相转换操作
DataStream转换成为Table说明:

我们也可以将dataStream,流式处理的数据处理成为一张表,然后通过sql语句进行查询数据,读取socket当中的数据,然后进行数据统计,统计年大于10的人数,并将结果保存到本地文件,socket发送的数据格式如下。

101,zhangsan,18
102,lisi,20
103,wangwu,25
104,zhaoliu,8

将DataStream转换成为Table,我们需要用到StreamExecutionEnvironment和StreamTableEnvironment这两个对象
获取StreamTableEnvironment 对象,然后调用fromDataStream或者registerDataStream就可以将我们的DataStream转换成为Table

Table转换成为DataStream说明:
或者我们也可以将我们处理完成之后的Table转换成为DataStream,将Table转换成为DataStream可以有两种模式
第一种方式:AppendMode
将表附加到流数据,表当中只能有查询或者添加操作,如果有update或者delete操作,那么就会失败
只有在动态Table仅通过INSERT更改修改时才能使用此模式,即它仅附加,并且以前发出的结果永远不会更新。如果更新或删除操作使用追加模式会失败报错

第二种模式:RetraceMode
始终可以使用此模式。返回值是boolean类型。它用true或false来标记数据的插入和撤回,返回true代表数据插入,false代表数据的撤回
第一步:代码开发
注意:flink代码开发需要导入隐式转换包

import org.apache.flink.api.scala._
对于flink tableAPI或者SQL的开发,则需要导入隐式转换包
import org.apache.flink.table.api._

import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.streaming.api.scala.{
     DataStream, StreamExecutionEnvironment}
import org.apache.flink.table.api._
import org.apache.flink.api.scala._
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.sinks.CsvTableSink

object FlinkStreamSQL {
     
  def main(args: Array[String]): Unit = {
     
    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

    val streamSQLEnvironment: StreamTableEnvironment = StreamTableEnvironment.create(environment)
    val socketStream: DataStream[String] = environment.socketTextStream("node01",9000)
    //101,zhangsan,18
    //102,lisi,20
    //103,wangwu,25
    //104,zhaoliu,8
    val userStream: DataStream[User] = socketStream.map(x =>User(x.split(",")(0).toInt,x.split(",")(1),x.split(",")(2).toInt) )
    //将我们的流注册成为一张表
    streamSQLEnvironment.registerDataStream("userTable",userStream)
    //通过sql语句的方式来进行查询

    //通过表达式来进行查询
    //使用tableAPI来进行查询
   // val table: Table = streamSQLEnvironment.scan("userTable").filter("age > 10")
    //使用sql方式来进行查询
    val table: Table = streamSQLEnvironment.sqlQuery("select * from userTable")
    val sink3 = new CsvTableSink("D:\\开课吧课程资料\\Flink实时数仓\\datas\\sink3.csv","===",1,WriteMode.OVERWRITE)
    table.writeToSink(sink3)

    //使用append模式将Table转换成为dataStream,不能用于sum,count,avg等操作,只能用于添加数据操作
    val appendStream: DataStream[User] = streamSQLEnvironment.toAppendStream[User](table)
    //使用retract模式将Table转换成为DataStream
    val retractStream: DataStream[(Boolean, User)] = streamSQLEnvironment.toRetractStream[User](table)
    environment.execute()
  }
}
case class User(id:Int,name:String,age:Int)

第二步:socket发送数据

101,zhangsan,18
102,lisi,20
103,wangwu,25
104,zhaoliu,8

3、DataSet与Table的互相转换操作
我们也可以将我们的DataSet注册成为一张表Table,然后进行查询数据,同时我们也可以将Table转换成为DataSet

import org.apache.flink.api.scala._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.table.api.scala.BatchTableEnvironment
import org.apache.flink.table.sinks.CsvTableSink

object FlinkBatchSQL {
     
  def main(args: Array[String]): Unit = {
     
    val environment: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val batchSQL: BatchTableEnvironment = BatchTableEnvironment.create(environment)

    val sourceSet: DataSet[String] = environment.readTextFile("D:\\开课吧课程资料\\Flink实时数仓\\datas\\dataSet.csv")

    val userSet: DataSet[User2] = sourceSet.map(x => {
     
      println(x)
      val line: Array[String] = x.split(",")
      User2(line(0).toInt, line(1), line(2).toInt)
    })

    import org.apache.flink.table.api._

    batchSQL.registerDataSet("user",userSet)
   //val table: Table = batchSQL.scan("user").filter("age > 18")
    //注意:user关键字是flink当中的保留字段,如果用到了这些保留字段,需要转译
    val table: Table = batchSQL.sqlQuery("select id,name,age from `user` ")
    val sink = new CsvTableSink("D:\\开课吧课程资料\\Flink实时数仓\\datas\\batchSink.csv","===",1,WriteMode.OVERWRITE)
    table.writeToSink(sink)


    //将Table转换成为DataSet
     val tableSet: DataSet[User2] = batchSQL.toDataSet[User2](table)

    tableSet.map(x =>x.age).print()

    environment.execute()
  }
}
case class User2(id:Int,name:String,age:Int)

更多flink定义的保留关键字段:
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/sql.html

A, ABS, ABSOLUTE, ACTION, ADA, ADD, ADMIN, AFTER, ALL, ALLOCATE,
ALLOW, ALTER, ALWAYS, AND, ANY, ARE, ARRAY, AS, ASC, ASENSITIVE,
ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTRIBUTE, ATTRIBUTES,
AUTHORIZATION, AVG, BEFORE, BEGIN, BERNOULLI, BETWEEN, BIGINT, BINARY,
BIT, BLOB, BOOLEAN, BOTH, BREADTH, BY, C, CALL, CALLED, CARDINALITY,
CASCADE, CASCADED, CASE, CAST, CATALOG, CATALOG_NAME, CEIL, CEILING,
CENTURY, CHAIN, CHAR, CHARACTER, CHARACTERISTICS, CHARACTERS,
CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME,
CHARACTER_SET_SCHEMA, CHAR_LENGTH, CHECK, CLASS_ORIGIN, CLOB, CLOSE,
COALESCE, COBOL, COLLATE, COLLATION, COLLATION_CATALOG,
COLLATION_NAME, COLLATION_SCHEMA, COLLECT, COLUMN, COLUMN_NAME,
COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMIT, COMMITTED, CONDITION,
CONDITION_NUMBER, CONNECT, CONNECTION, CONNECTION_NAME, CONSTRAINT,
CONSTRAINTS, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA,
CONSTRUCTOR, CONTAINS, CONTINUE, CONVERT, CORR, CORRESPONDING, COUNT,
COVAR_POP, COVAR_SAMP, CREATE, CROSS, CUBE, CUME_DIST, CURRENT,
CURRENT_CATALOG, CURRENT_DATE, CURRENT_DEFAULT_TRANSFORM_GROUP,
CURRENT_PATH, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME,
CURRENT_TIMESTAMP, CURRENT_TRANSFORM_GROUP_FOR_TYPE, CURRENT_USER,
CURSOR, CURSOR_NAME, CYCLE, DATA, DATABASE, DATE,
DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DAY, DEALLOCATE,
DEC, DECADE, DECIMAL, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE,
DEFERRED, DEFINED, DEFINER, DEGREE, DELETE, DENSE_RANK, DEPTH, DEREF,
DERIVED, DESC, DESCRIBE, DESCRIPTION, DESCRIPTOR, DETERMINISTIC,
DIAGNOSTICS, DISALLOW, DISCONNECT, DISPATCH, DISTINCT, DOMAIN, DOUBLE,
DOW, DOY, DROP, DYNAMIC, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE,
EACH, ELEMENT, ELSE, END, END-EXEC, EPOCH, EQUALS, ESCAPE, EVERY,
EXCEPT, EXCEPTION, EXCLUDE, EXCLUDING, EXEC, EXECUTE, EXISTS, EXP,
EXPLAIN, EXTEND, EXTERNAL, EXTRACT, FALSE, FETCH, FILTER, FINAL,
FIRST, FIRST_VALUE, FLOAT, FLOOR, FOLLOWING, FOR, FOREIGN, FORTRAN,
FOUND, FRAC_SECOND, FREE, FROM, FULL, FUNCTION, FUSION, G, GENERAL,
GENERATED, GET, GLOBAL, GO, GOTO, GRANT, GRANTED, GROUP, GROUPING,
HAVING, HIERARCHY, HOLD, HOUR, IDENTITY, IMMEDIATE, IMPLEMENTATION,
IMPORT, IN, INCLUDING, INCREMENT, INDICATOR, INITIALLY, INNER, INOUT,
INPUT, INSENSITIVE, INSERT, INSTANCE, INSTANTIABLE, INT, INTEGER,
INTERSECT, INTERSECTION, INTERVAL, INTO, INVOKER, IS, ISOLATION, JAVA,
JOIN, K, KEY, KEY_MEMBER, KEY_TYPE, LABEL, LANGUAGE, LARGE, LAST,
LAST_VALUE, LATERAL, LEADING, LEFT, LENGTH, LEVEL, LIBRARY, LIKE,
LIMIT, LN, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, LOWER, M, MAP,
MATCH, MATCHED, MAX, MAXVALUE, MEMBER, MERGE, MESSAGE_LENGTH,
MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MICROSECOND, MILLENNIUM,
MIN, MINUTE, MINVALUE, MOD, MODIFIES, MODULE, MONTH, MORE, MULTISET,
MUMPS, NAME, NAMES, NATIONAL, NATURAL, NCHAR, NCLOB, NESTING, NEW,
NEXT, NO, NONE, NORMALIZE, NORMALIZED, NOT, NULL, NULLABLE, NULLIF,
NULLS, NUMBER, NUMERIC, OBJECT, OCTETS, OCTET_LENGTH, OF, OFFSET, OLD,
ON, ONLY, OPEN, OPTION, OPTIONS, OR, ORDER, ORDERING, ORDINALITY,
OTHERS, OUT, OUTER, OUTPUT, OVER, OVERLAPS, OVERLAY, OVERRIDING, PAD,
PARAMETER, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION,
PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME,
PARAMETER_SPECIFIC_SCHEMA, PARTIAL, PARTITION, PASCAL, PASSTHROUGH,
PATH, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, PLACING, PLAN,
PLI, POSITION, POWER, PRECEDING, PRECISION, PREPARE, PRESERVE,
PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, QUARTER, RANGE, RANK,
READ, READS, REAL, RECURSIVE, REF, REFERENCES, REFERENCING, REGR_AVGX,
REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX,
REGR_SXY, REGR_SYY, RELATIVE, RELEASE, REPEATABLE, RESET, RESTART,
RESTRICT, RESULT, RETURN, RETURNED_CARDINALITY, RETURNED_LENGTH,
RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, RETURNS, REVOKE, RIGHT,
ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINE_CATALOG, ROUTINE_NAME,
ROUTINE_SCHEMA, ROW, ROWS, ROW_COUNT, ROW_NUMBER, SAVEPOINT, SCALE,
SCHEMA, SCHEMA_NAME, SCOPE, SCOPE_CATALOGS, SCOPE_NAME, SCOPE_SCHEMA,
SCROLL, SEARCH, SECOND, SECTION, SECURITY, SELECT, SELF, SENSITIVE,
SEQUENCE, SERIALIZABLE, SERVER, SERVER_NAME, SESSION, SESSION_USER,
SET, SETS, SIMILAR, SIMPLE, SIZE, SMALLINT, SOME, SOURCE, SPACE,
SPECIFIC, SPECIFICTYPE, SPECIFIC_NAME, SQL, SQLEXCEPTION, SQLSTATE,
SQLWARNING, SQL_TSI_DAY, SQL_TSI_FRAC_SECOND, SQL_TSI_HOUR,
SQL_TSI_MICROSECOND, SQL_TSI_MINUTE, SQL_TSI_MONTH, SQL_TSI_QUARTER,
SQL_TSI_SECOND, SQL_TSI_WEEK, SQL_TSI_YEAR, SQRT, START, STATE,
STATEMENT, STATIC, STDDEV_POP, STDDEV_SAMP, STREAM, STRUCTURE, STYLE,
SUBCLASS_ORIGIN, SUBMULTISET, SUBSTITUTE, SUBSTRING, SUM, SYMMETRIC,
SYSTEM, SYSTEM_USER, TABLE, TABLESAMPLE, TABLE_NAME, TEMPORARY, THEN,
TIES, TIME, TIMESTAMP, TIMESTAMPADD, TIMESTAMPDIFF, TIMEZONE_HOUR,
TIMEZONE_MINUTE, TINYINT, TO, TOP_LEVEL_COUNT, TRAILING, TRANSACTION,
TRANSACTIONS_ACTIVE, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK,
TRANSFORM, TRANSFORMS, TRANSLATE, TRANSLATION, TREAT, TRIGGER,
TRIGGER_CATALOG, TRIGGER_NAME, TRIGGER_SCHEMA, TRIM, TRUE, TYPE,
UESCAPE, UNBOUNDED, UNCOMMITTED, UNDER, UNION, UNIQUE, UNKNOWN,
UNNAMED, UNNEST, UPDATE, UPPER, UPSERT, USAGE, USER,
USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_CODE,
USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA, USING, VALUE,
VALUES, VARBINARY, VARCHAR, VARYING, VAR_POP, VAR_SAMP, VERSION, VIEW,
WEEK, WHEN, WHENEVER, WHERE, WIDTH_BUCKET, WINDOW, WITH, WITHIN,
WITHOUT, WORK, WRAPPER, WRITE, XML, YEAR, ZONE

4、FlinkSQL处理kafka的json格式数据

Flink的SQL功能也可以让我们直接读取kafka当中的数据,然后将kafka当中的数据作为我们的数据源,直接将kafka当中的数据注册成为一张表,然后通过sql来查询kafka当中的数据即可,如果kafka当中出现的是json格式的数据,那么也没关系flink也可以与json进行集成,直接解析json格式的数据
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/connect.html
第一步:导入jar包
导入jar包

<dependency>
     <groupId>org.apache.flinkgroupId>
     <artifactId>flink-jsonartifactId>
     <version>1.8.1version>
 dependency>

 

第二步:创建kafka的topic
node01执行以下命令,创建一个topic

cd /kkb/install/kafka_2.11-1.1.0
bin/kafka-topics.sh --create --topic kafka_source_table --partitions 3 --replication-factor 1 --zookeeper node01:2181,node02:2181,node03:2181

第三步:使用flink查询kafka当中的数据

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.{
     Table, _}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.descriptors.{
     Json, Kafka, Schema}
import org.apache.flink.table.sinks.CsvTableSink
object KafkaJsonSource {
     
  def main(args: Array[String]): Unit = {
     
    val streamEnvironment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //隐式转换
    //checkpoint配置
   /* streamEnvironment.enableCheckpointing(100);
    streamEnvironment.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
    streamEnvironment.getCheckpointConfig.setMinPauseBetweenCheckpoints(500);
    streamEnvironment.getCheckpointConfig.setCheckpointTimeout(60000);
    streamEnvironment.getCheckpointConfig.setMaxConcurrentCheckpoints(1);
    streamEnvironment.getCheckpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
*/
    val tableEnvironment: StreamTableEnvironment = StreamTableEnvironment.create(streamEnvironment)
    val kafka: Kafka = new Kafka()
      .version("0.11")
      .topic("kafka_source_table")
      .startFromLatest()
      .property("group.id", "test_group")
      .property("bootstrap.servers", "node01:9092,node02:9092,node03:9092")

    val json: Json = new Json().failOnMissingField(false).deriveSchema()
    //{"userId":1119,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000,"data":[{"package":"com.browser","activetime":120000}]}
    val schema: Schema = new Schema()
      .field("userId", Types.INT)
      .field("day", Types.STRING)
      .field("begintime", Types.LONG)
      .field("endtime", Types.LONG)
     tableEnvironment
      .connect(kafka)
      .withFormat(json)
      .withSchema(schema)
      .inAppendMode()
      .registerTableSource("user_log")
    //使用sql来查询数据
    val table: Table = tableEnvironment.sqlQuery("select userId,`day` ,begintime,endtime  from user_log")
    table.printSchema()
    //定义sink,输出数据到哪里
    val sink = new CsvTableSink("D:\\开课吧课程资料\\Flink实时数仓\\datas\\flink_kafka.csv","====",1,WriteMode.OVERWRITE)
    //注册数据输出目的地
    tableEnvironment.registerTableSink("csvSink",
      Array[String]("f0","f1","f2","f3"),
        Array[TypeInformation[_]](Types.INT, Types.STRING, Types.LONG, Types.LONG),sink)
    //将数据插入到数据目的地
    table.insertInto("csvSink")
    streamEnvironment.execute("kafkaSource")
  }
}

第四步:kafka当中发送数据
使用kafka命令行发送数据

cd /kkb/install/kafka_2.11-1.1.0
bin/kafka-console-producer.sh  --topic kafka_source_table --broker-list node01:9092,node02:9092,node03:9092 

发送数据格式如下:

{
     "userId":1119,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{
     "userId":1120,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{
     "userId":1121,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{
     "userId":1122,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{
     "userId":1123,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}

你可能感兴趣的:(Hadoop生态框架)