大数据学习[09]:presto0.184集群|多数据源|问题

大数据学习[09]:presto0.184集群|多数据源|问题_第1张图片

摘要:下载安装presto0.184,配置presto0.184的集群模式,测试presto与多种数据相连接的连接器,包括Hive,Mysql等等。记录安装与配置所遇到的问题和解决方法。

前置

Hadoop,Hive参考:
大数据学习[04]:Hive安装配置:
http://blog.csdn.net/ld326/article/details/78023101
大数据学习[02]:hadoop安装配置: http://blog.csdn.net/ld326/article/details/78004402

官网

https://prestodb.io/

下载

https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.184/presto-server-0.184.tar.gz

解压

tar zvf presto-server-0.184.tar.gz

创建文件夹

与目录presto-server-0.184相同的目录创建一个数据日志数据文件夹,presto-data; 为了方便之后升级这个文件不用修改;

配置

最后配置的文件结构

etc
├── catalog
│ └── hive.properties
├── config.properties
├── jvm.config
├── log.properties
└── node.properties

Create an etc directory

在presto的home目录下创建一个etc目录,主要配置以下的信息:
● Node Properties: environmental configuration specific to each node
● JVM Config: command line options for the Java Virtual Machine
● Config Properties: configuration for the Presto server
● Catalog Properties: configuration for Connectors (data sources)

节点属性配置

创建配置节点信息,先创建etc/node.properties文件,在一个机器上,节点是Presto安装的单例。简单配置一下etc/node.properties文件:
node.environment=pcluster
node.id=111
node.data-dir=/home/hadoop/data/presto
记得创建一个/home/hadoop/data/presto文件夹,当presto运行后会是这样的目录结构:

data
└── presto
    ├── etc -> /home/hadoop/presto-server-0.184/etc
    ├── plugin -> /home/hadoop/presto-server-0.184/plugin
    └── var
        ├── log
        │   ├── http-request.log
        │   ├── launcher.log
        │   └── server.log
        └── run
            └── launcher.pid

● node.environment: The name of the environment. All Presto nodes in a cluster must have the same environment name.
node.environment:是一个presto事个集群环境的名字,所有的Presto节点都要有这个相同的名字。
● node.id: The unique identifier for this installation of Presto. This must be unique for every node. This identifier should remain consistent across reboots or upgrades of Presto. If running multiple installations of Presto on a single machine (i.e. multiple nodes on the same machine), each installation must have a unique identifier.
node.id:Presto安装的唯一识别,对于每个节点这个id是唯一的;Presto无论重启或更新这个id是保持不变的。
● node.data-dir: The location (filesystem path) of the data directory. Presto will store logs and other data here.
nod.data-dir:Presto的数据储存的本地目录

JVM配置

文件:etc/jvm.config
-server
-Xmx1G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

配置Presto server

文件:etc/config.properties
presto server 中分两个角色:coordinator,workers;
Every Presto server can function as both a coordinator and a worker, but dedicating a single machine to only perform coordination work provides the best performance on larger clusters.
每台Presto服务器都可以做coordinatorgn与worker,但是在一个大的集群中,用单个机器作为coordination有好的性能。
coordinator[192.168.137.101]:

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=1GB
query.max-memory-per-node=512MB
discovery-server.enabled=true
discovery.uri=http://hadoop01:8080

workers[192.168.137.102,192.168.137.103]:

coordinator=false
http-server.http.port=8080
query.max-memory=1GB
discovery.uri=http://hadoop01:8080

或想一台机器设置两个角色:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=1GB
query.max-memory-per-node=512MB
discovery-server.enabled=true
discovery.uri=http://hadoop01:8080

这里就调整了一下:node-scheduler.include-coordinator,让主机也作为work节点。
注意:这里的hadoop01为主机名,如果配置成IP会出错。
说明:
● coordinator: Allow this Presto instance to function as a coordinator (accept queries from clients and manage query execution).
● node-scheduler.include-coordinator: Allow scheduling work on the coordinator. For larger clusters, processing work on the coordinator can impact query performance because the machine’s resources are not available for the critical task of scheduling, managing and monitoring query execution.
● http-server.http.port: Specifies the port for the HTTP server. Presto uses HTTP for all communication, internal and external.
● query.max-memory: The maximum amount of distributed memory that a query may use.
● query.max-memory-per-node: The maximum amount of memory that a query may use on any one machine.
● discovery-server.enabled: Presto uses the Discovery service to find all the nodes in the cluster. Every Presto instance will register itself with the Discovery service on startup. In order to simplify deployment and avoid running an additional service, the Presto coordinator can run an embedded version of the Discovery service. It shares the HTTP server with Presto and thus uses the same port.
● discovery.uri: The URI to the Discovery server. Because we have enabled the embedded version of Discovery in the Presto coordinator, this should be the URI of the Presto coordinator. Replace example.net:8080 to match the host and port of the Presto coordinator. This URI must not end in a slash.

配置日志Log

文件:etc/log.properties
配置内容

com.facebook.presto=INFO

配置连接数据源(Catalog Properties)

Connectors
● 5.1. Accumulo Connector
● 5.2. Black Hole Connector
● 5.3. Cassandra Connector
● 5.4. Hive Connector
● 5.5. Hive Security Configuration
● 5.6. JMX Connector
● 5.7. Kafka Connector
● 5.8. Kafka Connector Tutorial
● 5.9. Local File Connector
● 5.10. Memory Connector
● 5.11. MongoDB Connector
● 5.12. MySQL Connector
● 5.13. PostgreSQL Connector
● 5.14. Redis Connector
● 5.15. SQL Server Connector
● 5.16. System Connector
● 5.17. Thrift Connector
● 5.18. TPCDS Connector
● 5.19. TPCH Connector

配置hive的Catalog

vim hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hadoop01:9083
hive.config.resources=/home/hadoop/hadoop/etc/hadoop/core-site.xml, /home/hadoop/apache-hive-1.2.1-bin/conf/hdfs-site.xml
hive.allow-drop-table=true

记得hive-hadoop2这个名字不能乱起的哈,其它名字presto不认识的。
presto的hive配置属性说明表
大数据学习[09]:presto0.184集群|多数据源|问题_第2张图片

启动

两种启动方法
daemon运行:bin/launcher start
foreground运行:bin/launcher run
如果用到supersive运行的话,要用foreground运行这个。

Presto命令行界面Presto CLI###

下载:
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.184/presto-cli-0.184-executable.jar
修改名为presto,修改运行权限chmod +x,运行

./presto --server localhost:8080 --catalog hive --schema default

提示:Run the CLI with the –help option to see the available options.
cli启动:

[hadoop@hadoop01 ~]$ ./presto.jar --server 192.168.137.101:8080 --catalog hive --schema default

–server要跟前面presto中config.properties中所配置的IP与端口一致。

2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   PROPERTY                                           DEFAULT     RUNTIME                                                                                                DESCRIPTION
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.allow-corrupt-writes-for-testing              false       false                                                                                                  Allow Hive connector to write data even when data will likely be corrupt
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.assume-canonical-partition-keys               false       false
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.bucket-execution                              true        true                                                                                                   Enable bucket-aware execution: only use a single worker per bucket
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.bucket-writing                                true        true                                                                                                   Enable writing to bucketed tables
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.connect.max-retries                       5           5
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.connect.timeout                           500.00ms    500.00ms
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs-timeout                                   60.00s      60.00s
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.domain-compaction-threshold                   100         100                                                                                                    Maximum ranges to allow in a tuple domain without compacting it
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.domain-socket-path                        null        null
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.fs.cache.max-size                             1000        1000                                                                                                   Hadoop FileSystem cache size
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.force-local-scheduling                        false       false
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.hdfs.authentication.type                      NONE        NONE                                                                                                   HDFS authentication type
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.hdfs.impersonation.enabled                    false       false                                                                                                  Should Presto user be impersonated when communicating with HDFS
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.compression-codec                             GZIP        GZIP
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore.authentication.type                 NONE        NONE                                                                                                   Hive Metastore authentication type
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.storage-format                                RCBINARY    RCBINARY
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.immutable-partitions                          false       false                                                                                                  Can new data be inserted into existing partitions or existing unpartitioned tables
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.ipc-ping-interval                         10.00s      10.00s
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-concurrent-file-renames                   20          20
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-initial-split-size                        32MB        32MB
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-initial-splits                            200         200
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-refresh-max-threads                 100         100
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-outstanding-splits                        1000        1000
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore.partition-batch-size.max            100         100
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-partitions-per-scan                       100000      100000                                                                                                 Maximum allowed partitions for a single table scan
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-partitions-per-writers                    100         100                                                                                                    Maximum number of partitions per writer
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-split-iterator-threads                    1000        1000
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-split-size                                64MB        64MB
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-cache-maximum-size                  10000       10000
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-cache-ttl                           0.00s       0.00s
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-refresh-interval                    0.00s       0.00s
2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore.thrift.client.socks-proxy           null        null
2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.metastore-timeout                             10.00s      10.00s
2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.metastore.partition-batch-size.min            10          10
2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.orc.bloom-filters.enabled                     false       false
2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.orc.default-bloom-filter-fpp                  0.05        0.05                                                                                                   ORC Bloom filter false positive probability
2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.orc.max-buffer-size                           8MB         8MB
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.max-merge-distance                        1MB         1MB
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.max-read-block-size                       16MB        16MB
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.optimized-writer.enabled                  false       false
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.stream-buffer-size                        8MB         8MB
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.parquet-optimized-reader.enabled              false       false
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.parquet-predicate-pushdown.enabled            false       false
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.per-transaction-metastore-cache-maximum-size  1000        1000
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.rcfile-optimized-writer.enabled               true        true
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.rcfile.writer.validate                        false       false                                                                                                  Validate RCFile after write by re-reading the whole file
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.recursive-directories                         false       false
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.config.resources                              null        [/home/hadoop/hadoop/etc/hadoop/core-site.xml, /home/hadoop/apache-hive-1.2.1-bin/conf/hdfs-site.xml]
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.respect-table-format                          true        true                                                                                                   Should new partitions be written using the existing table format or the default Presto format
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.skip-deletion-for-alter                       false       false                                                                                                  Skip deletion of old partition data when a partition is deleted and then inserted in the same transaction
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.table-statistics-enabled                      true        true                                                                                                   Enable use of table statistics
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.time-zone                                     PRC         PRC
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.use-column-names                          false       false                                                                                                  Access ORC columns using names from the file
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.parquet.use-column-names                      false       false                                                                                                  Access Parquet columns using names from the file
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.dfs.verify-checksum                           true        true
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.write-validation-threads                      16          16                                                                                                     Number of threads used for verifying data after a write
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.non-managed-table-writes-enabled              false       false                                                                                                  Enable writes to non-managed (external) tables
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.pin-client-to-current-region               false       false                                                                                                  Should the S3 client be pinned to the current EC2 region
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.aws-access-key                             null        null
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.aws-secret-key                             [REDACTED]  [REDACTED]
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.connect-timeout                            5.00s       5.00s
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.encryption-materials-provider              null        null                                                                                                   Use a custom encryption materials provider for S3 data encryption
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.endpoint                                   null        null
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.kms-key-id                                 null        null                                                                                                   Use an AWS KMS key for S3 data encryption
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.max-backoff-time                           10.00m      10.00m
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.max-client-retries                         5           5
2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.max-connections                            500         500
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.max-error-retries                          10          10
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.max-retry-time                             10.00m      10.00m
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.multipart.min-file-size                    16MB        16MB                                                                                                   Minimum file size for an S3 multipart upload
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.multipart.min-part-size                    5MB         5MB                                                                                                    Minimum part size for an S3 multipart upload
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.signer-type                                null        null
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.socket-timeout                             5.00s       5.00s
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.sse.enabled                                false       false                                                                                                  Enable S3 server side encryption
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.sse.kms-key-id                             null        null                                                                                                   KMS Key ID to use for S3 server-side encryption with KMS-managed key
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.sse.type                                   S3          S3                                                                                                     Key management type for S3 server-side encryption (S3 or KMS)
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.ssl.enabled                                true        true
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.staging-directory                          /tmp        /tmp                                                                                                   Temporary directory for staging files before uploading to S3
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.use-instance-credentials                   true        true
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.user-agent-prefix                                                                                                                                             The user agent prefix to use for S3 calls
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.metastore.uri                                 null        [thrift://hadoop01:9083]                                                                               Hive metastore URIs (comma separated)
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.metastore                                     thrift      thrift
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-add-column                              false       false                                                                                                  Allow Hive connector to add column
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-drop-column                             false       false                                                                                                  Allow Hive connector to drop column
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-drop-table                              false       true                                                                                                   Allow Hive connector to drop table
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-rename-column                           false       false                                                                                                  Allow Hive connector to rename column
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-rename-table                            false       false                                                                                                  Allow Hive connector to rename table
2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.security                                      legacy      legacy
查询:presto:default> show tables;
   Table   
-----------
 dual      
 student   
 student01 
 student02 
 student03 
 student04 
 student05 
(7 rows)

Query 20170924_141223_00007_bg85h, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:01 [7 rows, 175B] [11 rows/s, 290B/s]
presto:default> select * from student05;
 student_id | student_name |    born    | sex 
------------+--------------+------------+-----
 dlmu_01    | happyprince  | 2015-07-08 |   0 
(1 row)

Query 20170924_141229_00008_bg85h, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:01 [1 rows, 358B] [1 rows/s, 448B/s]
presto:default> show tables;
   Table   
-----------
 dual      
 student   
 student01 
 student02 
 student03 
 student04 
 student05 
(7 rows)

Query 20170924_145732_00004_uaaik, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:01 [7 rows, 175B] [9 rows/s, 238B/s]
presto:default> select * from student;
 student_id | student_name 
------------+--------------
 dlmu_01    | happyprince  
(1 row)

Query 20170924_145746_00005_uaaik, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0:05 [1 rows, 20B] [0 rows/s, 3B/s]

配置其它的连接器

查看系统的连接器数据

(这个系统带有的,不用配置)
presto> show schemas from system;

       Schema       
--------------------
 information_schema 
 jdbc               
 metadata           
 runtime            
(4 rows)

Query 20170924_154001_00002_hnbrr, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:02 [4 rows, 57B] [2 rows/s, 35B/s]
presto> show tables from system.information_schema;
          Table          
-------------------------
 __internal_partitions__ 
 columns                 
 schemata                
 table_privileges        
 tables                  
 views                   
(6 rows)

Query 20170924_154021_00003_hnbrr, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:02 [6 rows, 233B] [3 rows/s, 136B/s]
presto> select * from system.information_schema.columns;

 table_catalog |    table_schema    |       table_name        |         column_name          | ordinal_position | column_default | is_nullable |   data_type    | comment | extra_info 
---------------+--------------------+-------------------------+------------------------------+------------------+----------------+-------------+----------------+---------+------------
 system        | runtime            | queries                 | node_id                      |                1 | NULL           | YES         | varchar        | NULL    | NULL       
 system        | runtime            | queries                 | query_id                     |                2 | NULL           | YES         | varchar        | NULL    | NULL       
 system        | runtime            | queries                 | state                        |                3 | NULL           | YES         | varchar        | NULL    | NULL       
 system        | runtime            | queries                 | user                         |                4 | NULL           | YES         | varchar        | NULL    | NULL       
 system        | runtime            | queries                 | source                       |                5 | NULL           | YES         | varchar        | NULL    | NULL       

配置JMX

[hadoop@hadoop01 catalog]$ vim jmx.properties

connector.name=jmx
jmx.dump-tables=java.lang:type=Runtime,com.facebook.presto.execution.scheduler:name=NodeScheduler
jmx.dump-period=10s
jmx.max-entries=86400
presto> SHOW TABLES FROM jmx.current;
                                                                     Table                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------
 com.facebook.presto.execution.executor:name=multilevelsplitqueue                                                                                 
 com.facebook.presto.execution.executor:name=taskexecutor                                                                                         
 com.facebook.presto.execution.resourcegroups:name=internalresourcegroupmanager                                                                   
 com.facebook.presto.execution.scheduler:name=nodescheduler                                                                                       
 com.facebook.presto.execution.scheduler:name=splitschedulerstats                                                                                 
 com.facebook.presto.execution:name=queryexecution                                                                                                
 com.facebook.presto.execution:name=querymanager                                                                                                  
 com.facebook.presto.execution:name=remotetaskfactory                                                                                             
 com.facebook.presto.execution:name=taskmanager                                                                                                   
 com.facebook.presto.execution:type=queryqueue,name=global,expansion=global                                                                       
 com.facebook.presto.failuredetector:name=heartbeatfailuredetector                                                                                
 com.facebook.presto.hive.metastore:type=cachinghivemetastore,name=hive                                                                           
 com.facebook.presto.hive.metastore:type=thrifthivemetastore,name=hive                                                                            
 com.facebook.presto.hive:type=fileformatdatasourcestats,name=hive                                                                                
 com.facebook.presto.hive:type=namenodestats,name=hive                                                                                            
 com.facebook.presto.hive:type=prestos3filesystem,name=hive                                                                                       
 com.facebook.presto.memory:name=clustermemorymanager                                                                                             
 com.facebook.presto.memory:type=clustermemorypool,name=general                                                                                   
 com.facebook.presto.memory:type=clustermemorypool,name=reserved                                                                                  
 com.facebook.presto.memory:type=clustermemorypool,name=system                                                                                    
 com.facebook.presto.memory:type=memorypool,name=general                                                                                          
 com.facebook.presto.memory:type=memorypool,name=reserved                                                                                         
 com.facebook.presto.memory:type=memorypool,name=system                                                                                           
 com.facebook.presto.metadata:name=discoverynodemanager                                                                                           
 com.facebook.presto.operator.index:name=indexjoinlookupstats                                                                                     
 com.facebook.presto.security:name=accesscontrolmanager                                                                                           
 com.facebook.presto.server.remotetask:name=remotetaskstats                                                                                       
 com.facebook.presto.server:name=asynchttpexecutionmbean                                                                                          
 com.facebook.presto.server:name=exchangeexecutionmbean                                                                                           
 com.facebook.presto.server:name=taskresource                                                                                                     
 com.facebook.presto.spiller:name=spillerfactory                                                                                                  
 com.facebook.presto.sql.gen:name=expressioncompiler                                                                                              
 com.facebook.presto.sql.gen:name=joincompiler                                                                                                    
 com.facebook.presto.sql.gen:name=joinfilterfunctioncompiler                                                                                      
 com.facebook.presto.sql.gen:name=joinprobecompiler                                                                                               
 com.facebook.presto.sql.gen:name=orderingcompiler                                                                                                
 com.facebook.presto.sql.gen:name=pagefunctioncompiler                                                                                            
 com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=addintermediateaggregations                                               
 com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=aggregationexpressionrewrite                                              
 com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=applyexpressionrewrite                                                    
 com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=createpartialtopn                                                         
 com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=eliminatecrossjoins                                                       

Query 20170924_154511_00014_hnbrr, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:03 [177 rows, 15.7KB] [67 rows/s, 5.98KB/s]
presto> SELECT node, vmname, vmversion
     -> FROM jmx.current."java.lang:type=runtime";
 node  |              vmname               | vmversion  
-------+-----------------------------------+------------
 12345 | Java HotSpot(TM) 64-Bit Server VM | 25.144-b01 
 111   | Java HotSpot(TM) 64-Bit Server VM | 25.144-b01 
 222   | Java HotSpot(TM) 64-Bit Server VM | 25.144-b01 
(3 rows)

Query 20170924_154519_00015_hnbrr, FINISHED, 3 nodes
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 140B] [11 rows/s, 523B/s]
presto> SELECT openfiledescriptorcount, maxfiledescriptorcount
     -> FROM jmx.current."java.lang:type=operatingsystem";
 openfiledescriptorcount | maxfiledescriptorcount 
-------------------------+------------------------
                     978 |                   4096 
                     919 |                   4096 
                     925 |                   4096 
(3 rows)

Query 20170924_154527_00016_hnbrr, FINISHED, 3 nodes
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 48B] [8 rows/s, 130B/s]

配置本地文件连接【Local File Connector】

The local file connector allows querying data stored on the local file system of each worker
本地文件连接器,允许查询每个节点的数据文件存储的数据。
创建文件:etc/catalog/localfile.properties

connector.name=localfile

配置属性

Property Name Description
presto-logs.http-request-log.location Directory or file where HTTP request logs are written
presto-logs.http-request-log.pattern If the log location is a directory this glob is used to match file names in the directory
Local File Connector Schemas and Tables
The local file connector provides a single schema named logs. You can see all the available tables by running SHOW TABLES:
SHOW TABLES FROM localfile.logs;

http_request_log
This table contains the HTTP request logs from each node on the cluster.

测试结果:

presto> show schemas from localfile;
       Schema       
--------------------
 information_schema 
 logs               
(2 rows)

Query 20170924_154211_00006_hnbrr, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [2 rows, 32B] [4 rows/s, 79B/s]
presto> show tables from localfile.logs;
     Table       
------------------
 http_request_log 
(1 row)

Query 20170924_154222_00007_hnbrr, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 30B] [2 rows/s, 89B/s]
presto> show tables from localfile.information_schema;
       Table          
-------------------------
 __internal_partitions__ 
 columns                 
 schemata                
 table_privileges        
 tables                  
 views                   
(6 rows)

Query 20170924_154232_00008_hnbrr, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [7 rows, 263B] [19 rows/s, 728B/s]
presto> select * from localfile.information_schema.tables;
 table_catalog |    table_schema    |       table_name        | table_type 
---------------+--------------------+-------------------------+------------
 localfile     | information_schema | columns                 | BASE TABLE 
 localfile     | information_schema | tables                  | BASE TABLE 
 localfile     | information_schema | views                   | BASE TABLE 
 localfile     | information_schema | schemata                | BASE TABLE 
 localfile     | information_schema | __internal_partitions__ | BASE TABLE 
 localfile     | information_schema | table_privileges        | BASE TABLE 
 localfile     | logs               | http_request_log        | BASE TABLE 
(7 rows)

Query 20170924_154251_00009_hnbrr, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [7 rows, 466B] [32 rows/s, 2.11KB/s]

配置一个mysql测试

[hadoop@hadoop01 catalog]$ vim mysql.properties

配置内容为

connector.name=mysql
connection-url=jdbc:mysql://hadoop01:3306
connection-user=root
connection-password=AAAaaa111

结果测试显示:

presto> SHOW SCHEMAS FROM mysql;
     Schema       
--------------------
 hive               
 information_schema 
 test               
(3 rows)

Query 20170924_152408_00007_r5qdw, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [3 rows, 41B] [10 rows/s, 142B/s]
presto> SHOW tables FROM mysql.hive;
        Table           
---------------------------
 bucketing_cols            
 cds                       
 columns_v2                
 database_params           
 dbs                       
 func_ru                   
 funcs                     
 global_privs              
 part_col_stats            
 partition_key_vals        
 partition_keys            
 partition_params          
 partitions                
 roles                     
 sd_params                 
 sds                       
 sequence_table            
 serde_params              
 serdes                    
 skewed_col_names          
 skewed_col_value_loc_map  
 skewed_string_list        
 skewed_string_list_values 
 skewed_values             
 sort_cols                 
 tab_col_stats             
 table_params              
 tbls                      
 version                   
(29 rows)

Query 20170924_152420_00008_r5qdw, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [29 rows, 737B] [66 rows/s, 1.66KB/s]
presto> desc mysql.hive.version;
    Column      |     Type     | Extra | Comment 
-----------------+--------------+-------+---------
 ver_id          | bigint       |       |         
 schema_version  | varchar(127) |       |         
 version_comment | varchar(255) |       |         

(3 rows)

Query 20170924_152526_00009_r5qdw, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:01 [3 rows, 215B] [2 rows/s, 173B/s]
presto> select * from mysql.hive.version;
 ver_id | schema_version |             version_comment             
--------+----------------+-----------------------------------------
      1 | 1.2.0          | Set by MetaStore [email protected] 
(1 row)

Query 20170924_152559_00010_r5qdw, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:02 [1 rows, 0B] [0 rows/s, 0B/s]

应用mysql局限性
大数据学习[09]:presto0.184集群|多数据源|问题_第3张图片

问题

错误01

java.lang.IllegalArgumentException: No factory for connector hive
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
    at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:170)
    at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:99)
    at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:77)
    at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:120)
    at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:67)

这个因为hive的名字写错了.记得要写hive-hadoop2这个名字不能乱起的哈,其它名字presto不认识的。

错误02

2017-09-24T22:25:57.586+0800    ERROR   main    com.facebook.presto.server.PrestoServer Unable to create injector, see the following errors:

1) Error: Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)

2) Configuration property 'ordinator' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)

2 errors
com.google.inject.CreationException: Unable to create injector, see the following errors:

1) Error: Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)

2) Configuration property 'ordinator' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)

2 errors
    at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:466)
    at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
    at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
    at com.google.inject.Guice.createInjector(Guice.java:96)
    at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:242)
    at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:116)
    at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:67)

修改一下节点的名字,节点名字写得有问题;集群用一串英文,节点ID用数字;或属性名字写错。要检查配置文件中属性名是否写错或写漏字母。实践过程中,这个是因为我复制配置时,漏了一个字母造成的。

错误3

2017-09-24T22:40:08.513+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Cannot connect to discovery server for announce: Announcement failed for http://192.169.137.101:8080
2017-09-24T22:40:08.514+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 1.16s. Next request will happen within 0.00s
2017-09-24T22:40:09.278+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 744.71ms. Next request will happen within 1.00ms
2017-09-24T22:40:09.539+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 258.22ms. Next request will happen within 2.00ms
2017-09-24T22:40:10.282+0800    ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 739.25ms. Next request will happen within 4.00ms
2017-09-24T22:40:10.515+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 223.25ms. Next request will happen within 8.00ms
2017-09-24T22:40:10.562+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 30.86ms. Next request will happen within 16.00ms
2017-09-24T22:40:11.293+0800    ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 698.12ms. Next request will happen within 32.00ms
2017-09-24T22:40:11.534+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 175.65ms. Next request will happen within 64.00ms
2017-09-24T22:40:12.362+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 699.87ms. Next request will happen within 128.00ms
2017-09-24T22:40:12.665+0800    ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 45.73ms. Next request will happen within 256.00ms
^C2017-09-24T22:40:13.600+0800  ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 422.26ms. Next request will happen within 512.00ms
2017-09-24T22:40:14.667+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 66.66ms. Next request will happen within 1000.00ms
^C2017-09-24T22:40:16.182+0800  ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 514.05ms. Next request will happen within 1000.00ms
^V2017-09-24T22:40:17.604+0800  ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 421.29ms. Next request will happen within 1000.00ms

把IP修改成域名或主机名就可以了, 用IP不识别;

另外对重启了一下主结点的情况,其它节点会显示,先是找不到,然后就恢复了:
大数据学习[09]:presto0.184集群|多数据源|问题_第4张图片
配置好的presto引擎,不论源数据是什么形式的数据都可以由这引擎去查询,对于windows的界面像oracle sql/pl那样的查询,可以考虑一下这个工具:
连接Presto:SQuirrel SQL Client安装配置:
http://blog.csdn.net/ld326/article/details/77987507

【作者:happyprince, http://blog.csdn.net/ld326/article/details/78088863】

你可能感兴趣的:(presto,hadoop生态圈)