Greenplum中查询数据

1 查询数据
1.1 系统数据字典
1) 所有的系统数据字典存放在pg_catalog模式下;
2) 标准的PostgreSQL系统数据字典是以pg_*开头;
3) GP特定的数据字典是以gp_*开头;

devdw=# select * from gp_configuration;
 content | definedprimary | dbid | isprimary | valid | hostname | port | datadir 
---------+----------------+------+-----------+-------+----------+------+---------
(0 rows)
devdw=# select * from gp_pgdatabase;
 dbid | isprimary | content | valid | definedprimary 
------+-----------+---------+-------+----------------
    1 | t         |      -1 | t     | t
    2 | t         |       0 | t     | t
    3 | t         |       1 | t     | t
(3 rows)

4) 在PSQL中列出所有的系统字典:\dtS,列出所有的系统视图:\dvS

devdw=# \dtS     列出所有的系统字典表
                                            List of relations
   Schema   |                              Name                               | Type  |  Owner  | Storage 
------------+-----------------------------------------------------------------+-------+---------+---------
 pg_catalog | gp_configuration                                                | table | gpadmin | heap
 pg_catalog | gp_configuration_history                                        | table | gpadmin | heap
 pg_catalog | gp_db_interfaces                                                | table | gpadmin | heap
 pg_catalog | gp_distribution_policy                                          | table | gpadmin | heap
 pg_catalog | gp_fastsequence                                                 | table | gpadmin | heap
 pg_catalog | gp_fault_strategy                                               | table | gpadmin | heap
 pg_catalog | gp_global_sequence                                              | table | gpadmin | heap
 pg_catalog | gp_id                                                           | table | gpadmin | heap
 pg_catalog | gp_interfaces                                                   | table | gpadmin | heap
 pg_catalog | gp_master_mirroring                                             | table | gpadmin | heap
 pg_catalog | gp_persistent_database_node                                     | table | gpadmin | heap
 pg_catalog | gp_persistent_filespace_node                                    | table | gpadmin | heap
 pg_catalog | gp_persistent_relation_node                                     | table | gpadmin | heap
 pg_catalog | gp_persistent_tablespace_node                                   | table | gpadmin | heap
 pg_catalog | gp_relation_node                                                | table | gpadmin | heap
 pg_catalog | gp_san_configuration                                            | table | gpadmin | heap
 pg_catalog | gp_segment_configuration                                        | table | gpadmin | heap
 pg_catalog | gp_verification_history                                         | table | gpadmin | heap
 pg_catalog | gp_version_at_initdb                                            | table | gpadmin | heap
 pg_catalog | pg_aggregate                                                    | table | gpadmin | heap
 pg_catalog | pg_am                                                           | table | gpadmin | heap
 pg_catalog | pg_amop                                                         | table | gpadmin | heap
 pg_catalog | pg_amproc                                                       | table | gpadmin | heap
 pg_catalog | pg_appendonly                                                   | table | gpadmin | heap
 pg_catalog | pg_appendonly_alter_column                                      | table | gpadmin | heap
 pg_catalog | pg_attrdef                                                      | table | gpadmin | heap
 pg_catalog | pg_attribute                                                    | table | gpadmin | heap
 pg_catalog | pg_attribute_encoding                                           | table | gpadmin | heap
 pg_catalog | pg_auth_members                                                 | table | gpadmin | heap
 pg_catalog | pg_auth_time_constraint                                         | table | gpadmin | heap
 pg_catalog | pg_authid                                                       | table | gpadmin | heap
 pg_catalog | pg_autovacuum                                                   | table | gpadmin | heap
 pg_catalog | pg_cast                                                         | table | gpadmin | heap
 pg_catalog | pg_class                                                        | table | gpadmin | heap
 pg_catalog | pg_compression                                                  | table | gpadmin | heap
 pg_catalog | pg_constraint                                                   | table | gpadmin | heap
 pg_catalog | pg_conversion                                                   | table | gpadmin | heap
 pg_catalog | pg_database                                                     | table | gpadmin | heap
 pg_catalog | pg_depend                                                       | table | gpadmin | heap
 pg_catalog | pg_description                                                  | table | gpadmin | heap
 pg_catalog | pg_extprotocol                                                  | table | gpadmin | heap
 pg_catalog | pg_exttable                                                     | table | gpadmin | heap
 pg_catalog | pg_filespace                                                    | table | gpadmin | heap
 pg_catalog | pg_filespace_entry                                              | table | gpadmin | heap
 pg_catalog | pg_foreign_data_wrapper                                         | table | gpadmin | heap
 pg_catalog | pg_foreign_server                                               | table | gpadmin | heap
 pg_catalog | pg_foreign_table                                                | table | gpadmin | heap
 pg_catalog | pg_index                                                        | table | gpadmin | heap
 pg_catalog | pg_inherits                                                     | table | gpadmin | heap
 pg_catalog | pg_language                                                     | table | gpadmin | heap
 pg_catalog | pg_largeobject                                                  | table | gpadmin | heap
 pg_catalog | pg_listener                                                     | table | gpadmin | heap
 pg_catalog | pg_namespace                                                    | table | gpadmin | heap
 pg_catalog | pg_opclass                                                      | table | gpadmin | heap
 pg_catalog | pg_operator                                                     | table | gpadmin | heap
 pg_catalog | pg_partition                                                    | table | gpadmin | heap
 pg_catalog | pg_partition_encoding                                           | table | gpadmin | heap
 pg_catalog | pg_partition_rule                                               | table | gpadmin | heap
 pg_catalog | pg_pltemplate                                                   | table | gpadmin | heap
 pg_catalog | pg_proc                                                         | table | gpadmin | heap
 pg_catalog | pg_proc_callback                                                | table | gpadmin | heap
 pg_catalog | pg_resourcetype                                                 | table | gpadmin | heap
 pg_catalog | pg_resqueue                                                     | table | gpadmin | heap
 pg_catalog | pg_resqueuecapability                                           | table | gpadmin | heap
 pg_catalog | pg_rewrite                                                      | table | gpadmin | heap
 pg_catalog | pg_shdepend                                                     | table | gpadmin | heap
 pg_catalog | pg_shdescription                                                | table | gpadmin | heap
 pg_catalog | pg_stat_last_operation                                          | table | gpadmin | heap
 pg_catalog | pg_stat_last_shoperation                                        | table | gpadmin | heap
 pg_catalog | pg_statistic                                                    | table | gpadmin | heap
 pg_catalog | pg_tablespace                                                   | table | gpadmin | heap
 pg_catalog | pg_trigger                                                      | table | gpadmin | heap
 pg_catalog | pg_type                                                         | table | gpadmin | heap
 pg_catalog | pg_type_encoding                                                | table | gpadmin | heap
 pg_catalog | pg_user_mapping                                                 | table | gpadmin | heap
 pg_catalog | pg_window                                                       | table | gpadmin | heap

devdw=# \dvS   列出所有的系统视图
                          List of relations
   Schema   |             Name             | Type |  Owner  | Storage 
------------+------------------------------+------+---------+---------
 pg_catalog | gp_distributed_log           | view | gpadmin | none
 pg_catalog | gp_distributed_xacts         | view | gpadmin | none
 pg_catalog | gp_pgdatabase                | view | gpadmin | none
 pg_catalog | gp_transaction_log           | view | gpadmin | none
 pg_catalog | pg_cursors                   | view | gpadmin | none
 pg_catalog | pg_group                     | view | gpadmin | none
 pg_catalog | pg_indexes                   | view | gpadmin | none
 pg_catalog | pg_locks                     | view | gpadmin | none
 pg_catalog | pg_max_external_files        | view | gpadmin | none
 pg_catalog | pg_partition_columns         | view | gpadmin | none
 pg_catalog | pg_partition_templates       | view | gpadmin | none
 pg_catalog | pg_partitions                | view | gpadmin | none
 pg_catalog | pg_prepared_statements       | view | gpadmin | none
 pg_catalog | pg_prepared_xacts            | view | gpadmin | none
 pg_catalog | pg_resqueue_attributes       | view | gpadmin | none
 pg_catalog | pg_resqueue_status           | view | gpadmin | none
 pg_catalog | pg_roles                     | view | gpadmin | none
 pg_catalog | pg_rules                     | view | gpadmin | none
 pg_catalog | pg_settings                  | view | gpadmin | none
 pg_catalog | pg_shadow                    | view | gpadmin | none
 pg_catalog | pg_stat_activity             | view | gpadmin | none
 pg_catalog | pg_stat_all_indexes          | view | gpadmin | none
 pg_catalog | pg_stat_all_tables           | view | gpadmin | none
 pg_catalog | pg_stat_database             | view | gpadmin | none
 pg_catalog | pg_stat_operations           | view | gpadmin | none
 pg_catalog | pg_stat_partition_operations | view | gpadmin | none
 pg_catalog | pg_stat_resqueues            | view | gpadmin | none
 pg_catalog | pg_stat_sys_indexes          | view | gpadmin | none
 pg_catalog | pg_stat_sys_tables           | view | gpadmin | none
 pg_catalog | pg_stat_user_indexes         | view | gpadmin | none
 pg_catalog | pg_stat_user_tables          | view | gpadmin | none
 pg_catalog | pg_statio_all_indexes        | view | gpadmin | none
 pg_catalog | pg_statio_all_sequences      | view | gpadmin | none
 pg_catalog | pg_statio_all_tables         | view | gpadmin | none
 pg_catalog | pg_statio_sys_indexes        | view | gpadmin | none
 pg_catalog | pg_statio_sys_sequences      | view | gpadmin | none
 pg_catalog | pg_statio_sys_tables         | view | gpadmin | none
 pg_catalog | pg_statio_user_indexes       | view | gpadmin | none
 pg_catalog | pg_statio_user_sequences     | view | gpadmin | none
 pg_catalog | pg_statio_user_tables        | view | gpadmin | none
 pg_catalog | pg_stats                     | view | gpadmin | none
 pg_catalog | pg_tables                    | view | gpadmin | none
 pg_catalog | pg_timezone_abbrevs          | view | gpadmin | none
 pg_catalog | pg_timezone_names            | view | gpadmin | none
 pg_catalog | pg_user                      | view | gpadmin | none
 pg_catalog | pg_user_mappings             | view | gpadmin | none
 pg_catalog | pg_views                     | view | gpadmin | none
(47 rows)

1.2 SQL语法
详情请参考【GP SQL Language.zip】

1.3 数据类型
1.3.1 常见数据类型
1) 字符类型:CHAR,VARCHAR,TEXT
2) 数值类型:Smallint ,integer,bigint
Numeric, real,double precision
3) 日期类型:Timestamp,date,time
4) 布尔类型:Boolean
5) Array 类型:如 integer[]

1.3.2 其它数据类型
详情请参考【Data Types.zip】

1.4 SQL值表达式
1.4.1 概述
1) 值表达式包括查询条件中的各种命令;
2) 值表达式的结果称为标量;
3) 表达式语法允许从原始值通过算术、逻辑、聚合等操作得出结果值;
1.4.2 值表达式的类型
1) 列的引用:方式:correlation.columnname,例如:tb01.a
2) 位置参数引用:用于标识从外部给一个SQL语句的参数,参数用于SQL函数定义语句,格式 number 1
3) 下标表达式:用于标识数组类型的值;格式:expression[subscript];
4) 字段选择表达式:若一个表达式产生了一个混合类型的值,则指定行的字段可以用这样的表达式获取:expression.fieldname;例如:$1.mycolumn;
5) 运算符调用:分为三种:
a) Expression operator expression(二元中间运算符);
b) Operator expression(一元前缀运算符);
c) Expression operator(一元后缀运算符);
6) 函数调用:函数名后接相关参数,格式:function(expression[,expression…]),例如使用md5加密算法对abc字符串加密,可以使用md5(‘abc’);
7) 聚合表达式:将聚合函数作用在一些查询的行上面的处理,格式:aggregate_name(expression [,…]) [FILTER (WHERE condition)],例如:count(*);
8) 窗口表达式
使用标准SQL命令构造复杂的在线分析处理(OLAP)查询;
窗口表达式在窗口构架(OVER子句)的基础上应用窗口函数;
语法:window_function([expression[, …]]) OVER(window_spec)
窗口规范的特征:
PARTITION BY子句,决定窗口分类标准
ORDER BY 子句,在窗口排序规则
ROWS/RANGE子句用以定义窗口框架
例如:select distinct id,sum(year) over (partition by id) from tb_cp_02;

devdw=# select distinct id,sum(year) over (partition by id) from tb_cp_02;
 id | sum 
----+-----
  1 |   6
  2 |   8
  5 |  14
(3 rows)

9) 类型转换
一种数据类型转换为另一种。两种方式:

CAST( expression AS type)
expression::type

例如:

2014-01-01::date

10) 标量子查询
一个括号中的普通SELECT查询返回单行单列的结果。
例如:

select id, (select oid from pg_class where relname=‘tb_cp_02’) as oid from tb_cp_02;

11) 关联子查询
使用其他查询结果来组建结果。
例如,

select * from tb01 where exists (select 1 from tb03 where tb01.a=tb03.a);

GP执行关联子查询有两种方式:
1、关联子查询可以被拆解为关联(JOIN)操作
2、关联子查询为外部的查询每行执行一次
12) 数组构造函数
从一系列值构造一个数组作为自身元素的表达式。例如,

CREATE TABLE tb_ar_01(a int[], b int[]);
INSERT INTO tb_ar_01 VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);

13) 行构造函数
从一系列的值构建一个作为自身行的值的表达式。
例如:

select row(1,2.5,‘this is a test’);

1.5 使用函数和运算符
1.5.1 函数类型
1) 不变型函数
GP完全支持所有类型的不变形函数;
不变形函数仅仅依赖于直接传递的参数列表,在给定参数的情况下总是得到相同的返回值。
例如:

mod(10,3)

2) 稳定型函数
返回值依赖于数据库查询或者参数值的函数都属于稳定型。
例如,to_char(), current_date
3) 不稳定型函数
即使是单表的扫描,函数值也可能发生变化。
例如,random(),setval()
注意:1、任何含有SQL语句或修改数据库的不稳定函数不可以在segment上执行
2、不稳定型和稳定型可以安全的在Master上使用
4) 日期函数

extract(day|month|year from  date2013-01-01’);
date '2013-01-01' + interval '1 day'
date_part('day', timestamp '2013-01-16 20:38:40');
date_trunc('hour', timestamp '2013-02-16 20:38:40'); 
pg_sleep(seconds);

5) 系统日期变量

current_date, current_time, now(), timeofday()

6) 数据类型格式化函数

to_char(125,’999), to_timestamp()

7) 字符串函数

substr, trim, length, lpad, replace, upper, position, ||

8) 模式匹配
like,similar to正则表达式,~正则表达式
9) 条件表达式

case, coalesce, nullif, greatest

10) 其他表达式

exists, in, not in, all, any, sum, min

11) 窗口函数

ROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY expr )
RANK () OVER ( [PARTITION BY expr] ORDER BY expr ) 
FIRST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr [ROWS|RANGE frame_expr] )

12) 高级分析函数
GP提供了一些在PostgreSQL中没有的高级分析函数。例如,

sum(array[])
matrix_transpose(array[[1,1,1],[2,2,2]])
mregr_tstats(y, array[1, x1, x2])

1.5.2 自定义函数
1) GP支持SQL/PYTHON/PERL/C语言构建函数,以下着重介绍SQL 存储过程。
一个存储过程就是一个事务,包括对子过程的调用都在一个事务内
存储过程结构:

CREATE OR REPLACE FUNCTION increment(i integer) RETURNS integer AS $$
	DECLARE 
		j int:=100;
        BEGIN
                RETURN i * j;
        END;
$$ LANGUAGE plpgsql;

赋值
给一个变量或行/记录赋值用下面方法:identifier := expression
例子:

user_id := 20;

执行一个没有结果的查询:

PERFORM query;

例子:

PERFORM create_mv('cs_session_page_requests_mv', my_query);

2) 动态SQL

 EXECUTE command-string [INTO [STRICT] target];

3) SELECT INTO
例子:

SELECT  ID INTO VAR_ID FROM TABLEA

4) 获取结果状态

GET DIAGNOSTICS variable = item [, ...];

例子:

GET DIAGNOSTICS integer_var = ROW_COUNT;

5) 控制结构

IF ... THEN ... ELSEIF ... THEN ... ELSE
LOOP, EXIT, CONTINUE, WHILE, FOR 

6) 从函数返回
有两个命令可以用来从函数中返回数据:RETURN 和 RETURN NEXT 。

Syntax:RETURN expression;

7) 设置回调

EXEC SQL WHENEVER condition action;

condition 可以是下列之一:

SQLERROR,SQLWARNING,NOT FOUND

action 可以是下列之一:

CONTINUE, BREAK, STOP, GO TO

8) 异常处理

EXECUTE 
EXCEPTION WHEN unique_violation 
THEN 
-- do nothing 
END;

忽略错误:

EXCEPTION  WHEN OTHERS THEN
            RAISE NOTICE 'an EXCEPTION is about to be raised';
            RAISE EXCEPTION 'NUM:%, DETAILS:%', SQLSTATE, SQLERRM;
END;

9) 错误和消息

RAISE level 'format' [, expression [, ...]];

Level:

Info:信息输入
Notice:信息提示
Exception:产生一个例外,将退出存储过程
Example: RAISE NOTICE 'Calling cs_create_job(%)', v_job_id;

1.5.3 查询性能
1) GP支持动态分区消除和查询内存优化;
2) GP在查询优化时为不同的操作动态的消除不相关分区和分配内存;
3) 动态分区消去
在GP中,运行的值只能用于在内部动态的减少分区;
需要设置Server参数gp_dynamic_partition_pruning缺省设置为on
4) 内存优化
5) GP为一个查询中不同的操作以最优的方式分配内存;
在单个查询的不同阶段快速地释放和重分配内存。
1.5.4 查询分析
1) GP为每一个查询设计出一个查询计划;
2) 确保查询和数据结构选择了正确的查询计划;
3) 通过检查查询计划,帮助确定调优方案;
4) 优化器根据统计信息来选择尽可能成本低的查询计划
5) 成本是对I/O和CPU消耗的衡量(获取磁盘页的数量)
6) 使用EXPLAIN来获取评估的查询计划,例如:

EXPLAIN SELECT * FROM tb_cp_02 WHERE date=‘2013-01-01’;

7) EXPLAIN ANALYZE会真正的执行语句,获取实际的查询执行情况。例如:

EXPLAIN ANALYZE SELECT * FROM tb_cp_02 WHERE date=‘2013-01-01’;

你可能感兴趣的:(Big,Data)