1 查询数据
1.1 系统数据字典
1) 所有的系统数据字典存放在pg_catalog模式下;
2) 标准的PostgreSQL系统数据字典是以pg_*开头;
3) GP特定的数据字典是以gp_*开头;
devdw=# select * from gp_configuration;
content | definedprimary | dbid | isprimary | valid | hostname | port | datadir
---------+----------------+------+-----------+-------+----------+------+---------
(0 rows)
devdw=# select * from gp_pgdatabase;
dbid | isprimary | content | valid | definedprimary
------+-----------+---------+-------+----------------
1 | t | -1 | t | t
2 | t | 0 | t | t
3 | t | 1 | t | t
(3 rows)
4) 在PSQL中列出所有的系统字典:\dtS,列出所有的系统视图:\dvS
devdw=# \dtS 列出所有的系统字典表
List of relations
Schema | Name | Type | Owner | Storage
------------+-----------------------------------------------------------------+-------+---------+---------
pg_catalog | gp_configuration | table | gpadmin | heap
pg_catalog | gp_configuration_history | table | gpadmin | heap
pg_catalog | gp_db_interfaces | table | gpadmin | heap
pg_catalog | gp_distribution_policy | table | gpadmin | heap
pg_catalog | gp_fastsequence | table | gpadmin | heap
pg_catalog | gp_fault_strategy | table | gpadmin | heap
pg_catalog | gp_global_sequence | table | gpadmin | heap
pg_catalog | gp_id | table | gpadmin | heap
pg_catalog | gp_interfaces | table | gpadmin | heap
pg_catalog | gp_master_mirroring | table | gpadmin | heap
pg_catalog | gp_persistent_database_node | table | gpadmin | heap
pg_catalog | gp_persistent_filespace_node | table | gpadmin | heap
pg_catalog | gp_persistent_relation_node | table | gpadmin | heap
pg_catalog | gp_persistent_tablespace_node | table | gpadmin | heap
pg_catalog | gp_relation_node | table | gpadmin | heap
pg_catalog | gp_san_configuration | table | gpadmin | heap
pg_catalog | gp_segment_configuration | table | gpadmin | heap
pg_catalog | gp_verification_history | table | gpadmin | heap
pg_catalog | gp_version_at_initdb | table | gpadmin | heap
pg_catalog | pg_aggregate | table | gpadmin | heap
pg_catalog | pg_am | table | gpadmin | heap
pg_catalog | pg_amop | table | gpadmin | heap
pg_catalog | pg_amproc | table | gpadmin | heap
pg_catalog | pg_appendonly | table | gpadmin | heap
pg_catalog | pg_appendonly_alter_column | table | gpadmin | heap
pg_catalog | pg_attrdef | table | gpadmin | heap
pg_catalog | pg_attribute | table | gpadmin | heap
pg_catalog | pg_attribute_encoding | table | gpadmin | heap
pg_catalog | pg_auth_members | table | gpadmin | heap
pg_catalog | pg_auth_time_constraint | table | gpadmin | heap
pg_catalog | pg_authid | table | gpadmin | heap
pg_catalog | pg_autovacuum | table | gpadmin | heap
pg_catalog | pg_cast | table | gpadmin | heap
pg_catalog | pg_class | table | gpadmin | heap
pg_catalog | pg_compression | table | gpadmin | heap
pg_catalog | pg_constraint | table | gpadmin | heap
pg_catalog | pg_conversion | table | gpadmin | heap
pg_catalog | pg_database | table | gpadmin | heap
pg_catalog | pg_depend | table | gpadmin | heap
pg_catalog | pg_description | table | gpadmin | heap
pg_catalog | pg_extprotocol | table | gpadmin | heap
pg_catalog | pg_exttable | table | gpadmin | heap
pg_catalog | pg_filespace | table | gpadmin | heap
pg_catalog | pg_filespace_entry | table | gpadmin | heap
pg_catalog | pg_foreign_data_wrapper | table | gpadmin | heap
pg_catalog | pg_foreign_server | table | gpadmin | heap
pg_catalog | pg_foreign_table | table | gpadmin | heap
pg_catalog | pg_index | table | gpadmin | heap
pg_catalog | pg_inherits | table | gpadmin | heap
pg_catalog | pg_language | table | gpadmin | heap
pg_catalog | pg_largeobject | table | gpadmin | heap
pg_catalog | pg_listener | table | gpadmin | heap
pg_catalog | pg_namespace | table | gpadmin | heap
pg_catalog | pg_opclass | table | gpadmin | heap
pg_catalog | pg_operator | table | gpadmin | heap
pg_catalog | pg_partition | table | gpadmin | heap
pg_catalog | pg_partition_encoding | table | gpadmin | heap
pg_catalog | pg_partition_rule | table | gpadmin | heap
pg_catalog | pg_pltemplate | table | gpadmin | heap
pg_catalog | pg_proc | table | gpadmin | heap
pg_catalog | pg_proc_callback | table | gpadmin | heap
pg_catalog | pg_resourcetype | table | gpadmin | heap
pg_catalog | pg_resqueue | table | gpadmin | heap
pg_catalog | pg_resqueuecapability | table | gpadmin | heap
pg_catalog | pg_rewrite | table | gpadmin | heap
pg_catalog | pg_shdepend | table | gpadmin | heap
pg_catalog | pg_shdescription | table | gpadmin | heap
pg_catalog | pg_stat_last_operation | table | gpadmin | heap
pg_catalog | pg_stat_last_shoperation | table | gpadmin | heap
pg_catalog | pg_statistic | table | gpadmin | heap
pg_catalog | pg_tablespace | table | gpadmin | heap
pg_catalog | pg_trigger | table | gpadmin | heap
pg_catalog | pg_type | table | gpadmin | heap
pg_catalog | pg_type_encoding | table | gpadmin | heap
pg_catalog | pg_user_mapping | table | gpadmin | heap
pg_catalog | pg_window | table | gpadmin | heap
devdw=# \dvS 列出所有的系统视图
List of relations
Schema | Name | Type | Owner | Storage
------------+------------------------------+------+---------+---------
pg_catalog | gp_distributed_log | view | gpadmin | none
pg_catalog | gp_distributed_xacts | view | gpadmin | none
pg_catalog | gp_pgdatabase | view | gpadmin | none
pg_catalog | gp_transaction_log | view | gpadmin | none
pg_catalog | pg_cursors | view | gpadmin | none
pg_catalog | pg_group | view | gpadmin | none
pg_catalog | pg_indexes | view | gpadmin | none
pg_catalog | pg_locks | view | gpadmin | none
pg_catalog | pg_max_external_files | view | gpadmin | none
pg_catalog | pg_partition_columns | view | gpadmin | none
pg_catalog | pg_partition_templates | view | gpadmin | none
pg_catalog | pg_partitions | view | gpadmin | none
pg_catalog | pg_prepared_statements | view | gpadmin | none
pg_catalog | pg_prepared_xacts | view | gpadmin | none
pg_catalog | pg_resqueue_attributes | view | gpadmin | none
pg_catalog | pg_resqueue_status | view | gpadmin | none
pg_catalog | pg_roles | view | gpadmin | none
pg_catalog | pg_rules | view | gpadmin | none
pg_catalog | pg_settings | view | gpadmin | none
pg_catalog | pg_shadow | view | gpadmin | none
pg_catalog | pg_stat_activity | view | gpadmin | none
pg_catalog | pg_stat_all_indexes | view | gpadmin | none
pg_catalog | pg_stat_all_tables | view | gpadmin | none
pg_catalog | pg_stat_database | view | gpadmin | none
pg_catalog | pg_stat_operations | view | gpadmin | none
pg_catalog | pg_stat_partition_operations | view | gpadmin | none
pg_catalog | pg_stat_resqueues | view | gpadmin | none
pg_catalog | pg_stat_sys_indexes | view | gpadmin | none
pg_catalog | pg_stat_sys_tables | view | gpadmin | none
pg_catalog | pg_stat_user_indexes | view | gpadmin | none
pg_catalog | pg_stat_user_tables | view | gpadmin | none
pg_catalog | pg_statio_all_indexes | view | gpadmin | none
pg_catalog | pg_statio_all_sequences | view | gpadmin | none
pg_catalog | pg_statio_all_tables | view | gpadmin | none
pg_catalog | pg_statio_sys_indexes | view | gpadmin | none
pg_catalog | pg_statio_sys_sequences | view | gpadmin | none
pg_catalog | pg_statio_sys_tables | view | gpadmin | none
pg_catalog | pg_statio_user_indexes | view | gpadmin | none
pg_catalog | pg_statio_user_sequences | view | gpadmin | none
pg_catalog | pg_statio_user_tables | view | gpadmin | none
pg_catalog | pg_stats | view | gpadmin | none
pg_catalog | pg_tables | view | gpadmin | none
pg_catalog | pg_timezone_abbrevs | view | gpadmin | none
pg_catalog | pg_timezone_names | view | gpadmin | none
pg_catalog | pg_user | view | gpadmin | none
pg_catalog | pg_user_mappings | view | gpadmin | none
pg_catalog | pg_views | view | gpadmin | none
(47 rows)
1.2 SQL语法
详情请参考【GP SQL Language.zip】
1.3 数据类型
1.3.1 常见数据类型
1) 字符类型:CHAR,VARCHAR,TEXT
2) 数值类型:Smallint ,integer,bigint
Numeric, real,double precision
3) 日期类型:Timestamp,date,time
4) 布尔类型:Boolean
5) Array 类型:如 integer[]
1.3.2 其它数据类型
详情请参考【Data Types.zip】
1.4 SQL值表达式
1.4.1 概述
1) 值表达式包括查询条件中的各种命令;
2) 值表达式的结果称为标量;
3) 表达式语法允许从原始值通过算术、逻辑、聚合等操作得出结果值;
1.4.2 值表达式的类型
1) 列的引用:方式:correlation.columnname,例如:tb01.a
2) 位置参数引用:用于标识从外部给一个SQL语句的参数,参数用于SQL函数定义语句,格式 number;例如: 1
3) 下标表达式:用于标识数组类型的值;格式:expression[subscript];
4) 字段选择表达式:若一个表达式产生了一个混合类型的值,则指定行的字段可以用这样的表达式获取:expression.fieldname;例如:$1.mycolumn;
5) 运算符调用:分为三种:
a) Expression operator expression(二元中间运算符);
b) Operator expression(一元前缀运算符);
c) Expression operator(一元后缀运算符);
6) 函数调用:函数名后接相关参数,格式:function(expression[,expression…]),例如使用md5加密算法对abc字符串加密,可以使用md5(‘abc’);
7) 聚合表达式:将聚合函数作用在一些查询的行上面的处理,格式:aggregate_name(expression [,…]) [FILTER (WHERE condition)],例如:count(*);
8) 窗口表达式
使用标准SQL命令构造复杂的在线分析处理(OLAP)查询;
窗口表达式在窗口构架(OVER子句)的基础上应用窗口函数;
语法:window_function([expression[, …]]) OVER(window_spec)
窗口规范的特征:
PARTITION BY子句,决定窗口分类标准
ORDER BY 子句,在窗口排序规则
ROWS/RANGE子句用以定义窗口框架
例如:select distinct id,sum(year) over (partition by id) from tb_cp_02;
devdw=# select distinct id,sum(year) over (partition by id) from tb_cp_02;
id | sum
----+-----
1 | 6
2 | 8
5 | 14
(3 rows)
9) 类型转换
一种数据类型转换为另一种。两种方式:
CAST( expression AS type)
expression::type
例如:
’2014-01-01’::date
10) 标量子查询
一个括号中的普通SELECT查询返回单行单列的结果。
例如:
select id, (select oid from pg_class where relname=‘tb_cp_02’) as oid from tb_cp_02;
11) 关联子查询
使用其他查询结果来组建结果。
例如,
select * from tb01 where exists (select 1 from tb03 where tb01.a=tb03.a);
GP执行关联子查询有两种方式:
1、关联子查询可以被拆解为关联(JOIN)操作
2、关联子查询为外部的查询每行执行一次
12) 数组构造函数
从一系列值构造一个数组作为自身元素的表达式。例如,
CREATE TABLE tb_ar_01(a int[], b int[]);
INSERT INTO tb_ar_01 VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);
13) 行构造函数
从一系列的值构建一个作为自身行的值的表达式。
例如:
select row(1,2.5,‘this is a test’);
1.5 使用函数和运算符
1.5.1 函数类型
1) 不变型函数
GP完全支持所有类型的不变形函数;
不变形函数仅仅依赖于直接传递的参数列表,在给定参数的情况下总是得到相同的返回值。
例如:
mod(10,3)
2) 稳定型函数
返回值依赖于数据库查询或者参数值的函数都属于稳定型。
例如,to_char(), current_date
3) 不稳定型函数
即使是单表的扫描,函数值也可能发生变化。
例如,random(),setval()
注意:1、任何含有SQL语句或修改数据库的不稳定函数不可以在segment上执行
2、不稳定型和稳定型可以安全的在Master上使用
4) 日期函数
extract(day|month|year from date ‘2013-01-01’);
date '2013-01-01' + interval '1 day'
date_part('day', timestamp '2013-01-16 20:38:40');
date_trunc('hour', timestamp '2013-02-16 20:38:40');
pg_sleep(seconds);
5) 系统日期变量
current_date, current_time, now(), timeofday()
6) 数据类型格式化函数
to_char(125,’999), to_timestamp()
7) 字符串函数
substr, trim, length, lpad, replace, upper, position, ||
8) 模式匹配
like,similar to正则表达式,~正则表达式
9) 条件表达式
case, coalesce, nullif, greatest
10) 其他表达式
exists, in, not in, all, any, sum, min
11) 窗口函数
ROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY expr )
RANK () OVER ( [PARTITION BY expr] ORDER BY expr )
FIRST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr [ROWS|RANGE frame_expr] )
12) 高级分析函数
GP提供了一些在PostgreSQL中没有的高级分析函数。例如,
sum(array[])
matrix_transpose(array[[1,1,1],[2,2,2]])
mregr_tstats(y, array[1, x1, x2])
1.5.2 自定义函数
1) GP支持SQL/PYTHON/PERL/C语言构建函数,以下着重介绍SQL 存储过程。
一个存储过程就是一个事务,包括对子过程的调用都在一个事务内
存储过程结构:
CREATE OR REPLACE FUNCTION increment(i integer) RETURNS integer AS $$
DECLARE
j int:=100;
BEGIN
RETURN i * j;
END;
$$ LANGUAGE plpgsql;
赋值
给一个变量或行/记录赋值用下面方法:identifier := expression
例子:
user_id := 20;
执行一个没有结果的查询:
PERFORM query;
例子:
PERFORM create_mv('cs_session_page_requests_mv', my_query);
2) 动态SQL
EXECUTE command-string [INTO [STRICT] target];
3) SELECT INTO
例子:
SELECT ID INTO VAR_ID FROM TABLEA
4) 获取结果状态
GET DIAGNOSTICS variable = item [, ...];
例子:
GET DIAGNOSTICS integer_var = ROW_COUNT;
5) 控制结构
IF ... THEN ... ELSEIF ... THEN ... ELSE
LOOP, EXIT, CONTINUE, WHILE, FOR
6) 从函数返回
有两个命令可以用来从函数中返回数据:RETURN 和 RETURN NEXT 。
Syntax:RETURN expression;
7) 设置回调
EXEC SQL WHENEVER condition action;
condition 可以是下列之一:
SQLERROR,SQLWARNING,NOT FOUND
action 可以是下列之一:
CONTINUE, BREAK, STOP, GO TO
8) 异常处理
EXECUTE
EXCEPTION WHEN unique_violation
THEN
-- do nothing
END;
忽略错误:
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'an EXCEPTION is about to be raised';
RAISE EXCEPTION 'NUM:%, DETAILS:%', SQLSTATE, SQLERRM;
END;
9) 错误和消息
RAISE level 'format' [, expression [, ...]];
Level:
Info:信息输入
Notice:信息提示
Exception:产生一个例外,将退出存储过程
Example: RAISE NOTICE 'Calling cs_create_job(%)', v_job_id;
1.5.3 查询性能
1) GP支持动态分区消除和查询内存优化;
2) GP在查询优化时为不同的操作动态的消除不相关分区和分配内存;
3) 动态分区消去
在GP中,运行的值只能用于在内部动态的减少分区;
需要设置Server参数gp_dynamic_partition_pruning缺省设置为on
4) 内存优化
5) GP为一个查询中不同的操作以最优的方式分配内存;
在单个查询的不同阶段快速地释放和重分配内存。
1.5.4 查询分析
1) GP为每一个查询设计出一个查询计划;
2) 确保查询和数据结构选择了正确的查询计划;
3) 通过检查查询计划,帮助确定调优方案;
4) 优化器根据统计信息来选择尽可能成本低的查询计划
5) 成本是对I/O和CPU消耗的衡量(获取磁盘页的数量)
6) 使用EXPLAIN来获取评估的查询计划,例如:
EXPLAIN SELECT * FROM tb_cp_02 WHERE date=‘2013-01-01’;
7) EXPLAIN ANALYZE会真正的执行语句,获取实际的查询执行情况。例如:
EXPLAIN ANALYZE SELECT * FROM tb_cp_02 WHERE date=‘2013-01-01’;