Greenplum实用技巧

一、通过gp_segment_id查看数据倾斜

gp_segment_id是表中的隐藏列,用来标记该行属于哪个segment节点。因此可以基于该隐藏列进行分组查询,获取每个segment的记录数,从而判断表数据的分布是否均匀或有倾斜。

qb=#select gp_segment_id, count(*) from call_center group by 1 order by 1;
 gp_segment_id | count 
---------------+-------
             0 |     4
             1 |     2
             2 |     6
             3 |     4
             4 |     5
             5 |     3
(6 rows)

二、查看segment配置和状态

gp_segment_configuration是一张系统表,它维护包括master、standby在内的所有节点信息。是DBA了解集群最直观的方式。

qb=# select * from gp_segment_configuration order by 1;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                datadir                
------+---------+------+----------------+------+--------+-------+----------+---------+---------------------------------------
    1 |      -1 | p    | p              | n    | u      |  5432 | n208     | n208    | /var/lib/qb-data/master/qbseg-1
    2 |       0 | p    | p              | n    | u      | 40000 | n208     | n208    | /var/lib/qb-data/primary/qbseg0
    3 |       1 | p    | p              | n    | u      | 40001 | n208     | n208    | /var/lib/qb-data/primary/qbseg1
    4 |       2 | p    | p              | n    | u      | 40000 | n209     | n209    | /var/lib/qb-data/primary/qbseg2
    5 |       3 | p    | p              | n    | u      | 40001 | n209     | n209    | /var/lib/qb-data/primary/qbseg3
    6 |       4 | p    | p              | n    | u      | 40000 | n210     | n210    | /var/lib/qb-data/primary/qbseg4
    7 |       5 | p    | p              | n    | u      | 40001 | n210     | n210    | /var/lib/qb-data/primary/qbseg5

三、查看表的大小

查看表的大小有两种方式,一种是\d+,一种是pg_size_pretty。
如果是在一个schema下直接执行\d+,可以一次性看到所有表的size,如下

qb=# \d+
                                             List of relations
 Schema |             Name             |       Type        |  Owner   |  Storage  |   Size   | Description 
--------+------------------------------+-------------------+----------+-----------+----------+-------------
 tpcds  | call_center                  | table             | qb| ao_column | 1039 kB  | 
 tpcds  | catalog_page                 | table             | qb| ao_column | 2575 kB  | 
 tpcds  | catalog_returns              | partitioned table | qb| ao_column | 0 bytes  | 
 tpcds  | catalog_returns_1_prt_10     | table             | qb| ao_column | 1092 kB  | 
 tpcds  | catalog_returns_1_prt_100    | table             | qb| ao_column | 2197 kB  | 
 tpcds  | catalog_returns_1_prt_101    | table             | qb| ao_column | 2216 kB  | 
 tpcds  | catalog_returns_1_prt_102    | table             | qb| ao_column | 2192 kB  | 
 tpcds  | catalog_returns_1_prt_103    | table             | qb| ao_column | 2190 kB  | 
 tpcds  | catalog_returns_1_prt_104    | table             | qb| ao_column | 2175 kB  | 
 tpcds  | catalog_returns_1_prt_105    | table             | qb| ao_column | 2174 kB  | 
 tpcds  | catalog_returns_1_prt_106    | table             | qb| ao_column | 2136 kB  | 
 tpcds  | catalog_returns_1_prt_107    | table             | qb| ao_column | 2119 kB  | 
 tpcds  | catalog_returns_1_prt_108    | table             | qb| ao_column | 2091 kB  | 
 tpcds  | catalog_returns_1_prt_109    | table             | qb| ao_column | 2072 kB  |

如果想查看某一张表的size,可以使用pg_size_pretty,如下

qb=# select pg_size_pretty(pg_relation_size('catalog_returns_1_prt_158'));
 pg_size_pretty 
----------------
 929 kB
(1 row)

除此之外,还有以下一系列函数可以查看数据库、表占用空间大小。

函数 说明
pg_database_size(‘znids_dc’) 数据库大小,不计算索引
pg_total_size(‘znids_dc’) 数据库大小,包含索引
pg_indexes_size(‘alert_log_sm’) 表中索引大小
pg_relation_size(‘alert_log_sm’) 表大小,不包括索引
pg_total_relation_size(‘alert_log_sm’) 表大小,包括索引
pg_tablespace_size(‘alert_log_sm’) 表空间大小

四、explain相关用法

如果是仅查看执行计划,可直接使用explain select xxx。
如果是希望查看执行计划,并输出语句执行耗时,可使用explain analyze select xxx。

qb=# explain analyze select cc_call_center_sk,count(*) from call_center group by 1;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Gather Motion 6:1  (slice1; segments: 6)  (cost=0.00..431.00 rows=24 width=12) (actual time=3.223..4.193 rows=24 loops=1)
   ->  GroupAggregate  (cost=0.00..431.00 rows=4 width=12) (actual time=0.519..0.528 rows=6 loops=1)
         Group Key: cc_call_center_sk
         ->  Sort  (cost=0.00..431.00 rows=4 width=4) (actual time=0.504..0.509 rows=6 loops=1)
               Sort Key: cc_call_center_sk
               Sort Method:  quicksort  Memory: 150kB
               Executor Memory: 152kB  Segments: 6  Max: 26kB (segment 2)
               ->  Seq Scan on call_center  (cost=0.00..431.00 rows=4 width=4) (actual time=0.459..0.470 rows=6 loops=1)
 Optimizer: ORCA Optimizer (QBORCA)
 Planning Time: 15.128 ms
   (slice0)    Executor memory: 27K bytes.
   (slice1)    Executor memory: 159K bytes avg x 6 workers, 159K bytes max (seg0).  Work_mem: 26K bytes max.
 Memory used:  128000kB
 Execution Time: 5.362 ms
(14 rows)

五、使用show显示某个参数的值

可以在命令行使用show xxx来显示某个参数的值。如,

qb=# show max_connections ;
 max_connections 
-----------------
 250
(1 row)

六、通过gpconfig修改参数

通过gpconfig配置Master和所有Segment的postgresql.conf中参数。
查询:gpconfig -s
修改:gpconfig -c

比如,

gpconfig -c work_mem -v 120MB –masteronly
修改master上的work_mem =120MB
gpconfig -c max_connections -v 100 -m 10
修改Master上max_connections=10,Segment上修改成100
gpconfig -r default_statistics_target
注释参数,使用缺省参数
gpconfig –l
列出所有的参数
gpconfig -s max_connections
显示某个参数
最大连接数:show max_connections;
最大事务数:show max_prepared_transactions;

你可能感兴趣的:(Greenplum,ffmpeg,linux,运维)