一条SQL引发系统out of memory PostgreSQL

错误描述

(1) Postgres执行的原SQL：

select COALESCE(m1.place_id, m2.place_id, m3.place_id) as place_id, concat_ws('``', m1.rich_attributes, m2.rich_attributes, m3.rich_attributes) as rich_attributes from

(SELECT place_external_points.place_id, string_agg(concat(place_external_points.metadata_dictionary_id, '@-@', place_external_points.value), '``'::text) AS rich_attributes

FROM place_external_points b WHERE metadata_dictionary_id = ANY (ARRAY[61, 62]) group by place_external_points.place_id) m1

full join

(SELECT place_geocoded.place_id, string_agg(concat(place_geocoded.metadata_dictionary_id, '@-@', place_geocoded.value), '``'::text) AS rich_attributes FROM place_geocoded g

WHERE metadata_dictionary_id = ANY (ARRAY[70, 71, 72, 73, 74, 75, 76, 77, 78]) group by place_geocoded.place_id) m2 on m1.place_id = m2.place_id

full join

(SELECT place_attributes.place_id, string_agg(concat(place_attributes.metadata_dictionary_id, '@-@', place_attributes.value), '``'::text) AS rich_attributes FROM

place_attributes a WHERE place_attributes.metadata_dictionary_id = ANY (ARRAY[5, 7, 9, 10, 11, 12, 13, 33, 35, 46, 51, 52, 53, 54, 55, 57, 58, 59]) group by a.place_id) m3 on m2.place_id = m3.place_id;

(2) 系统日志message:

Dec 27 10:39:13 shb-postgresql-01 kernel: Out of memory: Kill process 9116 (postgres) score 823 or sacrifice child

Dec 27 10:39:13 shb-postgresql-01 kernel: Killed process 9116, UID 501, (postgres) total-vm:40440476kB, anon-rss:28320224kB, file-rss:2154596kB

Dec 27 10:39:13 shb-postgresql-01 kernel: postgres: page allocation failure. order:0, mode:0x280da

Dec 27 10:39:13 shb-postgresql-01 kernel: Pid: 9116, comm: postgres Not tainted 2.6.32-431.el6.x86_64 #1

(3) 数据库日志：

less /usr/local/pgsql/data/pg_log/postgresql-2017-12-29_000000.log

WARNING: terminating connection because of crash of another server process

DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT: In a moment you should be able to reconnect to the database and repeat your command.

FATAL: the database system is in recovery mode

LOG: all server processes terminated; reinitializing

LOG: database system was interrupted; last known up at 2017-12-29 09:20:36 CST

FATAL: the database system is in recovery mode

LOG: database system was not properly shut down; automatic recovery in progress

LOG: invalid record length at 94A/478309C0

LOG: redo is not required

LOG: MultiXact member wraparound protections are now enabled

LOG: database system is ready to accept connections

LOG: autovacuum launcher started

(4) 脚本运行的时候内存变化：

# free -g

total used free shared buffers cached

Mem: 31 31 0 0 0 0

-/+ buffers/cache: 31 0

Swap: 7 3 4

原因：通过以上信息可以发现是postgres的查询SQL把内存消耗殆尽，并且开始使用swap分区，导致系统发生oom-kill,数据库进程被干掉，然后重启恢复。

分析过程

(1) 表place_external_points和place_geocoded经查看都为空，表place_attributes有1亿多条数据,所以问题发生在最后一条select xxx from place_attributes语句上，这个表的列metadata_dictionary_id有对应的索引

# \d place_attributes;

Table "denali_search_cn_17q3_20171226_epl.place_attributes"

Column | Type | Modifiers

------------------------+--------------------------------+---------------

place_id | integer |

metadata_dictionary_id | integer |

value | text |

lang | character varying(50) |

source | character varying(50) |

create_time | timestamp(6) without time zone | default now()

update_time | timestamp(6) without time zone |

Indexes:

"place_attributes_metadata_dictionary_id_idx" btree (metadata_dictionary_id)

"place_attributes_place_id_metadata_dictionary_id_idx" btree (place_id, metadata_dictionary_id)

"place_attributes_value_idx" btree (value)

(2) 通过执行计划explain发现使用了hash聚合：hashaggregate，SQL本身是group by ,没有使用group聚合，却使用了hash聚合，可初步判断数据库执行计划不是最优的：

# explain SELECT place_id, string_agg(concat(metadata_dictionary_id, '@-@', value), '``'::text) AS rich_attributes FROM place_attributes WHERE metadata_dictionary_id = ANY (ARRAY[5, 7, 9, 10, 11, 12, 13, 33, 35, 46, 51, 52, 53, 54, 55, 57, 58, 59]) group by place_attributes.place_id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------

HashAggregate (cost=7622781.80..7985216.03 rows=323575 width=62)

Group Key: place_id

-> Sort (cost=7622781.80..7712379.18 rows=35838955 width=62)

Sort Key: place_id

-> Bitmap Heap Scan on place_attributes (cost=450351.86..2452151.90 rows=35838955 width=62)

Recheck Cond: (metadata_dictionary_id = ANY ('{5,7,9,10,11,12,13,33,35,46,51,52,53,54,55,57,58,59}'::integer[]))

-> Bitmap Index Scan on place_attributes_metadata_dictionary_id_idx (cost=0.00..441392.12 rows=35838955 width=0)

Index Cond: (metadata_dictionary_id = ANY ('{5,7,9,10,11,12,13,33,35,46,51,52,53,54,55,57,58,59}'::integer[]))

解决方法

更新统计分析然后选择最优的执行计划

(Updates statistics used by the planner to determine the most efficient way to execute a query.)

postgres=# vacuum analyze place_attributes;

或者临时关闭enable_hashagg:

(Enables or disables the query planner's use of hashed aggregation plan types. The default is on)

# set enable_hashagg =0;

再次查看执行计划，发现执行计划不再走hashaggregate

yangzi=# explain SELECT place_id, string_agg(concat(metadata_dictionary_id, '@-@', value), '``'::text) AS rich_attributes FROM place_attributes WHERE metadata_dictionary_id = ANY (ARRAY[5, 7, 9, 10, 11, 12, 13, 33, 35, 46, 51, 52, 53, 54, 55, 57, 58, 59]) group by place_attributes.place_id;