pgfincore使用posix_fadvise缓存os cache

http://blog.163.com/digoal@126/blog/static/163877040201062944945126/

http://blog.163.com/digoal@126/blog/static/16387704020122198214650/

http://blog.163.com/digoal@126/blog/static/1638770402011630102117658/

http://francs3.blog.163.com/blog/static/4057672720107541611270/

http://blog.163.com/dazuiba_008/blog/static/36334981201293052730729/


1:下载地址:

http://git.postgresql.org/gitweb/


2:定义:

pgfincore:预先将  表/索引   刷入/刷出os cache的规则。


3:安装pgfincore

查看pg_config变量是否存在
[digoal@localhost pgfincore-0c89186]$ pg_config -V
pg_config: invalid argument: -V
Try "pg_config --help" for more information.

如果不存在,则需要修改环境变量:
查看pg的安装目录:
[digoal@localhost pgfincore-0c89186]$ which pg_config
/opt/pgsql_934/bin/pg_config

导入pg_config如环境变量:
[digoal@localhost ~]$ cd
[digoal@localhost ~]$ vi .bash_profile 
......
export PATH=/opt/pgsql_934/bin:$PATH
......
查看pg_config是否配置成功:
[digoal@localhost pgfincore-0c89186]$ pg_config
BINDIR = /opt/pgsql_934/bin
DOCDIR = /opt/pgsql_934/share/doc
HTMLDIR = /opt/pgsql_934/share/doc
INCLUDEDIR = /opt/pgsql_934/include
PKGINCLUDEDIR = /opt/pgsql_934/include
INCLUDEDIR-SERVER = /opt/pgsql_934/include/server
LIBDIR = /opt/pgsql_934/lib
PKGLIBDIR = /opt/pgsql_934/lib
LOCALEDIR = /opt/pgsql_934/share/locale
MANDIR = /opt/pgsql_934/share/man
SHAREDIR = /opt/pgsql_934/share
SYSCONFDIR = /opt/pgsql_934/etc
PGXS = /opt/pgsql_934/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pgsql934'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/pgsql934/lib',--enable-new-dtags
LDFLAGS_EX = 
LDFLAGS_SL = 
LIBS = -lpgport -lpgcommon -lz -lreadline -lcrypt -ldl -lm 
VERSION = PostgreSQL 9.3.4

如果在root上安装则需要先执行 下面这句话
[root@localhost pgfincore-0c89186]# export PATH=/opt/pgsql_934/bin:$PATH

然后就可以进行安装了:
[digoal@localhost pgfincore-0c89186]$ cd /usr/local/rj/postgresql-9.3.4/contrib/pgfincore-0c89186
[digoal@localhost pgfincore-0c89186]$ cd /usr/local/rj/postgresql-9.3.4/contrib/pgfincore-0c89186
[digoal@localhost pgfincore-0c89186]$ gmake clean
root@localhost pgfincore-0c89186]# gmake
[root@localhost pgfincore-0c89186]# gmake install

然后查看安装是否成功,确定其时间是否和刚才安装的时间大体一致即可:
[root@localhost pgfincore-0c89186]# cd /opt/pgsql_934/lib
[root@localhost lib]# ll
total 7132
......
-rwxr-xr-x. 1 root root   70205 Jun 11 13:29 pgfincore.so
......


在数据库中创建此扩展中间件
digoal=# create extension pgfincore;
CREATE EXTENSION

查看pgfincore的函数都有哪些

postgres=# \dx+ pgfincore
                       Objects in extension "pgfincore"
                              Object Description                              
------------------------------------------------------------------------------
 function pgfadvise(regclass,text,integer)
 function pgfadvise_dontneed(regclass)
 function pgfadvise_loader(regclass,integer,boolean,boolean,bit varying)
 function pgfadvise_loader(regclass,text,integer,boolean,boolean,bit varying)
 function pgfadvise_normal(regclass)
 function pgfadvise_random(regclass)
 function pgfadvise_sequential(regclass)
 function pgfadvise_willneed(regclass)
 function pgfincore(regclass)
 function pgfincore(regclass,boolean)
 function pgfincore(regclass,text,boolean)
 function pgfincore_drawer(bit varying)
 function pgsysconf()
 function pgsysconf_pretty()
(14 rows)


查看系统的块大小。

digoal=# select * from pgsysconf();
 os_page_size | os_pages_free | os_total_pages 
--------------+---------------+----------------
         4096 |         16871 |         253087
(1 row)


如果没有配置pg_config的话,可能会安装失败,在使用select pgfincore('test')的时候会报错“ERROR:  fincore(7, 0, 8192, 0x1cac350): Function not implemented”。


4:操作pgfincore


1:pgfincore

查看数据表文件在硬盘的块大小和在内存中的块大小。

postgres=# create table test (id int, info text);
CREATE TABLE
postgres=# insert into test select generate_series(1,10000), now();
INSERT 0 10000
postgres=#  select * from pgfincore('test');    
     relpath      | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33248 |       0 |         4096 |          168 |       169 |         1 |         17982 |         |             |            
(1 row)


2:pgfadvise_willneed,pgfadvise_dontneed

告知系统,尽可能多的将表的数据文件刷入/刷出到内存中,但不是一下子就全部(不比pgfadvise_loader)。

注意page_mem这列:表示有多少系统文件会存放到os page cache中。

postgres=# create table pgtest(id bigint,firstname text,lastname text,corp text,post text,age int,crt_time timestamp without time zone,comment text);
CREATE TABLE
Time: 442.189 ms
postgres=# insert into pgtest(id,firstname,lastname,corp,post,age,crt_time,comment) select generate_series(1,10000000),'zhou','digoal'||generate_series(2,10000001),'sky-mobi','dba team leader',28,clock_timestamp(),'abcdefg'||generate_series(3,10000002);
INSERT 0 10000000
Time: 222696.545 ms

刚插入的表中会有一部分在shared buffer中:

postgres=# select * from pgfincore('pgtest'::regclass);
      relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
--------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33260   |       0 |         4096 |       262144 |    100812 |     15218 |         16483 |         |             |            
 base/12896/33260.1 |       1 |         4096 |         4524 |        67 |         4 |         16483 |         |             |            
(2 rows)

刷出:

postgres=# select * from pgfadvise_dontneed('pgtest'::regclass);
      relpath       | os_page_size | rel_os_pages | os_pages_free 
--------------------+--------------+--------------+---------------
 base/12896/33260   |         4096 |       262144 |        119922
 base/12896/33260.1 |         4096 |         4524 |        120019
(2 rows)


Time: 305.337 ms
postgres=# select * from pgfincore('pgtest'::regclass);
      relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
--------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33260   |       0 |         4096 |       262144 |       412 |       142 |        119494 |         |             |            
 base/12896/33260.1 |       1 |         4096 |         4524 |         0 |         0 |        119494 |         |             |            
(2 rows)


Time: 10.001 ms

注意在使用pgfadvise_dontneed再刷出之后此时的pages_mem还剩下142,并没有全部刷出(与pgfadvise_loader刷出有所不同)。

查看select count(*)所需时间:

postgres=# select count(*) from pgtest;
  count   
----------
 10000000
(1 row)

Time: 47356.265 ms

查看在运行select count(*)期间io使用情况,可以发现与磁盘有大量的io操作(bi, bo):

[pg93@localhost ~]$ vmstat -n 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1 251072 223176   2592 650488    0   10    47    56   27   40  0  0 99  0  0
 0  1 251072 219688   2592 654032    0    0  3512     0  354  579  0  4 77 19  0
 0  1 251072 216216   2592 657356    0    0  3328     0  350  555  0  5 78 17  0
 2  0 251072 212372   2600 661148    0    0  3832    12  341  557  0  5 77 18  0
 2  0 251072 207784   2600 665548    0    0  4400     0  416  643  1  5 93  1  0
 0  0 251072 203444   2600 669944    0    0  4360     0  420  640  0  6 94  0  0
 0  0 251072 199476   2600 673744    0    0  3816     0  428  657  0  7 92  1  0
 0  0 251072 194888   2600 678352    0    0  4616     0  413  633  0  6 93  1  0
 1  1 251032 181992   2696 690800  140    0 12644    12  532  717  1 11 70 18  0
 0  2 251032 131524   2696 740304    0    0 49528     0  946  940 10 39  0 51  0
 0  2 251032 106724   2696 764272    0    0 23976  8408  607  450  4 21  0 75  0
 0  2 251032  86760   2696 783884    0    0 19608 12288  821  878  4 16  0 79  0
 0  2 251032  67664   2696 802600    0    0 18728 16384  768  377  2 19  0 79  0
 0  3 251032  66820   2704 803564    0    0 15168 13872  526  333  2 14  0 84  0
 0  3 251032  62976   2704 807428    0    0  3848 10000  351  325  1  2  0 97  0
 0  3 251032  70788   2704 799768    0    0  3840 10768  359  241  0  2  0 98  0
 0  3 251032  67068   2704 803604    0    0  3832 15672  457  303  1  2  0 97  0
 2  3 251032  63348   2704 807040    0    0  3448 14424  457  481  1  3  0 96  0
 0  3 251032  71532   2712 798576    0    0  3056 13216  664  702  1  6  0 93  0
 0  3 251032  67812   2712 802952    0    0  4376  3616  343  153  1  2  0 97  0
 0  3 251032  63720   2712 806788    0    0  3864  4480  385  224  2  1  0 97  0
 0  3 251032  71408   2712 798832    0    0  3568  4696  289  164  1  3  0 96  0
 0  3 251032  68184   2712 802472    0    0  3608  3600  421  523  0  3  0 97  0
 0  4 251032  64340   2720 806172   24    0  3720  3880  439  399  2  1  0 97  0
 0  4 251032  71152   2720 798244    0    0  3600  3012  407  443  2  3  0 95  0
 0  3 251032  68556   2720 801936    0    0  3688  5388  395  157  1  5  0 94  0
 0  3 251032  64712   2720 805620    0    0  3688  3824  423  540  1  1  0 98  0
 0  2 251032  61116   2720 809176    0    0  3560  2556  361  333  1  1  0 98  0
 1  0 251032  70912   2728 799576    0    0 13032    52  319  301  3 10  0 87  0
 1  1 251032  71532   2728 798508    0    0 30056     0  605  566  9 21  0 70  0
 0  2 251032  71408   2728 798756    0    0 22896     0  457  446  4 16  0 80  0
 0  2 251032  71408   2728 798408    0    0 20120     0  433  422  4 16  0 80  0
 0  2 251032  69052   2728 800824    0    0  5000  2900  350  237  2  1  0 97  0
 1  2 251032  68680   2736 800960    0    0 34416  1068  778  677  8 28  0 64  0
 0  2 251032  68928   2620 800692    0    0 14760   560  394  341  4 12  0 84  0
 1  1 251032  68928   2620 800672    0    0 20968   504  445  428  3 16  0 80  0
 0  1 251032  71284   2620 798044    0    0 30680  1064  743  708  5 29  0 66  0
 0  3 251032  68308   2620 800928    0    0  8232 22732  580  218  2  9  0 89  0
 0  4 251032  64092   2628 804992    0    0  4080 23476  623  175  1  6  0 93  0
 1  3 251032  68556   2628 799956    0    0  3676 10124  358  367  0  4  0 96  0
 0  3 251028  68680   2628 800312    0    0  3400 13280  456  700  1  2  0 97  0
 0  3 251028  64464   2628 804440    0    0  4136 16272  486  369  0  3  0 97  0
 1  3 251028  60744   2628 808124    0    0  3680 17160  452  197  1  2  0 97  0
 0  3 251028  69672   2632 799416    0    0  3064 13800  457  581  0  2  0 98  0
 1  2 251028  65084   2632 803776    0    0  4400 14224  492  290  1  3  0 96  0
 1  3 251028  61240   2632 807616    0    0  3832 17104  495  306  0  2  0 98  0
 0  3 251028  69300   2620 799500    0    0  3592 14464  449  555  1  3  0 96  0
 0  3 251028  65208   2620 803488    0    0  4016 15072  468  449  0  2  0 98  0
 0  3 251028  61612   2628 807088    0    0  3608 18444  478  146  1  2  0 97  0
 0  3 251028  69300   2628 799508    0    0  4200  7976  420  454  2  4  0 94  0
 3  1 251028  70912   2628 797856    0    0 19832    92  551  592  7 11  0 82  0
 0  2 251028  71408   2628 796976    0    0 32144     0  653  614 11 22  0 67  0
 0  2 251028  71408   2628 796664    0    0 26808     0  568  545  8 20  0 72  0
 1  0 251028  71532   2636 795980    0    0 24888    12  654  581 10 20  0 70  0
 0  3 251028  69292   2628 797288    0    0  7888  3348  682  546  4 13  0 83  0
 1  1 251028  61240   2628 806288    0    0  9040   476  652  512  6 11  0 83  0
 0  2 251028  71408   2628 796004    0    0 13256   600  641  536  6 17  0 77  0
 2  1 251028  68060   2616 799044    0    0 10168   528  506  452  6 13  0 81  0
 0  1 251028  71284    908 797132    0    0 21776  1316 1118  962  8 30  0 61  0
 0  3 251028  71160    908 796732    0    0 17016 15884 1097  758  6 23  0 71  0
 0  3 251028  68308    908 799460    0    0  2752 19464  649  674  0  4  0 96  0
 0  3 251028  65084    908 802508    0    0  3048 17640  591  464  1  2  0 97  0
 0  3 251028  63348    908 804236    0    0  1736 13632  469  569  0  2  0 98  0
 0  3 251028  61364    916 806152    0    0  1880 23564  654  421  1  2  0 97  0
 0  3 251028  70292    900 797148    0    0  2768 22424  751  390  1  6  0 93  0
 0  3 251028  67688    900 799740    0    0  2592 18464  537  519  1  1  0 98  0
 0  3 251028  65828    900 801624    0    0  1872 16928  573  838  0  3  0 97  0
 0  3 251028  63472    900 803868    0    0  2280 13288  466  434  0  1  0 99  0
 0  3 251028  61364    908 805900    0    0  2008 15612  474  532  0  2  0 98  0
 0  2 251028  71160    892 796224    0    0  2096 10908  577  973  1  2  0 97  0
 0  2 251028  71408    884 795048    0    0 18272     0  823  699  4 25  0 71  0
 0  1 251028  71532    884 794464    0    0 15808     0  679  657  2 24  0 74  0
 3  1 251028  69300    388 797480    0    0 10184     0  473  426  2 17  0 81  0
 2  0 251028  71408    176 795316    0    0  9392    12  459  444  2 16  0 82  0
 0  2 251028  71160    160 795504    0    0 13200     0  647  605  5 19  0 76  0
 0  2 251028  69920    160 796696    0    0 12680  4096  576  542  4 26  0 70  0
 1  1 251028  68680    148 797568    0    0 28032 24576  711  543  9 27  0 65  0
 2  0 251028  70168    148 796040    0    0 23296 24604  609  526 11 25  0 65  0
 0  1 251028  70416    156 795796    0    0 25856 24628  630  552 13 24  0 63  0
 0  2 251028  71532    156 793244    0    0 21248 41240  561  470 10 21  0 69  0
 0  1 251028  71408    148 794380    0    0 22656     0  495  476  9 19  0 72  0
 2  2 251028  71036    148 794924    0    0 23552 24576  571  490  9 19  0 72  0
 0  2 251028  70540    140 795352    0    0 23424 24576  551  483  9 20  0 71  0
 1  0 251024  66696    140 798888   24    0 12532 12288  763  366 48 15  0 37  0
 0  0 251016  69928    416 801204   40    0  2032    68  623  278 36  9 18 37  0
 0  0 251016  69060    416 802252    0    0     0     0   38   71  1  0 99  0  0
 0  0 251012  68068    416 803332    0    0     0     0   38   73  0  1 99  0  0
 0  0 251012  66208    420 804420    0    0     4 24184  123   86  0  4 93  3  0
 0  0 251012  65092    420 805500    0    0     0     0   38   72  0  0 100  0  0
 0  0 251012  64100    428 806580    0    0     0    68   44   90  0  1 99  0  0
 0  0 251012  62984    428 807660    0    0     0     0   39   74  0  1 99  0  0

刷入:

postgres=# select * from pgfadvise_willneed('pgtest'::regclass);
      relpath       | os_page_size | rel_os_pages | os_pages_free 
--------------------+--------------+--------------+---------------
 base/12896/33260   |         4096 |       262144 |         15306
 base/12896/33260.1 |         4096 |         4524 |         15801
(2 rows)


Time: 2344.132 ms
postgres=# select * from pgfincore('pgtest'::regclass);      
      relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
--------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33260   |       0 |         4096 |       262144 |    157035 |      9064 |         17971 |         |             |            
 base/12896/33260.1 |       1 |         4096 |         4524 |      4524 |         1 |         17971 |         |             |            
(2 rows)


Time: 55.911 ms

查看select count(*)所需时间:

postgres=# select count(*) from pgtest;
  count   
----------
 10000000
(1 row)

Time: 7500.932 ms

查看在运行select count(*)期间io使用情况,io使用很少(跟磁盘交互很少):
[pg93@localhost ~]$ vmstat -n 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0 250848  72440    944 802808    0   10    49    57   27   40  0  0 99  0  0
 0  0 250844  72300    944 802908    0    0     0     0   51   94  1  0 99  0  0
 0  0 250844  71664    944 803040    0    0     0    24   88  152  0  1 99  0  0
 0  0 250844  71672    944 803228    0    0     0     0   34   94  0  0 100  0  0
 0  0 250844  71672    952 803364    0    0     0    56   54  126  0  1 99  0  0
 0  0 250844  71672    952 803396    0    0     0     0   22   84  0  0 100  0  0
 0  1 250844  68944    944 806676    0    0 14820     0  716  293 40 18 26 15  0
 1  0 250844  67340    724 810168    0    0 83072     0 1210 1286 17 40  0 43  0
 0  1 250844  67712    724 809872    0    0 91008     0 1299 1418 21 38  0 40  0
 0  1 250844  70192    732 807452    0    0 83328    16 1206 1255 15 39  0 46  0
 1  1 250844  70812    200 807372    0    0 91776     0 1351 1394 13 49  0 39  0
 1  1 250844  71556    200 806116    0    0 34416     0 2097 2948  8 54  0 38  0
 0  1 250844  71556    200 806232    0    0 17048     0 2131 3112 12 47  0 41  0
 0  0 250844  71308    200 806900    0    0   568     0  746  104 62 10 24  3  0
 0  0 250844  71060    208 807072    0    0     0    16   40   95  0  0 100  0  0
 0  0 250844  70936    208 807216    0    0     0     0   31   92  0  0 100  0  0
 0  0 250844  70424    208 807308    0    0     0    24   71  127  1  2 97  0  0


刷入之后几乎比刷入之前select count(*)快了2倍多。

3:pgfadvise_loader

做一个os cache快照。
restore还原快照到内存中。


查看当前pgtest表的page_mem:(163570+3

postgres=# select * from pgfincore('pgtest'::regclass);      
      relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
--------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33260   |       0 |         4096 |       262144 |    163570 |        32 |         16855 |         |             |            
 base/12896/33260.1 |       1 |         4096 |         4524 |         3 |         1 |         16855 |         |             |            
(2 rows)

Time: 37.738 ms

创建snapshot表:(此表内容为二进制格式)

postgres=#  create table pgfincore_pgtest_snapshot as select * from pgfincore('pgtest'::regclass,true);
SELECT 2
Time: 705.283 ms

刷出pgtest表的os cache:( 使用pgfadvise_loader刷出

postgres=# select pgfadvise_loader('pgtest'::regclass,segment,false,true,bitnot(databit)) from (select segment,databit from pgfincore('pgtest'::regclass,true)) as t;              pgfadvise_loader             
-----------------------------------------
 (base/12896/33260,4096,181633,0,163570)
 (base/12896/33260.1,4096,181636,0,3)
(2 rows)

Time: 250.228 ms
postgres=# select * from pgfincore('pgtest'::regclass);      
      relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
--------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33260   |       0 |         4096 |       262144 |         0 |         0 |        181638 |         |             |            
 base/12896/33260.1 |       1 |         4096 |         4524 |         0 |         0 |        181638 |         |             |            
(2 rows)

Time: 9.127 ms
restore还原snapshot的os cache:

postgres=# select pgfadvise_loader('pgtest'::regclass,segment,true,true,databit) from pgfincore_pgtest_snapshot;
              pgfadvise_loader              
--------------------------------------------
 (base/12896/33260,4096,17942,163570,98574)
 (base/12896/33260.1,4096,17973,3,4521)
(2 rows)

Time: 7236.190 ms
postgres=# select * from pgfincore('pgtest'::regclass);                                                         
      relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit | pages_dirty | group_dirty 
--------------------+---------+--------------+--------------+-----------+-----------+---------------+---------+-------------+-------------
 base/12896/33260   |       0 |         4096 |       262144 |    163570 |        32 |         18565 |         |             |            
 base/12896/33260.1 |       1 |         4096 |         4524 |         3 |         1 |         18565 |         |             |            
(2 rows)

Time: 38.081 ms






===============附录====================================================================

===========
 PgFincore
===========


--------------------------------------------------------------
 A set of functions to manage pages in memory from PostgreSQL
--------------------------------------------------------------


A set of functions to handle low-level management of relations using mincore to
explore cache memory.


DESCRIPTION
===========


With PostgreSQL, each Table or Index is splitted in segments of (usually) 1GB,
and each segment is splitted in pages in memory then in blocks for the
filesystem.


Those functions let you know which and how many disk block from a relation are
in the page cache of the operating system. It can provide the result as a VarBit
and can be stored in a table. Then using this table, it is possible to restore
the page cache state for each block of the relation, even in another server,
thanks to Streaming Replication.


Other functions are used to set a *POSIX_FADVISE* flag on the entire relation
(each segment). The more usefull are probably *WILLNEED* and *DONTNEED* which
push and pop blocks of each segments of a relation from page cache,
respectively.


Each functions are call with at least a table name or an index name (or oid)
as a parameter and walk each segment of the relation.


DOWNLOAD
========


You can grab the latest code with git:: 


   git clone git://git.postgresql.org/git/pgfincore.git
   or
   git://github.com/klando/pgfincore.git


And the project is on pgfoundry : http://pgfoundry.org/projects/pgfincore


INSTALL
=======


From source code:: 


  make clean
  make
  su
  make install


For PostgreSQL >= 9.1, log in your database and::


  mydb=# CREATE EXTENSION pgfincore;


For other release, create the functions from the sql script (it should be in
your contrib directory)::


  psql mydb -f pgfincore.sql


PgFincore is also shipped with Debian scripts to build your own package::


  aptitude install debhelper postgresql-server-dev-all postgresql-server-dev-9.1
  # or postgresql-server-dev-8.4|postgresql-server-dev-9.0
  make deb
  dpkg -i ../postgresql-9.1-pgfincore_1.1.1-1_amd64.deb


PgFincore is packaged for *RPM* at http://yum.postgresql.org/
PgFincore is packaged for *debian* at http://pgapt.debian.net/








EXAMPLES
========


Here are some examples of usage. If you want more details go to Documentation_


Get current state of a relation
-------------------------------


May be useful::


   cedric=# select * from pgfincore('pgbench_accounts');
         relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit 
   --------------------+---------+--------------+--------------+-----------+-----------+---------------+---------
    base/11874/16447   |       0 |         4096 |       262144 |    262144 |         1 |         81016 | 
    base/11874/16447.1 |       1 |         4096 |        65726 |     65726 |         1 |         81016 | 
   (2 rows)
   
   Time: 31.563 ms


Load a table or an index in OS Page Buffer
------------------------------------------


You may want to try to keep a table or an index into the OS Page Cache, or
preload a table before your well know big query is executed (reducing the query
time).


To do so, just execute the following query::


   cedric=# select * from pgfadvise_willneed('pgbench_accounts');
         relpath       | os_page_size | rel_os_pages | os_pages_free 
   --------------------+--------------+--------------+---------------
    base/11874/16447   |         4096 |       262144 |        169138
    base/11874/16447.1 |         4096 |        65726 |        103352
   (2 rows)
    
   Time: 4462,936 ms


  * The column *os_page_size* report that page size is 4KB.
  * The column *rel_os_pages* is the number of pages of the specified file.
  * The column *os_pages_free* is the number of free pages in memory (for caching).


Snapshot and Restore the OS Page Buffer state of a table or an index (or more)
------------------------------------------------------------------------------


You may want to restore a table or an index into the OS Page Cache as it was
while you did the snapshot. For example if you have to reboot your server, then
when PostgreSQL start up the first queries might be slower because neither
PostgreSQL or the OS have pages in their respective cache about the relations
involved in those first queries.


Executing a snapshot and a restore is very simple::


   -- Snapshot
   cedric=# create table pgfincore_snapshot as
   cedric-#   select 'pgbench_accounts'::text as relname,*,now() as date_snapshot
   cedric-#   from pgfincore('pgbench_accounts',true);
   
   -- Restore
   cedric=# select * from pgfadvise_loader('pgbench_accounts', 0, true, true,
                          (select databit from  pgfincore_snapshot
                           where relname='pgbench_accounts' and segment = 0));
        relpath      | os_page_size | os_pages_free | pages_loaded | pages_unloaded 
   ------------------+--------------+---------------+--------------+----------------
    base/11874/16447 |         4096 |         80867 |       262144 |              0
   (1 row)
   
   Time: 35.349 ms


 * The column *pages_loaded* report how many pages have been read to memory
   (they may have already been in memoy)
 * The column *pages_unloaded* report how many pages have been removed from
   memory (they may not have already been in memoy);


SYNOPSIS
========


::


   pgsysconf(OUT os_page_size bigint, OUT os_pages_free bigint,
             OUT os_total_pages bigint)
     RETURNS record
    
   pgsysconf_pretty(OUT os_page_size text, OUT os_pages_free text,
                    OUT os_total_pages text)
     RETURNS record


   pgfadvise(IN relname regclass, IN fork text, IN action int,
             OUT relpath text, OUT os_page_size bigint,
             OUT rel_os_pages bigint, OUT os_pages_free bigint)
     RETURNS setof record


   pgfadvise_willneed(IN relname regclass,
                      OUT relpath text, OUT os_page_size bigint,
                      OUT rel_os_pages bigint, OUT os_pages_free bigint)
     RETURNS setof record


   pgfadvise_dontneed(IN relname regclass,
                      OUT relpath text, OUT os_page_size bigint,
                      OUT rel_os_pages bigint, OUT os_pages_free bigint)
     RETURNS setof record


   pgfadvise_normal(IN relname regclass,
                    OUT relpath text, OUT os_page_size bigint,
                    OUT rel_os_pages bigint, OUT os_pages_free bigint)
     RETURNS setof record


   pgfadvise_sequential(IN relname regclass,
                        OUT relpath text, OUT os_page_size bigint,
                        OUT rel_os_pages bigint, OUT os_pages_free bigint)
     RETURNS setof record


   pgfadvise_random(IN relname regclass,
                    OUT relpath text, OUT os_page_size bigint,
                    OUT rel_os_pages bigint, OUT os_pages_free bigint)
     RETURNS setof record


   pgfadvise_loader(IN relname regclass, IN fork text, IN segment int,
                    IN load bool, IN unload bool, IN databit varbit,
                    OUT relpath text, OUT os_page_size bigint,
                    OUT os_pages_free bigint, OUT pages_loaded bigint,
                    OUT pages_unloaded bigint)
     RETURNS setof record


   pgfadvise_loader(IN relname regclass, IN segment int,
                    IN load bool, IN unload bool, IN databit varbit,
                    OUT relpath text, OUT os_page_size bigint,
                    OUT os_pages_free bigint, OUT pages_loaded bigint,
                    OUT pages_unloaded bigint)
     RETURNS setof record


   pgfincore(IN relname regclass, IN fork text, IN getdatabit bool,
             OUT relpath text, OUT segment int, OUT os_page_size bigint,
             OUT rel_os_pages bigint, OUT pages_mem bigint,
             OUT group_mem bigint, OUT os_pages_free bigint,
             OUT databit      varbit)
     RETURNS setof record


   pgfincore(IN relname regclass, IN getdatabit bool,
             OUT relpath text, OUT segment int, OUT os_page_size bigint,
             OUT rel_os_pages bigint, OUT pages_mem bigint,
             OUT group_mem bigint, OUT os_pages_free bigint,
             OUT databit      varbit)
     RETURNS setof record


   pgfincore(IN relname regclass,
             OUT relpath text, OUT segment int, OUT os_page_size bigint,
             OUT rel_os_pages bigint, OUT pages_mem bigint,
             OUT group_mem bigint, OUT os_pages_free bigint,
             OUT databit      varbit)
     RETURNS setof record


DOCUMENTATION
=============


pgsysconf
---------


This function output size of OS blocks, number of free page in the OS Page Buffer.


::


   cedric=# select * from pgsysconf();
    os_page_size | os_pages_free | os_total_pages 
   --------------+---------------+----------------
            4096 |         80431 |        4094174


pgsysconf_pretty
----------------


The same as above, but with pretty output.


::


   cedric=# select * from pgsysconf_pretty();
    os_page_size | os_pages_free | os_total_pages 
   --------------+---------------+----------------
    4096 bytes   | 314 MB        | 16 GB


pgfadvise_WILLNEED
------------------


This function set *WILLNEED* flag on the current relation. It means that the
Operating Sytem will try to load as much pages as possible of the relation.
Main idea is to preload files on server startup, perhaps using cache hit/miss
ratio or most required relations/indexes.


::


   cedric=# select * from pgfadvise_willneed('pgbench_accounts');
         relpath       | os_page_size | rel_os_pages | os_pages_free 
   --------------------+--------------+--------------+---------------
    base/11874/16447   |         4096 |       262144 |         80650
    base/11874/16447.1 |         4096 |        65726 |         80650


pgfadvise_DONTNEED
------------------


This function set *DONTNEED* flag on the current relation. It means that the
Operating System will first unload pages of the file if it need to free some
memory. Main idea is to unload files when they are not usefull anymore (instead
of perhaps more interesting pages)


::


   cedric=# select * from pgfadvise_dontneed('pgbench_accounts');
         relpath       | os_page_size | rel_os_pages | os_pages_free
   --------------------+--------------+--------------+---------------
    base/11874/16447   |         4096 |       262144 |        342071
    base/11874/16447.1 |         4096 |        65726 |        408103




pgfadvise_NORMAL
----------------


This function set *NORMAL* flag on the current relation.


pgfadvise_SEQUENTIAL
--------------------


This function set *SEQUENTIAL* flag on the current relation.


pgfadvise_RANDOM
----------------


This function set *RANDOM* flag on the current relation.


pgfadvise_loader
----------------


This function allow to interact directly with the Page Cache.
It can be used to load and/or unload page from memory based on a varbit
representing the map of the pages to load/unload accordingly.


Work with relation pgbench_accounts, segment 0, arbitrary varbit map::


   -- Loading and Unloading
   cedric=# select * from pgfadvise_loader('pgbench_accounts', 0, true, true, B'111000');
        relpath      | os_page_size | os_pages_free | pages_loaded | pages_unloaded 
   ------------------+--------------+---------------+--------------+----------------
    base/11874/16447 |         4096 |        408376 |            3 |              3


   -- Loading
   cedric=# select * from pgfadvise_loader('pgbench_accounts', 0, true, false, B'111000');
        relpath      | os_page_size | os_pages_free | pages_loaded | pages_unloaded 
   ------------------+--------------+---------------+--------------+----------------
    base/11874/16447 |         4096 |        408370 |            3 |              0


   -- Unloading
   cedric=# select * from pgfadvise_loader('pgbench_accounts', 0, false, true, B'111000');
        relpath      | os_page_size | os_pages_free | pages_loaded | pages_unloaded 
   ------------------+--------------+---------------+--------------+----------------
    base/11874/16447 |         4096 |        408370 |            0 |              3


pgfincore
---------


This function provide information about the file system cache (page cache). 


::


   cedric=# select * from pgfincore('pgbench_accounts');
         relpath       | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit 
   --------------------+---------+--------------+--------------+-----------+-----------+---------------+---------
    base/11874/16447   |       0 |         4096 |       262144 |         3 |         1 |        408444 | 
    base/11874/16447.1 |       1 |         4096 |        65726 |         0 |         0 |        408444 | 


For the specified relation it returns:


  * relpath : the relation path 
  * segment : the segment number analyzed 
  * os_page_size : the size of one page
  * rel_os_pages : the total number of pages of the relation
  * pages_mem : the total number of relation's pages in page cache.
    (not the shared buffers from PostgreSQL but the OS cache)
  * group_mem : the number of groups of adjacent pages_mem
  * os_page_free : the number of free page in the OS page cache
  * databit : the varbit map of the file, because of its size it is useless to output
    Use pgfincore('pgbench_accounts',true) to activate it.


DEBUG
=====


You can debug the PgFincore with the following error level: *DEBUG1* and
*DEBUG5*.


For example::


   set client_min_messages TO debug1; -- debug5 is only usefull to trace each block


LIMITATIONS
===========


 * PgFincore needs mincore() and POSIX_FADVISE.


 * PgFincore has a limited mode when POSIX_FADVISE is not provided by the platform.


 * PgFincore needs PostgreSQL >= 8.3


 * PgFincore does not work on windows.


SEE ALSO
========


2ndQuadrant, PostgreSQL Expertise, developement, training and 24x7 support:


  http://2ndQuadrant.fr





你可能感兴趣的:(postgresql)