repmgr实现pg流复制失效自动切换

本次测试中用到的配置及脚本见：https://github.com/lxgithub/repmgr_conf_scripts

一、系统

IP                HOSTNAME  PG VERSION   DIR           OS
192.168.100.146   node1     9.3.4        /opt/pgsql    CentOS6.4_x64
192.168.100.150   node2     9.3.4        /opt/pgsql    CentOS6.4_x64

# cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
# uname -a
Linux barman 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/hosts
127.0.0.1localhost.localdomainlocalhost.localdomainlocalhost4localhost4.localdomain4localhostnode1
::1localhost.localdomainlocalhost.localdomainlocalhost6localhost6.localdomain6localhostnode1
192.168.100.146 node1
192.168.100.150 node2

二、安装

2.1 简介

PostgreSQL 9+ allow us to have replicated Hot Standby servers which we can query and/or use for high availability.

While the main components of the feature are included with PostgreSQL, the user is expected to manage the high availability part of it.

repmgr allows you to monitor and manage your replicated PostgreSQL databases as a single cluster. repmgr includes two components:

repmgr: command program that performs tasks and then exits

repmgrd: management and monitoring daemon that watches the cluster and can automate remote actions.

2.2 需求

Pg version >= 9.0

UNIX-like OS

gcc/gmake

rsync/pg_config/pg_ctl in PATH

2.3 安装pg

安装过程略

注：node1初始化数据库，node2仅安装数据库软件

2.4 编译安装repmgr

# unzip repmgr-master.zip
# mv repmgr-master postgresql-9.3.4/contrib/repmgr
# cd postgresql-9.3.4/contrib/repmgr/
# make && make install

也可通过pg系统管理用户编译安装，如：

[root@node1 ~]# su - postgres
[postgres@node1 ~]$ nzip repmgr-master.zip
[postgres@node1 ~]$ cd repmgr-master
[postgres@node1 repmgr-master]$ make USE_PGXS=1
[postgres@node1 repmgr-master]$ make USE_PGXS=1 install

验证是否安装正确：

[postgres@node1 ~]$ repmgr --version
repmgr 2.1dev (PostgreSQL 9.3.4)
[postgres@node1 ~]$ repmgrd --version
repmgrd 2.1dev (PostgreSQL 9.3.4)

成功安装后会将so及sql文件拷贝到相应的安装目录中：

$ ls /opt/pgsql/share/contrib/
repmgr_funcs.sql  repmgr.sql  uninstall_repmgr_funcs.sql  uninstall_repmgr.sql
$ ls /opt/pgsql/lib/repmgr*
/opt/pgsql/lib/repmgr_funcs.so
$ ls /opt/pgsql/bin/repmgr*
/opt/pgsql/bin/repmgr  /opt/pgsql/bin/repmgrd

postgres=# \df
                                            List of functions
   Schema    |               Name               |     Result data type     | Argument data types |  Type  
-------------+----------------------------------+--------------------------+---------------------+--------
 repmgr_test | repmgr_get_last_standby_location | text                     |                     | normal
 repmgr_test | repmgr_get_last_updated          | timestamp with time zone |                     | normal
 repmgr_test | repmgr_update_last_updated       | timestamp with time zone |                     | normal
 repmgr_test | repmgr_update_standby_location   | boolean                  | text                | normal
(4 rows)

三、配置

3.1 配置ssh互信

[postgres@node1 ~]$ ssh-keygen -t rsa
[postgres@node1 ~]$ ssh-copy-id -i .ssh/id_rsa.pub postgres@node2
[postgres@node1 ~]$ ssh node2 date
Tue Apr 15 01:17:20 CST 2014

[postgres@node2 ~]$ ssh-keygen -t rsa
[postgres@node2 ~]$ ssh-copy-id -i .ssh/id_rsa.pub postgres@node1
[postgres@node2 ~]$ ssh node1 date
Tue Apr 15 01:18:13 CST 2014

3.2 配置主节点数据库

$ vi postgresql.conf
listen_addresses = '*'
port = 5432
archive_mode = on
archive_command = 'cd .'        也可以使用exit 0等等，确保不做任何事情
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 5000      设置大一些
hot_standby = on


$ vi pg_hba.conf
host    all             all             192.168.100.0/24        trust
host    replication     all             192.168.100.0/24         trust

启动数据库：

[postgres@node1 ~]$ pg_ctl start
server starting

3.3 创建用户

在node1上创建repmgr用户：

[postgres@node1 ~]$ createuser --login --superuser repmgr

在node2上测试：

[postgres@node2 ~]$ psql -h node1 -U repmgr -d postgres -c "select version()"
                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)

3.4 克隆standby数据库

[postgres@node2 ~]$ repmgr -D $PGDATA -d postgres -p 5432 -U repmgr -R postgres --verbose standby clone node1

-D 指定将文件拷贝到的目录

-U 指定连接数据库的用户

-R 指定执行rsync命令同步的系统用户

同步完成后自动添加了recovery.conf文件：

[postgres@node2 data]$ cat recovery.conf 
standby_mode = 'on'
primary_conninfo = 'port=5432 host=node1 user=repmgr'

3.5 注册

3.5.1 设置配置文件

node1节点上：

[postgres@node1 ~]$ vi /opt/pgsql/repmgr/repmgr.conf
cluster=test
node=1
node_name=master
conninfo='host=node1 user=repmgr dbname=postgres'
pg_bindir=/opt/pgsql/bin

node2节点上：

[postgres@node2 ~]$ vi /opt/pgsql/repmgr/repmgr.conf
cluster=test
node=2
node_name=slave 
conninfo='host=node2 user=repmgr dbname=postgres'
pg_bindir=/opt/pgsql/bin

3.5.2 进行注册

在node1上执行注册：

[postgres@node1 ~]$ repmgr -f /opt/pgsql/repmgr/repmgr.conf --verbose master register
Opening configuration file: /opt/pgsql/repmgr.conf
[2014-04-15 02:58:44] [INFO] repmgr connecting to master database
[2014-04-15 02:58:44] [INFO] repmgr connected to master, checking its state
[2014-04-15 02:58:44] [INFO] master register: creating database objects inside the repmgr_test schema
[2014-04-15 02:58:44] [NOTICE] Master node correctly registered for cluster test with id 1 (conninfo: host=node1 user=repmgr dbname=postgres)

postgres=# set search_path to repmgr_test;
SET
postgres=# \d
              List of relations
   Schema    |     Name     | Type  | Owner  
-------------+--------------+-------+--------
 repmgr_test | repl_monitor | table | repmgr
 repmgr_test | repl_nodes   | table | repmgr
 repmgr_test | repl_status  | view  | repmgr
(3 rows)

repl_monitor 记录每次监控信息

repl_nodes 记录节点连接信息

repl_status 当前同步监控信息

postgres=# select * from repmgr_test.repl_nodes ;
 id | cluster |  name  |                conninfo                | priority | witness 
----+---------+--------+----------------------------------------+----------+---------
  1 | test    | master | host=node1 user=repmgr dbname=postgres |        0 | f
(1 row)

启动standby数据库：

[postgres@node2 ~]$ pg_ctl start
server starting

在node2上执行注册：

[postgres@node2 ~]$ repmgr -f /opt/pgsql/repmgr/repmgr.conf --verbose standby register
Opening configuration file: /opt/pgsql/repmgr.conf
[2014-04-15 03:05:37] [INFO] repmgr connecting to standby database
[2014-04-15 03:05:37] [INFO] repmgr connected to standby, checking its state
[2014-04-15 03:05:37] [INFO] repmgr connecting to master database
[2014-04-15 03:05:37] [INFO] finding node list for cluster 'test'
[2014-04-15 03:05:37] [INFO] checking role of cluster node 'host=node1 user=repmgr dbname=postgres'
[2014-04-15 03:05:37] [INFO] repmgr connected to master, checking its state
[2014-04-15 03:05:37] [INFO] repmgr registering the standby
[2014-04-15 03:05:37] [INFO] repmgr registering the standby complete
[2014-04-15 03:05:37] [NOTICE] Standby node correctly registered for cluster test with id 2 (conninfo: host=node2 user=repmgr dbname=postgres)

postgres=# select * from repmgr_test.repl_nodes ;
 id | cluster |  name  |                conninfo                | priority | witness 
----+---------+--------+----------------------------------------+----------+---------
  1 | test    | master | host=node1 user=repmgr dbname=postgres |        0 | f
  2 | test    | slave  | host=node2 user=repmgr dbname=postgres |        0 | f
(2 rows)

3.6 监控测试

[postgres@node2 ~]$ repmgrd -f /opt/pgsql/repmgr/repmgr.conf --verbose --monitoring-history > /opt/pgsql/repmgr/repmgr.log 2>&1
[postgres@node2 ~]$ tail -f /opt/pgsql/repmgr/repmgr.log 
[2014-04-15 05:24:42] [INFO] repmgrd Connecting to database 'host=node2 user=repmgr dbname=postgres'
[2014-04-15 05:24:42] [INFO] repmgrd Connected to database, checking its state
[2014-04-15 05:24:42] [INFO] repmgrd Connecting to primary for cluster 'test'
[2014-04-15 05:24:42] [INFO] finding node list for cluster 'test'
[2014-04-15 05:24:42] [INFO] checking role of cluster node 'host=node1 user=repmgr dbname=postgres'
[2014-04-15 05:24:42] [INFO] repmgrd Checking cluster configuration with schema 'repmgr_test'
[2014-04-15 05:24:42] [INFO] repmgrd Checking node 2 in cluster 'test'
[2014-04-15 05:24:42] [INFO] Reloading configuration file and updating repmgr tables
[2014-04-15 05:24:42] [INFO] repmgrd Starting continuous standby node monitoring

在备端启动监控进程后会在发起一个连向主端的进程，该连接将实时的监控信息insert到主库的repl_monitor中。可以在主端服务器查到该进程，如：

postgres  5541   670  0 03:23 ?        00:00:00 postgres: repmgr repmgr 192.168.100.146(56388) idle

postgres=# \d repl_monitor
                 Table "repmgr_test.repl_monitor"
          Column           |           Type           | Modifiers 
---------------------------+--------------------------+-----------
 primary_node              | integer                  | not null
 standby_node              | integer                  | not null
 last_monitor_time         | timestamp with time zone | not null
 last_apply_time           | timestamp with time zone | 
 last_wal_primary_location | text                     | not null
 last_wal_standby_location | text                     | 
 replication_lag           | bigint                   | not null
 apply_lag                 | bigint                   | not null
Indexes:
    "idx_repl_status_sort" btree (last_monitor_time, standby_node)
   
postgres=# \d repl_nodes
       Table "repmgr_test.repl_nodes"
  Column  |  Type   |       Modifiers        
----------+---------+------------------------
 id       | integer | not null
 cluster  | text    | not null
 name     | text    | not null
 conninfo | text    | not null
 priority | integer | not null
 witness  | boolean | not null default false
Indexes:
    "repl_nodes_pkey" PRIMARY KEY, btree (id)
    
postgres=# \d repl_status 
                  View "repmgr_test.repl_status"
          Column           |           Type           | Modifiers 
---------------------------+--------------------------+-----------
 primary_node              | integer                  | 
 standby_node              | integer                  | 
 standby_name              | text                     | 
 last_monitor_time         | timestamp with time zone | 
 last_wal_primary_location | text                     | 
 last_wal_standby_location | text                     | 
 replication_lag           | text                     | 
 replication_time_lag      | interval                 | 
 apply_lag                 | text                     | 
 communication_time_lag    | interval                 |

查看repl_status视图定义：

postgres=# select definition from pg_views where viewname = 'repl_status';
                                                  definition                                                   
---------------------------------------------------------------------------------------------------------------
  SELECT repl_monitor.primary_node,                                                                           +
     repl_monitor.standby_node,                                                                               +
     repl_nodes.name AS standby_name,                                                                         +
     repl_monitor.last_monitor_time,                                                                          +
     repl_monitor.last_wal_primary_location,                                                                  +
     repl_monitor.last_wal_standby_location,                                                                  +
     pg_size_pretty(repl_monitor.replication_lag) AS replication_lag,                                         +
     age(now(), repl_monitor.last_apply_time) AS replication_time_lag,                                        +
     pg_size_pretty(repl_monitor.apply_lag) AS apply_lag,                                                     +
     age(now(),                                                                                               +
         CASE                                                                                                 +
             WHEN pg_is_in_recovery() THEN repmgr_get_last_updated()                                          +
             ELSE repl_monitor.last_monitor_time                                                              +
         END) AS communication_time_lag                                                                       +
    FROM (repl_monitor                                                                                        +
    JOIN repl_nodes ON ((repl_monitor.standby_node = repl_nodes.id)))                                         +
   WHERE ((repl_monitor.standby_node, repl_monitor.last_monitor_time) IN ( SELECT repl_monitor_1.standby_node,+
        max(repl_monitor_1.last_monitor_time) AS max                                                          +
       FROM repl_monitor repl_monitor_1                                                                       +
      GROUP BY repl_monitor_1.standby_node));
(1 row)

通过视图查看当前同步情况：

postgres=# select * from repl_status ;
-[ RECORD 1 ]-------------+-----------------------------
primary_node              | 1
standby_node              | 2
standby_name              | slave
last_monitor_time         | 2014-04-15 05:28:32.53065+08
last_wal_primary_location | 0/3052FF0
last_wal_standby_location | 0/3052FF0
replication_lag           | 0 bytes
replication_time_lag      | 00:00:03.27349
apply_lag                 | 0 bytes
communication_time_lag    | 00:00:00.013697

向主库插入数据：

[postgres@node1 ~]$ pgbench -i -s 10 pgbench

期间查看同步情况：

postgres=# select * from repmgr_test.repl_status ;
-[ RECORD 1 ]-------------+------------------------------
primary_node              | 1
standby_node              | 2
standby_name              | slave
last_monitor_time         | 2014-04-15 05:43:53.368038+08
last_wal_primary_location | 0/48CC000
last_wal_standby_location | 0/4000000
replication_lag           | 9008 kB
replication_time_lag      | 00:00:04.031926
apply_lag                 | 336 bytes
communication_time_lag    |

四、切换模拟

4.1 模拟主库故障

[postgres@node1 ~]$ pg_ctl stop -m f

在node2上查看同步情况：

postgres=# select * from repmgr_test.repl_status ;
-[ RECORD 1 ]-------------+------------------------------
primary_node              | 1
standby_node              | 2
standby_name              | slave
last_monitor_time         | 2014-04-15 05:50:26.687504+08
last_wal_primary_location | 0/ADE9668
last_wal_standby_location | 0/ADE9668
replication_lag           | 0 bytes
replication_time_lag      | 00:01:12.366403
apply_lag                 | 0 bytes
communication_time_lag    |

（滞后时间在不断增加）

监控日志输出：

[2014-04-15 05:50:28] [WARNING] wait_connection_availability: could not receive data from connection.

[2014-04-15 05:50:28] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 60 seconds before failover decision

[2014-04-15 05:50:38] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 50 seconds before failover decision

[2014-04-15 05:50:48] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 40 seconds before failover decision

[2014-04-15 05:50:58] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 30 seconds before failover decision

[2014-04-15 05:51:08] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 20 seconds before failover decision

[2014-04-15 05:51:18] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 10 seconds before failover decision

[2014-04-15 05:51:28] [ERROR] repmgrd: We couldn't reconnect for long enough, exiting...

[2014-04-15 05:51:28] [ERROR] We couldn't reconnect to master. Now checking if another node has been promoted.

[2014-04-15 05:51:28] [INFO] finding node list for cluster 'test'

[2014-04-15 05:51:28] [INFO] checking role of cluster node 'host=node1 user=repmgr dbname=postgres'

[2014-04-15 05:51:28] [ERROR] Connection to database failed: could not connect to server: Connection refused

Is the server running on host "node1" (192.168.100.146) and accepting

TCP/IP connections on port 5432?

[2014-04-15 05:51:28] [INFO] checking role of cluster node 'host=node2 user=repmgr dbname=postgres'

[2014-04-15 05:51:28] [ERROR] We haven't found a new master, waiting before retry...

4.2 提升备库状态

[postgres@node2 ~]$ repmgr -f /opt/pgsql/repmgr/repmgr.conf --verbose standby promote

4.3 将node1恢复为standby

[postgres@node1 ~]$ repmgr -D $PGDATA -d postgres -p 5432 -U repmgr -R postgres --verbose --force standby clone node2

启动：

[postgres@node1 ~]$ pg_ctl start
server starting

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

4.4 将node1恢复为master

停止node2：

[postgres@node2 ~]$ pg_ctl stop -m f

提升node1状态：

[postgres@node1 ~]$ repmgr -f /opt/pgsql/repmgr/repmgr.conf --verbose standby promote

恢复node2为standby：

[postgres@node2 ~]$ repmgr -D $PGDATA -d postgres -p 5432 -U repmgr -R postgres --verbose --force standby clone node1

启动node2上的数据库：

[postgres@node2 ~]$ pg_ctl start
server starting

postgres=# select * from repmgr_test.repl_status ;
-[ RECORD 1 ]-------------+------------------------------
primary_node              | 1
standby_node              | 2
standby_name              | slave
last_monitor_time         | 2014-04-15 06:19:58.531949+08
last_wal_primary_location | 0/10003070
last_wal_standby_location | 0/10003070
replication_lag           | 0 bytes
replication_time_lag      | 00:00:03.967245
apply_lag                 | 0 bytes
communication_time_lag    |

五、搭建自动切换集群环境

IP                  HOSTNAME     PG VERSION     DIR             OS                ROLE
192.168.100.146     node1        9.3.4          /opt/pgsql      CentOS6.4_x64     master
192.168.100.150     node2        9.3.4          /opt/pgsql      CentOS6.4_x64     standby
192.168.100.190     witness      9.3.4          /opt/pgsql      CentOS6.4_x64     witness

# vi /etc/hosts
192.168.100.190 witness
192.168.100.146 node1
192.168.100.150 node2

5.1 安装pg

在3个节点上安装pg数据库，过程略

5.2 安装repmgr

在3个节点上安装repmgr,过程参考2.4节

5.3 配置ssh互信

保证3个节点上的postgres用户可以互相免密码访问，配置过程参考3.1节

5.4 配置主库

$ vi postgresql.conf
listen_addresses = '*'
port = 5432
archive_mode = on
archive_command = 'cd .'        也可以使用exit 0等等，确保不做任何事情
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 5000      必须>=5000
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'


$ vi pg_hba.conf
host    all             all             192.168.100.0/24        trust
host    replication     all             192.168.100.0/24         trust

启动数据库：

[postgres@node1 ~]$ pg_ctl start
server starting

在node1上创建repmgr用户：

[postgres@node1 ~]$ createuser -s repmgr

在node1上创建repmgr数据库：

[postgres@node1 ~]$ createdb -O repmgr repmgr

在node2上测试：

[postgres@node2 ~]$ psql -h node1 -U repmgr -d postgres -c "select version()"
                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)

5.5 克隆standby数据库

[postgres@node2 ~]$ repmgr -D $PGDATA -d repmgr -p 5432 -U repmgr -R postgres --verbose standby clone node1

启动数据库：

[postgres@node2 ~]$ pg_ctl start
server starting

5.6 配置repmgr

node1：

[postgres@node1 ~]$ vi /opt/pgsql/repmgr/repmgr.conf
cluster=my_cluster
node=1
node_name=node1
conninfo='host=node1 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='repmgr standby promote -f /opt/pgsql/repmgr/repmgr.conf'
pg_bindir=/opt/pgsql/bin

node2：

[postgres@node2 ~]$ vi /opt/pgsql/repmgr/repmgr.conf
cluster=my_cluster
node=2
node_name=node2
conninfo='host=node2 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='repmgr standby promote -f /opt/pgsql/repmgr/repmgr.conf'
pg_bindir=/opt/pgsql/bin

witness：

[postgres@witness ~]$ vi /opt/pgsql/repmgr/repmgr.conf
cluster=my_cluster
node=3
node_name=witness
conninfo='host=witness dbname=postgres user=postgres port=5499'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='repmgr standby promote -f /opt/pgsql/repmgr/repmgr.conf'
pg_bindir=/opt/pgsql/bin

5.7 注册主备库

[postgres@node1 ~]$ repmgr -f /opt/pgsql/repmgr/repmgr.conf --verbose master register
[postgres@node2 ~]$ repmgr -f /opt/pgsql/repmgr/repmgr.conf --verbose standby register

5.8 初始化witness

[postgres@witness ~]$ repmgr -d repmgr -U repmgr -h node1 -D $PGDATA -f /opt/pgsql/repmgr/repmgr.conf witness create

初始化完成后数据库自动启动

登录到witness的postgres库中查看nodes信息：

postgres=# select * from repmgr_my_cluster.repl_nodes ;
 id |  cluster   | name  |               conninfo               | priority | witness 
----+------------+-------+--------------------------------------+----------+---------
  1 | my_cluster | node1 | host=node1 dbname=repmgr user=repmgr |        0 | f
(1 row)

登录到node1的repmgr库中查看nodes信息：

repmgr=# select * from repmgr_my_cluster.repl_nodes ;
 id |  cluster   |  name   |                     conninfo                     | priority | witness 
----+------------+---------+--------------------------------------------------+----------+---------
  1 | my_cluster | node1   | host=node1 dbname=repmgr user=repmgr             |        0 | f
  2 | my_cluster | node2   | host=node2 dbname=repmgr user=repmgr             |        0 | f
  3 | my_cluster | witness | host=witness dbname=repmgr user=repmgr port=5499 |        0 | t
(3 rows)

5.9 启动监控

在node2上启动监控进程：

repmgrd -f /opt/pgsql/repmgr/repmgr.conf --verbose --monitoring-history > /opt/pgsql/repmgr/repmgr.log 2>&1 &

5.10 模拟主库故障

在node2上查询：

[postgres@node2 ~]$ psql -c "select pg_is_in_recovery()"
 pg_is_in_recovery 
-------------------
 t
(1 row)

{表明在恢复状态}

停止主库：

[postgres@node1 ~]$ pg_ctl stop -m f

此时node2上repmgr日志信息如下：

FATAL: terminating connection due to administrator command

[2014-04-15 08:57:07] [WARNING] Can't stop current query: PQcancel() -- connect() failed: Connection refused

[2014-04-15 08:57:07] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 60 seconds before failover decision

[2014-04-15 08:57:17] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 50 seconds before failover decision

[2014-04-15 08:57:27] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 40 seconds before failover decision

[2014-04-15 08:57:37] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 30 seconds before failover decision

[2014-04-15 08:57:47] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 20 seconds before failover decision

[2014-04-15 08:57:57] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 10 seconds before failover decision

[2014-04-15 08:58:07] [ERROR] repmgrd: We couldn't reconnect for long enough, exiting...

[2014-04-15 08:58:08] [ERROR] Connection to database failed: could not connect to server: Connection refused

Is the server running on host "node1" (192.168.100.146) and accepting

TCP/IP connections on port 5432?

[2014-04-15 08:58:13] [INFO] repmgrd: This node is the best candidate to be the new primary, promoting...

[2014-04-15 08:58:13] [ERROR] Connection to database failed: could not connect to server: Connection refused

Is the server running on host "node1" (192.168.100.146) and accepting

TCP/IP connections on port 5432?

[2014-04-15 08:58:13] [NOTICE] repmgr: Promoting standby

[2014-04-15 08:58:13] [NOTICE] repmgr: restarting server using /opt/pgsql/bin/pg_ctl

[2014-04-15 08:58:15] [ERROR] repmgr: STANDBY PROMOTE successful. You should REINDEX any hash indexes you have.

[2014-04-15 08:58:17] [INFO] repmgrd Checking cluster configuration with schema 'repmgr_my_cluster'

[2014-04-15 08:58:17] [INFO] repmgrd Checking node 2 in cluster 'my_cluster'

[2014-04-15 08:58:17] [INFO] Reloading configuration file and updating repmgr tables

[2014-04-15 08:58:17] [INFO] repmgrd Starting continuous primary connection check

出现上面日志信息，说明已经切换，在node2上确认状态：

[postgres@node2 repmgr]$ psql -c "select pg_is_in_recovery()"
 pg_is_in_recovery 
-------------------
 f
(1 row)

在node2上查看节点信息：

[postgres@node2 ~]$ repmgr cluster show -f /opt/pgsql/repmgr/repmgr.conf
Role      | Connection String 
[2014-04-15 09:01:59] [ERROR] Connection to database failed: could not connect to server: Connection refused
Is the server running on host "node1" (192.168.100.146) and accepting
TCP/IP connections on port 5432?
  FAILED  | host=node1 dbname=repmgr user=repmgr
  witness | host=witness dbname=postgres user=postgres port=5499
* master  | host=node2 dbname=repmgr user=repmgr

5.11 恢复node1为master

终止node2上之前的监控进程：

[postgres@node2 repmgr]$ kill -9 `pidof repmgrd`

恢复node1为standby：

[postgres@node1 ~]$ repmgr -D $PGDATA -d repmgr -p 5432 -U repmgr -R postgres --verbose --force standby clone node2

启动node1节点数据库：

[postgres@node1 ~]$ pg_ctl start
server starting
[postgres@node1 ~]$ psql -c "select pg_is_in_recovery()"
 pg_is_in_recovery 
-------------------
 t
(1 row)

在node1上启动监控进程：

[postgres@node1 ~]$ repmgrd -f /opt/pgsql/repmgr/repmgr.conf --verbose --monitoring-history > /opt/pgsql/repmgr/repmgr.log 2>&1 &

停止node2节点数据库：

[postgres@node2 ~]$ pg_ctl stop -m f
waiting for server to shut down.... done
server stopped

此时再次触发失效切换，master角色将由node2切换到node1。此处也可不等待自动切换，直接手动执行promote。

恢复node2为standby：

[postgres@node2 ~]$ repmgr -D $PGDATA -d repmgr -p 5432 -U repmgr -R postgres --verbose --force standby clone node1

启动node2节点数据库：

[postgres@node2 ~]$ pg_ctl start
server starting
[postgres@node2 ~]$ psql -c "select pg_is_in_recovery()"
 pg_is_in_recovery 
-------------------
 t
(1 row)

在node2上重新启动监控进程：

[postgres@node2 ~]$ repmgrd -f /opt/pgsql/repmgr/repmgr.conf --verbose --monitoring-history > /opt/pgsql/repmgr/repmgr.log 2>&1 &

查看同步情况:

repmgr=# select * from repmgr_my_cluster.repl_status ;
-[ RECORD 1 ]-------------+------------------------------
primary_node              | 1
standby_node              | 2
standby_name              | node2
last_monitor_time         | 2014-04-15 09:22:36.838084+08
last_wal_primary_location | 0/C003088
last_wal_standby_location | 0/C003088
replication_lag           | 0 bytes
replication_time_lag      | 00:00:04.119423
apply_lag                 | 0 bytes
communication_time_lag    | 00:00:00.06189
-[ RECORD 2 ]-------------+------------------------------
primary_node              | 2
standby_node              | 1
standby_name              | node1
last_monitor_time         | 2014-04-15 09:15:39.482578+08
last_wal_primary_location | 0/9008848
last_wal_standby_location | 0/9008848
replication_lag           | 0 bytes
replication_time_lag      | 00:06:58.398443
apply_lag                 | 0 bytes
communication_time_lag    | 00:06:57.417396

注：为排除无用监控数据干扰，可执行一次清理命令（repmgr cluster cleanup -f repmgr.conf），同时清理操作可用作为日常的维护任务，避免监控数据占用过多磁盘空间，可合理加入-k参数指定保留几天的监控数据。

六、配置参数说明（repmgr.conf）

cluster：为管理的集群指定一个名称。如test、……。
node：节点ID
node_name：指定节点的名称，如master、standby、node1、……。
conninfo：指定连接数据库的信息，包括host、user、dbname、port等。
rsync_options：指定rsync命令参数，默认为--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\""。
ssh_options：指定ssh连接参数。
master_response_timeout：等待主库响应的最长时间，默认60s。
reconnect_attempts：无主库响应后尝试重新连接的等待时间，默认6s。
reconnect_interval：尝试连接的时间间隔，默认10s。
failover：指定手动切换或自动切换，默认为manual。自动为automatic。
priority：当存在多个备节点时，用于指定提升为主库的优先权。默认为-1,表示禁用。一个备节点时无需设置。
promote_command：当切换触发时执行的promote命令。例如'repmgr standby promote -f /path/to/repmgr.conf'。成功执行返回0。也可指定为一个脚本，以便实现更复杂的操作。
follow_command：当切换触发时执行的follow命令。例如'repmgr standby follow -f /path/to/repmgr.conf -W'。当存在多个备节点时，未被提升为主库的备节点需要执行该命令等待重新连接到新的主库。一个备节点时无需设置。成功执行返回0。也可指定为一个脚本，以便实现更复杂的操作。
loglevel：指定日志级别，如DEBUG, INFO, NOTICE, WARNING, ERR, ALERT, CRIT ，EMERG，默认为NOTICE。
logfacility：输出日志的设备，默认为STDERR（标准错误输出）。
pg_bindir：指定pg数据库的pg_ctl命令路径。
pg_ctl_options：为pg_ctl命令指定附加命令项，如'-s'。
logfile：指定日志文件，如'/var/log/repmgr.log'
monitor_interval_secs：指定监控间隔，默认2s一次。
retry_promote_interval_secs：指定当主节点失效时，若promote操作失败后将等待retry_promote_interval_secs秒后重新进行promote，总共进行6次，6次后将退出。默认为300，表示间隔时间为半小时。

七、总结及注意事项

1. repmgr使集群中备库的创建及主库的重新恢复变得简单方便，仅需一条命令就可实现。文件的同步是通过rsync实现的，因此服务器必须安装支持rsync;

2. 发生standby promote时先将recovery.conf重命名为recovery.done然后重启standby（pg_ctl -D $PGDATA -m fast restart），启动后成为新的主节点;

3. 提供高可用功能，相比pgpool-II与linux-ha实现hot standby高可用性，repmgr安装配置更加简单，不过需要另外提供一个witness节点;

4. 可自动实现在发生切换后其余备库对新主库的重新连接，这样便使得repmgr支持多个备节点;

5. 关于promote_command的设置。若pg version>=9.1，则不需要再按照repmgr官方提供的那样来配置，因为在pg v9.1中新加入了pg_ctl promote的支持，这样将不需要重启备库就可方便提升为主库;

6. 监控节点故障及找到一个新的主节点的任务全部交由repmgrd来完成的，因此要实现自动切换必须在故障发生之前首先启动repmgrd守护进程;

7. 关于IP的切换。仅依赖上面的测试配置，虽然能实现失效自动切换，但是此时应用程序将不能连接到数据库，因为IP地址发生了变化。不过可以在以上测试基础上加上pgbouncer连接池工具，将pgbouncer的更改连接配置过程放入promote_command的脚本中。这样在发生切换后，应用程序不需要更改连接信息。（实现过程参考章节八）

八、实现IP自动切换

以下实验接第五章节

8.1 安装pgbouncer

将pgbouncer安装在witness节点上

下载地址：http://pgfoundry.org/frs/?group_id=1000258

8.1.1 安装libevent

下载地址：http://libevent.org/

[root@witness ~]# tar -zxvf libevent-2.0.21-stable.tar.gz
[root@witness ~]# cd libevent-2.0.21-stable
[root@witness libevent-2.0.21-stable]# ./configure --prefix=/opt/libevent
[root@witness libevent-2.0.21-stable]# make
[root@witness libevent-2.0.21-stable]# make install

8.1.2 安装pgbouncer

[root@witness ~]# tar -zxvf pgbouncer-1.5.4.tar.gz
[root@witness ~]# cd pgbouncer-1.5.4
[root@witness pgbouncer-1.5.4]# ./configure --prefix=/opt/pgbouncer --with-libevent=/opt/libevent/
[root@witness pgbouncer-1.5.4]# make
[root@witness pgbouncer-1.5.4]# make install

更改pgbouncer权限：

[root@witness ~]# chown -R postgres:postgres /opt/pgbouncer/

更改postgres用户环境变量，加入pgbouncer路径及libevetn的lib路径：

[root@witness ~]# su - postgres
[postgres@witness ~]$ vi .bash_profile
export PATH=/opt/pgbouncer/bin:/opt/pgsql/bin:$PATH:$HOME/bin
export PGDATA=/opt/pgsql/data
export PGUSER=postgres
export PGPORT=5499
export LD_LIBRARY_PATH=/opt/libevent/lib:/opt/pgsql/lib:$LD_LIBRARY_PATH

使更改生效：

[postgres@witness ~]$ source .bash_profile

测试：

[postgres@witness ~]$ pgbouncer -V
pgbouncer version 1.5.4 (compiled by <root@witness> at 2014-04-16 06:50:09)

8.1.3 配置pgbouncer

[postgres@witness ~]$ cp /opt/pgbouncer/share/doc/pgbouncer/pgbouncer.ini /opt/pgbouncer/
[postgres@witness ~]$ vi /opt/pgbouncer/pgbouncer.ini
[databases]
masterdb = host=192.168.100.146 port=5432 dbname=postgres user=postgres
[pgbouncer]
logfile = /opt/pgbouncer/pgbouncer.log
pidfile = /opt/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = trust
auth_file = /opt/pgbouncer/userlist.txt
admin_users = pgbouncer
pool_mode = session

设置userlist：

[postgres@witness ~]$ cp /opt/pgbouncer/share/doc/pgbouncer/userlist.txt /opt/pgbouncer/
[postgres@witness ~]$ vi /opt/pgbouncer/userlist.txt
"postgres" "123456"
"pgbouncer" "123456"

8.1.4 启动pgbouncer

[postgres@witness ~]$ pgbouncer -d /opt/pgbouncer/pgbouncer.ini 
2014-04-16 07:12:48.287 22662 LOG File descriptor limit: 1024 (H:4096), max_client_conn: 100, max fds possible: 130

8.1.5 测试

从其它服务器连接测试：

[highgo@lx-pc ~]$ psql -h 192.168.100.190 -p 6432 -U postgres masterdb
psql (9.0.9, server 9.3.4)
WARNING: psql version 9.0, server version 9.3.
         Some psql features might not work.
Type "help" for help.
masterdb=# \l
                                  List of databases
   Name    |  Owner   | Encoding |  Collation  |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 repmgr    | repmgr   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)
masterdb=# SELECT pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)

连接pgbouncer本身的数据库：

[highgo@lx-pc ~]$ psql -h 192.168.100.190 -p 6432 -U pgbouncer pgbouncer
psql (9.0.9, server 1.5.4/bouncer)
WARNING: psql version 9.0, server version 1.5.
         Some psql features might not work.
Type "help" for help.
pgbouncer=# show help;
NOTICE:  Console usage
DETAIL:  
SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|VERSION
SHOW STATS|FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM
SHOW DNS_HOSTS|DNS_ZONES
SET key = arg
RELOAD
PAUSE [<db>]
RESUME [<db>]
KILL <db>
SUSPEND
SHUTDOWN
SHOW
pgbouncer=# show clients;
 type |   user    | database  | state  |      addr       | port  |   local_addr    | local_port |    connect_time     |    request_time     |    ptr  
  | link 
------+-----------+-----------+--------+-----------------+-------+-----------------+------------+---------------------+---------------------+---------
--+------
 C    | pgbouncer | pgbouncer | active | 192.168.100.108 | 63984 | 192.168.100.190 |       6432 | 2014-04-16 07:27:08 | 2014-04-16 07:27:52 | 0x15f4a3
0 | 
(1 row)

至此pgbouncer设置完成

8.2 repmgr配置

8.2.1 修改promote_command

将3个节点上的repmgr.conf中promote_command均改为：

promote_command='/opt/pgsql/repmgr/failover.sh'

8.2.2 设置failover.sh脚本

在node1与node2上分别创建failover.sh脚本，并赋执行权限。

脚本内容见后附内容。需要注意脚本中以下设置的不同：

node1：
MASTER_IP=192.168.100.146
node2：
MASTER_IP=192.168.100.150

8.3 模拟切换

从一个客户端首先检查当前是否可以正常连接：

[highgo@lx-pc ~]$ psql -h 192.168.100.190 -p 6432 -U postgres masterdb -c "select pg_is_in_recovery()"
 pg_is_in_recovery 
-------------------
 f
(1 row)
[highgo@lx-pc ~]$ psql -h 192.168.100.190 -p 6432 -U pgbouncer pgbouncer -c "show databases"
   name    |      host       | port | database  | force_user | pool_size | reserve_pool 
-----------+-----------------+------+-----------+------------+-----------+--------------
 masterdb  | 192.168.100.146 | 5432 | postgres  | postgres   |        20 |            0
 pgbouncer |                 | 6432 | pgbouncer | pgbouncer  |         2 |            0
(2 rows)

停止node1上的数据库：

[postgres@node1 repmgr]$ pg_ctl stop -m f

监控node2上日志，输出以下内容：

FATAL: terminating connection due to administrator command

[2014-04-17 05:01:06] [WARNING] wait_connection_availability: could not receive data from connection.

[2014-04-17 05:01:06] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 60 seconds before failover decision

[2014-04-17 05:01:16] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 50 seconds before failover decision

[2014-04-17 05:01:26] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 40 seconds before failover decision

[2014-04-17 05:01:36] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 30 seconds before failover decision

[2014-04-17 05:01:46] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 20 seconds before failover decision

[2014-04-17 05:01:56] [WARNING] repmgrd: Connection to master has been lost, trying to recover... 10 seconds before failover decision

[2014-04-17 05:02:06] [ERROR] repmgrd: We couldn't reconnect for long enough, exiting...

[2014-04-17 05:02:06] [ERROR] Connection to database failed: could not connect to server: Connection refused

Is the server running on host "node1" (192.168.100.146) and accepting

TCP/IP connections on port 5432?

[2014-04-17 05:02:11] [INFO] repmgrd: This node is the best candidate to be the new primary, promoting...

2014-04-17 05:02:11 server promoting

2014-04-17 05:02:11 FAILOVER-ERROR: the db is still in recovery!

2014-04-17 05:02:12 FAILOVER-INFO: promote successful!

2014-04-17 05:02:13 FAILVOER-INFO: change pgbouncer.ini successful!

2014-04-17 05:02:13.050 32668 LOG File descriptor limit: 1024 (H:4096), max_client_conn: 100, max fds possible: 130

2014-04-17 05:02:13.051 32668 LOG takeover_init: launching connection

2014-04-17 05:02:13.052 32668 LOG S-0xa9ba40: pgbouncer/pgbouncer@unix :6432 new connection to server

2014-04-17 05:02:13.052 32668 LOG S-0xa9ba40: pgbouncer/pgbouncer@unix :6432 Login OK, sending SUSPEND

2014-04-17 05:02:13.053 32668 LOG SUSPEND finished, sending SHOW FDS

2014-04-17 05:02:13.054 32668 LOG got pooler socket: 0.0.0.0@6432

2014-04-17 05:02:13.054 32668 LOG got pooler socket: unix@6432

2014-04-17 05:02:13.054 32668 LOG SHOW FDS finished

2014-04-17 05:02:13.055 32668 LOG disko over, going background

2014-04-17 05:02:13 FAILOVER-INFO: pgbouncer reload successful!

################################# The New Conn_info ####################################

-----------+-----------------+------+-----------+------------+-----------+--------------

masterdb | 192.168.100.150 | 5432 | postgres | postgres | 20 | 0

pgbouncer | | 6432 | pgbouncer | pgbouncer | 2 | 0

(2 rows)

########################################################################################

再次从客户端检查连接：

[highgo@lx-pc ~]$ psql -h 192.168.100.190 -p 6432 -U postgres masterdb -c "select pg_is_in_recovery()"
 pg_is_in_recovery 
-------------------
 f
(1 row)
[highgo@lx-pc ~]$ psql -h 192.168.100.190 -p 6432 -U pgbouncer pgbouncer -c "show databases"
   name    |      host       | port | database  | force_user | pool_size | reserve_pool 
-----------+-----------------+------+-----------+------------+-----------+--------------
 masterdb  | 192.168.100.150 | 5432 | postgres  | postgres   |        20 |            0
 pgbouncer |                 | 6432 | pgbouncer | pgbouncer  |         2 |            0
(2 rows)

{已经自动将连接切换到了node2上}

九、参考文献

官方网站：http://www.repmgr.org/?gclid=CLDFj57x3r0CFUUHvAod_38ANQ

文档：https://github.com/2ndQuadrant/repmgr

十、license

GPL V3

十一、附

failover.sh脚本：

#!/bin/bash
# Created by [email protected] 2014/04/16
# Do
#   1.Promote the standby.
#   2.Change the pgbouncer.ini on pgbouncer-server.
#   3.Restart the pgbouncer.ini on pgbouncer-server.
PGHOME=/opt/pgsql
PGBIN=$PGHOME/bin
PGDATA=$PGHOME/data
PGPORT=5432
PGUSER=postgres
LOG_FILE=/opt/pgsql/repmgr/failover.log
BOUN_SERVER=witness
BOUN_FILE=/opt/pgbouncer/pgbouncer.ini
BOUN_LISTEN_PORT=6432
BOUN_ADMIN_USER=pgbouncer
# STANDBY_IP = FAIL_NODE_IP
STANDBY_IP=192.168.100.*
# MASTER_IP = LOCAL_IP
MASTER_IP=192.168.100.146
CONN_INFO="user=postgres port=5432 dbname=postgres"
TIME=`date '+%Y-%m-%d %H:%M:%S'`
echo -n "$TIME " >> $LOG_FILE
$PGBIN/pg_ctl -D $PGDATA promote >> $LOG_FILE
if [ $? == 0 ];then
  IF_RECOVERY=" t"
  while [ "$IF_RECOVERY" = " t" ];do
  TIME=`date '+%Y-%m-%d %H:%M:%S'`
  IF_RECOVERY=`psql -c "select pg_is_in_recovery()" | sed -n '3,3p'`
  if [ "$IF_RECOVERY" = " f" ];then
    echo "$TIME FAILOVER-INFO: promote successful!" >> $LOG_FILE
    TIME=`date '+%Y-%m-%d %H:%M:%S'`
    echo "sed -i 's/host=$STANDBY_IP/host=$MASTER_IP $CONN_INFO/g' $BOUN_FILE" | ssh $PGUSER@$BOUN_SERVER bash
    if [ $? == 0 ];then
      echo "$TIME FAILVOER-INFO: change pgbouncer.ini successful!" >> $LOG_FILE
#     $PGBIN/psql -h $BOUN_SERVER -p $BOUN_LISTEN_PORT -U $BOUN_ADMIN_USER pgbouncer -c "reload" > /dev/null
#      TIME=`date '+%Y-%m-%d %H:%M:%S'`
#      echo "$TIME " >> $LOG_FILE
      ssh postgres@witness ". ~/.bash_profile;pgbouncer -R -d $BOUN_FILE" &>> $LOG_FILE
      if [ $? == 0 ];then
        TIME=`date '+%Y-%m-%d %H:%M:%S'`
        echo "$TIME FAILOVER-INFO: pgbouncer reload successful!" >> $LOG_FILE
        echo "################################# The New Conn_info ####################################" >> $LOG_FILE
        # Ensure the auth_type = trust
        $PGBIN/psql -h $BOUN_SERVER -p $BOUN_LISTEN_PORT -U $BOUN_ADMIN_USER pgbouncer -c "show databases" >> $LOG_FILE
        echo "########################################################################################" >> $LOG_FILE
      else
        echo "$TIME FAILOVER-ERROR: pgbouncer reload failed!" >> $LOG_FILE
      fi
    else
      echo "$TIME FAILOVER-ERROR: change pgbouncer.ini failed!" >> $LOG_FILE
    fi
  else
    echo "$TIME FAILOVER-ERROR: the db is still in recovery! Sleep 1s and Retry..." >> $LOG_FILE
    sleep 1
  fi
  done
else
  echo "$TIME ERROR: promote failed!"
fi

你可能感兴趣的:(PostgreSQL,HA,Replication,Failover,repmgr,流复制)

android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
每日一题——第八十九题互联网打工人no1 C语言程序设计每日一练 c语言
题目：在字符串中找到提取数字，并统计一共找到多少整数，a123xxyu23&8889，那么找到的整数为123，23，8889//思想：#include#include#includeintmain(){charstr[]="a123xxyu23&8889";intcount=0;intnum=0;//用于临时存放当前正在构建的整数。boolinNum=false;//用于标记当前是否正在读取一个整
每日一题——第九十题互联网打工人no1 C语言程序设计每日一练 c语言
题目：判断子串是否与主串匹配#include#include#include//////判断子串是否在主串中匹配//////主串///子串///boolisSubstring(constchar*str,constchar*substr){intlenstr=strlen(str);//计算主串的长度intlenSub=strlen(substr);//计算子串的长度//遍历主字符串，对每个可能得
每日一题——第八十一题互联网打工人no1 C语言程序设计每日一练 c语言
打印如下图案:#includeintmain(){inti,j;charch='A';for(i=1;i<5;i++,ch++){for(j=0;j<5-i;j++){printf("");//控制空格输出}for(j=1;j<2*i;j++)//条件j<2*i{printf("%c",ch);//控制字符输出}printf("\n");}return0;}
每日一题——第八十四题互联网打工人no1 C语言程序设计每日一练 c语言
题目：编写函数1、输入10个职工的姓名和职工号2、按照职工由大到小顺序排列，姓名顺序也随之调整3、要求输入一个职工号，用折半查找法找出该职工的姓名#define_CRT_SECURE_NO_WARNINGS#include#include#defineMAX_EMPLOYEES10typedefstruct{intid;charname[50];}Empolyee;voidinputEmploye
每日一题——第八十二题互联网打工人no1 C语言程序设计每日一练 c语言
题目：将一个控制台输入的字符串中的所有元音字母复制到另一字符串中#include#include#include#include#defineMAX_INPUT1024boolisVowel(charp);intmain(){charinput[MAX_INPUT];charoutput[MAX_INPUT];printf("请输入一串字符串：\n");fgets(input,sizeof(inp
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
利用LangChain的StackExchange组件实现智能问答系统 nseejrukjhad langchain microsoft 数据库 python
利用LangChain的StackExchange组件实现智能问答系统引言在当今的软件开发世界中，StackOverflow已经成为程序员解决问题的首选平台之一。而LangChain作为一个强大的AI应用开发框架，提供了StackExchange组件，使我们能够轻松地将StackOverflow的海量知识库集成到我们的应用中。本文将详细介绍如何使用LangChain的StackExchange组件
如何部分格式化提示模板:LangChain中的高级技巧 nseejrukjhad langchain java 服务器 python
标题:如何部分格式化提示模板:LangChain中的高级技巧内容:如何部分格式化提示模板:LangChain中的高级技巧引言在使用大型语言模型(LLM)时,提示工程是一个关键环节。LangChain提供了强大的提示模板功能,让我们能更灵活地构建和管理提示。本文将介绍LangChain中一个高级特性-部分格式化提示模板,这个技巧可以让你的提示管理更加高效和灵活。什么是部分格式化提示模板?部分格式化提
数组去重好奇的猫猫猫
整理自js中基础数据结构数组去重问题思考？如何去除数组中重复的项例如数组：[1,3,4,3,5]我们在做去重的时候，一开始想到的肯定是，逐个比较，外面一层循环，内层后一个与前一个一比较，如果是久不将当前这一项放进新的数组，挨个比较完之后返回一个新的去过重复的数组不好的实践方式上述方法效率极低，代码量还多，思考？有没有更好的方法这时候不禁一想当然有了！！！hashtable啊，通过对象的hash办法
人工智能时代，程序员如何保持核心竞争力？ jmoych 人工智能
随着AIGC（如chatgpt、midjourney、claude等）大语言模型接二连三的涌现，AI辅助编程工具日益普及，程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作，也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作，还是广泛学习以适应快速变化的技术环境?又或者，我们是否应该将重点转向AI无法轻易替代的软技能？让我们一起探讨程序员
pyecharts——绘制柱形图折线图 2224070247 信息可视化 python java 数据可视化
一、pyecharts概述自2013年6月百度EFE(ExcellentFrontEnd）数据可视化团队研发的ECharts1.0发布到GitHub网站以来，ECharts一直备受业界权威的关注并获得广泛好评，成为目前成熟且流行的数据可视化图表工具，被应用到诸多数据可视化的开发领域。Python作为数据分析领域最受欢迎的语言，也加入ECharts的使用行列，并研发出方便Python开发者使用的数据
高级 ECharts 技巧：自定义图表主题与样式 SnowMan1993 echarts 信息可视化数据分析
ECharts是一个强大的数据可视化库，提供了多种内置主题和样式，但你也可以根据项目的设计需求，自定义图表的主题与样式。本文将介绍如何使用ECharts自定义图表主题，以提升数据可视化的吸引力和一致性。1.什么是ECharts主题？ECharts的主题是指定义图表样式的配置项，包括颜色、字体、线条样式等。通过预设主题，你可以快速更改图表的整体风格，而自定义主题则允许你在此基础上进行个性化设置。2.
C++ | Leetcode C++题解之第409题最长回文串 Ddddddd_158 经验分享 C++Leetcode 题解
题目：题解：classSolution{public:intlongestPalindrome(strings){unordered_mapcount;intans=0;for(charc:s)++count[c];for(autop:count){intv=p.second;ans+=v/2*2;if(v%2==1andans%2==0)++ans;}returnans;}};
每日一题——第八十八题互联网打工人no1 C语言程序设计每日一练 c语言
题目：输入一个9位的无符号整数，判断其是否有重复数字#include#include#includeintmain(){charnum_str[10];printf("请输入一个9位数的无符号数：");scanf_s("%9d",&num_str);if(strlen(num_str)!=9){printf("输入的不是一个9位无符号整数，请重新输入");}else{if(hasDuplicate
mac电脑命令行获取电量小米人er 我的博客 macos 命令行
在macOS上，有几个命令行工具可以用来获取电量信息，最常用的是pmset命令。你可以通过以下方式来查看电池状态和电量信息：查看电池状态：pmset-gbatt这个命令会返回类似下面的输出：Nowdrawingfrom'BatteryPower'-InternalBattery-0(id=1234567)95%;discharging;4:02remainingpresent:true输出中包括电
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
在Ubuntu中编译含有JSON的文件出现报错芝麻糊76 Linux kill_bug linux ubuntu json
在ubuntu中进行JSON相关学习的时候，我发现了一些小问题，决定与大家进行分享，减少踩坑时候出现不必要的时间耗费截取部分含有JSON部分的代码进行展示char*str="{\"title\":\"JSONExample\",\"author\":{\"name\":\"JohnDoe\",\"age\":35,\"isVerified\":true},\"tags\":[\"json\",\"
Xinference如何注册自定义模型玩人工智能的辣条哥人工智能 AI 大模型 Xinference
环境：Xinference问题描述：Xinference如何注册自定义模型解决方案：1.写个model_config.json，内容如下{"version":1,"context_length":2048,"model_name":"custom-llama-3","model_lang":["en","ch"],"model_ability":["generate","chat"],"model
STM32中的计时与延时 lupinjia STM32 stm32 单片机
前言在裸机开发中，延时作为一种规定循环周期的方式经常被使用，其中尤以HAL库官方提供的HAL_Delay为甚。刚入门的小白可能会觉得既然有官方提供的延时函数，而且精度也还挺好，为什么不用呢？实际上HAL_Delay中有不少坑，而这些也只是HAL库中无数坑的其中一些。想从坑里跳出来还是得加强外设原理的学习和理解，切不可只依赖HAL库。除了延时之外，我们在开发中有时也会想要确定某段程序的耗时，这就需要
Python编译器鹿鹿~ Python编译器 Python python 开发语言后端
嘿嘿嘿我又来了啊有些小盆友可能不知道Python其实是有编译器的，也就是PyCharm。你们可能会问到这个是干嘛的又不可以吃也不可以穿好像没有什么用，其实你还说对了这个还真的不可以吃也不可以穿，但是它用来干嘛的呢。用来编译你所打出的代码进行运行（可能这里说的有点不对但是只是个人认为）现在我们来说说PyCharm是用来干嘛的。PyCharm是一种PythonIDE，带有一整套可以帮助用户在使用Pyt
你可能遗漏的一些C#/.NET/.NET Core知识点追逐时光者 C#.NET DotNetGuide编程指南 c#.net .netcore microsoft
前言在这个快速发展的技术世界中，时常会有一些重要的知识点、信息或细节被忽略或遗漏。《C#/.NET/.NETCore拾遗补漏》专栏我们将探讨一些可能被忽略或遗漏的重要知识点、信息或细节，以帮助大家更全面地了解这些技术栈的特性和发展方向。拾遗补漏GitHub开源地址https://github.com/YSGStudyHards/DotNetGuide/blob/main/docs/DotNet/D
为什么你总是对下属不满意? ZhaoWu1050
【ZhaoWu的听课笔记】大多数公司，都存在两种问题。我创业四年，更是体会深切。这两种问题就是：老板经常不满意下属的表现；下属总是不知道老板想要什么；虽然这两种问题普遍存在，其实解决方法并不复杂。这节课，我们再聊聊第一个问题：为什么老板经常不满意下属表现?其实，这背后也是一条管理常识。管理学家德鲁克先生早就说过：管理者的任务，不是去改变人。*来自《卓有成效的管理者》只是大多数老板和我一样，都是一边
推荐算法_隐语义-梯度下降 _feivirus_ 算法机器学习和数学推荐算法机器学习隐语义
importnumpyasnp1.模型实现"""inputrate_matrix:M行N列的评分矩阵，值为P*Q.P:初始化用户特征矩阵M*K.Q:初始化物品特征矩阵K*N.latent_feature_cnt:隐特征的向量个数max_iteration:最大迭代次数alpha:步长lamda:正则化系数output分解之后的P和Q"""defLFM_grad_desc(rate_matrix,l
2024.8.22 Python，链表两数之和，链表快速反转，二叉树的深度，二叉树前中后序遍历，N叉树递归遍历，翻转二叉树 RaidenQ python 链表开发语言
1.链表两数之和输入：l1=[2,4,3],l2=[5,6,4]输出：[7,0,8]解释：342+465=807.示例2：输入：l1=[0],l2=[0]输出：[0]示例3：输入：l1=[9,9,9,9,9,9,9],l2=[9,9,9,9]输出：[8,9,9,9,0,0,0,1]昨天的这个题，用自己的办法写的麻烦的要死，然后刚才一看chat归类的办法，感觉自己像个智障。classListNode
ARM V8 base instruction -- Debug instructions xiaozhiwise Assembly arm
/**Debuginstructions*/BRK#imm16进入monitormodedebug，那里有on-chipdebugmonitorcodeHLT#imm16进入haltmodedebug，连接有外部调试硬件
matlab mle 优化,MLE+: Matlab Toolbox for Integrated Modeling, Control and Optimization for Buildings... Simon Zhong matlab mle 优化
摘要：FollowingunilateralopticnervesectioninadultPVGhoodedrat,theaxonguidancecueephrin-A2isup-regulatedincaudalbutnotrostralsuperiorcolliculus(SC)andtheEphA5receptorisdown-regulatedinaxotomisedretinalgan
ARMv8 Debug __pop_ ARMv8 ARM64 架构 linux 运维
内容来自DEN0024A_v8_architecture_PG.pdf本质ARMv8Debug是什么历史在ARMv4开始被引入,并已发展成一系列广泛的调试(debug1)和跟踪(trace)功能ARMv6和ARMv7-a新增了自托管调试(debug2)和性能评测(trace-enhance)ARMv8处理器提供硬件功能侵入式:调试工具能够对核心活动提供显著级别的控制非侵入式:以非侵入性方式收集有关
【ARM Cortex-M 系列 2.3 -- Cortex-M7 Debug event 详细介绍】主公讲 ARM #ARM 系列 arm开发 debug event
请阅读【嵌入式开发学习必备专栏】文章目录Cortex-M7DebugeventDebugeventsCortex-M7Debugevent在ARMCortex-M7架构中，调试事件（DebugEvent）是由于调试原因而触发的事件。一个调试事件会导致以下几种情况之一发生：进入调试状态：如果启用了停滞调试（HaltingDebug），一个调试事件会使处理器在调试状态下停滞。通过将DHCSR.C_DE
Enum用法不懂事的小屁孩 enum
以前的时候知道enum，但是真心不怎么用，在实际开发中，经常会用到以下代码: protected final static String XJ = "XJ"; protected final static String YHK = "YHK"; protected final static String PQ = "PQ";
【Spark九十七】RDD API之aggregateByKey bit1129 spark
1. aggregateByKey的运行机制 /** * Aggregate the values of each key, using given combine functions and a neutral "zero value". * This function can return a different result type
hive创建表是报错： Specified key was too long; max key length is 767 bytes daizj hive
今天在hive客户端创建表时报错，具体操作如下 hive> create table test2(id string); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataSto
Map 与 JavaBean之间的转换周凡杨 java 自省转换反射
最近项目里需要一个工具类，它的功能是传入一个Map后可以返回一个JavaBean对象。很喜欢写这样的Java服务，首先我想到的是要通过Java 的反射去实现匿名类的方法调用，这样才可以把Map里的值set 到JavaBean里。其实这里用Java的自省会更方便，下面两个方法就是一个通过反射，一个通过自省来实现本功能。 1：JavaBean类 1 &nb
java连接ftp下载 g21121 java
有的时候需要用到java连接ftp服务器下载，上传一些操作，下面写了一个小例子。 /** ftp服务器地址 */ private String ftpHost; /** ftp服务器用户名 */ private String ftpName; /** ftp服务器密码 */ private String ftpPass; /** ftp根目录 */ private String f
web报表工具FineReport使用中遇到的常见报错及解决办法（二）老A不折腾 finereport web报表 java报表总结
抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、没有返回数据集：在存储过程中的操作语句之前加上set nocount on 或者在数据集exec调用存储过程的前面加上这句。当S
linux 系统cpu 内存等信息查看墙头上一根草 cpu 内存 liunx
1 查看CPU 　　1.1 查看CPU个数　　# cat /proc/cpuinfo | grep "physical id" | uniq | wc -l 　　2 　　**uniq命令：删除重复行;wc –l命令：统计行数** 　　1.2 查看CPU核数　　# cat /proc/cpuinfo | grep "cpu cores" | u
Spring中的AOP aijuans spring AOP
Spring中的AOP Written by Tony Jiang @ 2012-1-18 （转）何为AOP AOP，面向切面编程。在不改动代码的前提下，灵活的在现有代码的执行顺序前后，添加进新规机能。来一个简单的Sample: 目标类： [java] view plain copy print ? package&nb
placeholder(HTML 5) IE 兼容插件 alxw4616 JavaScript jquery jQuery插件
placeholder 这个属性被越来越频繁的使用. 但为做HTML 5 特性IE没能实现这东西. 以下的jQuery插件就是用来在IE上实现该属性的. /** * [placeholder(HTML 5) IE 实现.IE9以下通过测试.] * v 1.0 by oTwo 2014年7月31日 11:45:29 */ $.fn.placeholder = function
Object类,值域,泛型等总结(适合有基础的人看) 百合不是茶泛型的继承和通配符变量的值域 Object类转换
java的作用域在编程的时候经常会遇到,而我经常会搞不清楚这个问题,所以在家的这几天回忆一下过去不知道的每个小知识点变量的值域; package 基础; /** * 作用域的范围 * * @author Administrator * */ public class zuoyongyu { public static vo
JDK1.5 Condition接口 bijian1013 java thread Condition java多线程
Condition 将 Object 监视器方法（wait、notify和 notifyAll）分解成截然不同的对象，以便通过将这些对象与任意 Lock 实现组合使用，为每个对象提供多个等待 set （wait-set）。其中，Lock 替代了 synchronized 方法和语句的使用，Condition 替代了 Object 监视器方法的使用。条件（也称为条件队列或条件变量）为线程提供了一
开源中国OSC源创会记录 bijian1013 hadoop spark MemSQL
一.Strata+Hadoop World（SHW）大会是全世界最大的大数据大会之一。SHW大会为各种技术提供了深度交流的机会，还会看到最领先的大数据技术、最广泛的应用场景、最有趣的用例教学以及最全面的大数据行业和趋势探讨。二.Hadoop &nbs
【Java范型七】范型消除 bit1129 java
范型是Java1.5引入的语言特性，它是编译时的一个语法现象，也就是说，对于一个类，不管是范型类还是非范型类，编译得到的字节码是一样的，差别仅在于通过范型这种语法来进行编译时的类型检查，在运行时是没有范型或者类型参数这个说法的。范型跟反射刚好相反，反射是一种运行时行为，所以编译时不能访问的变量或者方法(比如private)，在运行时通过反射是可以访问的，也就是说，可见性也是一种编译时的行为，在
【Spark九十四】spark-sql工具的使用 bit1129 spark
spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过 hive>输入的指令可以通过spark-sql>输入的指令来完成。 spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store 关于Hive build into Spark
js做的各种倒计时 ronin47 js 倒计时
第一种：精确到秒的javascript倒计时代码 HTML代码: <form name="form1"> <div align="center" align="middle"
java-37.有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接 bylijinnan java
public class MaxCatenate { /* * Q.37 有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接， * 问这n 个字符串最多可以连成一个多长的字符串，如果出现循环，则返回错误。 */ public static void main(String[] args){
mongoDB安装开窍的石头 mongodb安装基本操作
mongoDB的安装 1:mongoDB下载 https://www.mongodb.org/downloads 2:下载mongoDB下载后解压
[开源项目]引擎的关键意义 comsci 开源项目
一个系统，最核心的东西就是引擎。。。。。而要设计和制造出引擎，最关键的是要坚持。。。。。。现在最先进的引擎技术，也是从莱特兄弟那里出现的，但是中间一直没有断过研发的
软件度量的一些方法 cuiyadll 方法
软件度量的一些方法http://cuiyingfeng.blog.51cto.com/43841/6775/在前面我们已介绍了组成软件度量的几个方面。在这里我们将先给出关于这几个方面的一个纲要介绍。在后面我们还会作进一步具体的阐述。当我们不从高层次的概念级来看软件度量及其目标的时候，我们很容易把这些活动看成是不同而且毫不相干的。我们现在希望表明他们是怎样恰如其分地嵌入我们的框架的。也就是我们度量的
XSD中的targetNameSpace解释 darrenzhu xml namespace xsd targetnamespace
参考链接: http://blog.csdn.net/colin1014/article/details/357694 xsd文件中定义了一个targetNameSpace后，其内部定义的元素，属性，类型等都属于该targetNameSpace,其自身或外部xsd文件使用这些元素，属性等都必须从定义的targetNameSpace中找：例如：以下xsd文件，就出现了该错误，即便是在一
什么是RAID0、RAID1、RAID0+1、RAID5，等磁盘阵列模式? dcj3sjt126com raid
RAID 1又称为Mirror或Mirroring，它的宗旨是最大限度的保证用户数据的可用性和可修复性。 RAID 1的操作方式是把用户写入硬盘的数据百分之百地自动复制到另外一个硬盘上。由于对存储的数据进行百分之百的备份，在所有RAID级别中，RAID 1提供最高的数据安全保障。同样，由于数据的百分之百备份，备份数据占了总存储空间的一半，因而，Mirror的磁盘空间利用率低，存储成本高。 Mir
yii2 restful web服务快速入门 dcj3sjt126com PHP yii2
快速入门 Yii 提供了一整套用来简化实现 RESTful 风格的 Web Service 服务的 API。特别是，Yii 支持以下关于 RESTful 风格的 API：支持 Active Record 类的通用API的快速原型涉及的响应格式（在默认情况下支持 JSON 和 XML) 支持可选输出字段的定制对象序列化适当的格式的数据采集和验证错误
MongoDB查询(3)——内嵌文档查询（七） eksliang MongoDB查询内嵌文档 MongoDB查询内嵌数组
MongoDB查询内嵌文档转载请出自出处：http://eksliang.iteye.com/blog/2177301 一、概述有两种方法可以查询内嵌文档：查询整个文档；针对键值对进行查询。这两种方式是不同的，下面我通过例子进行分别说明。二、查询整个文档例如:有如下文档 db.emp.insert({ &qu
android4.4从系统图库无法加载图片的问题 gundumw100 android
典型的使用场景就是要设置一个头像，头像需要从系统图库或者拍照获得，在android4.4之前，我用的代码没问题，但是今天使用android4.4的时候突然发现不灵了。baidu了一圈，终于解决了。下面是解决方案： private String[] items = new String[] { "图库","拍照" }; /* 头像名称 */
网页特效大全 jQuery等 ini JavaScript jquery css html5 ini
HTML5和CSS3知识和特效 asp.net ajax jquery实例分享一个下雪的特效 jQuery倾斜的动画导航菜单选美大赛示例你会选谁 jQuery实现HTML5时钟功能强大的滚动播放插件JQ-Slide 万圣节快乐！！！向上弹出菜单jQuery插件 htm5视差动画 jquery将列表倒转顺序推荐一个jQuery分页插件 jquery animate
swift objc_setAssociatedObject block(version1.2 xcode6.4) 啸笑天 version
import UIKit class LSObjectWrapper: NSObject { let value: ((barButton: UIButton?) -> Void)? init(value: (barButton: UIButton?) -> Void) { self.value = value
Aegis 默认的 Xfire 绑定方式，将 XML 映射为 POJO MagicMa_007 java POJO xml Aegis xfire
Aegis 是一个默认的 Xfire 绑定方式，它将 XML 映射为 POJO, 支持代码先行的开发.你开发服务类与 POJO,它为你生成 XML schema/wsdl XML 和注解映射概览默认情况下，你的 POJO 类被是基于他们的名字与命名空间被序列化。如果
js get max value in (json) Array qiaolevip 每天进步一点点学习永无止境 max 纵观千象
// Max value in Array var arr = [1,2,3,5,3,2];Math.max.apply(null, arr); // 5 // Max value in Jaon Array var arr = [{"x":"8/11/2009","y":0.026572007},{"x"
XMLhttpRequest 请求 XML,JSON ,POJO 数据 Luob. POJO json Ajax xml XMLhttpREquest
在使用XMlhttpRequest对象发送请求和响应之前，必须首先使用javaScript对象创建一个XMLHttpRquest对象。 var xmlhttp； function getXMLHttpRequest(){ if(window.ActiveXObject){ xmlhttp:new ActiveXObject("Microsoft.XMLHTTP
jquery wuai jquery
以下防止文档在完全加载之前运行Jquery代码，否则会出现试图隐藏一个不存在的元素、获得未完全加载的图像的大小等等 $(document).ready(function(){ jquery代码; }); <script type="text/javascript" src="c:/scripts/jquery-1.4.2.min.js&quo