谐云

PostgreSQL基于Patroni方案的高可用启动流程分析

什么是Patroni
在很多生产环境中，分布式数据库以高可用性、数据分布性、负载均衡等特性，被用户广泛应用。而作为高可用数据库的解决方案——Patroni，是专门为PostgreSQL数据库设计的，一款以Python语言实现的高可用架构模板。该架构模板，旨在通过外部共享存储软件（kubernetes、etcd、etcd3、zookeeper、aws等），实现 PostgreSQL 集群的自动故障恢复、自动故障转移、自动备份等能力。
主要特点：
1.自动故障检测和恢复：Patroni 监视 PostgreSQL 集群的健康状态，一旦检测到主节点故障，它将自动执行故障恢复操作，将其中一个从节点晋升为主节点。
2.自动故障转移：一旦 Patroni 定义了新的主节点，它将协调所有从节点和客户端，以确保它们正确地切换到新的主节点，从而实现快速、无缝的故障转移。
3.一致性和数据完整性：Patroni 高度关注数据一致性和完整性。在故障切换过程中，它会确保在新主节点接管之前，数据不会丢失或受损。
4.外部共享配置存储：Patroni 使用外部键值存储（如 ZooKeeper、etcd 或 Consul）来存储配置和集群状态信息。这确保了配置的一致性和可访问性，并支持多个 Patroni 实例之间的协作。
5.支持多种云环境和物理硬件：Patroni 不仅可以在云环境中运行，还可以部署在物理硬件上，提供了广泛的部署选项。
Patroni架构解析

●DCS（Distributed Configuration Store ）：是指分布式配置信息的存储位置，可支持kubernetes、etcd、etcd3、zookeeper、aws等存储媒介，由Patroni进行分布式配置信息的读写。
●核心Patroni：负责将分布式配置信息写入DCS中，并设置PostgreSQL节点的角色以及PostgreSQL配置信息，管理PostgreSQL的生命周期。
●PostgreSQL节点：各PostgreSQL节点，根据Patroni设置的PostgreSQL配置信息，生成主从关系链，以流复制的方式进行数据同步，最终生成一个PostgreSQL集群。
Patroni高可用源码分析
Patroni高可用启动流程

流程说明：
●加载集群信息，通过DCS支持的API接口，获取集群信息，主要内容如下：
○config：记录pg集群ID以及配置信息（包括pg参数信息、一些超时时间配置等），用于集群校验、节点重建等；
○leader：记录主节点选举时间、心跳时间、选举周期、最新的lsn等，用于主节点完成竞争后的信息记录；
○sync: 记录主节点和同步节点信息，由主节点记录，用于主从切换、故障转移的同步节点校验；
○failover: 记录最后一次故障转移的时间。
●集群状态检测，主要检测集群配置信息的内容校验，当前集群的整体状态及节点状态，判断通过什么方式来启动PostgreSQL；
●启动PostgreSQL，用于初始化PostgreSQL目录，根据集群信息设置相应的PostgreSQL配置信息，并启动；
●生成PostgreSQL集群，指将完成启动的PostgreSQL节点，通过设置主从角色，关联不同角色的PostgreSQL节点，最终生成完整的集群。
Patroni高可用启动流程分析
加载集群信息
加载集群信息，是高可用流程启动的第一步，也是生成PostgreSQL集群的最关键信息。

第一步，记载集群信息

…

try:
    self.load_cluster_from_dcs()
    self.state_handler.reset_cluster_info_state(self.cluster, self.patroni.nofailover)
except Exception:
    self.state_handler.reset_cluster_info_state(None, self.patroni.nofailover)
    raise

…

通过DCS接口加载集群信息

def load_cluster_from_dcs(self):
    cluster = self.dcs.get_cluster()

# We want to keep the state of cluster when it was healthy
if not cluster.is_unlocked() or not self.old_cluster:
    self.old_cluster = cluster
self.cluster = cluster

if not self.has_lock(False):
    self.set_is_leader(False)

self._leader_timeline = None if cluster.is_unlocked() else cluster.leader.timeline

集群接口

def get_cluster(self, force=False):
    if force:
        self._bypass_caches()
    try:
        cluster = self._load_cluster()
    except Exception:
        self.reset_cluster()
        raise

self._last_seen = int(time.time())

with self._cluster_thread_lock:
    self._cluster = cluster
    self._cluster_valid_till = time.time() + self.ttl
    return cluster

@abc.abstractmethod
def _load_cluster(self):
    """Internally this method should build  `Cluster` object which
       represents current state and topology of the cluster in DCS.
       this method supposed to be called only by `get_cluster` method.

   raise `~DCSError` in case of communication or other problems with DCS.
   If the current node was running as a master and exception raised,
   instance would be demoted."""

以Kubernetes作为DCS为例

def _load_cluster(self):
    stop_time = time.time() + self._retry.deadline
    self._api.refresh_api_servers_cache()
    try:
        with self._condition:
            self._wait_caches(stop_time)

        members = [self.member(pod) for pod in self._pods.copy().values()]
        nodes = self._kinds.copy()

    config = nodes.get(self.config_path)
    metadata = config and config.metadata
    annotations = metadata and metadata.annotations or {}

    # get initialize flag
    initialize = annotations.get(self._INITIALIZE)

    # get global dynamic configuration
    config = ClusterConfig.from_node(metadata and metadata.resource_version,
                                     annotations.get(self._CONFIG) or '{}',
                                     metadata.resource_version if self._CONFIG in annotations else 0)

    # get timeline history
    history = TimelineHistory.from_node(metadata and metadata.resource_version,
                                        annotations.get(self._HISTORY) or '[]')

    leader = nodes.get(self.leader_path)
    metadata = leader and leader.metadata
    self._leader_resource_version = metadata.resource_version if metadata else None
    annotations = metadata and metadata.annotations or {}

    # get last known leader lsn
    last_lsn = annotations.get(self._OPTIME)
    try:
        last_lsn = 0 if last_lsn is None else int(last_lsn)
    except Exception:
        last_lsn = 0

    # get permanent slots state (confirmed_flush_lsn)
    slots = annotations.get('slots')
    try:
        slots = slots and json.loads(slots)
    except Exception:
        slots = None

    # get leader
    leader_record = {n: annotations.get(n) for n in (self._LEADER, 'acquireTime',
                     'ttl', 'renewTime', 'transitions') if n in annotations}
    if (leader_record or self._leader_observed_record) and leader_record != self._leader_observed_record:
        self._leader_observed_record = leader_record
        self._leader_observed_time = time.time()

    leader = leader_record.get(self._LEADER)
    try:
        ttl = int(leader_record.get('ttl')) or self._ttl
    except (TypeError, ValueError):
        ttl = self._ttl

    if not metadata or not self._leader_observed_time or self._leader_observed_time + ttl < time.time():
        leader = None

    if metadata:
        member = Member(-1, leader, None, {})
        member = ([m for m in members if m.name == leader] or [member])[0]
        leader = Leader(metadata.resource_version, None, member)

    # failover key
    failover = nodes.get(self.failover_path)
    metadata = failover and failover.metadata
    failover = Failover.from_node(metadata and metadata.resource_version,
                                  metadata and (metadata.annotations or {}).copy())

    # get synchronization state
    sync = nodes.get(self.sync_path)
    metadata = sync and sync.metadata
    sync = SyncState.from_node(metadata and metadata.resource_version,  metadata and metadata.annotations)

    return Cluster(initialize, config, leader, last_lsn, members, failover, sync, history, slots)
except Exception:
    logger.exception('get_cluster')
    raise KubernetesError('Kubernetes API is not responding properly')

上述集群信息中，主要以xxx-config、xxx-leader、xxx-failover、xxx-sync作为配置信息，具体内容如下：
●xxx-config

% kubectl get cm pg142-1013-postgresql-config -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    config: '{"loop_wait":10,"maximum_lag_on_failover":33554432,"postgresql":{"parameters":{"archive_command":"/bin/true","archive_mode":"on","archive_timeout":"1800s","autovacuum":"on","autovacuum_analyze_scale_factor":0.02,"autovacuum_max_workers":"3","autovacuum_naptime":"5min","autovacuum_vacuum_cost_delay":"2ms","autovacuum_vacuum_cost_limit":"-1","autovacuum_vacuum_scale_factor":0.05,"autovacuum_work_mem":"128MB","backend_flush_after":"0","bgwriter_delay":"200ms","bgwriter_flush_after":"256","bgwriter_lru_maxpages":"100","bgwriter_lru_multiplier":"2","checkpoint_completion_target":"0.9","checkpoint_flush_after":"256kB","checkpoint_timeout":"5min","commit_delay":"0","constraint_exclusion":"partition","datestyle":"iso,
      mdy","deadlock_timeout":"1s","default_text_search_config":"pg_catalog.english","dynamic_shared_memory_type":"posix","effective_cache_size":"32768","fsync":"on","full_page_writes":"on","hot_standby":"on","hot_standby_feedback":"off","huge_pages":"off","idle_in_transaction_session_timeout":"600000","lc_messages":"en_US.UTF-8","lc_monetary":"en_US.UTF-8","lc_numeric":"en_US.UTF-8","lc_time":"en_US.UTF-8","listen_addresses":"*","log_autovacuum_min_duration":"0","log_checkpoints":"on","log_connections":"off","log_disconnections":"off","log_error_verbosity":"default","log_line_prefix":"%t
      [%p]: [%l-1] %c %x %d %u %a %h","log_lock_waits":"on","log_min_duration_statement":"500","log_rotation_size":"0","log_statement":"none","log_temp_files":0,"log_timezone":"Asia/Shanghai","maintenance_work_mem":"32768","max_connections":"170","max_parallel_maintenance_workers":"2","max_parallel_workers":"2","max_parallel_workers_per_gather":"2","max_replication_slots":"10","max_standby_archive_delay":"30s","max_standby_streaming_delay":"30s","max_wal_senders":"10","max_wal_size":"2048","max_worker_processes":"8","old_snapshot_threshold":"-1","pg_stat_statements.max":"10000","pg_stat_statements.save":"on","pg_stat_statements.track":"all","pgaudit.log":"NONE","pgaudit.log_catalog":"on","pgaudit.log_client":"off","pgaudit.log_level":"log","pgaudit.log_parameter":"off","pgaudit.log_relation":"off","pgaudit.log_rows":"off","pgaudit.log_statement":"on","pgaudit.log_statement_once":"off","pgaudit.role":"","random_page_cost":"4","restart_after_crash":"on","synchronous_commit":"on","tcp_keepalives_count":"0","tcp_keepalives_idle":"900","tcp_keepalives_interval":"100","temp_buffers":"8MB","timezone":"Asia/Shanghai","track_activity_query_size":"1kB","track_functions":"all","track_io_timing":"off","unix_socket_directories":"/var/run/postgresql","vacuum_cost_delay":"0ms","vacuum_cost_limit":"200","wal_buffers":"2048","wal_compression":"on","wal_keep_segments":"128","wal_keep_size":"2048MB","wal_level":"replica","wal_log_hints":"on","wal_receiver_status_interval":"10s","wal_sender_timeout":"1min","wal_writer_delay":"200ms","wal_writer_flush_after":"1MB","work_mem":"4MB"},"use_pg_rewind":true,"use_slots":true},"retry_timeout":10,"synchronous_mode":true,"ttl":30}'
    initialize: "7289263672843878470"
  creationTimestamp: "2023-10-13T02:25:51Z"
  labels:
    application: spilo
    cluster-name: pg142-1013-postgresql
  name: pg142-1013-postgresql-config
  namespace: default
  resourceVersion: "22858249"
  uid: dfa64d28-e939-4bdd-8db1-a3485fa09637

上述例子中，下有和2个参数，
1.定义集群的整体配置信息，这里包含了PostgreSQL配置参数以及集群参数（选举等待时间、允许的最大WAL延迟量、是否开启同步模式等）等；
2.定义了集群的ID，该值对应pg_controldata命令内的值，因此，所有集群内的PostgreSQL节点有相同的sys_id。

root@pg142-1013-postgresql-1:/home/postgres# pg_controldata | grep "Database system identifier"
Database system identifier:           7289263672843878470
●xxx-leader
% kubectl get cm pg142-1013-postgresql-leader -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    acquireTime: "2023-10-13T02:26:06.973552+00:00"
    leader: pg142-1013-postgresql-0
    optime: "67109192"
    renewTime: "2023-10-16T07:02:57.418940+00:00"
    transitions: "0"
    ttl: "30"
  creationTimestamp: "2023-10-13T02:26:07Z"
  labels:
    application: spilo
    cluster-name: pg142-1013-postgresql
  name: pg142-1013-postgresql-leader
  namespace: default
  resourceVersion: "23286847"
  uid: cb235c85-6a21-454d-8320-222205eaa77f

上述下，各参数含义：
1.acquireTime：获取集群leader锁时间；
2.leader：集群leader锁的拥有者，这里表示某个PostgreSQL节点名称；
3.optime：集群leader的最新LSN的十进制数，这里;
4.renewTime：集群leader锁的拥有者心跳时间，心跳周期与xxx-config中的对应；
5.transitions：集群leader锁占用次数，一般发生在主从切换或故障转移场景，依次累加；
6.ttl：故障转移前的选举时间，即超过TTL时间下，没有获取到renewTime值更新，便触发选举，由新的节点占用leader锁。
●xxx-sync

% kubectl get cm pg142-1013-postgresql-sync -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    leader: pg142-1013-postgresql-1
    sync_standby: pg142-1013-postgresql-0
  creationTimestamp: "2023-10-16T06:54:39Z"
  labels:
    application: spilo
    cluster-name: pg142-1013-postgresql
  name: pg142-1013-postgresql-sync
  namespace: default
  resourceVersion: "23288352"
  uid: 1c46e63b-8b90-4fc6-9596-8e2f71fba2ab

上述内容记录了2个信息：
1.leader：显示leader节点的名称；
2.sync_standby：显示同步节点的名称，多个同步节点以逗号分隔。

●xxx-failover


% kubectl get cm pg142-1013-postgresql-failover -oyaml
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: "2023-10-16T07:16:03Z"
  labels:
    application: spilo
    cluster-name: pg142-1013-postgresql
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:application: {}
          f:cluster-name: {}
    manager: Patroni
    operation: Update
    time: "2023-10-16T07:36:56Z"
  name: pg142-1013-postgresql-failover
  namespace: default
  resourceVersion: "23290596"
  uid: 72d50c58-bc65-4b77-8870-93d0b8f8b7a2
上述内容，主要记录最后一次故障转移发生的时间。
集群状态检测
  if self.is_paused():
      self.watchdog.disable()
      self._was_paused = True
  else:
      if self._was_paused:
          self.state_handler.schedule_sanity_checks_after_pause()
      self._was_paused = False
  
  if not self.cluster.has_member(self.state_handler.name):
      self.touch_member()
  
  # cluster has leader key but not initialize key
  if not (self.cluster.is_unlocked() or self.sysid_valid(self.cluster.initialize)) and self.has_lock():
      self.dcs.initialize(create_new=(self.cluster.initialize is None), sysid=self.state_handler.sysid)
  
  if not (self.cluster.is_unlocked() or self.cluster.config and self.cluster.config.data) and self.has_lock():
      self.dcs.set_config_value(json.dumps(self.patroni.config.dynamic_configuration, separators=(',', ':')))
      self.cluster = self.dcs.get_cluster()
  
  if self._async_executor.busy:
      return self.handle_long_action_in_progress()
  
  msg = self.handle_starting_instance()
  if msg is not None:
      return msg
  
  # we've got here, so any async action has finished.
  if self.state_handler.bootstrapping:
      return self.post_bootstrap()
  
  if self.recovering:
      self.recovering = False

  if not self._rewind.is_needed:
      # Check if we tried to recover from postgres crash and failed
      msg = self.post_recover()
      if msg is not None:
          return msg

  # Reset some states after postgres successfully started up
  self._crash_recovery_executed = False
  if self._rewind.executed and not self._rewind.failed:
      self._rewind.reset_state()

  # The Raft cluster without a quorum takes a bit of time to stabilize.
  # Therefore we want to postpone the leader race if we just started up.
  if self.cluster.is_unlocked() and self.dcs.__class__.__name__ == 'Raft':
      return 'started as a secondary'

检测集群是否暂停
集群暂停，是指集群中的PostgreSQL节点不由Patroni管理，当集群异常时，不再出发故障转移等措施。
集群暂停一般由用户主动出发，可以用在单个PostgreSQL节点的维护上，触发方式：

root@pg142-1013-postgresql-0:/home/postgres# patronictl list
+ Cluster: pg142-1013-postgresql (7289263672843878470) ---+---------+----+-----------+
| Member                  | Host           | Role         | State   | TL | Lag in MB |
+-------------------------+----------------+--------------+---------+----+-----------+
| pg142-1013-postgresql-0 | 10.244.117.143 | Leader       | running |  3 |           |
| pg142-1013-postgresql-1 | 10.244.165.220 | Sync Standby | running |  3 |         0 |
+-------------------------+----------------+--------------+---------+----+-----------+
root@pg142-1013-postgresql-0:/home/postgres# patronictl pause
Success: cluster management is paused
root@pg142-1013-postgresql-0:/home/postgres# patronictl list
+ Cluster: pg142-1013-postgresql (7289263672843878470) ---+---------+----+-----------+
| Member                  | Host           | Role         | State   | TL | Lag in MB |
+-------------------------+----------------+--------------+---------+----+-----------+
| pg142-1013-postgresql-0 | 10.244.117.143 | Leader       | running |  3 |           |
| pg142-1013-postgresql-1 | 10.244.165.220 | Sync Standby | running |  3 |         0 |
+-------------------------+----------------+--------------+---------+----+-----------+
 Maintenance mode: on

上述，即表示当前集群已停止。此时，PostgreSQL进程仍然存活，如果故障，将需要用户自行启动。
集群暂停恢复方式：

root@pg142-1013-postgresql-0:/home/postgres# patronictl list
+ Cluster: pg142-1013-postgresql (7289263672843878470) ---+---------+----+-----------+
| Member                  | Host           | Role         | State   | TL | Lag in MB |
+-------------------------+----------------+--------------+---------+----+-----------+
| pg142-1013-postgresql-0 | 10.244.117.143 | Leader       | running |  3 |           |
| pg142-1013-postgresql-1 | 10.244.165.220 | Sync Standby | running |  3 |         0 |
+-------------------------+----------------+--------------+---------+----+-----------+
 Maintenance mode: on
root@pg142-1013-postgresql-0:/home/postgres# patronictl resume
Success: cluster management is resumed
root@pg142-1013-postgresql-0:/home/postgres# patronictl list
+ Cluster: pg142-1013-postgresql (7289263672843878470) ---+---------+----+-----------+
| Member                  | Host           | Role         | State   | TL | Lag in MB |
+-------------------------+----------------+--------------+---------+----+-----------+
| pg142-1013-postgresql-0 | 10.244.117.143 | Leader       | running |  3 |           |
| pg142-1013-postgresql-1 | 10.244.165.220 | Sync Standby | running |  3 |         0 |
+-------------------------+----------------+--------------+---------+----+-----------+

通过命令，即可恢复集群。
在恢复集群后，需要对集群中PostgreSQL节点进行处理：
1.重新配置PostgreSQL的参数；
2.根据xxx-sync中最后一次记录的主、同步节点名称信息，在主节点上设置同步复制槽信息；
3.检测恢复后的PostgreSQL节点的是否变更，与最后一次xxx-config中的值，是否一致，否则将无法恢复集群。
集群初始化检测

# cluster has leader key but not initialize key
if not (self.cluster.is_unlocked() or self.sysid_valid(self.cluster.initialize)) and self.has_lock():
    self.dcs.initialize(create_new=(self.cluster.initialize is None), sysid=self.state_handler.sysid)

if not (self.cluster.is_unlocked() or self.cluster.config and self.cluster.config.data) and self.has_lock():
    self.dcs.set_config_value(json.dumps(self.patroni.config.dynamic_configuration, separators=(',', ':')))
    self.cluster = self.dcs.get_cluster()

集群初始化检测，主要检测2个方面的信息：
●集群当前存在leader节点，但xxx-config中的不存在，此时，需要将leader节点上PostgreSQL的sysid设置到xxx-config中；
●集群当前存在leader节点，但未获取到xxx-config信息，需要将leader节点上的配置信息和sysid都设置到xxx-config中，并重新获取集群信息。
该步骤的用途是，防止xxx-config文件被删除，导致从节点加载集群信息失败。
节点状态检测

检测当前PostgreSQL的进程启动到了什么阶段

if self._async_executor.busy:
    return self.handle_long_action_in_progress()

检测启动中的PostgreSQL是否出现异常

msg = self.handle_starting_instance()
if msg is not None:
return msg
节点状态检测，是通过检测PostgreSQL节点的当前运行状态，来确定是否需要进行具体的操作，节点状态检测的方式可分为2种：
1.通过PostgreSQL的运行状态确定；
2.通过异步进程（_async_executor）监听，当前节点处于什么阶段。

节点检测通过后基础操作

we've got here, so any async action has finished.

if self.state_handler.bootstrapping:
    return self.post_bootstrap()

if self.recovering:
    self.recovering = False

if not self._rewind.is_needed:
    # Check if we tried to recover from postgres crash and failed
    msg = self.post_recover()
    if msg is not None:
        return msg

# Reset some states after postgres successfully started up
self._crash_recovery_executed = False
if self._rewind.executed and not self._rewind.failed:
    self._rewind.reset_state()

# The Raft cluster without a quorum takes a bit of time to stabilize.
# Therefore we want to postpone the leader race if we just started up.
if self.cluster.is_unlocked() and self.dcs.__class__.__name__ == 'Raft':
    return 'started as a secondary'

节点状态检测通过后，需要对PostgreSQL进行操作：
1.PostgreSQL启动后操作

def post_bootstrap(self):
    with self._async_response:
        result = self._async_response.result
    # bootstrap has failed if postgres is not running
    if not self.state_handler.is_running() or result is False:
        self.cancel_initialization()

if result is None:
    if not self.state_handler.is_leader():
        return 'waiting for end of recovery after bootstrap'

    self.state_handler.set_role('master')
    ret = self._async_executor.try_run_async('post_bootstrap', self.state_handler.bootstrap.post_bootstrap,
                                             args=(self.patroni.config['bootstrap'], self._async_response))
    return ret or 'running post_bootstrap'

self.state_handler.bootstrapping = False
if not self.watchdog.activate():
    logger.error('Cancelling bootstrap because watchdog activation failed')
    self.cancel_initialization()
self._rewind.ensure_checkpoint_after_promote(self.wakeup)
self.dcs.initialize(create_new=(self.cluster.initialize is None), sysid=self.state_handler.sysid)
self.dcs.set_config_value(json.dumps(self.patroni.config.dynamic_configuration, separators=(',', ':')))
self.dcs.take_leader()
self.set_is_leader(True)
self.state_handler.call_nowait(ACTION_ON_START)
self.load_cluster_from_dcs()

return 'initialized a new cluster'

上述操作，包括pg_rewind后的checkpoint检测、初始化DCS的xxx-config资源、生成xxx-leader资源、加载集群信息等。
2.恢复中的PostgreSQL检测是否需要执行pg_rewind
if self.recovering:
self.recovering = False

if not self._rewind.is_needed:
    # Check if we tried to recover from postgres crash and failed
    msg = self.post_recover()
    if msg is not None:
        return msg

# Reset some states after postgres successfully started up
self._crash_recovery_executed = False
if self._rewind.executed and not self._rewind.failed:
    self._rewind.reset_state()

pg_rewind命令用于将从节点的WAL与主节点的WAL拉齐，一般用于从节点WAL因异常后滞后于主节点WAL。
启动PostgreSQL

is data directory empty?

if self.state_handler.data_directory_empty():
    self.state_handler.set_role('uninitialized')
    self.state_handler.stop('immediate', stop_timeout=self.patroni.config['retry_timeout'])
    # In case datadir went away while we were master.
    self.watchdog.disable()

# is this instance the leader?
if self.has_lock():
    self.release_leader_key_voluntarily()
    return 'released leader key voluntarily as data dir empty and currently leader'

if self.is_paused():
    return 'running with empty data directory'
return self.bootstrap()  # new node

else:
    # check if we are allowed to join
    data_sysid = self.state_handler.sysid
    if not self.sysid_valid(data_sysid):
        # data directory is not empty, but no valid sysid, cluster must be broken, suggest reinit
        return ("data dir for the cluster is not empty, "
                "but system ID is invalid; consider doing reinitialize")

if self.sysid_valid(self.cluster.initialize):
    if self.cluster.initialize != data_sysid:
        if self.is_paused():
            logger.warning('system ID has changed while in paused mode. Patroni will exit when resuming'
                           ' unless system ID is reset: %s != %s', self.cluster.initialize, data_sysid)
            if self.has_lock():
                self.release_leader_key_voluntarily()
                return 'released leader key voluntarily due to the system ID mismatch'
        else:
            logger.fatal('system ID mismatch, node %s belongs to a different cluster: %s != %s',
                         self.state_handler.name, self.cluster.initialize, data_sysid)
            sys.exit(1)
elif self.cluster.is_unlocked() and not self.is_paused():
    # "bootstrap", but data directory is not empty
    if not self.state_handler.cb_called and self.state_handler.is_running() \
            and not self.state_handler.is_leader():
        self._join_aborted = True
        logger.error('No initialize key in DCS and PostgreSQL is running as replica, aborting start')
        logger.error('Please first start Patroni on the node running as master')
        sys.exit(1)
    self.dcs.initialize(create_new=(self.cluster.initialize is None), sysid=data_sysid)

无数据目录启动
无数据目录启动，是指在执行初始化目录异常、恢复节点异常、WAL拉齐异常等场景下，会触发的流程：
1.设置角色，用于后续重新初始化集群；
2.立即停止当前PostgreSQL进程；
3.判断当前节点是否为主节点，主动释放主节点锁；
4.执行启动操作。

def bootstrap(self):
  if not self.cluster.is_unlocked():  # cluster already has leader
      clone_member = self.cluster.get_clone_member(self.state_handler.name)
      member_role = 'leader' if clone_member == self.cluster.leader else 'replica'
      msg = "from {0} '{1}'".format(member_role, clone_member.name)
      ret = self._async_executor.try_run_async('bootstrap {0}'.format(msg), self.clone, args=(clone_member, msg))
      return ret or 'trying to bootstrap {0}'.format(msg)
  
  # no initialize key and node is allowed to be master and has 'bootstrap' section in a configuration file
  elif self.cluster.initialize is None and not self.patroni.nofailover and 'bootstrap' in self.patroni.config:
      if self.dcs.initialize(create_new=True):  # race for initialization
          self.state_handler.bootstrapping = True
          with self._async_response:
              self._async_response.reset()
  
          if self.is_standby_cluster():
              ret = self._async_executor.try_run_async('bootstrap_standby_leader', self.bootstrap_standby_leader)
              return ret or 'trying to bootstrap a new standby leader'
          else:
              ret = self._async_executor.try_run_async('bootstrap', self.state_handler.bootstrap.bootstrap,
                                                       args=(self.patroni.config['bootstrap'],))
              return ret or 'trying to bootstrap a new cluster'
      else:
          return 'failed to acquire initialize lock'
  else:
      create_replica_methods = self.get_standby_cluster_config().get('create_replica_methods', []) \
                               if self.is_standby_cluster() else None
      if self.state_handler.can_create_replica_without_replication_connection(create_replica_methods):
          msg = 'bootstrap (without leader)'
          return self._async_executor.try_run_async(msg, self.clone) or 'trying to ' + msg
      return 'waiting for {0}leader to bootstrap'.format('standby_' if self.is_standby_cluster() else '')

上述代码，表示启动的几种方式：
1.当前集群已有leader节点，当前PostgreSQL将以从节点从主节点上同步数据启动；
2.当前集群没有leader节点，当前PostgreSQL将以主节点启动，如果是备用集群，将以备用集群主节点启动；
3.当前集群为备用集群且没有主节点，从节点通过方式，一般通过协议流方式从主集群上进行数据同步。
有数据目录启动
有数据目录启动，主要校验集群ID与PostgreSQL节点sysid的一致性，触发的主要流程：
1.校验PostgreSQL节点sysid是否有效，如果无效，表示PostgreSQL出现了异常需要重启；
2.校验校验集群ID与PostgreSQL节点sysid是否一致，不一致将无法加入集群，如果集群已暂停，将会释放leader锁占用；
3.检验集群没有leader节点，当前节点将重新初始化集群，将sysid作为新的集群ID启动。
生成PostgreSQL集群

try:
    if self.cluster.is_unlocked():
        ret = self.process_unhealthy_cluster()
    else:
        msg = self.process_healthy_cluster()
        ret = self.evaluate_scheduled_restart() or msg
finally:
    # we might not have a valid PostgreSQL connection here if another thread
    # stops PostgreSQL, therefore, we only reload replication slots if no
    # asynchronous processes are running (should be always the case for the master)
    if not self._async_executor.busy and not self.state_handler.is_starting():
        create_slots = self.state_handler.slots_handler.sync_replication_slots(self.cluster,
                                                                               self.patroni.nofailover)
        if not self.state_handler.cb_called:
            if not self.state_handler.is_leader():
                self._rewind.trigger_check_diverged_lsn()
            self.state_handler.call_nowait(ACTION_ON_START)
        if create_slots and self.cluster.leader:
            err = self._async_executor.try_run_async('copy_logical_slots',
                                                     self.state_handler.slots_handler.copy_logical_slots,
                                                     args=(self.cluster.leader, create_slots))
            if not err:
                ret = 'Copying logical slots {0} from the primary'.format(create_slots)

生成PostgreSQL集群，主要根据当前集群是否存在主节点，判断走健康的集群流程还是非健康的集群流程。
非健康的集群流程

def process_unhealthy_cluster(self):
  """Cluster has no leader key"""

  if self.is_healthiest_node():
      if self.acquire_lock():
          failover = self.cluster.failover
          if failover:
              if self.is_paused() and failover.leader and failover.candidate:
                  logger.info('Updating failover key after acquiring leader lock...')
                  self.dcs.manual_failover('', failover.candidate, failover.scheduled_at, failover.index)
              else:
                  logger.info('Cleaning up failover key after acquiring leader lock...')
                  self.dcs.manual_failover('', '')
          self.load_cluster_from_dcs()

      if self.is_standby_cluster():
          # standby leader disappeared, and this is the healthiest
          # replica, so it should become a new standby leader.
          # This implies we need to start following a remote master
          msg = 'promoted self to a standby leader by acquiring session lock'
          return self.enforce_follow_remote_master(msg)
      else:
          return self.enforce_master_role(
              'acquired session lock as a leader',
              'promoted self to leader by acquiring session lock'
          )
  else:
      return self.follow('demoted self after trying and failing to obtain lock',
                         'following new leader after trying and failing to obtain lock')

  else:
      # when we are doing manual failover there is no guaranty that new leader is ahead of any other node
      # node tagged as nofailover can be ahead of the new leader either, but it is always excluded from elections
      if bool(self.cluster.failover) or self.patroni.nofailover:
          self._rewind.trigger_check_diverged_lsn()
          time.sleep(2)  # Give a time to somebody to take the leader lock

  if self.patroni.nofailover:
      return self.follow('demoting self because I am not allowed to become master',
                         'following a different leader because I am not allowed to promote')
  return self.follow('demoting self because i am not the healthiest node',
                     'following a different leader because i am not the healthiest node')

非健康的集群流程，是确定leader节点的候选，首要条件必须找到一个健康的节点，如何判断健康的节点，主要有以下几个条件：
1.PostgreSQL集群状态非暂停；
2.PostgreSQL节点状态非启动中；
3.PostgreSQL节点允许故障转移；
4.PostgreSQL节点WAL与集群缓存中的（最后一次主节点同步的lsn值）的滞后量在允许的范围内。

 def is_healthiest_node(self):
        if time.time() - self._released_leader_key_timestamp < self.dcs.ttl:
            logger.info('backoff: skip leader race after pre_promote script failure and releasing the lock voluntarily')
            return False

    if self.is_paused() and not self.patroni.nofailover and \
            self.cluster.failover and not self.cluster.failover.scheduled_at:
        ret = self.manual_failover_process_no_leader()
        if ret is not None:  # continue if we just deleted the stale failover key as a master
            return ret

    if self.state_handler.is_starting():  # postgresql still starting up is unhealthy
        return False

    if self.state_handler.is_leader():
        # in pause leader is the healthiest only when no initialize or sysid matches with initialize!
        return not self.is_paused() or not self.cluster.initialize\
                or self.state_handler.sysid == self.cluster.initialize

    if self.is_paused():
        return False

    if self.patroni.nofailover:  # nofailover tag makes node always unhealthy
        return False

    if self.cluster.failover:
        # When doing a switchover in synchronous mode only synchronous nodes and former leader are allowed to race
        if self.is_synchronous_mode() and self.cluster.failover.leader and \
                self.cluster.failover.candidate and not self.cluster.sync.matches(self.state_handler.name):
            return False
        return self.manual_failover_process_no_leader()

    if not self.watchdog.is_healthy:
        logger.warning('Watchdog device is not usable')
        return False

    # When in sync mode, only last known master and sync standby are allowed to promote automatically.
    all_known_members = self.cluster.members + self.old_cluster.members
    if self.is_synchronous_mode() and self.cluster.sync and self.cluster.sync.leader:
        if not self.cluster.sync.matches(self.state_handler.name):
            return False
        # pick between synchronous candidates so we minimize unnecessary failovers/demotions
        members = {m.name: m for m in all_known_members if self.cluster.sync.matches(m.name)}
    else:
        # run usual health check
        members = {m.name: m for m in all_known_members}

    return self._is_healthiest_node(members.values())

…

 def _is_healthiest_node(self, members, check_replication_lag=True):
        """This method tries to determine whether I am healthy enough to became a new leader candidate or not."""

    my_wal_position = self.state_handler.last_operation()
    if check_replication_lag and self.is_lagging(my_wal_position):
        logger.info('My wal position exceeds maximum replication lag')
        return False  # Too far behind last reported wal position on master

    if not self.is_standby_cluster() and self.check_timeline():
        cluster_timeline = self.cluster.timeline
        my_timeline = self.state_handler.replica_cached_timeline(cluster_timeline)
        if my_timeline < cluster_timeline:
            logger.info('My timeline %s is behind last known cluster timeline %s', my_timeline, cluster_timeline)
            return False

    # Prepare list of nodes to run check against
    members = [m for m in members if m.name != self.state_handler.name and not m.nofailover and m.api_url]

    if members:
        for st in self.fetch_nodes_statuses(members):
            if st.failover_limitation() is None:
                if not st.in_recovery:
                    logger.warning('Master (%s) is still alive', st.member.name)
                    return False
                if my_wal_position < st.wal_position:
                    logger.info('Wal position of %s is ahead of my wal position', st.member.name)
                    # In synchronous mode the former leader might be still accessible and even be ahead of us.
                    # We should not disqualify himself from the leader race in such a situation.
                    if not self.is_synchronous_mode() or st.member.name != self.cluster.sync.leader:
                        return False
                    logger.info('Ignoring the former leader being ahead of us')
    return True

当前节点为健康节点，因当前集群没有主节点，需要执行leader锁抢占。如果当前节点抢占leader锁失败，将作为从节点加入到集群中。
当前节点为异常节点，则会一直等待PostgreSQL节点正常后，参与集群的选举行为。
健康的集群流程

def process_healthy_cluster(self):
  if self.has_lock():
      if self.is_paused() and not self.state_handler.is_leader():
          if self.cluster.failover and self.cluster.failover.candidate == self.state_handler.name:
              return 'waiting to become master after promote...'

      if not self.is_standby_cluster():
          self._delete_leader()
          return 'removed leader lock because postgres is not running as master'

  if self.update_lock(True):
      msg = self.process_manual_failover_from_leader()
      if msg is not None:
          return msg

      # check if the node is ready to be used by pg_rewind
      self._rewind.ensure_checkpoint_after_promote(self.wakeup)

      if self.is_standby_cluster():
          # in case of standby cluster we don't really need to
          # enforce anything, since the leader is not a master.
          # So just remind the role.
          msg = 'no action. I am ({0}), the standby leader with the lock'.format(self.state_handler.name) \
                if self.state_handler.role == 'standby_leader' else \
                'promoted self to a standby leader because i had the session lock'
          return self.enforce_follow_remote_master(msg)
      else:
          return self.enforce_master_role(
              'no action. I am ({0}), the leader with the lock'.format(self.state_handler.name),
              'promoted self to leader because I had the session lock'
          )
  else:
      # Either there is no connection to DCS or someone else acquired the lock
      logger.error('failed to update leader lock')
      if self.state_handler.is_leader():
          if self.is_paused():
              return 'continue to run as master after failing to update leader lock in DCS'
          self.demote('immediate-nolock')
          return 'demoted self because failed to update leader lock in DCS'
      else:
          return 'not promoting because failed to update leader lock in DCS'

else:
      logger.debug('does not have lock')
  lock_owner = self.cluster.leader and self.cluster.leader.name
  if self.is_standby_cluster():
      return self.follow('cannot be a real primary in a standby cluster',
                         'no action. I am ({0}), a secondary, and following a standby leader ({1})'.format(
                              self.state_handler.name, lock_owner), refresh=False)
  return self.follow('demoting self because I do not have the lock and I was a leader',
                     'no action. I am ({0}), a secondary, and following a leader ({1})'.format(
                          self.state_handler.name, lock_owner), refresh=False)

健康的集群流程，是指当前的集群存在leader节点，对该流程的处理，主要有2个方向：
1.检测当前节点为主节点，进行更新leader锁操作，保持主节点心跳，避免从节点竞争锁，如果更新锁失败，将立即释放锁，让其他从节点抢占；
2.检测当前节点非主节点，作为从节点加入集群。

总结
综上所述，Patroni 是一个用于管理 PostgreSQL 数据库集群的高可用性（HA）管理工具，旨在确保数据库系统的连续可用性，以应对节点故障和维护操作等挑战。Patroni 提供了一系列关键功能和特点，使得它成为强大的高可用性解决方案。
总之，在很多场景中，Patroni能够保持PostgreSQL集群友好的运行，保证在集群异常的情况下，通过自动故障转移、数据同步和备份策略等功能，确保数据库集群的稳定性和可用性，使得应用程序能够持续访问数据，即使在节点故障或维护时也不会中断服务。

参考资源
Patroni配置参数https://patroni.readthedocs.io/en/latest/patroni_configuration.html
Patroni基于2.1.5分支源码https://github.com/zalando/patroni/tree/v2.1.5

你可能感兴趣的:(postgresql,数据库)

Google earth studio 简介陟彼高冈yu 旅游
GoogleEarthStudio是一个基于Web的动画工具，专为创作使用GoogleEarth数据的动画和视频而设计。它利用了GoogleEarth强大的三维地图和卫星影像数据库，使用户能够轻松地创建逼真的地球动画、航拍视频和动态地图可视化。网址为https://www.google.com/earth/studio/。GoogleEarthStudio是一个基于Web的动画工具，专为创作使用G
关于提高复杂业务逻辑代码可读性的思考编程经验分享开发经验 java 数据库开发语言
目录前言需求场景常规写法拆分方法领域对象总结前言实际工作中大部分时间都是在写业务逻辑，一般都是三层架构，表示层（Controller）接收客户端请求，并对入参做检验，业务逻辑层（Service）负责处理业务逻辑，一般开发都是在这一层中写具体的业务逻辑。数据访问层（Dao）是直接和数据库交互的，用于查数据给业务逻辑层，或者是将业务逻辑层处理后的数据写入数据库。简单的增删改查接口不用多说，基本上写好一
SQL Server_查询某一数据库中的所有表的内容 qq_42772833 SQL Server 数据库 sqlserver
1.查看所有表的表名要列出CrabFarmDB数据库中的所有表（名），可以使用以下SQL语句：USECrabFarmDB;--切换到目标数据库GOSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_TYPE='BASETABLE';对这段SQL脚本的解释：SELECTTABLE_NAME：这个语句的作用是从查询结果中选择TABLE_NAM
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
MongoDB Oplog 窗口喝醉酒的小白 MongoDB 运维
在MongoDB中，oplog（操作日志）是一个特殊的日志系统，用于记录对数据库的所有写操作。oplog允许副本集成员（通常是从节点）应用主节点上已经执行的操作，从而保持数据的一致性。它是MongoDB副本集实现数据复制的基础。MongoDBOplog窗口oplog窗口是指在MongoDB副本集中，从节点可以用来同步数据的时间范围。这个窗口通常由以下因素决定：Oplog大小：oplog的大小是有限
python os 环境变量 CV矿工 python 开发语言 numpy
环境变量：环境变量是程序和操作系统之间的通信方式。有些字符不宜明文写进代码里，比如数据库密码，个人账户密码，如果写进自己本机的环境变量里，程序用的时候通过os.environ.get（）取出来就行了。os.environ是一个环境变量的字典。环境变量的相关操作importos"""设置/修改环境变量：os.environ[‘环境变量名称’]=‘环境变量值’#其中key和value均为string类
【PG】常见数据库、表属性设置江无羡数据库
PG的常见属性配置方法数据库复制、备份相关表的复制标识单表操作批量表操作链接数据库复制、备份相关表的复制标识单表操作通过ALTER语句单独更改一张表的复制标识。ALTERTABLE[tablename]REPLICAIDENTITYFULL;批量表操作通过代码块的方式，对某个schema中的所有表一起更新其复制标识。SELECTtablename,CASErelreplidentWHEN'd'TH
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
insert into select 主键自增_mybatis拦截器实现主键自动生成 weixin_39521651 insert into select 主键自增 mybatis delete返回值 mybatis insert返回主键 mybatis insert返回对象 mybatis plus insert返回主键 mybatis plus 插入生成id
前言前阵子和朋友聊天，他说他们项目有个需求，要实现主键自动生成，不想每次新增的时候，都手动设置主键。于是我就问他，那你们数据库表设置主键自动递增不就得了。他的回答是他们项目目前的id都是采用雪花算法来生成，因此为了项目稳定性，不会切换id的生成方式。朋友问我有没有什么实现思路，他们公司的orm框架是mybatis，我就建议他说，不然让你老大把mybatis切换成mybatis-plus。mybat
关于Mysql 中 Row size too large (＞ 8126) 错误的解决和理解秋刀prince mysql mysql 数据库
提示：啰嗦一嘴，数据库的任何操作和验证前，一定要记得先备份！！！不会有错；文章目录问题发现一、问题导致的可能原因1、页大小2、行格式2.1compact格式2.2Redundant格式2.3Dynamic格式2.4Compressed格式3、BLOB和TEXT列二、解决办法1、修改页大小（不推荐）2、修改行格式3、修改数据类型为BLOB和TEXT列4、其他优化方式（可以参考使用）4.1合理设置数据
Java爬虫框架（一）--架构设计狼图腾-狼之传说 java 框架 java 任务 html解析器存储电子商务
一、架构图那里搜网络爬虫框架主要针对电子商务网站进行数据爬取，分析，存储，索引。爬虫：爬虫负责爬取，解析，处理电子商务网站的网页的内容数据库：存储商品信息索引：商品的全文搜索索引Task队列：需要爬取的网页列表Visited表：已经爬取过的网页列表爬虫监控平台：web平台可以启动，停止爬虫，管理爬虫，task队列，visited表。二、爬虫1.流程1)Scheduler启动爬虫器，TaskMast
MongoDB知识概括 GeorgeLin98 持久层 mongodb
MongoDB知识概括MongoDB相关概念单机部署基本常用命令索引-IndexSpirngDataMongoDB集成副本集分片集群安全认证MongoDB相关概念业务应用场景：传统的关系型数据库（如MySQL），在数据操作的“三高”需求以及应对Web2.0的网站需求面前，显得力不从心。解释：“三高”需求：①Highperformance-对数据库高并发读写的需求。②HugeStorage-对海量数
Mongodb Error: queryTxt ETIMEOUT xxxx.wwwdz.mongodb.net 佛一脚 error react mongodb 数据库
背景每天都能遇到奇怪的问题，做个记录，以便有缘人能得到帮助！换了一台电脑开发nextjs程序。需要连接mongodb数据，对数据进行增删改查。上一台电脑好好的程序，新电脑死活连不上mongodb数据库。同一套代码，没任何修改，搞得我怀疑人生了，打开浏览器进入mongodb官网毫无问题，也能进入线上系统查看数据，网络应该是没问题。于是我尝试了一下手机热点，这次代码能正常跑起来，连接数据库了！！！是不
入门MySQL——查询语法练习 K_un
前言：前面几篇文章为大家介绍了DML以及DDL语句的使用方法，本篇文章将主要讲述常用的查询语法。其实MySQL官网给出了多个示例数据库供大家实用查询，下面我们以最常用的员工示例数据库为准，详细介绍各自常用的查询语法。1.员工示例数据库导入官方文档员工示例数据库介绍及下载链接：https://dev.mysql.com/doc/employee/en/employees-installation.h
博客网站制作教程 2401_85194651 java maven
首先就是技术框架：后端：Java+SpringBoot数据库：MySQL前端：Vue.js数据库连接：JPA(JavaPersistenceAPI)1.项目结构blog-app/├──backend/│├──src/main/java/com/example/blogapp/││├──BlogApplication.java││├──config/│││└──DatabaseConfig.java
ubuntu安装wordpress lissettecarlr
1安装nginx网上安装方式很多，这就就直接用apt-get了apt-getinstallnginx不用启动啥，然后直接在浏览器里面输入IP:80就能看到nginx的主页了。如果修改了一些配置可以使用下列命令重启一下systemctlrestartnginx.service2安装mysql输入安装前也可以更新一下软件源，在安装过程中将会让你输入数据库的密码。sudoapt-getinstallmy
深入浅出 -- 系统架构之负载均衡Nginx的性能优化 xiaoli8748_软件开发系统架构系统架构负载均衡 nginx
一、Nginx性能优化到这里文章的篇幅较长了，最后再来聊一下关于Nginx的性能优化，主要就简单说说收益最高的几个优化项，在这块就不再展开叙述了，毕竟影响性能都有多方面原因导致的，比如网络、服务器硬件、操作系统、后端服务、程序自身、数据库服务等，对于性能调优比较感兴趣的可以参考之前《JVM性能调优》中的调优思想。优化一：打开长连接配置通常Nginx作为代理服务，负责分发客户端的请求，那么建议开启H
【RabbitMQ 项目】服务端：数据管理模块之绑定管理月夜星辉雪 rabbitmq 分布式
文章目录一.编写思路二.代码实践一.编写思路定义绑定信息类交换机名称队列名称绑定关键字：交换机的路由交换算法中会用到没有是否持久化的标志，因为绑定是否持久化取决于交换机和队列是否持久化，只有它们都持久化时绑定才需要持久化。绑定就好像一根绳子，两端连接着交换机和队列，当一方不存在，它就没有存在的必要了定义绑定持久化类构造函数：如果数据库文件不存在则创建，打开数据库，创建binding_table插入
计算机毕业设计PHP仓储综合管理系统（源码+程序+VUE+lw+部署） java毕设程序源码王哥 php 课程设计 vue.js
该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程。欢迎交流项目运行环境配置：phpStudy+Vscode+Mysql5.7+HBuilderX+Navicat11+Vue+Express。项目技术：原生PHP++Vue等等组成，B/S模式+Vscode管理+前后端分离等等。环境需要1.运行环境：最好是小皮phpstudy最新版，我们在这个版本上开发的。其他版本理论上也可以。2.开发
3.增删改查--连接查询问女何所忆
关系型数据库的一个特点就是，多张表之间存在关系，以致于我们可以连接多张表进行查询操作，所以连接查询会是关系型数据库中最常见的操作。连接查询主要分为三种，交叉连接、内连接和外连接，我们一个个说。1、交叉连接交叉连接其实连接查询的第一个阶段，它简单表现为两张表的笛卡尔积形式，具体例子：如果你没学过数学中的笛卡尔积概念，你可以这样简单的理解这里的交叉连接：两张表的交叉连接就是一个连接合并的过程，T1表中
docker from指令的含义_多个FROM-含义 weixin_39722188 docker from指令的含义
小编典典什么是基本图片？一组文件，加上EXPOSE端口ENTRYPOINT和CMD。您可以添加文件并基于该基础图像构建新图像，Dockerfile并以FROM指令开头：后面提到的图像FROM是新图像的“基础图像”。这是否意味着如果我neo4j/neo4j在FROM指令中声明，则在运行映像时，neo数据库将自动运行并且可在端口7474的容器中使用？仅当您不覆盖CMD和时ENTRYPOINT。但是图像
Redis:缓存击穿我的程序快快跑啊缓存 redis java
缓存击穿(热点key)：部分key(被高并发访问且缓存重建业务复杂的)失效,无数请求会直接到数据库，造成巨大压力1.互斥锁：可以保证强一致性线程一：未命中之后，获取互斥锁，再查询数据库重建缓存，写入缓存，释放锁线程二：查询未命中，未获得锁(已由线程一获得)，等待一会，缓存命中互斥锁实现方式：redis中setnxkeyvalue:改变对应key的value,仅当value不存在时执行，以此来实现互
mysql学习教程，从入门到精通，TOP 和MySQL LIMIT 子句（15）知识分享小能手大数据数据库 MySQL mysql 学习 oracle 数据库开发语言 adb 大数据
1、TOP和MySQLLIMIT子句内容在SQL中，不同的数据库系统对于限制查询结果的数量有不同的实现方式。TOP关键字主要用于SQLServer和Access数据库中，而LIMIT子句则主要用于MySQL、PostgreSQL（通过LIMIT/OFFSET语法）、SQLite等数据库中。下面将分别详细介绍这两个功能的语法、语句以及案例。1.1、TOP子句（SQLServer和Access）1.1
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your †徐先森® Oracle数据库 Web相关错误集
createtablestudents(idintunsignedprimarykeyauto_increment,namevarchar(50)notnull,ageintunsigned,highdecimal(3,2),genderenum('男','女','中性','保密','妖')default'保密',cls_idintunsigned);在对数据库插入如上带有中文带有默认值的字段的时
Redis 有哪些危险命令？如何防范？花小疯 redis 缓存数据库危险命令大数据
Redis有哪些危险命令？Redis的危险命令主要有以下几个：1.keys客户端可查询出所有存在的键。2.flushdb删除Redis中当前所在数据库中的所有记录，并且此命令从不会执行失败。3.flushall删除Redis中所有数据库中的所有记录，不止是当前所在数据库，并且此命令从不会执行失败。4.config客户端可修改Redis配置。怎么禁用和重命名危险命令？看下redis.conf默认配置
【Golang】 Golang 的 GORM 库中的 Rows 函数不爱洗脚的小滕 golang 开发语言后端
文章目录前言一、Rows函数解释二、代码实现三、总结前言在使用Go语言进行数据库操作时，GORM（GoObject-RelationalMapping）库是一个常用的工具。它提供了一种简洁和强大的方式来处理数据库操作。本文将介绍GORM库中的Rows函数，这是一个用于执行原生SQL查询并返回结果的函数。一、Rows函数解释在GORM库中，Rows函数用于执行原生SQL查询并返回*sql.Rows结
接口测试如何设计测试用例李蕴Ronnie
接口测试用例设计方式针对每个必填参数，都设计一条参数为空的测试用例必填参数不存在传的参数值在数据库中不存在添加数据接口，传入已有的数据重复添加编辑数据接口，各个字段分别编辑，合并编辑参数数据类型限制，针对每个参数设计一条参数值类型不符合的逆向用例参数自身取值范围，针对所有参数，设计一条每个参数值在取值范围内最大值的正向测试用例是否满足前提条件（token、headers），几个前提条件几条用例针对
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
非关系型数据库天秤-white nosql
一、为什么要用Nosql1.单机MySQL的时代。一个基本的网站访问量一般不会太大，单个数据库完全足够。那时候更多使用的静态网页html，服务器根本没有太大压力。这时候网站的瓶颈是什么？-数据量如果太大，一个机器放不下。-数据量太大需要建立数据的索引（B+Tree），一个服务器内存放不下。-访问量读写混合，一个服务器承受不了。2.memcached缓存+MySQL+垂直拆分（读写分离）。网站80%
六、全局锁和表锁：给表加个字段怎么有这么多阻碍 nieniemin
数据库锁设计的初衷是处理并发问题。作为多用户共享的资源，当出现并发访问的时候，数据库需要合理地控制资源的访问规则。而锁就是用来实现这些访问规则的重要数据结构。根据加锁的范围，MySQL里面的锁大致可以分成全局锁、表级锁和行锁三类。6.1全局锁全局锁就是对整个数据库实例加锁。MySQL提供了一个加全局读锁的方法，命令是Flushtableswithreadlock(FTWRL)。当你需要让整个库处于
scala的option和some 矮蛋蛋编程 scala
原文地址： http://blog.sina.com.cn/s/blog_68af3f090100qkt8.html 对于学习 Scala 的 Java™ 开发人员来说，对象是一个比较自然、简单的入口点。在本系列前几期文章中，我介绍了 Scala 中一些面向对象的编程方法，这些方法实际上与 Java 编程的区别不是很大。我还向您展示了 Scala 如何重新应用传统的面向对象概念，找到其缺点
NullPointerException Cb123456 android BaseAdapter
java.lang.NullPointerException: Attempt to invoke virtual method 'int android.view.View.getImportantForAccessibility()' on a null object reference 出现以上异常.然后就在baidu上
PHP使用文件和目录天子之骄 php文件和目录读取和写入 php验证文件 php锁定文件
PHP使用文件和目录 1.使用include()包含文件 (1)：使用include()从一个被包含文档返回一个值 (2)：在控制结构中使用include() include_once()函数需要一个包含文件的路径，此外，第一次调用它的情况和include()一样，如果在脚本执行中再次对同一个文件调用，那么这个文件不会再次包含。在php.ini文件中设置
SQL SELECT DISTINCT 语句何必如此 sql
SELECT DISTINCT 语句用于返回唯一不同的值。 SQL SELECT DISTINCT 语句在表中，一个列可能会包含多个重复值，有时您也许希望仅仅列出不同（distinct）的值。 DISTINCT 关键词用于返回唯一不同的值。 SQL SELECT DISTINCT 语法 SELECT DISTINCT column_name,column_name F
java冒泡排序 3213213333332132 java 冒泡排序
package com.algorithm; /** * @Description 冒泡 * @author FuJianyong * 2015-1-22上午09:58:39 */ public class MaoPao { public static void main(String[] args) { int[] mao = {17,50,26,18,9,10
struts2.18 +json,struts2-json-plugin-2.1.8.1.jar配置及问题！ 7454103 DAO spring Ajax json qq
struts2.18 出来有段时间了！（貌似是稳定版）闲时研究下下！貌似 sruts2 搭配 json 做 ajax 很吃香！实践了下下！不当之处请绕过！呵呵网上一大堆 struts2+json 不过大多的json 插件都是 jsonplugin.34.jar strut
struts2 数据标签说明 darkranger jsp bean struts servlet Scheme
数据标签主要用于提供各种数据访问相关的功能，包括显示一个Action里的属性，以及生成国际化输出等功能数据标签主要包括： action ：该标签用于在JSP页面中直接调用一个Action，通过指定executeResult参数，还可将该Action的处理结果包含到本页面来。 bean ：该标签用于创建一个javabean实例。如果指定了id属性，则可以将创建的javabean实例放入Sta
链表.简单的链表节点构建 aijuans 编程技巧
/*编程环境WIN-TC*/ #include "stdio.h" #include "conio.h" #define NODE(name, key_word, help) \ Node name[1]={{NULL, NULL, NULL, key_word, help}} typedef struct node { &nbs
tomcat下jndi的三种配置方式 avords tomcat
jndi(Java Naming and Directory Interface，Java命名和目录接口)是一组在Java应用中访问命名和目录服务的API。命名服务将名称和对象联系起来，使得我们可以用名称访问对象。目录服务是一种命名服务，在这种服务里，对象不但有名称，还有属性。 tomcat配置
关于敏捷的一些想法 houxinyou 敏捷
从网上看到这样一句话：“敏捷开发的最重要目标就是：满足用户多变的需求，说白了就是最大程度的让客户满意。” 感觉表达的不太清楚。感觉容易被人误解的地方主要在“用户多变的需求”上。第一种多变，实际上就是没有从根本上了解了用户的需求。用户的需求实际是稳定的，只是比较多，也比较混乱，用户一般只能了解自己的那一小部分，所以没有用户能清楚的表达出整体需求。而由于各种条件的，用户表达自己那一部分时也有
富养还是穷养，决定孩子的一生 bijian1013 教育人生
是什么决定孩子未来物质能否丰盛？为什么说寒门很难出贵子，三代才能出贵族？真的是父母必须有钱，才能大概率保证孩子未来富有吗？-----作者：@李雪爱与自由事实并非由物质决定，而是由心灵决定。一朋友富有而且修养气质很好，兄弟姐妹也都如此。她的童年时代，物质上大家都很贫乏，但妈妈总是保持生活中的美感，时不时给孩子们带回一些美好小玩意，从来不对孩子传递生活艰辛、金钱来之不易、要懂得珍惜
oracle 日期时间格式转化征客丶 oracle
oracle 系统时间有 SYSDATE 与 SYSTIMESTAMP； SYSDATE：不支持毫秒，取的是系统时间； SYSTIMESTAMP：支持毫秒，日期，时间是给时区转换的，秒和毫秒是取的系统的。日期转字符窜：一、不取毫秒： TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') 简要说明， YYYY 年 MM 月
【Scala六】分析Spark源代码总结的Scala语法四 bit1129 scala
1. apply语法 FileShuffleBlockManager中定义的类ShuffleFileGroup，定义： private class ShuffleFileGroup(val shuffleId: Int, val fileId: Int, val files: Array[File]) { ... def apply(bucketId
Erlang中有意思的bug bookjovi erlang
代码中常有一些很搞笑的bug，如下面的一行代码被调用两次（Erlang beam） commit f667e4a47b07b07ed035073b94d699ff5fe0ba9b Author: Jovi Zhang <[email protected]> Date: Fri Dec 2 16:19:22 2011 +0100 erts:
移位打印10进制数转16进制-2008-08-18 ljy325 java 基础
/** * Description 移位打印10进制的16进制形式 * Creation Date 15-08-2008 9:00 * @author 卢俊宇 * @version 1.0 * */ public class PrintHex { // 备选字符 static final char di
读《研磨设计模式》-代码笔记-组合模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; abstract class Component { public abstract void printStruct(Str
利用cmd命令将.class文件打包成jar chenyu19891124 cmd jar
cmd命令打jar是如下实现：在运行里输入cmd，利用cmd命令进入到本地的工作盘符。(如我的是D盘下的文件有此路径 D:\workspace\prpall\WEB-INF\classes) 现在是想把D:\workspace\prpall\WEB-INF\classes路径下所有的文件打包成prpall.jar。然后继续如下操作： cd D: 回车 cd workspace/prpal
[原创]JWFD v0.96 工作流系统二次开发包 for Eclipse 简要说明 comsci eclipse 设计模式算法工作 swing
JWFD v0.96 工作流系统二次开发包 for Eclipse 简要说明 &nb
SecureCRT右键粘贴的设置 daizj secureCRT 右键粘贴
一般都习惯鼠标右键自动粘贴的功能，对于SecureCRT6.7.5 ，这个功能也已经是默认配置了。老版本的SecureCRT其实也有这个功能，只是不是默认设置，很多人不知道罢了。菜单： Options->Global Options ...->Terminal 右边有个Mouse的选项块。 Copy on Select Paste on Right/Middle
Linux 软链接和硬链接 dongwei_6688 linux
1.Linux链接概念Linux链接分两种，一种被称为硬链接（Hard Link），另一种被称为符号链接（Symbolic Link）。默认情况下，ln命令产生硬链接。【硬连接】硬连接指通过索引节点来进行连接。在Linux的文件系统中，保存在磁盘分区中的文件不管是什么类型都给它分配一个编号，称为索引节点号(Inode Index)。在Linux中，多个文件名指向同一索引节点是存在的。一般这种连
DIV底部自适应 dcj3sjt126com JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
Centos6.5使用yum安装mysql——快速上手必备 dcj3sjt126com mysql
第1步、yum安装mysql [root@stonex ~]# yum -y install mysql-server 安装结果： Installed: mysql-server.x86_64 0:5.1.73-3.el6_5 &nb
如何调试JDK源码 frank1234 jdk
相信各位小伙伴们跟我一样，想通过JDK源码来学习Java，比如collections包，java.util.concurrent包。可惜的是sun提供的jdk并不能查看运行中的局部变量，需要重新编译一下rt.jar。下面是编译jdk的具体步骤： 1.把C:\java\jdk1.6.0_26\sr
Maximal Rectangle hcx2013 max
Given a 2D binary matrix filled with 0's and 1's, find the largest rectangle containing all ones and return its area. public class Solution { public int maximalRectangle(char[][] matrix)
Spring MVC测试框架详解——服务端测试 jinnianshilongnian spring mvc test
随着RESTful Web Service的流行，测试对外的Service是否满足期望也变的必要的。从Spring 3.2开始Spring了Spring Web测试框架，如果版本低于3.2，请使用spring-test-mvc项目（合并到spring3.2中了）。 Spring MVC测试框架提供了对服务器端和客户端（基于RestTemplate的客户端）提供了支持。 &nbs
Linux64位操作系统（CentOS6.6）上如何编译hadoop2.4.0 liyong0802 hadoop
一、准备编译软件 1.在官网下载jdk1.7、maven3.2.1、ant1.9.4，解压设置好环境变量就可以用。环境变量设置如下：（1）执行vim /etc/profile （2）在文件尾部加入: export JAVA_HOME=/home/spark/jdk1.7 export MAVEN_HOME=/ho
StatusBar 字体白色 pangyulei status
[[UIApplication sharedApplication] setStatusBarStyle:UIStatusBarStyleLightContent]; /*you'll also need to set UIViewControllerBasedStatusBarAppearance to NO in the plist file if you use this method
如何分析Java虚拟机死锁 sesame java thread oracle 虚拟机 jdbc
英文资料： Thread Dump and Concurrency Locks Thread dumps are very useful for diagnosing synchronization related problems such as deadlocks on object monitors. Ctrl-\ on Solaris/Linux or Ctrl-B
位运算简介及实用技巧（一）：基础篇 tw_wangzhengquan 位运算
http://www.matrix67.com/blog/archives/263 去年年底写的关于位运算的日志是这个Blog里少数大受欢迎的文章之一，很多人都希望我能不断完善那篇文章。后来我看到了不少其它的资料，学习到了更多关于位运算的知识，有了重新整理位运算技巧的想法。从今天起我就开始写这一系列位运算讲解文章，与其说是原来那篇文章的follow-up，不如说是一个r
jsearch的索引文件结构 yangshangchuan 搜索引擎 jsearch 全文检索信息检索 word分词
jsearch是一个高性能的全文检索工具包，基于倒排索引，基于java8，类似于lucene，但更轻量级。 jsearch的索引文件结构定义如下： 1、一个词的索引由=分割的三部分组成：第一部分是词第二部分是这个词在多少