PostgreSQL中根据不同对象,不同使用场景,使用到了三种锁,即spinLock,LWLock,Lock
SpinLock也就是所谓的自旋锁,是并发场景下(多进程/线程),保护共享资源的一种机制。实现的成本最低,一般是使用基于硬件的TAS操作(test-and-set来实现的)。显著的特点是审请锁的进程一直在尝试能否加锁成功,只有等到持有锁的线程释放锁之后才可以获取锁。在等待锁的过程中进程并不是切入内核态进行sleep,而是忙等待,即忙循环–旋转–等待锁重新可用,因此一直在user态使用cpu。该锁只有独占一种模式
因此使用场景为:占锁时间短,对临界资源进行简单访问,并且临界区较短。
PostgreSQL中spinLock的使用:
1.CPU指令集TAS方式:
/*
* s_lock(lock) - platform-independent portion of waiting for a spinlock.
*/
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
SpinDelayStatus delayStatus;
// 初始化SpinLock的状态信息
init_spin_delay(&delayStatus, file, line, func);
while (TAS_SPIN(lock)) //这里调用TAS
{
// spins,在cpu级别有一个delay时间,另外当spin次数大于100,
// 在此函数中会随机休眠1ms到1s
perform_spin_delay(&delayStatus);
}
// 获取锁后,根据delay的结果调整进入休眠的spin次数,如果,在获取锁的时候
// 没有休眠过,那么可以把进入休眠spin的次数调大。如果休眠过,表示锁竞争大,
// 就把进入休眠spin的次数降低,减少CPU消耗。
finish_spin_delay(&delayStatus);
return delayStatus.delays;
TAS函数实现
#ifdef __x86_64__ /* AMD Opteron, Intel EM64T */
#define HAS_TEST_AND_SET
typedef unsigned char slock_t;
#define TAS(lock) tas(lock)
/*
* On Intel EM64T, it's a win to use a non-locking test before the xchg proper,
* but only when spinning.
*
* See also Implementing Scalable Atomic Locks for Multi-Core Intel(tm) EM64T
* and IA32, by Michael Chynoweth and Mary R. Lee. As of this writing, it is
* available at:
* http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures
*/
#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))
static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;
__asm__ __volatile__(
" lock \n"
" xchgb %0,%1 \n"
: "+q"(_res), "+m"(*lock)
: /* no inputs */
: "memory", "cc");
return (int) _res;
}
2.使用semaphore实现
如果DB运行的平台没有test-and-set指令,则使用PGsemaphore实现SpinLock。PG中默认有128个信号量用于SpinLock(系统默认最大同时可用的semaphore为128,cat /proc/sys/kernel/sem 查看),PG信号量实现的加锁逻辑如下:
int
tas_sema(volatile slock_t *lock)
{
int lockndx = *lock;
if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
elog(ERROR, "invalid spinlock number: %d", lockndx);
/* Note that TAS macros return 0 if *success* */
return !PGSemaphoreTryLock(SpinlockSemaArray[lockndx - 1]);
}
/*
* PGSemaphoreTryLock
*
* Lock a semaphore only if able to do so without blocking
*/
bool
PGSemaphoreTryLock(PGSemaphore sema)
{
int errStatus;
/*
* Note: if errStatus is -1 and errno == EINTR then it means we returned
* from the operation prematurely because we were sent a signal. So we
* try and lock the semaphore again.
*/
do
{
errStatus = sem_trywait(PG_SEM_REF(sema));
} while (errStatus < 0 && errno == EINTR);
if (errStatus < 0)
{
if (errno == EAGAIN || errno == EDEADLK)
return false; /* failed to lock it */
/* Otherwise we got trouble */
elog(FATAL, "sem_trywait failed: %m");
}
return true;
}
由于SpinLock不能用于需要长久持有锁的逻辑,在PostgreSQL中,SpinLock主要用于对于临界变量的并发访问控制,所保护的临界区通常是简单的赋值语句,读取语句等等。
LWlock:Lightweight Lock,即所谓的轻量级锁,这个轻量是相对第三种Lock而言的。基于spinLock实现,除了独占模式(互斥),还多了一种共享模式和一种special mode。
typedef enum LWLockMode
{
LW_EXCLUSIVE,
LW_SHARED,
LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwlockMode,
* when waiting for lock to become free. Not
* to be used as LWLockAcquire argument */
} LWLockMode;
其主要是以互斥访问的方式用来保护共享内存数据结构,比如Clog buffer(事务提交状态缓存)、Shared buffers(数据页缓存)、wal buffer(wal缓存)等等。
LWlock数据结构定义:
typedef struct LWLock
{
uint16 tranche; /* tranche ID */
pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */
proclist_head waiters; /* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
pg_atomic_uint32 nwaiters; /* number of waiters */
struct PGPROC *owner; /* last exclusive owner of the lock */
#endif
} LWLock;
在PostgreSQL中LWLock根据使用场景不同,被细化为多个子模块
/*
* Every tranche ID less than NUM_INDIVIDUAL_LWLOCKS is reserved; also,
* we reserve additional tranche IDs for builtin tranches not included in
* the set of individual LWLocks. A call to LWLockNewTrancheId will never
* return a value less than LWTRANCHE_FIRST_USER_DEFINED.
*/
typedef enum BuiltinTrancheIds
{
LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
LWTRANCHE_COMMITTS_BUFFERS,
LWTRANCHE_SUBTRANS_BUFFERS,
LWTRANCHE_MXACTOFFSET_BUFFERS,
LWTRANCHE_MXACTMEMBER_BUFFERS,
LWTRANCHE_ASYNC_BUFFERS,
LWTRANCHE_OLDSERXID_BUFFERS,
LWTRANCHE_WAL_INSERT,
LWTRANCHE_BUFFER_CONTENT,
LWTRANCHE_BUFFER_IO_IN_PROGRESS,
LWTRANCHE_REPLICATION_ORIGIN,
LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS,
LWTRANCHE_PROC,
LWTRANCHE_BUFFER_MAPPING,
LWTRANCHE_LOCK_MANAGER,
LWTRANCHE_PREDICATE_LOCK_MANAGER,
LWTRANCHE_PARALLEL_HASH_JOIN,
LWTRANCHE_PARALLEL_QUERY_DSA,
LWTRANCHE_SESSION_DSA,
LWTRANCHE_SESSION_RECORD_TABLE,
LWTRANCHE_SESSION_TYPMOD_TABLE,
LWTRANCHE_SHARED_TUPLESTORE,
LWTRANCHE_TBM,
LWTRANCHE_PARALLEL_APPEND,
LWTRANCHE_FIRST_USER_DEFINED
} BuiltinTrancheIds;
const char *const MainLWLockNames[] = {
"" ,
"ShmemIndexLock",
"OidGenLock",
"XidGenLock",
"ProcArrayLock",
"SInvalReadLock",
"SInvalWriteLock",
"WALBufMappingLock",
"WALWriteLock",
"ControlFileLock",
"CheckpointLock",
"CLogControlLock",
"SubtransControlLock",
"MultiXactGenLock",
"MultiXactOffsetControlLock",
"MultiXactMemberControlLock",
"RelCacheInitLock",
"CheckpointerCommLock",
"TwoPhaseStateLock",
"TablespaceCreateLock",
"BtreeVacuumLock",
"AddinShmemInitLock",
"AutovacuumLock",
"AutovacuumScheduleLock",
"SyncScanLock",
"RelationMappingLock",
"AsyncCtlLock",
"AsyncQueueLock",
"SerializableXactHashLock",
"SerializableFinishedListLock",
"SerializablePredicateLockListLock",
"OldSerXidLock",
"SyncRepLock",
"BackgroundWorkerLock",
"DynamicSharedMemoryControlLock",
"AutoFileLock",
"ReplicationSlotAllocationLock",
"ReplicationSlotControlLock",
"CommitTsControlLock",
"CommitTsLock",
"ReplicationOriginLock",
"MultiXactTruncationLock",
"OldSnapshotTimeMapLock",
"LogicalRepWorkerLock",
"CLogTruncationLock"
};
LWLock的初始化:
在PG初始化shared mem和信号量时,会初始化LWLock array(CreateLWLocks)。
具体为:
/*
* Initialize LWLocks that are fixed and those belonging to named tranches.
*/
static void
InitializeLWLocks(void)
{
int numNamedLocks = NumLWLocksByNamedTranches();
int id;
int i;
int j;
LWLockPadded *lock;
/* Initialize all individual LWLocks in main array */
/* 初始化BuiltinTrancheIds enum成员*/
for (id = 0, lock = MainLWLockArray; id < NUM_INDIVIDUAL_LWLOCKS; id++, lock++)
LWLockInitialize(&lock->lock, id);
/* Initialize buffer mapping LWLocks in main array */
lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS;
for (id = 0; id < NUM_BUFFER_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_BUFFER_MAPPING);
/* Initialize lmgrs' LWLocks in main array */
lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS + NUM_BUFFER_PARTITIONS;
for (id = 0; id < NUM_LOCK_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_LOCK_MANAGER);
/* Initialize predicate lmgrs' LWLocks in main array */
lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
NUM_BUFFER_PARTITIONS + NUM_LOCK_PARTITIONS;
for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);
/* Initialize named tranches. */
if (NamedLWLockTrancheRequests > 0)
{
char *trancheNames;
NamedLWLockTrancheArray = (NamedLWLockTranche *)
&MainLWLockArray[NUM_FIXED_LWLOCKS + numNamedLocks];
trancheNames = (char *) NamedLWLockTrancheArray +
(NamedLWLockTrancheRequests * sizeof(NamedLWLockTranche));
lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];
for (i = 0; i < NamedLWLockTrancheRequests; i++)
{
NamedLWLockTrancheRequest *request;
NamedLWLockTranche *tranche;
char *name;
request = &NamedLWLockTrancheRequestArray[i];
tranche = &NamedLWLockTrancheArray[i];
name = trancheNames;
trancheNames += strlen(request->tranche_name) + 1;
strcpy(name, request->tranche_name);
tranche->trancheId = LWLockNewTrancheId();
tranche->trancheName = name;
for (j = 0; j < request->num_lwlocks; j++, lock++)
LWLockInitialize(&lock->lock, tranche->trancheId);
}
}
}
/*
* Register named tranches and tranches for fixed LWLocks.
*/
static void
RegisterLWLockTranches(void)
{
int i;
if (LWLockTrancheArray == NULL)
{
LWLockTranchesAllocated = 128;
LWLockTrancheArray = (const char **)
MemoryContextAllocZero(TopMemoryContext,
LWLockTranchesAllocated * sizeof(char *));
Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED);
}
for (i = 0; i < NUM_INDIVIDUAL_LWLOCKS; ++i)
/* 注册MainLWLockNames[] array成员*/
LWLockRegisterTranche(i, MainLWLockNames[i]);
LWLockRegisterTranche(LWTRANCHE_BUFFER_MAPPING, "buffer_mapping");
LWLockRegisterTranche(LWTRANCHE_LOCK_MANAGER, "lock_manager");
LWLockRegisterTranche(LWTRANCHE_PREDICATE_LOCK_MANAGER,
"predicate_lock_manager");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
"parallel_query_dsa");
LWLockRegisterTranche(LWTRANCHE_SESSION_DSA,
"session_dsa");
LWLockRegisterTranche(LWTRANCHE_SESSION_RECORD_TABLE,
"session_record_table");
LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
"session_typmod_table");
LWLockRegisterTranche(LWTRANCHE_SHARED_TUPLESTORE,
"shared_tuplestore");
LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");
/* Register named tranches. */
for (i = 0; i < NamedLWLockTrancheRequests; i++)
LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
NamedLWLockTrancheArray[i].trancheName);
}
LWLock的使用:
1.获取锁
调用LWLockAcquire(LWLock *lock, LWLockMode mode)函数来加锁,其中mode可以为LW_SHARED(共享)和LW_EXCLUSIVE(排他)。
加锁时,首先把需要加的锁放入等待队列,然后通过LWLock中的state状态判断是否可以加锁成功,如果可以加锁成功,使用原子操作campare and set来修改LWLock的状态,把锁从等待队列中删除。否则,需要等锁。
还可以使用LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)来获取锁,与LWLockAcquire不同的是如果获取不到直接返回,不会休眠等待。
LWLockAcquireOrWait函数,如果加锁不成功,会一直等待,但是如果锁状态变为free之后,不会再加锁而是直接返回;当前这个函数在WALWriteLock中被使用,当一个backend需要flush WAL时,会加上WALWriteLock,然后会顺带把其它backend产生的WAL也flush了,因此,其它等锁去flush WAL的backend其实也并不需要再去flush WAL了
/*
* LWLockAcquire - acquire a lightweight lock in the specified mode
*
* If the lock is not available, sleep until it is. Returns true if the lock
* was available immediately, false if we had to sleep.
*
* Side effect: cancel/die interrupts are held off until lock release.
*/
bool
LWLockAcquire(LWLock *lock, LWLockMode mode)
{
PGPROC *proc = MyProc;
bool result = true;
int extraWaits = 0;
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
lwstats = get_lwlock_stats_entry(lock);
#endif
AssertArg(mode == LW_SHARED || mode == LW_EXCLUSIVE);
PRINT_LWDEBUG("LWLockAcquire", lock, mode);
#ifdef LWLOCK_STATS
/* Count lock acquisition attempts */
if (mode == LW_EXCLUSIVE)
lwstats->ex_acquire_count++;
else
lwstats->sh_acquire_count++;
#endif /* LWLOCK_STATS */
/*
* We can't wait if we haven't got a PGPROC. This should only occur
* during bootstrap or shared memory initialization. Put an Assert here
* to catch unsafe coding practices.
*/
Assert(!(proc == NULL && IsUnderPostmaster));
/* Ensure we will have room to remember the lock */
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
* manipulations of data structures in shared memory.
*/
HOLD_INTERRUPTS();
/*
* Loop here to try to acquire lock after each time we are signaled by
* LWLockRelease.
*
* NOTE: it might seem better to have LWLockRelease actually grant us the
* lock, rather than retrying and possibly having to go back to sleep. But
* in practice that is no good because it means a process swap for every
* lock acquisition when two or more processes are contending for the same
* lock. Since LWLocks are normally used to protect not-very-long
* sections of computation, a process needs to be able to acquire and
* release the same lock many times during a single CPU time slice, even
* in the presence of contention. The efficiency of being able to do that
* outweighs the inefficiency of sometimes wasting a process dispatch
* cycle because the lock is not free when a released waiter finally gets
* to run. See pgsql-hackers archives for 29-Dec-01.
*/
/* 主循环*/
for (;;)
{
bool mustwait;
/*
* Try to grab the lock the first time, we're not in the waitqueue
* yet/anymore.
*/
/* 第一次尝试加锁,如果成功,函数返回false,并跳出循环 */
mustwait = LWLockAttemptLock(lock, mode);
/* mustwait 为false,说明加锁成功,跳出循环 */
if (!mustwait)
{
LOG_LWDEBUG("LWLockAcquire", lock, "immediately acquired lock");
break; /* got the lock */
}
/*
* Ok, at this point we couldn't grab the lock on the first try. We
* cannot simply queue ourselves to the end of the list and wait to be
* woken up because by now the lock could long have been released.
* Instead add us to the queue and try to grab the lock again. If we
* succeed we need to revert the queuing and be happy, otherwise we
* recheck the lock. If we still couldn't grab it, we know that the
* other locker will see our queue entries when releasing since they
* existed before we checked for the lock.
*/
/* 第一次尝试加锁失败,因此将该锁加入到等待队列中*/
/* add to the queue */
LWLockQueueSelf(lock, mode);
/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(lock, mode);
/* 第二次尝试获取锁成功,成功获取后,将锁从等待队列中撤销 */
/* ok, grabbed the lock the second time round, need to undo queueing */
if (!mustwait)
{
LOG_LWDEBUG("LWLockAcquire", lock, "acquired, undoing queue");
LWLockDequeueSelf(lock);
break;
}
/*
* Wait until awakened.
*
* Since we share the process wait semaphore with the regular lock
* manager and ProcWaitForSignal, and we may need to acquire an LWLock
* while one of those is pending, it is possible that we get awakened
* for a reason other than being signaled by LWLockRelease. If so,
* loop back and wait again. Once we've gotten the LWLock,
* re-increment the sema by the number of additional signals received,
* so that the lock manager or signal manager will see the received
* signal when it next waits.
*/
/* 以下就是等锁逻辑了,是通过PGSemaphoreLock函数实现的 */
LOG_LWDEBUG("LWLockAcquire", lock, "waiting");
#ifdef LWLOCK_STATS
lwstats->block_count++;
#endif
/* 记录当前等待事件类型为LW_Lock,并传递具体事件lock->tranche (tranche这个枚举成员在文章前边展示过)*/
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
/* 等锁操作 */
for (;;)
{ /* 信号量加锁 */
PGSemaphoreLock(proc->sem);
if (!proc->lwWaiting)
break;
extraWaits++;
}
/* Retrying, allow LWLockRelease to release waiters again. */
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
#ifdef LOCK_DEBUG
{
/* not waiting anymore */
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
Assert(nwaiters < MAX_BACKENDS);
}
#endif
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
LWLockReportWaitEnd();
/* 等锁结束 */
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
/* Now loop back and try to acquire lock again. */
result = false;
}
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
/*
* Fix the process wait semaphore's count for any absorbed wakeups.
*/
/* 信号量解锁 */
while (extraWaits-- > 0)
PGSemaphoreUnlock(proc->sem);
return result;
}
等锁是由PGSemaphoreLock函数完成的,当没有加上锁时,会等待一个信号量proc->sem(此时会休眠,不会消耗CPU)。
/*
* PGSemaphoreLock
*
* Lock a semaphore (decrement count), blocking if count would be < 0
*/
void
PGSemaphoreLock(PGSemaphore sema)
{
int errStatus;
/* See notes in sysv_sema.c's implementation of PGSemaphoreLock. */
do
{ /* 调用sem_wait函数,等待信号量,如果信号量的值大于0*/
/* 将信号量的值减1,立即返回。如果信号量的值为0,则线程阻塞。*/
/* 相当于P操作。成功返回0,失败返回-1 */
/* sem指向的对象是由sem_init调用初始化的信号量*/
errStatus = sem_wait(PG_SEM_REF(sema));
} while (errStatus < 0 && errno == EINTR);
if (errStatus < 0)
elog(FATAL, "sem_wait failed: %m");
}
Lock是pg中的重量级锁,主要用来操作数据库对象,分类如下:
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock 0
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE INDEX
* CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
* SHARE */
#define ExclusiveLock 7 /* blocks ROW SHARE/SELECT...FOR UPDATE */
#define AccessExclusiveLock 8 /* ALTER TABLE, DROP TABLE, VACUUM FULL,
* and unqualified LOCK TABLE */
Lock的使用场景较多,我们拿一个update语句执行堆栈来分析加锁,等锁过程
[postgres@postgres_zabbix ~]$ pstack 21318
#0 0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x0000000000853571 in WaitEventSetWaitBlock (set=0x2164778, cur_timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1) at latch.c:1080
#2 0x000000000085344c in WaitEventSetWait (set=0x2164778, timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1, wait_event_info=50331652) at latch.c:1032
#3 0x0000000000852d38 in WaitLatchOrSocket (latch=0x7f17001cf5c4, wakeEvents=33, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:407
#4 0x0000000000852c03 in WaitLatch (latch=0x7f17001cf5c4, wakeEvents=33, timeout=0, wait_event_info=50331652) at latch.c:347
#5 0x0000000000867ccb in ProcSleep (locallock=0x2072938, lockMethodTable=0xb8f5a0 <default_lockmethod>) at proc.c:1289
#6 0x0000000000861f98 in WaitOnLock (locallock=0x2072938, owner=0x2081d40) at lock.c:1768
#7 0x00000000008610be in LockAcquireExtended (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#8 0x0000000000860713 in LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) at lock.c:713
#9 0x000000000085f592 in XactLockTableWait (xid=501, rel=0x7f1707c8fb10, ctid=0x7ffcac6f1b44, oper=XLTW_Update) at lmgr.c:658
#10 0x00000000004c99c9 in heap_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, newtup=0x2164708, cid=1, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c) at heapam.c:3228
#11 0x00000000004d411c in heapam_tuple_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at heapam_handler.c:332
#12 0x00000000006db007 in table_tuple_update (rel=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at ../../../src/include/access/tableam.h:1275
#13 0x00000000006dce83 in ExecUpdate (mtstate=0x2160b40, tupleid=0x7ffcac6f1e60, oldtuple=0x0, slot=0x2162108, planSlot=0x21613a0, epqstate=0x2160c38, estate=0x21607c0, canSetTag=true) at nodeModifyTable.c:1311
#14 0x00000000006de36c in ExecModifyTable (pstate=0x2160b40) at nodeModifyTable.c:2222
#15 0x00000000006b2b07 in ExecProcNodeFirst (node=0x2160b40) at execProcnode.c:445
#16 0x00000000006a8ce7 in ExecProcNode (node=0x2160b40) at ../../../src/include/executor/executor.h:239
#17 0x00000000006ab063 in ExecutePlan (estate=0x21607c0, planstate=0x2160b40, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2146860, execute_once=true) at execMain.c:1646
#18 0x00000000006a91c4 in standard_ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
#19 0x00000000006a9069 in ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308
#20 0x000000000088017a in ProcessQuery (plan=0x2146780, sourceText=0x204b040 "update test_tbl set id=4 where id=3;", params=0x0, queryEnv=0x0, dest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:161
#21 0x00000000008818c1 in PortalRunMulti (portal=0x20b6570, isTopLevel=true, setHoldSnapshot=false, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:1283
#22 0x0000000000880efb in PortalRun (portal=0x20b6570, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:796
#23 0x000000000087b28f in exec_simple_query (query_string=0x204b040 "update test_tbl set id=4 where id=3;") at postgres.c:1215
#24 0x000000000087f30f in PostgresMain (argc=1, argv=0x207a6e0, dbname=0x207a578 "postgres", username=0x207a558 "postgres") at postgres.c:4247
#25 0x00000000007e6a9e in BackendRun (port=0x20702a0) at postmaster.c:4437
#26 0x00000000007e629d in BackendStartup (port=0x20702a0) at postmaster.c:4128
#27 0x00000000007e293d in ServerLoop () at postmaster.c:1704
#28 0x00000000007e21fd in PostmasterMain (argc=1, argv=0x2045c00) at postmaster.c:1377
#29 0x000000000070f76d in main (argc=1, argv=0x2045c00) at main.c:228
[postgres@postgres_zabbix ~]$
这是一个被阻塞的update语句,目前在等锁状态(等待之前持有锁的事务提交)。
postgres=# select pid,wait_event_type,wait_event,query from pg_stat_activity where pid=21318;
-[ RECORD 1 ]---+-------------------------------------
pid | 21318
wait_event_type | Lock
wait_event | transactionid
query | update test_tbl set id=4 where id=3;
加锁:
调用LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) ,申请的LockMode为5,即 ShareLock
LockAcquire函数体计较长,这里只概述大致的逻辑:
1)根据locktag中给定的需要加锁对象的相关信息查询hash表。因为同一锁可能被持有多次,为了加快访问速度,故而将这些所缓存在hash table中。
例如:
当我们需要执行对table进行加锁操作时,会将我们所需要操作的数据库编号,表的编号等信息存储在locktag中;
当我们需要执行对Tuple进行加锁操作时候,会将数据库编号,表的编号,块号及相应的偏移量等信息设置在locktag中。SET_LOCKTAG_XXX完成了对于相应LOCKTAG的设置工作;
因此首先是查找LocalLOCK hash表并根据结果进行相应的处理;
locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash,
(void *) &localtag,
HASH_ENTER, &found);
2)检查该对象是否已经获取相应的锁;
3)依据相应条件,对该锁申请操作添加WAL日志;
4)进行锁冲突检测;
5)当不存在相应的访问冲突后,则进行锁申请操作并记录下该资源对于锁的使用情况;当发现存在着访问冲突后,需要进行锁等待处理,使用WaitOnLock进行等待(底层是epoll实现的);
6)告知锁的申请结果
等锁:
调用WaitOnLock (locallock=0x2072938, owner=0x2081d40)
底层实现是通过epoll实现的,可以看到顶层堆栈在epoll_wait函数中
#0 0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
释放锁:
调用:LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
不详细分析了,在事务提交/回滚后释放锁。
spinlock
主要特点:轻量锁,只有排他一种模式,无等待队列,等锁为忙等待,cpu空转
使用场景:占锁时间短,对临界资源进行简单访问,并且临界区较短。临界区通常是简单的赋值语句,读取语句等等
LWLock
主要特点:轻量级锁,除了排他模式,还存在共享模式。存在等待队列,等锁通过sem_wait实现
使用场景:临界区较长,且逻辑关系比较复杂,对临界资源的操作比较复杂。比如操作Clog buffer(事务提交状态缓存)、Shared buffers(数据页缓存)、wal buffer(wal缓存)等等。
Lock
主要特点:重量级锁,持有时间可以很长,等锁通过epoll实现
使用场景:对所有数据库对象的操作,例如表的增删改查
参考:
https://zhuanlan.zhihu.com/p/73517810
http://www.postgres.cn/news/viewone/1/241