一种分布式锁的设计与实现

一种分布式锁的设计与实现

前言

最近在用etcd做分布式服务的存储系统时,需要用到加锁的功能来避免不同进程对同一文件进行读写的竞争问题,查了下文档,发现官方版本的lock model相关的API早在0.4版本就已废弃,不知道为什么新版本一直没有添加进去,查了下资料,github上有个python-etcd的客户端,封装了etcd的api,并额外实现了lock模块,下来大致的看了下,其中分布式锁的设计参考了zookeeper,很有启发,并在其基础上稍微做了修改,现在把该实现思路整理整理,学习备忘。

获取锁实现思路:

  1. 创建一个专用的锁目录,源代码中创建了一个隐藏目录/_locks来保存所有的锁,作为所有所的根目录
  2. 不同的锁以传入的key值不同来进行区分,实现了对不同对象分别加锁的功能,eg:两个不同key值的加锁函数,生成两个节点,lock(key1) 和 lock(key2),会生成/_locks/key1/_locks/key2两个目录,同时生成唯一的uuid作为自身标识按序列写入锁目录(注意源码acquire函数的实现中,write方式为apeend
  3. 进程要获得锁时,希望获得锁的进程读取锁目录,读出锁目录中所有的子节点,并检索出其中最小的节点和离自己最近的节点(差值最小)
  4. 如果步骤3中获取的最小节点就是自己本身,说明当前进程序列号最小,也就是身份标识最小,进程获得锁,否则不获得锁
  5. 如果步骤4中没有获得锁,则进程监视步骤3中查找到的离自己最近的节点的变化
  6. 如果监视节点发生了变化,跳转步骤3继续查询
  7. 释放锁时只需要删除自己创建的节点
  8. 注意设置锁的存活时间(即文件生存周期)

举个例子:etcd是K/V存储系统,设有下列K,V对:
key: /_locks/file1/000000002 vaule: 123456
这说明当前对文件file1 进行了加锁,锁的id(uuid)为123456,000000002 是etcd post方法自动生成的递增序列号,新的锁请求到来时,假设新的锁对象uuid 为 123455,则新插入一条记录
key: /_locks/file1/000000003 vaule: 123455
然后检索/_locks/file1/ 下所有子节点,获得最小的子节点和离自己最近的子节点,该例子中都是00000002,获取的最小子节点不等于自己的序列号,则监视离自己最近的子节点变化,直到最小子节点等于自己的序列号,获得锁。释放锁时直接删除自己序列号所对应节点。序列号需要在插入时由etcd生成,没有序列号的锁对象释放锁时需要通过递归查询所有指定锁目录下的节点,获取所有的 uuid,通过uuid 来取得key值进行删除。
  系统实现的关键点是保证请求锁的进程在写文件时的key值能够按照申请顺序生成,etcd的api 提供了POST方法来在指定目录下自动生成一个当前最大的值作为key,保证了分布式锁的获得顺序与申请顺序相同,同时也提供了watch方法去监视指定key值变化。这两点保证了分布式锁的设计。
  另外分布式锁与系统锁有一点不同需要注意的是分布式锁写在文件中,程序崩溃会不会自动释放,要注意设置好文件(锁)存活时间,防止程序出错后不用的锁长时间存在于系统中造成麻烦。
  在实际项目的应用中,使用uuid来表示锁其实并不是特别方便,限制了加锁和释放锁必须是同一个对象,同时用uuid4的方法生成的随机值有概率重复,不是完全的可靠,在实际应用中使用了结点名称来代替uuid,对代码做了改进,外部看来加锁解锁的操作以分布式系统中的每个结点为基本单位,结点内部表示为一个请求队列,比较好的解决了原有实现的不足。
原始实现相关代码如下:

import logging
import etcd
import uuid

_log = logging.getLogger(__name__)

class Lock(object):
    """
    Locking recipe for etcd, inspired by the kazoo recipe for zookeeper
    """

    def __init__(self, client, lock_name):
        self.client = client
        self.name = lock_name
        # props to Netflix Curator for this trick. It is possible for our
        # create request to succeed on the server, but for a failure to
        # prevent us from getting back the full path name. We prefix our
        # lock name with a uuid and can check for its presence on retry.
        self._uuid = uuid.uuid4().hex
        self.path = "/_locks/{}".format(lock_name)
        self.is_taken = False
        self._sequence = None
        _log.debug("Initiating lock for %s with uuid %s", self.path, self._uuid)

    @property
    def uuid(self):
        """
        The unique id of the lock
        """
        return self._uuid

    @uuid.setter
    def set_uuid(self, value):
        old_uuid = self._uuid
        self._uuid = value
        if not self._find_lock():
            _log.warn("The hand-set uuid was not found, refusing")
            self._uuid = old_uuid
            raise ValueError("Inexistent UUID")

    @property
    def is_acquired(self):
        """
        tells us if the lock is acquired
        """
        if not self.is_taken:
            _log.debug("Lock not taken")
            return False
        try:
            self.client.read(self.lock_key)
            return True
        except etcd.EtcdKeyNotFound:
            _log.warn("Lock was supposedly taken, but we cannot find it")
            self.is_taken = False
            return False

    def acquire(self, blocking=True, lock_ttl=3600, timeout=0):
        """
        Acquire the lock.
        :param blocking Block until the lock is obtained, or timeout is reached
        :param lock_ttl The duration of the lock we acquired, set to None for eternal locks
        :param timeout The time to wait before giving up on getting a lock
        """
        # First of all try to write, if our lock is not present.
        if not self._find_lock():
            _log.debug("Lock not found, writing it to %s", self.path)
            res = self.client.write(self.path, self.uuid, ttl=lock_ttl, append=True)
            self._set_sequence(res.key)
            _log.debug("Lock key %s written, sequence is %s", res.key, self._sequence)
        elif lock_ttl:
            # Renew our lock if already here!
            self.client.write(self.lock_key, self.uuid, ttl=lock_ttl)

        # now get the owner of the lock, and the next lowest sequence
        return self._acquired(blocking=blocking, timeout=timeout)

    def release(self):
        """
        Release the lock
        """
        if not self._sequence:
            self._find_lock()
        try:
            _log.debug("Releasing existing lock %s", self.lock_key)
            self.client.delete(self.lock_key)
        except etcd.EtcdKeyNotFound:
            _log.info("Lock %s not found, nothing to release", self.lock_key)
            pass
        finally:
            self.is_taken = False

    def __enter__(self):
        """
        You can use the lock as a contextmanager
        """
        self.acquire(blocking=True, lock_ttl=0)

    def __exit__(self, type, value, traceback):
        self.release()

    def _acquired(self, blocking=True, timeout=0):
        locker, nearest = self._get_locker()
        self.is_taken = False
        if self.lock_key == locker:
            _log.debug("Lock acquired!")
            # We own the lock, yay!
            self.is_taken = True
            return True
        else:
            self.is_taken = False
            if not blocking:
                return False
            # Let's look for the lock
            watch_key = nearest
            _log.debug("Lock not acquired, now watching %s", watch_key)
            t = max(0, timeout)
            while True:
                try:
                    r = self.client.watch(watch_key, timeout=t)
                    _log.debug("Detected variation for %s: %s", r.key, r.action)
                    return self._acquired(blocking=True, timeout=timeout)
                except etcd.EtcdKeyNotFound:
                    _log.debug("Key %s not present anymore, moving on", watch_key)
                    return self._acquired(blocking=True, timeout=timeout)
                except etcd.EtcdException:
                    # TODO: log something...
                    pass

    @property
    def lock_key(self):
        if not self._sequence:
            raise ValueError("No sequence present.")
        return self.path + '/' + str(self._sequence)

    def _set_sequence(self, key):
        self._sequence = key.replace(self.path, '').lstrip('/')

    def _find_lock(self):
        if self._sequence:
            try:
                res = self.client.read(self.lock_key)
                self._uuid = res.value
                return True
            except etcd.EtcdKeyNotFound:
                return False
        elif self._uuid:
            try:
                for r in self.client.read(self.path, recursive=True).leaves:
                    if r.value == self._uuid:
                        self._set_sequence(r.key)
                        return True
            except etcd.EtcdKeyNotFound:
                pass
        return False

    def _get_locker(self):
        results = [res for res in
                   self.client.read(self.path, recursive=True).leaves]
        if not self._sequence:
            self._find_lock()
        l = sorted([r.key for r in results])
        _log.debug("Lock keys found: %s", l)
        try:
            i = l.index(self.lock_key)
            if i == 0:
                _log.debug("No key before our one, we are the locker")
                return (l[0], None)
            else:
                _log.debug("Locker: %s, key to watch: %s", l[0], l[i-1])
                return (l[0], l[i-1])
        except ValueError:
            # Something very wrong is going on, most probably
            # our lock has expired
            raise etcd.EtcdLockExpired(u"Lock not found")

python-etcd 作者github 地址: python-etcd
相关源码位于 lock.py

你可能感兴趣的:(分布式系统)