感谢朋友支持本博客,欢迎共同探讨交流,由于能力和时间有限,错误之处在所难免,欢迎指正!
如果转载,请保留作者信息。
博客地址:http://blog.csdn.net/gaoxingnengjisuan
邮箱地址:[email protected]
ring是swift中的核心组件,它描述和决定了数据如何在集群系统中分布。其中的一致性哈希算法更是核心内容之一。
swift-ring-builder中包含了对ring的各种操作方法,包括create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等方法,我将会逐个解析这写方法的实现过程;
首先来看swift-ring-builder中的main方法:
if __name__ == '__main__': if len(argv) < 2: print "swift-ring-builder %(MAJOR_VERSION)s.%(MINOR_VERSION)s\n" % globals() print Commands.default.__doc__.strip() print cmds = [c for c, f in Commands.__dict__.iteritems() if f.__doc__ and c[0] != '_' and c != 'default'] cmds.sort() for cmd in cmds: print Commands.__dict__[cmd].__doc__.strip() print print RingBuilder.search_devs.__doc__.strip() print for line in wrap(' '.join(cmds), 79, initial_indent='Quick list: ', subsequent_indent=' '): print line print ('Exit codes: 0 = operation successful\n' ' 1 = operation completed with warnings\n' ' 2 = error') exit(EXIT_SUCCESS) # 验证argv[1]的路径是否存在; if exists(argv[1]): # builder从argv[1]指定文件获取初始化数据; # 再对RingBuilder类进行实例化; # 根据具体情况决定是否改变builder的初始化数据; builder = RingBuilder.load(argv[1]) # 如果argv[1]不存在,而且调用的还不是'create'命令,则退出; elif len(argv) < 3 or argv[2] != 'create': print 'Ring Builder file does not exist: %s' % argv[1] exit(EXIT_ERROR) # 获取文件夹'backups'的整体路径; backup_dir = pathjoin(dirname(argv[1]), 'backups') # 新建文件夹'backups'; try: mkdir(backup_dir) except OSError, err: if err.errno != EEXIST: raise # 获取argv[1],作为ring_file; ring_file = argv[1] if ring_file.endswith('.builder'): ring_file = ring_file[:-len('.builder')] ring_file += '.ring.gz' if len(argv) == 2: command = "default" else: command = argv[2] if argv[0].endswith('-safe'): try: with lock_parent_directory(abspath(argv[1]), 15): Commands.__dict__.get(command, Commands.unknown.im_func)() except exceptions.LockTimeout: print "Ring/builder dir currently locked." exit(2) else: Commands.__dict__.get(command, Commands.unknown.im_func)()这个main方法主要做了以下几件事:
(1)检测命令行的正确性;
(2)如果argv[1]文件存在,则调用类RingBuilder中的方法load加载argv[1]文件,并返回类RingBuilder的实例化对象;
(3)建立backups文件夹;
(4)初始化ring_file文件;
(5)调用运行command中指定的处理ring的方法;
在类Commands中有若干处理ring的方法,如create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等等。下面我们来逐一解析这些方法:
1.来看方法create:
def create(): if len(argv) < 6: print Commands.create.__doc__.strip() exit(EXIT_ERROR) builder = RingBuilder(int(argv[3]), float(argv[4]), int(argv[5])) backup_dir = pathjoin(dirname(argv[1]), 'backups') try: mkdir(backup_dir) except OSError, err: if err.errno != EEXIST: raise # Python中可以使用 pickle 模块将对象转化为文件保存在磁盘上,在需要的时候再读取并还原。 # 这里即是把转化为字典格式的builder写入到文件pathjoin(backup_dir,'%d.' % time() + basename(argv[1]))文件中; # builder.to_dict():以字典的形式返回初始化的RingBuilder类的对象; pickle.dump(builder.to_dict(), open(pathjoin(backup_dir,'%d.' % time() + basename(argv[1])), 'wb'), protocol=2) # 再把转化为字典格式的builder写入到argv[1]指明的文件中; pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2) exit(EXIT_SUCCESS)这个方法主要实现的功能是:
(1)根据命令初始化类RingBuilder,获取类RingBuilder的实例化对象;
(2)创建用于备份的文件夹'backups';
(3)使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;这里具体是把转化为字典格式的builder保存两份,一份写入到建立的文件夹'backups'中的指定文件中,一份写入到argv[1]指明的文件中;
2.来看方法default:
def default(): print '%s, build version %d' % (argv[1], builder.version) regions = 0 zones = 0 balance = 0 dev_count = 0 if builder.devs: regions = len(set(d['region'] for d in builder.devs if d is not None)) zones = len(set((d['region'], d['zone']) for d in builder.devs if d is not None)) dev_count = len([d for d in builder.devs if d is not None]) balance = builder.get_balance() print '%d partitions, %.6f replicas, %d regions, %d zones, ' \ '%d devices, %.02f balance' % (builder.parts, builder.replicas, regions, zones, dev_count, balance) print 'The minimum number of hours before a partition can be ' \ 'reassigned is %s' % builder.min_part_hours if builder.devs: print 'Devices: id region zone ip address port' \ ' name weight partitions balance meta' weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None) for dev in builder.devs: if dev is None: continue if not dev['weight']: if dev['parts']: balance = 999.99 else: balance = 0 else: balance = 100.0 * dev['parts'] / (dev['weight'] * weighted_parts) - 100.0 print(' %5d %5d %5d %15s %5d %9s %6.02f %10s' '%7.02f %s' % (dev['id'], dev['region'], dev['zone'], dev['ip'], dev['port'], dev['device'], dev['weight'], dev['parts'], balance, dev['meta'])) exit(EXIT_SUCCESS)
这个方法主要实现的功能是显示ring和设备内部的信息;
3.来看方法search:
def search(): if len(argv) < 4: print Commands.search.__doc__.strip() print print builder.search_devs.__doc__.strip() exit(EXIT_ERROR) # search_devs:<search-value>可以通过<device_id>/<region>/<zone>-<ip>:<port>/<device_name>_<meta>几种格式来进行查询; devs = builder.search_devs(argv[3]) if not devs: print 'No matching devices found' exit(EXIT_ERROR) print 'Devices: id region zone ip address port name ' \ 'weight partitions balance meta' weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None) for dev in devs: if not dev['weight']: if dev['parts']: balance = 999.99 else: balance = 0 else: balance = 100.0 * dev['parts'] / \ (dev['weight'] * weighted_parts) - 100.0 print(' %5d %5d %5d %15s %5d %9s %6.02f %10s %7.02f %s' % (dev['id'], dev['region'], dev['zone'], dev['ip'], dev['port'], dev['device'], dev['weight'], dev['parts'], balance, dev['meta'])) exit(EXIT_SUCCESS)这个方法主要实现了显示匹配的设备信息的功能;
主要做了以下几件事:
(1)验证命令行正确性;
(2)调用方法search_devs实现对设备信息的搜索功能;
(3)遍历得到的匹配设备信息,组成输出信息并进行打印输出;
我们来看看方法search_devs:
def search_devs(self, search_value): """ <search-value>可以通过<device_id>/<region>/<zone>-<ip>:<port>/<device_name>_<meta>几种格式来进行查询; Examples:: d74 Matches the device id 74 r4 Matches devices in region 4 z1 Matches devices in zone 1 z1-1.2.3.4 Matches devices in zone 1 with the ip 1.2.3.4 1.2.3.4 Matches devices in any zone with the ip 1.2.3.4 z1:5678 Matches devices in zone 1 using port 5678 :5678 Matches devices that use port 5678 /sdb1 Matches devices with the device name sdb1 _shiny Matches devices with shiny in the meta data _"snet: 5.6.7.8" Matches devices with snet: 5.6.7.8 in the meta data [::1] Matches devices in any zone with the ip ::1 z1-[::1]:5678 Matches devices in zone 1 with ip ::1 and port 5678 Most specific example:: d74r4z1-1.2.3.4:5678/sdb1_"snet: 5.6.7.8" """ orig_search_value = search_value match = [] if search_value.startswith('d'): ...... if search_value.startswith('r'): ...... if search_value.startswith('z'): ...... if search_value.startswith('-'): ...... if len(search_value) and search_value[0].isdigit(): ...... elif len(search_value) and search_value[0] == '[': ...... if search_value.startswith(':'): ...... if search_value.startswith('/'): ...... if search_value.startswith('_'): ...... if search_value: ...... matched_devs = [] for dev in self.devs: if not dev: continue matched = True for key, value in match: if key == 'meta': if value not in dev.get(key): matched = False elif dev.get(key) != value: matched = False if matched: matched_devs.append(dev) return matched_devs
在这个方法中我们要注意的地方就是搜索指令的格式;
4.来看方法add:
def add(): """ swift-ring-builder <builder_file> add [r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight> [[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>] ... 使用给定的信息添加新的设备到ring上; add操作不会分配partitions到新的设备上,只有运行了'rebalance'命令后,才会进行分区的分配; 因此,这种机制可以允许我们一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配; 使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原; 这里具体是把转化为字典格式的builder写入到argv[1]指定文件中; """ if len(argv) < 5 or len(argv) % 2 != 1: print Commands.add.__doc__.strip() exit(EXIT_ERROR) # itertools.islice(iterable, start, stop[, step]) # islice('ABCDEFG', 2) --> A B # islice('ABCDEFG', 2, 4) --> C D # islice('ABCDEFG', 2, None) --> C D E F G # islice('ABCDEFG', 0, None, 2) --> A C E G # 从命令行获取匹配的devstr和weightstr集合; devs_and_weights = izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2)) for devstr, weightstr in devs_and_weights: region = 1 rest = devstr if devstr.startswith('r'): i = 1 while i < len(devstr) and devstr[i].isdigit(): i += 1 region = int(devstr[1:i]) rest = devstr[i:] else: stderr.write("WARNING: No region specified for %s. " "Defaulting to region 1.\n" % devstr) if not rest.startswith('z'): print 'Invalid add value: %s' % devstr exit(EXIT_ERROR) i = 1 while i < len(rest) and rest[i].isdigit(): i += 1 zone = int(rest[1:i]) rest = rest[i:] if not rest.startswith('-'): print 'Invalid add value: %s' % devstr print "The on-disk ring builder is unchanged.\n" exit(EXIT_ERROR) i = 1 if rest[i] == '[': i += 1 while i < len(rest) and rest[i] != ']': i += 1 i += 1 ip = rest[1:i].lstrip('[').rstrip(']') rest = rest[i:] else: while i < len(rest) and rest[i] in '0123456789.': i += 1 ip = rest[1:i] rest = rest[i:] if not rest.startswith(':'): print 'Invalid add value: %s' % devstr print "The on-disk ring builder is unchanged.\n" exit(EXIT_ERROR) i = 1 while i < len(rest) and rest[i].isdigit(): i += 1 port = int(rest[1:i]) rest = rest[i:] if not rest.startswith('/'): print 'Invalid add value: %s' % devstr print "The on-disk ring builder is unchanged.\n" exit(EXIT_ERROR) i = 1 while i < len(rest) and rest[i] != '_': i += 1 device_name = rest[1:i] rest = rest[i:] meta = '' if rest.startswith('_'): meta = rest[1:] try: weight = float(weightstr) except ValueError: print 'Invalid weight value: %s' % weightstr print "The on-disk ring builder is unchanged.\n" exit(EXIT_ERROR) if weight < 0: print 'Invalid weight value (must be positive): %s' % weightstr print "The on-disk ring builder is unchanged.\n" exit(EXIT_ERROR) for dev in builder.devs: if dev is None: continue if dev['ip'] == ip and dev['port'] == port and \ dev['device'] == device_name: print 'Device %d already uses %s:%d/%s.' % \ (dev['id'], dev['ip'], dev['port'], dev['device']) print "The on-disk ring builder is unchanged.\n" exit(EXIT_ERROR) # 增加一个device到ring; # 这个方法不会马上执行ring的重新平衡操作,因为我们可能需要在重新平衡操作之前进行多次改变; # 这样做也是为了提高效率的; builder.add_dev({'region': region, 'zone': zone, 'ip': ip, 'port': port, 'device': device_name, 'weight': weight, 'meta': meta}) new_dev = builder.search_devs('r%dz%d-%s:%s/%s' % (region, zone, ip, port, device_name))[0]['id'] if ':' in ip: print( 'Device r%dz%d-[%s]:%s/%s_"%s" with %s weight got id %s' % (region, zone, ip, port, device_name, meta, weight, new_dev)) else: print('Device r%dz%d-%s:%s/%s_"%s" with %s weight got id %s' % (region, zone, ip, port, device_name, meta, weight, new_dev)) # 使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原; # 这里具体是把转化为字典格式的builder写入到argv[1]指定文件中; pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2) exit(EXIT_SUCCESS)
可以知道命令行格式:
swift-ring-builder <builder_file> add
[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
[[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>]
这个方法主要做了以下几件事:
(1)验证命令行的正确性;
(2)从命令行获取匹配的devstr和weightstr集合;
调试示例:
argv = ['/usr/bin/swift-ring-builder', 'account.builder', 'add', 'z1-127.0.0.1:6012/sda3', '1']
izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))) = ('z1-127.0.0.1:6012/sda3', '1')
(3)遍历每一对匹配的devstr和weightstr,从devstr和weightstr中获取region、zone、ip、port、device_name、weight和meta的值。调用add_dev方法实现增加这个device信息到ring,这个方法不会马上执行ring的重新平衡操作,因为可能需要在重新平衡操作之前进行多次改变(正如这里是个遍历循环操作,如果命令行中一次增加多个设备信息,就需要执行多次add_dev方法)。这样做也是为了提高效率的。
(4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv[1]指定文件中。
5.来看方法set_weight:
def set_weight(): """ swift-ring-builder <builder_file> set_weight <search-value> <weight> [<search-value> <weight] ... """ if len(argv) < 5 or len(argv) % 2 != 1: print Commands.set_weight.__doc__.strip() print print builder.search_devs.__doc__.strip() exit(EXIT_ERROR) devs_and_weights = izip(islice(argv, 3, len(argv), 2), islice(argv, 4, len(argv), 2)) for devstr, weightstr in devs_and_weights: devs = builder.search_devs(devstr) weight = float(weightstr) if not devs: print("Search value \"%s\" matched 0 devices.\n" "The on-disk ring builder is unchanged.\n" % devstr) exit(EXIT_ERROR) if len(devs) > 1: print 'Matched more than one device:' for dev in devs: print ' d%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \ '"%(meta)s"' % dev if raw_input('Are you sure you want to update the weight for ' 'these %s devices? (y/N) ' % len(devs)) != 'y': print 'Aborting device modifications' exit(EXIT_ERROR) for dev in devs: # 设置device的weight值; # 这个方法不是仅仅直接在device字典中变更weight值; # 还有builder将会需要重新设置一些内部状态来反应weight值的改变; builder.set_dev_weight(dev['id'], weight) print 'd%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \ '"%(meta)s" weight set to %(weight)s' % dev pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2) exit(EXIT_SUCCESS)这个方法实现了重新设置设备的weight。set_weight操作后,设备上的partition不会重新分配,只有运行了'rebalance'命令后才会进行分区的分配。因此,这种机制可以允许你一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配。
这个方法做了以下几件事:
(1)验证命令行正确性;
(2)从命令行获取匹配的devstr和weightstr集合;
(3)遍历每一对匹配的devstr和weightstr。调用search_devs方法根据devstr查询得到对应的devs,也就是要修改weight值的devs。遍历得到的devs,调用set_dev_weight方法为每一个dev重新设置weight值为float(weightstr)。所以每一组devs的weight的值都是相同的。
(4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv[1]指定文件中。
这里再来看看方法set_dev_weight,方法比较容易理解,具体解析如注释所示;
def set_dev_weight(self, dev_id, weight): """ 设置device的weight值; 这个方法不是仅仅直接在device字典中变更weight值; 还有builder将会需要重新设置一些内部状态来反应weight值的改变; """ self.devs[dev_id]['weight'] = weight # _set_parts_wanted:方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目; self._set_parts_wanted() # 设置devs_changed为TRUE,说明已经改变; self.devs_changed = True self.version += 1
这里再来看看方法_set_parts_wanted,这个方法会在很多地方被调用,这里先来解析一下,具体解析如注释所示:
def _set_parts_wanted(self): """ 方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目; 计算方法是: 1.首先计算每个partition的weight,即将partition数目乘以副本数得到总的partition数目; 然后除以现有dev的weight总和,得到每个partition的权重; 实际上得到的是单位权重对应的partition数目; 2.dev根据上述结构计算自己应该获取的partition数目; 计算方法:dev weight * weigh_of_one_part – 已经分配的partition数目; 具体解释就是单位权重对应的partition数目乘以权重值,得到权重dev['weight']对应的总的partition数目, 然后减去已经分配的partition数目,就得到了还要分配的partition数目; 由此可见,每次add设备操作都会引起partition分配的变化,但是真正的partition搬迁操作在rebalance执行时; """ # weight_of_one_part:从所有设备的总权重(weight)中,返回计算出来的每一个分区的权重(weight); # 计算方法就是将partition数目乘以副本数得到总的partition数目,然后除以现有dev的weight总和,得到每个partition的权重; # 也就是说具有同样副本数的分区具有同样的权重值(weight); weight_of_one_part = self.weight_of_one_part() for dev in self._iter_devs(): if not dev['weight']: dev['parts_wanted'] = -self.parts * self.replicas else: dev['parts_wanted'] = int(weight_of_one_part * dev['weight']) - dev['parts']此篇博文解析到这里吧,下一篇博文将会继续解析swift-ring-builder文件。
博文中不免有不正确的地方,欢迎朋友们不吝批评指正,谢谢大家了!