------------恢复内容开始------------
在编写强化学习算法时,逻辑思想是:将exp信息放在一个字典里;exp成功失败的计数放在一个字典里,里面套列表;最后将exp信息放在一个根据exp中lp值进行排序的列表中。
Exp_info = {‘name’:xxx,
‘lp’:xxx
}
Exp_dic = {xxx:[1,0],www:[0,1]}
Order_list = [Exp_info,Exp_info,......]
然后复现了当时的代码:
"""
import numpy as np
test_dict = {
'a': True,
'b': False,
'c': True,
'd': False,
'e': False
}
hp = 0
old_lp = 0
p = 0.005
exp_tool_info = {'name': '',
'lp': 0}
dict_exp = {}
ORDER_dict_list = []
for i in range(3):
for k, v in test_dict.items():
exp_tool_info['name'] = k
dict_exp.setdefault(k, [])
if v:
dict_exp[k].append(1)
else:
dict_exp[k].append(0)
hp = np.mean(dict_exp[k])
old_lp = exp_tool_info['lp']
print(old_lp)
exp_tool_info['lp'] = 0.095 * old_lp + 0.005 * hp
temp_list = [i['name'] for i in ORDER_dict_list]
print(temp_list)
if exp_tool_info['name'] not in temp_list:
ORDER_dict_list.append(exp_tool_info)
print('========', ORDER_dict_list)
ORDER_dict_list.sort(key=lambda x: x['lp'], reverse=True) # 降序
print('++++++++++++++')
print(ORDER_dict_list)
结果:
0
[]
======== [{'name': 'a', 'lp': 0.005}]
++++++++++++++
0.005
['b']
++++++++++++++
0.000475
['c']
++++++++++++++
0.005045125
['d']
++++++++++++++
0.000479286875
['e']
++++++++++++++
4.5532253125e-05
['a']
++++++++++++++
0.005004325564046875
['b']
++++++++++++++
0.0004754109285844531
['c']
++++++++++++++
0.005045164038215523
['d']
++++++++++++++
0.00047929058363047473
['e']
++++++++++++++
4.55326054448951e-05
['a']
++++++++++++++
0.005004325597517265
['b']
++++++++++++++
0.0004754109317641402
['c']
++++++++++++++
0.005045164038517593
['d']
++++++++++++++
0.00047929058365917134
['e']
++++++++++++++
[{'name': 'e', 'lp': 4.5532605447621275e-05}]
{'a': [1, 1, 1], 'b': [0, 0, 0], 'c': [1, 1, 1], 'd': [0, 0, 0], 'e': [0, 0, 0]}
根据结果发现,每个exp的lp都在增加,而且Order_list只有一个值。本想着每次更新name,也给它更新lp,这样就可以得到关于Exp_info的Order_list.
遇到了一个出乎意料意料的结果,每次的exp的概率都有在变化,但第一次失败的竟然也有概率竟然不是0,这很困惑。
经过分析,发现,每次的exp_info地址没变,就是lp都是获取上一个的,所以才每次都变,而不是应该开始都是0,且造成name没法筛出来。
解决方案:建立一个exp类,将exp对象的lp与name绑定到一起。这样就好了。
class Exp(object):
def __init__(self, name: str):
self.name = name
self.lp = 0
self.exp_info = {'name': self.name,
'lp': self.lp,
'exp_success_fail_list': []
}
def set_lp(self, lp):
self.lp = lp
self.exp_info['lp'] = self.lp
def get_exp_tool_info(self):
return self.exp_info
另一个攻击类中的方法:部分代码
def dic_lst(self, Exp):
if len(self.EXP_ORDER):
for i in self.EXP_ORDER:
if i.exp_info['name'] == Exp.exp_info['name']:
Exp = i
# print('2222')
else:
self.EXP_ORDER.append(Exp)
# print('111')
else:
self.EXP_ORDER.append(Exp)
print(''''就走一次''')
self.EXP_ORDER = list(set(self.EXP_ORDER))
print('&&&&&&&&&&', self.EXP_ORDER)
return Exp
for i, exp in enumerate(exps):
print('++++++++++++exp++++++++++++', exp)
# exp_tool_info['name'] = exp
exp_obj = Exp(exp)
exp_obj = self.dic_lst(exp_obj)
exp_obj.exp_info['exp_success_fail_list'].append(1)
else:
# dict_exp[exp].append(0)
exp_obj.exp_info['exp_success_fail_list'].append(0)
if len(exp_obj.exp_info['exp_success_fail_list']) > 10:
exp_obj.exp_info['exp_success_fail_list'].pop(0)
# hp : 历å²æ¦‚率
# hp = np.mean(dict_exp[exp])
hp = np.mean(exp_obj.exp_info['exp_success_fail_list'])
old_lp = exp_obj.exp_info['lp']
print('-----------old_lp-------------', old_lp)
# old_lp = old_exp['lp']
exp_obj.set_lp((1 - p) * old_lp + p * hp)
print('-----------------lp---------------', exp_obj.get_exp_tool_info()['lp'])
# if EXP_ORDER
此分享关键是思想。
------------恢复内容结束------------