对于中变量与类的使用感想

------------恢复内容开始------------

在编写强化学习算法时,逻辑思想是:将exp信息放在一个字典里;exp成功失败的计数放在一个字典里,里面套列表;最后将exp信息放在一个根据exp中lp值进行排序的列表中。

Exp_info = {‘name’:xxx,

‘lp’:xxx

}

Exp_dic = {xxx:[1,0],www:[0,1]}

Order_list = [Exp_info,Exp_info,......]

然后复现了当时的代码:


"""
import numpy as np

test_dict = {
    'a': True,
    'b': False,
    'c': True,
    'd': False,
    'e': False
}
hp = 0
old_lp = 0
p = 0.005
exp_tool_info = {'name': '',
                 'lp': 0}
dict_exp = {}
ORDER_dict_list = []
for i in range(3):
    for k, v in test_dict.items():

        exp_tool_info['name'] = k
        dict_exp.setdefault(k, [])

        if v:
            dict_exp[k].append(1)
        else:
            dict_exp[k].append(0)
        hp = np.mean(dict_exp[k])
        old_lp = exp_tool_info['lp']
        print(old_lp)
        exp_tool_info['lp'] = 0.095 * old_lp + 0.005 * hp
        temp_list = [i['name'] for i in ORDER_dict_list]
        print(temp_list)
        if exp_tool_info['name'] not in temp_list:
            ORDER_dict_list.append(exp_tool_info)
            print('========', ORDER_dict_list)
        ORDER_dict_list.sort(key=lambda x: x['lp'], reverse=True)  # 降序
        print('++++++++++++++')

print(ORDER_dict_list)

结果:

 

0

[]

======== [{'name': 'a', 'lp': 0.005}]

++++++++++++++

0.005

['b']

++++++++++++++

0.000475

['c']

++++++++++++++

0.005045125

['d']

++++++++++++++

0.000479286875

['e']

++++++++++++++

4.5532253125e-05

['a']

++++++++++++++

0.005004325564046875

['b']

++++++++++++++

0.0004754109285844531

['c']

++++++++++++++

0.005045164038215523

['d']

++++++++++++++

0.00047929058363047473

['e']

++++++++++++++

4.55326054448951e-05

['a']

++++++++++++++

0.005004325597517265

['b']

++++++++++++++

0.0004754109317641402

['c']

++++++++++++++

0.005045164038517593

['d']

++++++++++++++

0.00047929058365917134

['e']

++++++++++++++

[{'name': 'e', 'lp': 4.5532605447621275e-05}]

{'a': [1, 1, 1], 'b': [0, 0, 0], 'c': [1, 1, 1], 'd': [0, 0, 0], 'e': [0, 0, 0]}

 

根据结果发现,每个exp的lp都在增加,而且Order_list只有一个值。本想着每次更新name,也给它更新lp,这样就可以得到关于Exp_info的Order_list.

遇到了一个出乎意料意料的结果,每次的exp的概率都有在变化,但第一次失败的竟然也有概率竟然不是0,这很困惑。

经过分析,发现,每次的exp_info地址没变,就是lp都是获取上一个的,所以才每次都变,而不是应该开始都是0,且造成name没法筛出来。

解决方案:建立一个exp类,将exp对象的lp与name绑定到一起。这样就好了。

 

class Exp(object):

 

    def __init__(self, name: str):

        self.name = name

        self.lp = 0

        self.exp_info = {'name': self.name,

                         'lp': self.lp,

                         'exp_success_fail_list': []

                         }

 

    def set_lp(self, lp):

        self.lp = lp

        self.exp_info['lp'] = self.lp

 

    def get_exp_tool_info(self):

        return self.exp_info

 

另一个攻击类中的方法:部分代码

def dic_lst(self, Exp):

    if len(self.EXP_ORDER):
        for i in self.EXP_ORDER:
            if i.exp_info['name'] == Exp.exp_info['name']:
                Exp = i
                # print('2222')
        else:
            self.EXP_ORDER.append(Exp)
            # print('111')
    else:
        self.EXP_ORDER.append(Exp)
        print(''''就走一次''')

    self.EXP_ORDER = list(set(self.EXP_ORDER))
    print('&&&&&&&&&&', self.EXP_ORDER)
    return Exp

 

for i, exp in enumerate(exps):
    print('++++++++++++exp++++++++++++', exp)
    # exp_tool_info['name'] = exp
    exp_obj = Exp(exp)
    exp_obj = self.dic_lst(exp_obj)

    exp_obj.exp_info['exp_success_fail_list'].append(1)
else:
    # dict_exp[exp].append(0)
    exp_obj.exp_info['exp_success_fail_list'].append(0)

if len(exp_obj.exp_info['exp_success_fail_list']) > 10:
    exp_obj.exp_info['exp_success_fail_list'].pop(0)

# hp : 历史概率
# hp = np.mean(dict_exp[exp])
hp = np.mean(exp_obj.exp_info['exp_success_fail_list'])

old_lp = exp_obj.exp_info['lp']
print('-----------old_lp-------------', old_lp)
# old_lp = old_exp['lp']
exp_obj.set_lp((1 - p) * old_lp + p * hp)
print('-----------------lp---------------', exp_obj.get_exp_tool_info()['lp'])
# if EXP_ORDER

 

此分享关键是思想。

------------恢复内容结束------------

你可能感兴趣的:(对于中变量与类的使用感想)