汀、

【二】MADDPG多智能体算法实现(parl)【追逐游戏复现】

【一】MADDPG-单智能体|多智能体总结（理论、算法）

【二】MADDPG多智能体深度强化学习算法算法实现(parl)--【追逐游戏复现】

【一】-环境配置+python入门教学

【二】-Parl基础命令

【三】-Notebook、&pdb、ipdb 调试

【四】-强化学习入门简介

【五】-Sarsa&Qlearing详细讲解

【六】-DQN

【七】-Policy Gradient

【八】-DDPG

【九】-四轴飞行器仿真

飞桨PARL_2.0&1.8.5（遇到bug调试修正）

1.论文全称：Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

论文原文：https://download.csdn.net/download/sinat_39620217/16203960

论文翻译：https://blog.csdn.net/qiusuoxiaozi/article/details/79066612

具体原理见：【一】MADDPG-单智能体|多智能体总结（理论、算法）

1.1 OpenAI 的捉迷藏环境

很有意思的OpenAI的捉迷藏环境，主要讲的是两队开心的小朋友agents在玩捉迷藏游戏中经过训练逐渐学到的各种策略:

视频链接：https://www.bilibili.com/video/bv1ZA411N7jg 大家可以看看效果挺有趣的

这个环境是基于mujoco的, mujoco是付费的,这里有一个简化版的类似捉迷藏的环境,也是OpenAI的.

1.2 OpenAI的小球版“追逐游戏”环境

代码源：https://gitee.com/dingding962285595/gitee_work_python

里面一共有9个多智能体环境,

simple、simple_adversary、simple_crypto、simple_push、simple_reference、simple_speaker_listener、simple_spread、simple_tag、simple_world_comm

这里以simple_world_comm这个环境为例:

环境中有6个智能体,其中两个绿色小球速度快,他们要去蓝色小球(水源)那里获得reward;而另外四个红色小球速度较慢,他们要追逐绿色小球以此来获得reward。

剩下的两个绿色大球是森林,绿色小球进入森林时,红色小球就无法获取绿色小球的位置;
黑色小球是障碍物,小球都无法通过;
两个蓝色小球是水源,绿色小球可以通过靠近水源的方式获取reward。

这个环境中,只有智能体可以移动,每个episode结束后,环境会随机改变。

这是一个合作与竞争的环境,绿色小球和红色小球都要学会和队友合作,于此同时,绿色小球和红色小球之间存在竞争的关系。

下面给出官网每个文件解释我就不一一翻译了。

1.2.1 代码架构

make_env.py: contains code for importing a multiagent environment as an OpenAI Gym-like object.

./multiagent/environment.py: contains code for environment simulation (interaction physics, _step() function, etc.)

./multiagent/core.py: contains classes for various objects (Entities, Landmarks, Agents, etc.) that are used throughout the code.

./multiagent/rendering.py: used for displaying agent behaviors on the screen.

./multiagent/policy.py: contains code for interactive policy based on keyboard input.

./multiagent/scenario.py: contains base scenario object that is extended for all scenarios.

./multiagent/scenarios/: folder where various scenarios/ environments are stored. scenario code consists of several functions:

make_world(): creates all of the entities that inhabit the world (landmarks, agents, etc.), assigns their capabilities (whether they can communicate, or move, or both). called once at the beginning of each training session

reset_world(): resets the world by assigning properties (position, color, etc.) to all entities in the world called before every episode (including after make_world() before the first episode)

reward(): defines the reward function for a given agent

observation(): defines the observation space of a given agent(optional) benchmark_data(): provides diagnostic data for policies trained on the environment (e.g. evaluation metrics)

1.2.2 环境列表

Env name in code (name in paper)	Communication?	Competitive?	Notes
`simple.py`	N	N	Single agent sees landmark position, rewarded based on how close it gets to landmark. Not a multiagent environment -- used for debugging policies.
`simple_adversary.py` (Physical deception)	N	Y	1 adversary (red), N good agents (green), N landmarks (usually N=2). All agents observe position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents rewarded based on how close one of them is to the target landmark, but negatively rewarded if the adversary is close to target landmark. Adversary is rewarded based on how close it is to the target, but it doesn’t know which landmark is the target landmark. So good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary.
`simple_crypto.py` (Covert communication)	Y	Y	Two good agents (alice and bob), one adversary (eve). Alice must sent a private message to bob over a public channel. Alice and bob are rewarded based on how well bob reconstructs the message, but negatively rewarded if eve can reconstruct the message. Alice and bob have a private key (randomly generated at beginning of each episode), which they must learn to use to encrypt the message.
`simple_push.py` (Keep-away)	N	Y	1 agent, 1 adversary, 1 landmark. Agent is rewarded based on distance to landmark. Adversary is rewarded if it is close to the landmark, and if the agent is far from the landmark. So the adversary learns to push agent away from the landmark.
`simple_reference.py`	Y	N	2 agents, 3 landmarks of different colors. Each agent wants to get to their target landmark, which is known only by other agent. Reward is collective. So agents have to learn to communicate the goal of the other agent, and navigate to their landmark. This is the same as the simple_speaker_listener scenario where both agents are simultaneous speakers and listeners.
`simple_speaker_listener.py` (Cooperative communication)	Y	N	Same as simple_reference, except one agent is the ‘speaker’ (gray) that does not move (observes goal of other agent), and other agent is the listener (cannot speak, but must navigate to correct landmark).
`simple_spread.py` (Cooperative navigation)	N	N	N agents, N landmarks. Agents are rewarded based on how far any agent is from each landmark. Agents are penalized if they collide with other agents. So, agents have to learn to cover all the landmarks while avoiding collisions.
`simple_tag.py` (Predator-prey)	N	Y	Predator-prey environment. Good agents (green) are faster and want to avoid being hit by adversaries (red). Adversaries are slower and want to hit good agents. Obstacles (large black circles) block the way.
`simple_world_comm.py`	Y	Y	Environment seen in the video accompanying the paper. Same as simple_tag, except (1) there is food (small blue balls) that the good agents are rewarded for being near, (2) we now have ‘forests’ that hide agents inside from being seen from outside; (3) there is a ‘leader adversary” that can see the agents at all times, and can communicate with the other adversaries to help coordinate the chase.

1.3 MADDPG码源

环境代码源：https://github.com/dingidng/multiagent-particle-envs

所有程序码源：https://gitee.com/dingding962285595/myenv/tree/master/gym/multiagent 上述两个链接都有完整程序！

1.3.1 智能体部分:（agent.py）

每个Actor都单独地与环境交互,即采样的过程是独立的
每个Actor都有一个可以观测全局的Critir,从而指导Actor做动作

import numpy as np
import parl
from parl import layers
from paddle import fluid
from parl.utils import ReplayMemory
from parl.utils import machine_info, get_gpu_count


class MAAgent(parl.Agent):
    def __init__(self,
                algorithm,
                agent_index=None,
                obs_dim_n=None,
                act_dim_n=None,
                batch_size=None,
                speedup=False):
        assert isinstance(agent_index, int)
        assert isinstance(obs_dim_n, list)
        assert isinstance(act_dim_n, list)
        assert isinstance(batch_size, int)
        assert isinstance(speedup, bool)
        self.agent_index = agent_index
        self.obs_dim_n = obs_dim_n
        self.act_dim_n = act_dim_n
        self.batch_size = batch_size
        self.speedup = speedup
        self.n = len(act_dim_n)

        self.memory_size = int(1e6)
        self.min_memory_size = batch_size * 25  # batch_size * args.max_episode_len
        self.rpm = ReplayMemory(
            max_size=self.memory_size,
            obs_dim=self.obs_dim_n[agent_index],
            act_dim=self.act_dim_n[agent_index])
        self.global_train_step = 0

        if machine_info.is_gpu_available():
            assert get_gpu_count() == 1, 'Only support training in single GPU,\
                    Please set environment variable: `export CUDA_VISIBLE_DEVICES=[GPU_ID_TO_USE]` .'

        super(MAAgent, self).__init__(algorithm)

        # Attention: In the beginning, sync target model totally.
        self.alg.sync_target(decay=0)

    def build_program(self):
        self.pred_program = fluid.Program()
        self.learn_program = fluid.Program()
        self.next_q_program = fluid.Program()
        self.next_a_program = fluid.Program()

        with fluid.program_guard(self.pred_program):
            obs = layers.data(
                name='obs',
                shape=[self.obs_dim_n[self.agent_index]],
                dtype='float32')
            self.pred_act = self.alg.predict(obs)

        with fluid.program_guard(self.learn_program):
            obs_n = [
                layers.data(
                    name='obs' + str(i),
                    shape=[self.obs_dim_n[i]],
                    dtype='float32') for i in range(self.n)
            ]
            act_n = [
                layers.data(
                    name='act' + str(i),
                    shape=[self.act_dim_n[i]],
                    dtype='float32') for i in range(self.n)
            ]
            target_q = layers.data(name='target_q', shape=[], dtype='float32')
            self.critic_cost = self.alg.learn(obs_n, act_n, target_q)

        with fluid.program_guard(self.next_q_program):
            obs_n = [
                layers.data(
                    name='obs' + str(i),
                    shape=[self.obs_dim_n[i]],
                    dtype='float32') for i in range(self.n)
            ]
            act_n = [
                layers.data(
                    name='act' + str(i),
                    shape=[self.act_dim_n[i]],
                    dtype='float32') for i in range(self.n)
            ]
            self.next_Q = self.alg.Q_next(obs_n, act_n)

        with fluid.program_guard(self.next_a_program):
            obs = layers.data(
                name='obs',
                shape=[self.obs_dim_n[self.agent_index]],
                dtype='float32')
            self.next_action = self.alg.predict_next(obs)

        if self.speedup:
            self.pred_program = parl.compile(self.pred_program)
            self.learn_program = parl.compile(self.learn_program,
                                            self.critic_cost)
            self.next_q_program = parl.compile(self.next_q_program)
            self.next_a_program = parl.compile(self.next_a_program)

    def predict(self, obs):
        obs = np.expand_dims(obs, axis=0)
        obs = obs.astype('float32')
        act = self.fluid_executor.run(
            self.pred_program, feed={'obs': obs},
            fetch_list=[self.pred_act])[0]
        return act[0]

    def learn(self, agents):
        self.global_train_step += 1

        # only update parameter every 100 steps
        if self.global_train_step % 100 != 0:
            return 0.0

        if self.rpm.size() <= self.min_memory_size:
            return 0.0

        batch_obs_n = []
        batch_act_n = []
        batch_obs_new_n = []

        rpm_sample_index = self.rpm.make_index(self.batch_size)
        for i in range(self.n):
            batch_obs, batch_act, _, batch_obs_new, _ \
                = agents[i].rpm.sample_batch_by_index(rpm_sample_index)
            batch_obs_n.append(batch_obs)
            batch_act_n.append(batch_act)
            batch_obs_new_n.append(batch_obs_new)
        _, _, batch_rew, _, batch_isOver \
                = self.rpm.sample_batch_by_index(rpm_sample_index)

        # compute target q
        target_q = 0.0
        target_act_next_n = []
        for i in range(self.n):
            feed = {'obs': batch_obs_new_n[i]}
            target_act_next = agents[i].fluid_executor.run(
                agents[i].next_a_program,
                feed=feed,
                fetch_list=[agents[i].next_action])[0]
            target_act_next_n.append(target_act_next)
        feed_obs = {'obs' + str(i): batch_obs_new_n[i] for i in range(self.n)}
        feed_act = {
            'act' + str(i): target_act_next_n[i]
            for i in range(self.n)
        }
        feed = feed_obs.copy()
        feed.update(feed_act)  # merge two dict
        target_q_next = self.fluid_executor.run(
            self.next_q_program, feed=feed, fetch_list=[self.next_Q])[0]
        target_q += (
            batch_rew + self.alg.gamma * (1.0 - batch_isOver) * target_q_next)

        feed_obs = {'obs' + str(i): batch_obs_n[i] for i in range(self.n)}
        feed_act = {'act' + str(i): batch_act_n[i] for i in range(self.n)}
        target_q = target_q.astype('float32')
        feed = feed_obs.copy()
        feed.update(feed_act)
        feed['target_q'] = target_q
        critic_cost = self.fluid_executor.run(
            self.learn_program, feed=feed, fetch_list=[self.critic_cost])[0]

        self.alg.sync_target()
        return critic_cost

    def add_experience(self, obs, act, reward, next_obs, terminal):
        self.rpm.append(obs, act, reward, next_obs, terminal)

1.3.2 网络部分（model.py）

Actor和Critir相当于神经网络,采用的都是三层全连接层

这一部分可以不做修改,有能力的同学可以尝试对这一部分进行调优

import paddle.fluid as fluid
import parl
from parl import layers


class MAModel(parl.Model):
    def __init__(self, act_dim):
        self.actor_model = ActorModel(act_dim)
        self.critic_model = CriticModel()

    def policy(self, obs):
        return self.actor_model.policy(obs)

    def value(self, obs, act):
        return self.critic_model.value(obs, act)

    def get_actor_params(self):
        return self.actor_model.parameters()

    def get_critic_params(self):
        return self.critic_model.parameters()


class ActorModel(parl.Model):
    def __init__(self, act_dim):
        hid1_size = 64
        hid2_size = 64

        self.fc1 = layers.fc(
            size=hid1_size,
            act='relu',
            param_attr=fluid.initializer.Normal(loc=0.0, scale=0.1))
        self.fc2 = layers.fc(
            size=hid2_size,
            act='relu',
            param_attr=fluid.initializer.Normal(loc=0.0, scale=0.1))
        self.fc3 = layers.fc(
            size=act_dim,
            act=None,
            param_attr=fluid.initializer.Normal(loc=0.0, scale=0.1))

    def policy(self, obs):
        hid1 = self.fc1(obs)
        hid2 = self.fc2(hid1)
        means = self.fc3(hid2)
        means = means
        return means


class CriticModel(parl.Model):
    def __init__(self):
        hid1_size = 64
        hid2_size = 64

        self.fc1 = layers.fc(
            size=hid1_size,
            act='relu',
            param_attr=fluid.initializer.Normal(loc=0.0, scale=0.1))
        self.fc2 = layers.fc(
            size=hid2_size,
            act='relu',
            param_attr=fluid.initializer.Normal(loc=0.0, scale=0.1))
        self.fc3 = layers.fc(
            size=1,
            act=None,
            param_attr=fluid.initializer.Normal(loc=0.0, scale=0.1))

    def value(self, obs_n, act_n):
        inputs = layers.concat(obs_n + act_n, axis=1)
        hid1 = self.fc1(inputs)
        hid2 = self.fc2(hid1)
        Q = self.fc3(hid2)
        Q = layers.squeeze(Q, axes=[1])
        return Q

1.3.3 训练代码:（train.py）

#import sys
#print(sys.path)
#sys.path.append("H:/Anaconda3-2020.02/envs/parl/Lib/site-packages/parl/env")
#sys.path.append("H:\Anaconda3-202002\envs\parl\Lib\site-packages\gym\envs\multiagent")


import os
import time
import argparse
import numpy as np
from simple_model import MAModel
from simple_agent import MAAgent
import parl
from gym.envs.multiagent.multiagent_simple_env import MAenv
from parl.utils import logger, summary


def run_episode(env, agents):
    obs_n = env.reset()
    total_reward = 0
    agents_reward = [0 for _ in range(env.n)]
    steps = 0
    while True:
        steps += 1
        action_n = [agent.predict(obs) for agent, obs in zip(agents, obs_n)]
        next_obs_n, reward_n, done_n, _ = env.step(action_n)
        done = all(done_n)
        terminal = (steps >= args.max_step_per_episode)

        # store experience
        for i, agent in enumerate(agents):
            agent.add_experience(obs_n[i], action_n[i], reward_n[i],
                                next_obs_n[i], done_n[i])

        # compute reward of every agent
        obs_n = next_obs_n
        for i, reward in enumerate(reward_n):
            total_reward += reward
            agents_reward[i] += reward

        # check the end of an episode
        if done or terminal:
            break

        # show animation
        if args.show:
            time.sleep(0.1)
            env.render()

        # show model effect without training
        if args.restore and args.show:
            continue

        # learn policy
        for i, agent in enumerate(agents):
            critic_loss = agent.learn(agents)
            summary.add_scalar('critic_loss_%d' % i, critic_loss,
                               agent.global_train_step)

    return total_reward, agents_reward, steps


def train_agent():
    env = MAenv(args.env)
    logger.info('agent num: {}'.format(env.n))
    logger.info('observation_space: {}'.format(env.observation_space))
    logger.info('action_space: {}'.format(env.action_space))
    logger.info('obs_shape_n: {}'.format(env.obs_shape_n))
    logger.info('act_shape_n: {}'.format(env.act_shape_n))
    for i in range(env.n):
        logger.info('agent {} obs_low:{} obs_high:{}'.format(
            i, env.observation_space[i].low, env.observation_space[i].high))
        logger.info('agent {} act_n:{}'.format(i, env.act_shape_n[i]))
        if ('low' in dir(env.action_space[i])):
            logger.info('agent {} act_low:{} act_high:{} act_shape:{}'.format(
                i, env.action_space[i].low, env.action_space[i].high,
                env.action_space[i].shape))
            logger.info('num_discrete_space:{}'.format(
                env.action_space[i].num_discrete_space))

    from gym import spaces
    from gym.envs.multiagent.multi_discrete import MultiDiscrete
    for space in env.action_space:
        assert (isinstance(space, spaces.Discrete)
                or isinstance(space, MultiDiscrete))

    agents = []
    for i in range(env.n):
        model = MAModel(env.act_shape_n[i])
        algorithm = parl.algorithms.MADDPG(
            model,
            agent_index=i,
            act_space=env.action_space,
            gamma=args.gamma,
            tau=args.tau,
            critic_lr=args.critic_lr,
            actor_lr=args.actor_lr)
        agent = MAAgent(
            algorithm,
            agent_index=i,
            obs_dim_n=env.obs_shape_n,
            act_dim_n=env.act_shape_n,
            batch_size=args.batch_size,
            speedup=(not args.restore))
        agents.append(agent)
    total_steps = 0
    total_episodes = 0

    episode_rewards = []  # sum of rewards for all agents
    agent_rewards = [[] for _ in range(env.n)]  # individual agent reward
    final_ep_rewards = []  # sum of rewards for training curve
    final_ep_ag_rewards = []  # agent rewards for training curve

    if args.restore:
        # restore modle
        for i in range(len(agents)):
            model_file = args.model_dir + '/agent_' + str(i) + '.ckpt'
            if not os.path.exists(model_file):
                logger.info('model file {} does not exits'.format(model_file))
                raise Exception
            agents[i].restore(model_file)
            
    t_start = time.time()
    logger.info('Starting...')
    while total_episodes <= args.max_episodes:
        # run an episode
        ep_reward, ep_agent_rewards, steps = run_episode(env, agents)
        if args.show:
            print('episode {}, reward {}, steps {}'.format(
                total_episodes, ep_reward, steps))

        # Record reward
        total_steps += steps
        total_episodes += 1
        episode_rewards.append(ep_reward)
        for i in range(env.n):
            agent_rewards[i].append(ep_agent_rewards[i])

        # Keep track of final episode reward
        if total_episodes % args.stat_rate == 0:
            mean_episode_reward = np.mean(episode_rewards[-args.stat_rate:])
            final_ep_rewards.append(mean_episode_reward)
            for rew in agent_rewards:
                final_ep_ag_rewards.append(np.mean(rew[-args.stat_rate:]))
            use_time = round(time.time() - t_start, 3)
            logger.info(
                'Steps: {}, Episodes: {}, Mean episode reward: {}, Time: {}'.
                format(total_steps, total_episodes, mean_episode_reward,
                       use_time))
            t_start = time.time()
            summary.add_scalar('mean_episode_reward/episode',
                               mean_episode_reward, total_episodes)
            summary.add_scalar('mean_episode_reward/steps',
                               mean_episode_reward, total_steps)
            summary.add_scalar('use_time/1000episode', use_time,
                               total_episodes)

            # save model
            if not args.restore:
                os.makedirs(os.path.dirname(args.model_dir), exist_ok=True)
                for i in range(len(agents)):
                    model_name = '/agent_' + str(i)
                    agents[i].save(args.model_dir + model_name)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # Environment
    parser.add_argument(
        '--env',
        type=str,
        default='simple_spread',
        help='scenario of MultiAgentEnv')
    parser.add_argument(
        '--max_step_per_episode',
        type=int,
        default=50,
        help='maximum step per episode')
    parser.add_argument(
        '--max_episodes',
        type=int,
        default=25000,
        help='stop condition:number of episodes')
    parser.add_argument(
        '--stat_rate',
        type=int,
        default=500,  #第1000episodes保存一下，并显示reward值。
        help='statistical interval of save model or count reward')
    # Core training parameters
    parser.add_argument(
        '--critic_lr',
        type=float,
        default=1e-3,
        help='learning rate for the critic model')
    parser.add_argument(
        '--actor_lr',
        type=float,
        default=1e-3, ##修改 default值可修改学习率
        help='learning rate of the actor model')
    parser.add_argument(
        '--gamma', type=float, default=0.95, help='discount factor')
    parser.add_argument(
        '--batch_size',
        type=int,
        default=1024,
        help='number of episodes to optimize at the same time')
    parser.add_argument('--tau', type=int, default=0.01, help='soft update')
    # auto save model, optional restore model
    parser.add_argument(
        '--show', action='store_true', default=False, help='display or not')    #TRUE表示显示渲染
    parser.add_argument(
        '--restore',
        action='store_true',
        default=False,  
        help='restore or not, must have model_dir')
    parser.add_argument(
        '--model_dir',
        type=str,
        
        default='./model',
        help='directory for saving model')

    args = parser.parse_args()

    train_agent()

1.3.4 MADDPG算法部分:

PARL里有现成的MADDPG算法,,可以调用:train.py函数中调用格式如下

from parl.algorithms import MADDPG

algorithm = parl.algorithms.MADDPG( )

MADDPG.py算法程序：

import warnings
warnings.simplefilter('default')

from parl.core.fluid import layers
from copy import deepcopy
from paddle import fluid
from parl.core.fluid.algorithm import Algorithm

__all__ = ['MADDPG']

from parl.core.fluid.policy_distribution import SoftCategoricalDistribution
from parl.core.fluid.policy_distribution import SoftMultiCategoricalDistribution


def SoftPDistribution(logits, act_space):
    """Args:
            logits: the output of policy model
            act_space: action space, must be gym.spaces.Discrete or multiagent.multi_discrete.MultiDiscrete

        Return:
            instance of SoftCategoricalDistribution or SoftMultiCategoricalDistribution
    """
    # is instance of gym.spaces.Discrete
    if (hasattr(act_space, 'n')):
        return SoftCategoricalDistribution(logits)
    # is instance of multiagent.multi_discrete.MultiDiscrete
    elif (hasattr(act_space, 'num_discrete_space')):
        return SoftMultiCategoricalDistribution(logits, act_space.low,
                                                act_space.high)
    else:
        raise AssertionError("act_space must be instance of \
            gym.spaces.Discrete or multiagent.multi_discrete.MultiDiscrete")


class MADDPG(Algorithm):
    def __init__(self,
                 model,
                 agent_index=None,
                 act_space=None,
                 gamma=None,
                 tau=None,
                 lr=None,
                 actor_lr=None,
                 critic_lr=None):
        """  MADDPG algorithm
        
        Args:
            model (parl.Model): forward network of actor and critic.
                                The function get_actor_params() of model should be implemented.
            agent_index: index of agent, in multiagent env
            act_space: action_space, gym space
            gamma (float): discounted factor for reward computation.
            tau (float): decay coefficient when updating the weights of self.target_model with self.model
            lr (float): learning rate, lr will be assigned to both critic_lr and actor_lr
            critic_lr (float): learning rate of the critic model
            actor_lr (float): learning rate of the actor model
        """

        assert isinstance(agent_index, int)
        assert isinstance(act_space, list)
        assert isinstance(gamma, float)
        assert isinstance(tau, float)
        # compatible upgrade of lr
        if lr is None:
            assert isinstance(actor_lr, float)
            assert isinstance(critic_lr, float)
        else:
            assert isinstance(lr, float)
            assert actor_lr is None, 'no need to set `actor_lr` if `lr` is not None'
            assert critic_lr is None, 'no need to set `critic_lr` if `lr` is not None'
            critic_lr = lr
            actor_lr = lr
            warnings.warn(
                "the `lr` argument of `__init__` function in `parl.Algorithms.MADDPG` is deprecated \
                    since version 1.4 and will be removed in version 2.0. \
                    Recommend to use `actor_lr` and `critic_lr`. ",
                DeprecationWarning,
                stacklevel=2)
        self.agent_index = agent_index
        self.act_space = act_space
        self.gamma = gamma
        self.tau = tau
        self.lr = lr
        self.actor_lr = actor_lr
        self.critic_lr = critic_lr

        self.model = model
        self.target_model = deepcopy(model)

    def predict(self, obs):
        """ input:  
                obs: observation, shape([B] + shape of obs_n[agent_index])
            output: 
                act: action, shape([B] + shape of act_n[agent_index])
        """
        this_policy = self.model.policy(obs)
        this_action = SoftPDistribution(
            logits=this_policy,
            act_space=self.act_space[self.agent_index]).sample()
        return this_action

    def predict_next(self, obs):
        """ input:  observation, shape([B] + shape of obs_n[agent_index])
            output: action, shape([B] + shape of act_n[agent_index])
        """
        next_policy = self.target_model.policy(obs)
        next_action = SoftPDistribution(
            logits=next_policy,
            act_space=self.act_space[self.agent_index]).sample()
        return next_action

    def Q(self, obs_n, act_n):
        """ input:  
                obs_n: all agents' observation, shape([B] + shape of obs_n)
            output: 
                act_n: all agents' action, shape([B] + shape of act_n)
        """
        return self.model.value(obs_n, act_n)

    def Q_next(self, obs_n, act_n):
        """ input:  
                obs_n: all agents' observation, shape([B] + shape of obs_n)
            output: 
                act_n: all agents' action, shape([B] + shape of act_n)
        """
        return self.target_model.value(obs_n, act_n)

    def learn(self, obs_n, act_n, target_q):
        """ update actor and critic model with MADDPG algorithm
        """
        actor_cost = self._actor_learn(obs_n, act_n)
        critic_cost = self._critic_learn(obs_n, act_n, target_q)
        return critic_cost

    def _actor_learn(self, obs_n, act_n):
        i = self.agent_index
        this_policy = self.model.policy(obs_n[i])
        sample_this_action = SoftPDistribution(
            logits=this_policy,
            act_space=self.act_space[self.agent_index]).sample()

        action_input_n = act_n + []
        action_input_n[i] = sample_this_action
        eval_q = self.Q(obs_n, action_input_n)
        act_cost = layers.reduce_mean(-1.0 * eval_q)

        act_reg = layers.reduce_mean(layers.square(this_policy))

        cost = act_cost + act_reg * 1e-3

        fluid.clip.set_gradient_clip(
            clip=fluid.clip.GradientClipByNorm(clip_norm=0.5),
            param_list=self.model.get_actor_params())

        optimizer = fluid.optimizer.AdamOptimizer(self.actor_lr)
        optimizer.minimize(cost, parameter_list=self.model.get_actor_params())
        return cost

    def _critic_learn(self, obs_n, act_n, target_q):
        pred_q = self.Q(obs_n, act_n)
        cost = layers.reduce_mean(layers.square_error_cost(pred_q, target_q))

        fluid.clip.set_gradient_clip(
            clip=fluid.clip.GradientClipByNorm(clip_norm=0.5),
            param_list=self.model.get_critic_params())

        optimizer = fluid.optimizer.AdamOptimizer(self.critic_lr)
        optimizer.minimize(cost, parameter_list=self.model.get_critic_params())
        return cost

    def sync_target(self, decay=None):
        if decay is None:
            decay = 1.0 - self.tau
        self.model.sync_weights_to(self.target_model, decay=decay)

至此已经把多智能体深度强化学习算法算法实现核心代码讲完了，下面开始讲述如何实现！

2.在本地实现

2.1 安装parl、gym创建环境

首先安装parl、步骤见：https://blog.csdn.net/sinat_39620217/article/details/114537725
如果还没有安装anaconda也请参考：https://blog.csdn.net/sinat_39620217/article/details/115861876
安装gym：https://blog.csdn.net/sinat_39620217/article/details/115510483

如果不会设置环境创建自己的gym游戏参考上面文章，正确放置好maddp的环境才可以跑通程序！ 不懂必看！！或者下面看我设置有问题不理解时候，回头重新看！

2.2 放置下载文件

首先确认下载下来的文件：

核实环境文件：

核实主程序运行文件：

开始放置文件位置

将环境文件放置到上述创建的虚拟环境parl【安装了飞桨】中。我的路径如下：
H:\Anaconda3-2020.02\envs\parl\Lib\site-packages\gym\envs

2.3 设置环境参数init文件修改

首先是：路径下的init文件进行修改

H:\Anaconda3-2020.02\envs\parl\Lib\site-packages\gym\envs

这里会发现可能和官网提供的环境文件也发有关，不同场景需要程序里改动调用。并没有明确的某个环境，所以写不写并没有影响！

再是：

该路径下init文件进行修改

H:\Anaconda3-2020.02\envs\parl\Lib\site-packages\gym\envs\multiagent

至此完成环境路径在gym中的声明

2.4 修改文件中导入库的路径

每个人放置路径不同和gym安装路径不同会导致很多库可能无法调用，因此需要一一修改。

如果在运行中遇到报错，请仔细看清楚报错出现在那一行！再根据我下面写的进行修改：

train文件中：

import os
import time
import argparse
import numpy as np
from simple_model import MAModel
from simple_agent import MAAgent
import parl
from gym.envs.multiagent.multiagent_simple_env import MAenv
from parl.utils import logger, summary



from gym import spaces
from gym.envs.multiagent.multi_discrete import MultiDiscrete

gym.envs.multiagent.这个部分就是修改过的部分，放置在gym路径下！

这里from gym.envs.multiagent.multiagent_simple_env import MAenv需要注意

这个文件是在：

H:\Anaconda3-2020.02\envs\parl\Lib\site-packages\parl\env

parl自己环境下多智能体简单环境

将该文件进行复制，放到我们放置的gym路径下：

然后把路径修改如下：即可

environment文件

import gym
from gym import spaces
from gym.envs.registration import EnvSpec
import numpy as np
from gym.envs.multiagent.multi_discrete import MultiDiscrete

在multi_discrete 文件中

import numpy as np

import gym
from gym.spaces import prng

prng在gym在0.11后的版本删除prng的内容，因此要安装之前的版本。如果报错了请参考：https://blog.csdn.net/sinat_39620217/article/details/115818626进行新修改！

ModuleNotFoundError的报错是指：在.py文件的搜索路径下，找不到指定的Module。（这种问题分两种情况，一种是你压根就没安装这个包，一种是你安装的路径不对）

也可以添加路径 import sys ；sys.append 路径也行，我传到码云程序都有写的

错误如下：

ModuleNotFoundError: No module named 'multiagent'

from parl.env.multiagent_simple_env import MAenv

再对下面渲染环境中需要调用rendering库进行修改：

from gym.envs.multiagent import rendering

2.5 scenarios文件夹下环境库导入修改

所有的文件都修改如下：simple、simple_adversary、simple_crypto、simple_push、simple_reference、simple_speaker_listener、simple_spread、simple_tag、simple_world_comm

import numpy as np
from gym.envs.multiagent.core import World, Agent, Landmark
from gym.envs.multiagent.scenario import BaseScenario

至此已经全部修改完毕

3.主要调整参数

根据自己需求修改default
下面是我自己根据官网提供的参数进行了修改

parser.add_argument(
        '--env',
        type=str,
        default='simple_world_comm',  #修改环境场景
        help='scenario of MultiAgentEnv')
    parser.add_argument(
        '--max_step_per_episode',
        type=int,
        default=50,  #每个episode中最大step
        help='maximum step per episode')
    parser.add_argument(
        '--max_episodes',
        type=int,
        default=50000, #一共训练多少step
        help='stop condition:number of episodes')
    parser.add_argument(
        '--stat_rate',
        type=int,
        default=1000,  #第1000episodes保存一下，并显示reward值。
        help='statistical interval of save model or count reward')
    # Core training parameters
    parser.add_argument(
        '--critic_lr',
        type=float,
        default=1e-3,
        help='learning rate for the critic model')
    parser.add_argument(
        '--actor_lr',
        type=float,
        default=1e-3, ##修改 default值可修改学习率
        help='learning rate of the actor model')
    parser.add_argument(
        '--gamma', type=float, default=0.95, help='discount factor')
    parser.add_argument(
        '--batch_size',
        type=int,
        default=1024,
        help='number of episodes to optimize at the same time')
    parser.add_argument('--tau', type=int, default=0.01, help='soft update')
    # auto save model, optional restore model
    parser.add_argument(
        '--show', action='store_true', default=True, help='display or not')    #TRUE表示显示渲染
    parser.add_argument(
        '--restore',
        action='store_true',
        default=False,  
        help='restore or not, must have model_dir')
    parser.add_argument(
        '--model_dir',
        type=str,
        default='./model',
        help='directory for saving model')

如果在选择环境运行出错提示reshape格式不对把train文件中138行保存文件这里修改为下面：（可能.ckpt格式遗漏导致）

   if args.restore:
        # restore modle
        for i in range(len(agents)):
            model_file = args.model_dir + '/agent_' + str(i) + '.ckpt'
            if not os.path.exists(model_file):
                logger.info('model file {} does not exits'.format(model_file))
                raise Exception
            agents[i].restore(model_file)

4.运行展示

4.1 simple_speaker_listener结果：

结果如下：

[33m[04-23 14:09:53 MainThread @tensorboard.py:34][0m [5m[33mWRN[0m [tensorboard] logdir is None, will save tensorboard files to train_log\train
View the data using: tensorboard --logdir=./train_log\train --host=10.22.151.209
[32m[04-23 14:10:31 MainThread @train.py:166][0m Steps: 25000, Episodes: 1000, Mean episode reward: -146.71197663766637, Time: 38.256
[32m[04-23 14:10:32 MainThread @machine_info.py:91][0m Cannot find available GPU devices, using CPU or other devices now.
[32m[04-23 14:10:32 MainThread @machine_info.py:91][0m Cannot find available GPU devices, using CPU or other devices now.
[32m[04-23 14:11:22 MainThread @train.py:166][0m Steps: 50000, Episodes: 2000, Mean episode reward: -177.59173856982906, Time: 50.769
[32m[04-23 14:12:15 MainThread @train.py:166][0m Steps: 75000, Episodes: 3000, Mean episode reward: -65.93734078140551, Time: 53.699
[32m[04-23 14:13:07 MainThread @train.py:166][0m Steps: 100000, Episodes: 4000, Mean episode reward: -60.95650945973305, Time: 51.837
[32m[04-23 14:13:58 MainThread @train.py:166][0m Steps: 125000, Episodes: 5000, Mean episode reward: -60.4786219660665, Time: 50.83
[32m[04-23 14:14:47 MainThread @train.py:166][0m Steps: 150000, Episodes: 6000, Mean episode reward: -61.97418693302028, Time: 48.797
[32m[04-23 14:15:36 MainThread @train.py:166][0m Steps: 175000, Episodes: 7000, Mean episode reward: -61.27743577282738, Time: 49.405
[32m[04-23 14:16:26 MainThread @train.py:166][0m Steps: 200000, Episodes: 8000, Mean episode reward: -55.795305675851054, Time: 49.48
[32m[04-23 14:17:15 MainThread @train.py:166][0m Steps: 225000, Episodes: 9000, Mean episode reward: -52.170408578073314, Time: 49.602
[32m[04-23 14:18:05 MainThread @train.py:166][0m Steps: 250000, Episodes: 10000, Mean episode reward: -45.48956962382595, Time: 49.977
[32m[04-23 14:18:57 MainThread @train.py:166][0m Steps: 275000, Episodes: 11000, Mean episode reward: -37.54661975584198, Time: 51.9
[32m[04-23 14:19:51 MainThread @train.py:166][0m Steps: 300000, Episodes: 12000, Mean episode reward: -35.94095515700111, Time: 53.781
[32m[04-23 14:20:45 MainThread @train.py:166][0m Steps: 325000, Episodes: 13000, Mean episode reward: -33.22250130999288, Time: 53.623
[32m[04-23 14:21:38 MainThread @train.py:166][0m Steps: 350000, Episodes: 14000, Mean episode reward: -33.88889589767084, Time: 53.842
[32m[04-23 14:22:32 MainThread @train.py:166][0m Steps: 375000, Episodes: 15000, Mean episode reward: -32.222499746838956, Time: 53.521
[32m[04-23 14:23:21 MainThread @train.py:166][0m Steps: 400000, Episodes: 16000, Mean episode reward: -32.56661045688181, Time: 49.577
[32m[04-23 14:24:11 MainThread @train.py:166][0m Steps: 425000, Episodes: 17000, Mean episode reward: -33.26917140412647, Time: 49.626
[32m[04-23 14:25:01 MainThread @train.py:166][0m Steps: 450000, Episodes: 18000, Mean episode reward: -35.43697273278178, Time: 49.528
[32m[04-23 14:25:50 MainThread @train.py:166][0m Steps: 475000, Episodes: 19000, Mean episode reward: -32.72183170780931, Time: 49.623
[32m[04-23 14:26:40 MainThread @train.py:166][0m Steps: 500000, Episodes: 20000, Mean episode reward: -29.851138059307747, Time: 49.549
[32m[04-23 14:27:30 MainThread @train.py:166][0m Steps: 525000, Episodes: 21000, Mean episode reward: -30.199245070908457, Time: 49.909
[32m[04-23 14:28:19 MainThread @train.py:166][0m Steps: 550000, Episodes: 22000, Mean episode reward: -30.753366241189703, Time: 49.638
[32m[04-23 14:29:10 MainThread @train.py:166][0m Steps: 575000, Episodes: 23000, Mean episode reward: -29.245936484505624, Time: 50.944
[32m[04-23 14:30:00 MainThread @train.py:166][0m Steps: 600000, Episodes: 24000, Mean episode reward: -29.90573991291673, Time: 49.776
[32m[04-23 14:30:50 MainThread @train.py:166][0m Steps: 625000, Episodes: 25000, Mean episode reward: -28.012067336375498, Time: 49.603
[32m[04-23 14:31:41 MainThread @train.py:166][0m Steps: 650000, Episodes: 26000, Mean episode reward: -27.606981177395067, Time: 51.432
[32m[04-23 14:32:33 MainThread @train.py:166][0m Steps: 675000, Episodes: 27000, Mean episode reward: -28.298744008978385, Time: 51.444
[32m[04-23 14:33:25 MainThread @train.py:166][0m Steps: 700000, Episodes: 28000, Mean episode reward: -28.153396104027372, Time: 52.03
[32m[04-23 14:34:17 MainThread @train.py:166][0m Steps: 725000, Episodes: 29000, Mean episode reward: -29.419025882229768, Time: 52.388
[32m[04-23 14:35:09 MainThread @train.py:166][0m Steps: 750000, Episodes: 30000, Mean episode reward: -29.029263843079026, Time: 52.416
[32m[04-23 14:36:03 MainThread @train.py:166][0m Steps: 775000, Episodes: 31000, Mean episode reward: -29.873391889162605, Time: 53.696
[32m[04-23 14:36:55 MainThread @train.py:166][0m Steps: 800000, Episodes: 32000, Mean episode reward: -29.46000530751644, Time: 51.57
[32m[04-23 14:37:49 MainThread @train.py:166][0m Steps: 825000, Episodes: 33000, Mean episode reward: -30.474405124370563, Time: 54.476
[32m[04-23 14:38:43 MainThread @train.py:166][0m Steps: 850000, Episodes: 34000, Mean episode reward: -29.484400820070196, Time: 53.409
[32m[04-23 14:39:35 MainThread @train.py:166][0m Steps: 875000, Episodes: 35000, Mean episode reward: -28.966424317648737, Time: 52.674

最后reward一直在-29-28之间波动。其余场景我就不一一贴出来，感兴趣的自己跑一跑，我这边参数可能设定也不是很好，会导致有时候效果不佳，需要调整。

给出建议可以再train文件中添加测试训练来提高模型精度或者把训练参数中max step增大，也增加训练时间步数看看结果

4.2 官网程序跑出来的效果

可以看到simple_speaker_listener跑出结果和图5基本一样收敛了

MADDPG_simple MADDPG_simple_adversary MADDPG_simple_push

MADDPG_simple_reference MADDPG_simple_speaker_listener MADDPG_simple_spread

MADDPG_simple_tag MADDPG_simple_world_comm

你可能感兴趣的:(#,飞桨parl,#,多智能体强化学习,机器学习,深度学习,tensorflow,神经网络,python)

系统学习Python——并发模型和异步编程：进程、线程和GIL
分类目录：《系统学习Python》总目录在文章《并发模型和异步编程：基础知识》我们简单介绍了Python中的进程、线程和协程。本文就着重介绍Python中的进程、线程和GIL的关系。Python解释器的每个实例都是一个进程。使用multiprocessing或concurrent.futures库可以启动额外的Python进程。Python的subprocess库用于启动运行外部程序（不管使用何种
Flask框架入门：快速搭建轻量级Python网页应用「已注销」 python-AI python基础网站网络 python flask 后端
转载：Flask框架入门：快速搭建轻量级Python网页应用1.Flask基础Flask是一个使用Python编写的轻量级Web应用框架。它的设计目标是让Web开发变得快速简单，同时保持应用的灵活性。Flask依赖于两个外部库：Werkzeug和Jinja2，Werkzeug作为WSGI工具包处理Web服务的底层细节，Jinja2作为模板引擎渲染模板。安装Flask非常简单，可以使用pip安装命令
Python Flask 框架入门：快速搭建 Web 应用的秘诀 Python编程之道 Python人工智能与大数据 Python编程之道 python flask 前端 ai
PythonFlask框架入门：快速搭建Web应用的秘诀关键词Flask、微框架、路由系统、Jinja2模板、请求处理、WSGI、Web开发摘要想快速用Python搭建一个灵活的Web应用？Flask作为“微框架”代表，凭借轻量、可扩展的特性，成为初学者和小型项目的首选。本文将从Flask的核心概念出发，结合生活化比喻、代码示例和实战案例，带你一步步掌握：如何用Flask搭建第一个Web应用？路由
python_虚拟环境阿_焦 python
第一、配置虚拟环境：virtualenv（1）pipvirtualenv>安装虚拟环境包（2）pipinstallvirtualenvwrapper-win>安装虚拟环境依赖包（3）c盘创建虚拟目录>C:\virtualenv>配置环境变量【了解一下】：（1）如何使用virtualenv创建虚拟环境a、cd到C:\virtualenv目录下：b、mkvirtualenvname>创建虚拟环境nam
PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
Python爱心光波
系列文章序号直达链接Tkinter1Python李峋同款可写字版跳动的爱心2Python跳动的双爱心3Python蓝色跳动的爱心4Python动漫烟花5Python粒子烟花Turtle1Python满屏飘字2Python蓝色流星雨3Python金色流星雨4Python漂浮爱心5Python爱心光波①6Python爱心光波②7Python满天繁星8Python五彩气球9Python白色飘雪10Pyt
Python流星雨 Want595 python 开发语言
文章目录系列文章写在前面技术需求完整代码代码分析1.模块导入2.画布设置3.画笔设置4.颜色列表5.流星类(Star)6.流星对象创建7.主循环8.流星运动逻辑9.视觉效果10.总结写在后面系列文章序号直达链接表白系列1Python制作一个无法拒绝的表白界面2Python满屏飘字表白代码3Python无限弹窗满屏表白代码4Python李峋同款可写字版跳动的爱心5Python流星雨代码6Python
Python之七彩花朵代码实现 PlutoZuo Python python 开发语言
Python之七彩花朵代码实现文章目录Python之七彩花朵代码实现下面是一个简单的使用Python的七彩花朵。这个示例只是一个简单的版本，没有很多高级功能，但它可以作为一个起点，你可以在此基础上添加更多功能。importturtleastuimportrandomasraimportmathtu.setup(1.0,1.0)t=tu.Pen()t.ht()colors=['red','skybl
Python 脚本最佳实践2025版
前文可以直接把这篇文章喂给AI,可以放到AI角色设定里,也可以直接作为提示词.这样,你只管提需求,写脚本就让AI来.概述追求简洁和清晰：脚本应简单明了。使用函数(functions)、常量(constants)和适当的导入(import)实践来有逻辑地组织你的Python脚本。使用枚举(enumerations)和数据类(dataclasses)等数据结构高效管理脚本状态。通过命令行参数增强交互性
（Python基础篇）了解和使用分支结构 EternityArt 基础篇 python
目录一、引言二、Python分支结构的类型与语法（一）if语句（单分支）（二）if-else语句（双分支）（三）if-elif-else语句（多分支）三、分支结构的应用场景（一）提示用户输入用户名，然后再提示输入密码，如果用户名是“admin”并且密码是“88888”则提示正确，否则，如果用户名不是admin还提示用户用户名不存在,（二）提示用户输入用户名，然后再提示输入密码，如果用户名是“adm
（Python基础篇）循环结构 EternityArt 基础篇 python
一、什么是Python循环结构？循环结构是编程中重复执行代码块的机制。在Python中，循环允许你：1.迭代处理数据：遍历列表、字典、文件内容等。2.自动化重复任务：如批量处理数据、生成序列等。3.控制执行流程：根据条件决定是否继续或终止循环。二、为什么需要循环结构？假设你需要打印1到100的所有偶数：没有循环：需手动编写100行print()语句。print(0)print(2)print(4)
（Python基础篇）字典的操作 EternityArt 基础篇 python 开发语言
一、引言在Python编程中，字典（Dictionary）是一种极具灵活性的数据结构，它通过“键-值对”（key-valuepair）的形式存储数据，如同现实生活中的字典——通过“词语（键）”快速查找“释义（值）”。相较于列表和元组的有序索引访问，字典的优势在于基于键的快速查找，这使得它在处理需要频繁通过唯一标识获取数据的场景中极为高效。掌握字典的操作，能让我们更高效地组织和管理复杂数据，是Pyt
Python七彩花朵 Want595 python 开发语言
系列文章序号直达链接Tkinter1Python李峋同款可写字版跳动的爱心2Python跳动的双爱心3Python蓝色跳动的爱心4Python动漫烟花5Python粒子烟花Turtle1Python满屏飘字2Python蓝色流星雨3Python金色流星雨4Python漂浮爱心5Python爱心光波①6Python爱心光波②7Python满天繁星8Python五彩气球9Python白色飘雪10Pyt
用OpenCV标定相机内参应用示例（C++和Python）
下面是一个完整的使用OpenCV进行相机内参标定（CameraCalibration）的示例，包括C++和Python两个版本，基于棋盘格图案标定。一、目标：相机标定通过拍摄多张带有棋盘格图案的图像，估计相机的内参：相机矩阵（内参）K畸变系数distCoeffs可选外参（R,T）标定精度指标（如重投影误差）二、棋盘格参数设置（根据自己的棋盘格设置）：棋盘格角点数：9x6（内角点，9列×6行）；每个
Anaconda 详细下载与安装教程
Anaconda详细下载与安装教程1.简介Anaconda是一个用于科学计算的开源发行版，包含了Python和R的众多常用库。它还包括了conda包管理器，可以方便地安装、更新和管理各种软件包。2.下载Anaconda2.1访问官方网站首先，打开浏览器，访问Anaconda官方网站。2.2选择适合的版本在页面中，你会看到两个主要的下载选项：AnacondaIndividualEdition：适用于
python中 @注解及内置注解的使用方法总结以及完整示例慧一居士 Python python
在Python中，装饰器（Decorator）使用@符号实现，是一种修改函数/类行为的语法糖。它本质上是一个高阶函数，接受目标函数作为参数并返回包装后的函数。Python也提供了多个内置装饰器，如@property、@staticmethod、@classmethod等。一、核心概念装饰器本质：@decorator等价于func=decorator(func)执行时机：在函数/类定义时立即执行装饰
Python中的静态方法和类方法详解
在Python中，`@staticmethod`和`@classmethod`是两种装饰器，它们用于定义类中的方法，但是它们的行为和用途有所不同。###@staticmethod`@staticmethod`装饰器用于定义一个静态方法。静态方法不接收类或实例的引用作为第一个参数，因此它不能访问类的状态或实例的状态。静态方法可以看作是与类关联的普通函数，但它们可以通过类名直接调用。classMath
Python中类静态方法：@classmethod/@staticmethod详解和实战示例
在Python中，类方法(@classmethod)和静态方法(@staticmethod)是类作用域下的两种特殊方法。它们使用装饰器定义，并且与实例方法(deffunc(self))的行为有所不同。1.三种方法的对比概览方法类型是否访问实例(self)是否访问类(cls)典型用途实例方法✅是❌否访问对象属性类方法@classmethod❌否✅是创建类的替代构造器，访问类变量等静态方法@stati
Python多版本管理与pip升级全攻略：解决冲突与高效实践码界奇点 Python python pip 开发语言 python3.11 源代码管理虚拟现实依赖倒置原则
引言Python作为最流行的编程语言之一，其版本迭代速度与生态碎片化给开发者带来了巨大挑战。据统计，超过60%的Python开发者需要同时维护基于Python3.6+和Python2.7的项目。本文将系统解决以下核心痛点：如何安全地在同一台机器上管理多个Python版本pip依赖冲突的根治方案符合PEP标准的生产环境最佳实践第一部分：Python多版本管理核心方案1.1系统级多版本共存方案Wind
基于Python的健身数据分析工具的搭建流程day1 weixin_45677320 python 开发语言数据挖掘爬虫
基于Python的健身数据分析工具的搭建流程分数据挖掘、数据存储和数据分析三个步骤。本文主要介绍利用Python实现健身数据分析工具的数据挖掘部分。第一步：加载库加载本文需要的库，如下代码所示。若库未安装，请按照python如何安装各种库（保姆级教程）_python安装库-CSDN博客https://blog.csdn.net/aobulaien001/article/details/133298
seaborn又一个扩展heatmapz qq_21478261 #Python可视化 matplotlib
推荐阅读：Pythonmatplotlib保姆级教程嫌Matplotlib繁琐？试试Seaborn！
NGS测序基础梳理01-文库构建（Library Preparation） qq_21478261 #生物信息生物学
本文介绍Illumina测序平台文库构建（LibraryPreparation）步骤，文库结构。写作时间：2020.05。推荐阅读：10W字《Python可视化教程1.0》来了！一份由公众号「pythonic生物人」精心制作的PythonMatplotlib可视化系统教程，105页PDFhttps://mp.weixin.qq.com/s/QaSmucuVsS_DR-klfpE3-Q10W字《Rg
Python 常用内置函数详解（七）：dir()函数——获取当前本地作用域中的名称列表或对象的有效属性列表
目录一、功能二、语法和示例一、功能dir()函数获取当前本地作用域中的名称列表或对象的有效属性列表。二、语法和示例dir()函数有两种形式，如果没有实参，则返回当前本地作用域中的名称列表。如果有实参，它会尝试返回该对象的有效属性列表。如果对象有一个名为__dir__()的方法，那么该方法将被调用，并且必须返回一个属性列表。dir()函数的语法格式如下：C:\Users\amoxiang>ipyth
pythonjson中list操作_Python json.dumps 特殊数据类型的自定义序列化操作
场景描述：Python标准库中的json模块，集成了将数据序列化处理的功能；在使用json.dumps()方法序列化数据时候，如果目标数据中存在datetime数据类型，执行操作时，会抛出异常：TypeError:datetime.datetime(2016,12,10,11,04,21)isnotJSONserializable那么遇到json.dumps序列化不支持的数据类型，该怎么办！首先，
LangChain中的向量数据库接口－Weaviate 洪城叮当 langchain 数据库经验分享笔记交互人工智能知识图谱
文章目录前言一、原型定义二、代码解析1、add_texts方法1.1、应用样例2、from_texts方法2.1、应用样例3、similarity_search方法3.1、应用样例三、项目应用1、安装依赖2、引入依赖3、创建对象4、添加数据5、查询数据总结前言 Weaviate是一个开源的向量数据库，支持存储来自各类机器学习模型的数据对象和向量嵌入，并能无缝扩展至数十亿数据对象。它提供存储文档嵌
Python 日期格式转json.dumps的解决方法 douyaoxin python json 开发语言
classDateEncoder(json.JSONEncoder):defdefault(self,obj):ifisinstance(obj,datetime.datetime):returnobj.strftime('%Y-%m-%d%H:%M:%S')elifisinstance(obj,datetime.date):returnobj.strftime("%Y-%m-%d")json.d
Python 爬虫实战：视频平台播放量实时监控（含反爬对抗与数据趋势预测）西攻城狮北 python 爬虫音视频
一、引言在数字内容蓬勃发展的当下，视频平台的播放量数据已成为内容创作者、营销人员以及行业分析师手中极为关键的情报资源。它不仅能够实时反映内容的受欢迎程度，更能在竞争分析、营销策略制定以及内容优化等方面发挥不可估量的作用。然而，视频平台为了保护自身数据和用户隐私，往往会设置一系列反爬虫机制，对数据爬取行为进行限制。这就向我们发起了挑战：如何巧妙地突破这些限制，同时精准地捕捉并预测播放量的动态变化趋势
Python技能手册 - 模块module 金色牛神 Python python windows 开发语言
系列Python常用技能手册-基础语法Python常用技能手册-模块modulePython常用技能手册-包package目录module模块指什么typing数据类型int整数float浮点数str字符串bool布尔值TypeVar类型变量functools高阶函数工具functools.partial()函数偏置functools.lru_cache()函数缓存sorted排序列表排序元组排序
深度学习模型表征提取全解析 ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 深度学习人工智能 python embedding 语言模型
模型内部进行表征提取的方法在自然语言处理（NLP）中，“表征（Representation）”指将文本（词、短语、句子、文档等）转化为计算机可理解的数值形式（如向量、矩阵），核心目标是捕捉语言的语义、语法、上下文依赖等信息。自然语言表征技术可按“静态/动态”“有无上下文”“是否融入知识”等维度划分一、传统静态表征（无上下文，词级为主）这类方法为每个词分配固定向量，不考虑其在具体语境中的含义（无法解
Ubuntu基础（Python虚拟环境和Vue） aaiier ubuntu python linux
Python虚拟环境sudoaptinstallpython3python3-venv进入项目目录cdXXX创建虚拟环境python3-mvenvvenv激活虚拟环境sourcevenv/bin/activate退出虚拟环境deactivateVue安装Node.js和npm#安装Node.js和npm（Ubuntu默认仓库可能版本较旧，适合入门）sudoaptinstallnodejsnpm#验
jsonp 常用util方法 hw1287789687 jsonp jsonp常用方法 jsonp callback
jsonp 常用java方法 (1)以jsonp的形式返回:函数名(json字符串) /*** * 用于jsonp调用 * @param map : 用于构造json数据 * @param callback : 回调的javascript方法名 * @param filters : <code>SimpleBeanPropertyFilter theFilt
多线程场景 alafqq 多线程
0 能不能简单描述一下你在java web开发中需要用到多线程编程的场景？0 对多线程有些了解，但是不太清楚具体的应用场景，能简单说一下你遇到的多线程编程的场景吗？ Java多线程 2012年11月23日 15:41 Young9007 Young9007 4 0 0 4 Comment添加评论关注(2) 3个答案按时间排序按投票排序 0 0 最典型的如： 1、
Maven学习——修改Maven的本地仓库路径 Kai_Ge maven
安装Maven后我们会在用户目录下发现.m2 文件夹。默认情况下，该文件夹下放置了Maven本地仓库.m2/repository。所有的Maven构件(artifact)都被存储到该仓库中，以方便重用。但是windows用户的操作系统都安装在C盘，把Maven仓库放到C盘是很危险的，为此我们需要修改Maven的本地仓库路径。
placeholder的浏览器兼容 120153216 placeholder
【前言】自从html5引入placeholder后，问题就来了，不支持html5的浏览器也先有这样的效果，各种兼容，之前考虑，今天测试人员逮住不放，想了个解决办法，看样子还行，记录一下。【原理】不使用placeholder，而是模拟placeholder的效果，大概就是用focus和focusout效果。【代码】 <scrip
debian_用iso文件创建本地apt源 2002wmj Debian
1.将N个debian-506-amd64-DVD-N.iso存放于本地或其他媒介内，本例是放在本机/iso/目录下 2.创建N个挂载点目录如下： debian:~#mkdir –r /media/dvd1 debian:~#mkdir –r /media/dvd2 debian:~#mkdir –r /media/dvd3 …. debian:~#mkdir –r /media
SQLSERVER耗时最长的SQL 357029540 SQL Server
对于DBA来说，经常要知道存储过程的某些信息： 1. 执行了多少次 2. 执行的执行计划如何 3. 执行的平均读写如何 4. 执行平均需要多少时间列名 &
com/genuitec/eclipse/j2eedt/core/J2EEProjectUtil 7454103 eclipse
今天eclipse突然报了com/genuitec/eclipse/j2eedt/core/J2EEProjectUtil 错误，并且工程文件打不开了，在网上找了一下资料，然后按照方法操作了一遍，好了，解决方法如下：错误提示信息： An error has occurred.See error log for more details. Reason: com/genuitec/
用正则删除文本中的html标签 adminjun java html 正则表达式去掉html标签
使用文本编辑器录入文章存入数据中的文本是HTML标签格式，由于业务需要对HTML标签进行去除只保留纯净的文本内容，于是乎Java实现自动过滤。如下： public static String Html2Text(String inputString) { String htmlStr = inputString; // 含html标签的字符串 String textSt
嵌入式系统设计中常用总线和接口 aijuans linux 基础
嵌入式系统设计中常用总线和接口任何一个微处理器都要与一定数量的部件和外围设备连接，但如果将各部件和每一种外围设备都分别用一组线路与CPU直接连接，那么连线
Java函数调用方式——按值传递 ayaoxinchao java 按值传递对象基础数据类型
Java使用按值传递的函数调用方式，这往往使我感到迷惑。因为在基础数据类型和对象的传递上，我就会纠结于到底是按值传递，还是按引用传递。其实经过学习，Java在任何地方，都一直发挥着按值传递的本色。首先，让我们看一看基础数据类型是如何按值传递的。 public static void main(String[] args) { int a = 2;
ios音量线性下降 bewithme ios音量
直接上代码吧 //second 几秒内下降为0 - (void)reduceVolume:(int)second { KGVoicePlayer *player = [KGVoicePlayer defaultPlayer]; if (!_flag) { _tempVolume = player.volume;
与其怨它不如爱它 bijian1013 选择理想职业规划
抱怨工作是年轻人的常态，但爱工作才是积极的心态，与其怨它不如爱它。一般来说，在公司干了一两年后，不少年轻人容易产生怨言，除了具体的埋怨公司“扭门”，埋怨上司无能以外，也有许多人是因为根本不爱自已的那份工作，工作完全成了谋生的手段，跟自已的性格、专业、爱好都相差甚远。
一边时间不够用一边浪费时间 bingyingao 工作时间浪费
一方面感觉时间严重不够用，另一方面又在不停的浪费时间。每一个周末，晚上熬夜看电影到凌晨一点，早上起不来一直睡到10点钟，10点钟起床，吃饭后玩手机到下午一点。精神还是很差，下午像一直野鬼在城市里晃荡。为何不尝试晚上10点钟就睡，早上7点就起，时间完全是一样的，把看电影的时间换到早上，精神好，气色好，一天好状态。控制让自己周末早睡早起，你就成功了一半。有多少个工作
【Scala八】Scala核心二：隐式转换 bit1129 scala
Implicits work like this: if you call a method on a Scala object, and the Scala compiler does not see a definition for that method in the class definition for that object, the compiler will try to con
sudoku slover in Haskell (2) bookjovi haskell sudoku
继续精简haskell版的sudoku程序，稍微改了一下，这次用了8行，同时性能也提高了很多，对每个空格的所有解不是通过尝试算出来的，而是直接得出。 board = [0,3,4,1,7,0,5,0,0, 0,6,0,0,0,8,3,0,1, 7,0,0,3,0,0,0,0,6, 5,0,0,6,4,0,8,0,7,
Java-Collections Framework学习与总结-HashSet和LinkedHashSet BrokenDreams linkedhashset
本篇总结一下两个常用的集合类HashSet和LinkedHashSet。它们都实现了相同接口java.util.Set。Set表示一种元素无序且不可重复的集合；之前总结过的java.util.List表示一种元素可重复且有序
读《研磨设计模式》-代码笔记-备忘录模式-Memento bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; /* * 备忘录模式的功能是，在不破坏封装性的前提下，捕获一个对象的内部状态，并在对象之外保存这个状态，为以后的状态恢复作“备忘”
《RAW格式照片处理专业技法》笔记 cherishLC PS
注意，这不是教程！仅记录楼主之前不太了解的一、色彩（空间）管理作者建议采用ProRGB（色域最广），但camera raw中设为ProRGB，而PS中则在ProRGB的基础上，将gamma值设为了1.8（更符合人眼）注意：bridge、camera raw怎么设置显示、输出的颜色都是正确的（会读取文件内的颜色配置文件），但用PS输出jpg文件时，必须先用Edit->conv
使用 Git 下载 Spring 源码编译 for Eclipse crabdave eclipse
使用 Git 下载 Spring 源码编译 for Eclipse 1、安装gradle，下载 http://www.gradle.org/downloads 配置环境变量GRADLE_HOME，配置PATH %GRADLE_HOME%/bin，cmd，gradle -v 2、spring4 用jdk8 下载 https://jdk8.java.
mysql连接拒绝问题 daizj mysql 登录权限
mysql中在其它机器连接mysql服务器时报错问题汇总一、[running][email protected]:~$mysql -uroot -h 192.168.9.108 -p //带-p参数，在下一步进行密码输入 Enter password: //无字符串输入 ERROR 1045 (28000): Access
Google Chrome 为何打压 H.264 dsjt apple html5 chrome Google
Google 今天在 Chromium 官方博客宣布由于 H.264 编解码器并非开放标准，Chrome 将在几个月后正式停止对 H.264 视频解码的支持，全面采用开放的 WebM 和 Theora 格式。 Google 在博客上表示，自从 WebM 视频编解码器推出以后，在性能、厂商支持以及独立性方面已经取得了很大的进步，为了与 Chromium 现有支持的編解码器保持一致，Chrome
yii 获取控制器名和方法名 dcj3sjt126com yii framework
1. 获取控制器名在控制器中获取控制器名: $name = $this->getId(); 在视图中获取控制器名: $name = Yii::app()->controller->id; 2. 获取动作名在控制器beforeAction()回调函数中获取动作名: $name =
Android知识总结（二） come_for_dream android
明天要考试了，速速总结如下 1、Activity的启动模式 standard：每次调用Activity的时候都创建一个（可以有多个相同的实例，也允许多个相同Activity叠加。） singleTop：可以有多个实例，但是不允许多个相同Activity叠加。即，如果Ac
高洛峰收徒第二期：寻找未来的“技术大牛” ——折腾一年，奖励20万元 gcq511120594 工作项目管理
高洛峰，兄弟连IT教育合伙人、猿代码创始人、PHP培训第一人、《细说PHP》作者、软件开发工程师、《IT峰播》主创人、PHP讲师的鼻祖！首期现在的进程刚刚过半，徒弟们真的很棒，人品都没的说，团结互助，学习刻苦，工作认真积极，灵活上进。我几乎会把他们全部留下来，现在已有一多半安排了实际的工作，并取得了很好的成绩。等他们出徒之日，凭他们的能力一定能够拿到高薪，而且我还承诺过一个徒弟，当他拿到大学毕
linux expect heipark expect
1. 创建、编辑文件go.sh #!/usr/bin/expect spawn sudo su admin expect "*password*" { send "13456\r\n" } interact 2. 设置权限 chmod u+x go.sh 3.
Spring4.1新特性——静态资源处理增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
idea ubuntuxia 乱码 liyonghui160com
1.首先需要在windows字体目录下或者其它地方找到simsun.ttf 这个字体文件。 2.在ubuntu 下可以执行下面操作安装该字体： sudo mkdir /usr/share/fonts/truetype/simsun sudo cp simsun.ttf /usr/share/fonts/truetype/simsun fc-cache -f -v
改良程序的11技巧 pda158 技巧
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短永远永远不要把同一个变量用于多个不同的
300个涵盖IT各方面的免费资源（下）——工作与学习篇 shoothao 创业免费资源学习课程远程工作
工作与生产效率: A. 背景声音 Noisli:背景噪音与颜色生成器。 Noizio:环境声均衡器。 Defonic:世界上任何的声响都可混合成美丽的旋律。 Designers.mx:设计者为设计者所准备的播放列表。 Coffitivity:这里的声音就像咖啡馆里放的一样。 B. 避免注意力分散 Self Co
深入浅出RPC uule rpc
深入浅出RPC-浅出篇深入浅出RPC-深入篇 RPC Remote Procedure Call Protocol 远程过程调用协议它是一种通过网络从远程计算机程序上请求服务，而不需要了解底层网络技术的协议。RPC协议假定某些传输协议的存在，如TCP或UDP，为通信程序之间携带信息数据。在OSI网络通信模型中，RPC跨越了传输层和应用层。RPC使得开发