Robert--cao

强化学习（二）--让你轻松玩转生成对抗网络（GAN）与生成对抗模仿学习（GAIL）

GAN的基本结构

GAN的主要结构包括一个生成器G（Generator）和一个判别器D（Discriminator）

GAN 充分利用“对抗过程”训练两个神经网络，这两个网络会互相博弈直至达到一种理想的平衡状态，我们这个例子中的警察和罪犯就相当于这两个神经网络。其中一个神经网络叫做生成器网络 G（Z），它会使用输入随机噪声数据，生成和已有数据集非常接近的数据，它学习的是数据分布；另一个神经网络叫鉴别器网络 D（X），它会以生成的数据作为输入，尝试鉴别出哪些是生成的数据，哪些是真实数据。鉴别器的核心是实现二元分类，输出的结果是输入数据来自真实数据集（和合成数据或虚假数据相对）的概率。

整个过程的目标函数从正式意义上可以写为：

前面所说的 GAN 最终能达到一种理想的平衡状态，是指生成器应该能模拟真实的数据，鉴别器输出的概率应该为 0.5，即生成的数据和真实数据一致。也就是说，它不确定来自生成器的新数据是真实还是虚假，二者的概率相等（这样熵最大）。

这里，使用GAN生成正弦信号，下面给出代码：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt




# torch.manual_seed(1)       # reproducible
# np.random.seed(1)

# Hyper Parameters
BATCH_SIZE = 64
LR_G = 0.0001  # learning rate for generator
LR_D = 0.0001  # learning rate for discriminator
N_IDEAS = 8  # think of this as number of ideas for generating an art work(Generator)
ART_COMPONENTS = 15  # it could be total point G can drew in the canvas
PAINT_POINTS = np.vstack([np.linspace(-1, 1, ART_COMPONENTS) for _ in range(BATCH_SIZE)])



def artist_works():  # painting from the famous artist (real target)
    # a = np.random.uniform(1, 2, size=BATCH_SIZE)[:, np.newaxis]
    r = 0.02 * np.random.randn(1, ART_COMPONENTS)
    paintings = np.sin(PAINT_POINTS * np.pi) + r
    paintings = torch.from_numpy(paintings).float()
    return paintings


# G = nn.Sequential(  # Generator
#     nn.Linear(N_IDEAS, 128),  # random ideas (could from normal distribution)
#     nn.ReLU(),
#     nn.Linear(128, ART_COMPONENTS),  # making a painting from these random ideas
# )
#
# D = nn.Sequential(  # Discriminator
#     nn.Linear(ART_COMPONENTS, 128),  # receive art work either from the famous artist or a newbie like G
#     nn.ReLU(),
#     nn.Linear(128, 1),
#     nn.Sigmoid(),  # tell the probability that the art work is made by artist
# )

class Ge(nn.Module):
    def __init__(self):
        super(Ge,self).__init__()
        self.fc1=nn.Linear(N_IDEAS,128)
        self.fc2=nn.Linear(128,ART_COMPONENTS)

    def forward(self, x):
        x=F.relu(self.fc1(x))
        x=self.fc2(x)
        return x


class De(nn.Module):
    def __init__(self):
        super(De,self).__init__()
        self.fc1=nn.Linear(ART_COMPONENTS,128)
        self.fc2=nn.Linear(128,1)

    def forward(self,x):
        x=F.relu(self.fc1(x))
        x=F.sigmoid(self.fc2(x))
        return x


G=Ge()
D=De()


opt_D = torch.optim.Adam(D.parameters(), lr=LR_D)
opt_G = torch.optim.Adam(G.parameters(), lr=LR_G)

plt.ion()  # something about continuous plotting

D_loss_history = []
G_loss_history = []
for step in range(10000):
    artist_paintings = artist_works()  # real painting from artist
    G_ideas = torch.randn(BATCH_SIZE, N_IDEAS)  # random ideas
    G_paintings = G(G_ideas)  # fake painting from G (random ideas)

    prob_artist0 = D(artist_paintings)  # D try to increase this prob
    prob_artist1 = D(G_paintings)  # D try to reduce this prob

    D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1))
    G_loss = torch.mean(torch.log(1. - prob_artist1))

    D_loss_history.append(D_loss)
    G_loss_history.append(G_loss)

    opt_D.zero_grad()
    D_loss.backward(retain_graph=True)  # reusing computational graph
    opt_D.step()

    opt_G.zero_grad()
    G_loss.backward()
    opt_G.step()



    print("4444d",PAINT_POINTS[0])


    if step % 1000 == 0:  # plotting
        plt.cla()
        plt.plot(PAINT_POINTS[0], G_paintings.data.numpy()[0], c='r', lw=3, label='Generated painting', )
        plt.plot(PAINT_POINTS[0], np.sin(PAINT_POINTS[0] * np.pi), c='b', lw=3, label='upper bound')
        plt.text(-1, 0.75, 'D accuracy=%.2f (0.5 for D to converge)' % prob_artist0.data.numpy().mean(),
                 fontdict={'size': 13})
        plt.text(-1, 0.5, 'D score= %.2f (-1.38 for G to converge)' % -D_loss.data.numpy(), fontdict={'size': 13})
        plt.ylim((-1, 1));
        plt.legend(loc='upper right', fontsize=10);
        plt.draw();
        plt.pause(0.01)

# plt.ioff()
# plt.show()

上面代码中，def artist_works()函数这里主要产生给定的正弦信号：

def artist_works():  # painting from the famous artist (real target)
    # a = np.random.uniform(1, 2, size=BATCH_SIZE)[:, np.newaxis]
    r = 0.02 * np.random.randn(1, ART_COMPONENTS)
    paintings = np.sin(PAINT_POINTS * np.pi) + r
    paintings = torch.from_numpy(paintings).float()
    return paintings

下面这段代码主要是构建生成器与判别器网络，这里的网络是在pytorch下完成的。

class Ge(nn.Module):
    def __init__(self):
        super(Ge,self).__init__()
        self.fc1=nn.Linear(N_IDEAS,128)
        self.fc2=nn.Linear(128,ART_COMPONENTS)

    def forward(self, x):
        x=F.relu(self.fc1(x))
        x=self.fc2(x)
        return x


class De(nn.Module):
    def __init__(self):
        super(De,self).__init__()
        self.fc1=nn.Linear(ART_COMPONENTS,128)
        self.fc2=nn.Linear(128,1)

    def forward(self,x):
        x=F.relu(self.fc1(x))
        x=F.sigmoid(self.fc2(x))
        return x

下面这段代码为生成器和判别器的损失函数：

D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1))
G_loss = torch.mean(torch.log(1. - prob_artist1))

实现效果，第一幅图为刚开始随机数输入产生的曲线，第二幅图为鉴别器输出的概率为 0.5，可以看出效果很好：

有了上面GAN的经验，接下来介绍生成对抗模仿学习：

在这里，整个工程有两个文件组成，一个env_OppositeV4.py构建环境，一个GAIL_OppositeV4.py运行程序。

首先介绍env_OppositeV4.py代码构建环境，先看一个构建的环境效果图：

图中红色的部分为起点，绿色部分为终点，下面给出env_OppositeV4.py代码：

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import random
import cv2

class EnvOppositeV4(object):
    def __init__(self, size):
        self.map_size = size
        self.raw_occupancy = np.zeros((self.map_size, self.map_size))
        for i in range(self.map_size):
            self.raw_occupancy[0][i] = 1
            self.raw_occupancy[self.map_size - 1][i] = 1
            self.raw_occupancy[i][0] = 1
            self.raw_occupancy[i][self.map_size - 1] = 1
            self.raw_occupancy[i][int((self.map_size - 1) / 2)] = 1
        self.raw_occupancy[1][int((self.map_size - 1) / 2)] = 0
        self.raw_occupancy[self.map_size - 2][int((self.map_size - 1) / 2)] = 0

        self.occupancy = self.raw_occupancy.copy()

        self.agt1_pos = [int((self.map_size - 1) / 2), 1]
        self.goal1_pos = [int((self.map_size - 1) / 2), self.map_size - 2]
        self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1

    def reset(self):
        self.occupancy = self.raw_occupancy.copy()

        self.agt1_pos = [int((self.map_size - 1) / 2), 1]
        self.goal1_pos = [int((self.map_size - 1) / 2), self.map_size - 2]
        self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1

    def get_state(self):
        state = np.zeros((1, 2))
        state[0, 0] = self.agt1_pos[0] / self.map_size
        state[0, 1] = self.agt1_pos[1] / self.map_size
        return state

    def step(self, action_list):
        reward = 0
        # agent1 move
        if action_list[0] == 0:  # move up
            if self.occupancy[self.agt1_pos[0] - 1][self.agt1_pos[1]] != 1:  # if can move
                self.agt1_pos[0] = self.agt1_pos[0] - 1
                self.occupancy[self.agt1_pos[0] + 1][self.agt1_pos[1]] = 0
                self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1
        elif action_list[0] == 1:  # move down
            if self.occupancy[self.agt1_pos[0] + 1][self.agt1_pos[1]] != 1:  # if can move
                self.agt1_pos[0] = self.agt1_pos[0] + 1
                self.occupancy[self.agt1_pos[0] - 1][self.agt1_pos[1]] = 0
                self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1
        elif action_list[0] == 2:  # move left
            if self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] - 1] != 1:  # if can move
                self.agt1_pos[1] = self.agt1_pos[1] - 1
                self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] + 1] = 0
                self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1
        elif action_list[0] == 3:  # move right
            if self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] + 1] != 1:  # if can move
                self.agt1_pos[1] = self.agt1_pos[1] + 1
                self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] - 1] = 0
                self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1

        if self.agt1_pos == self.goal1_pos:
            reward = reward + 5

        done = False
        if reward == 5:
            done = True
        return reward, done

    def get_global_obs(self):
        obs = np.zeros((self.map_size, self.map_size, 3))
        for i in range(self.map_size):
            for j in range(self.map_size):
                if self.occupancy[i][j] == 0:
                    obs[i, j, 0] = 1.0
                    obs[i, j, 1] = 1.0
                    obs[i, j, 2] = 1.0
        obs[self.agt1_pos[0], self.agt1_pos[1], 0] = 1.0
        obs[self.agt1_pos[0], self.agt1_pos[1], 1] = 0.0
        obs[self.agt1_pos[0], self.agt1_pos[1], 2] = 0.0
        return obs

    def render(self):
        obs = self.get_global_obs()
        enlarge = 30
        new_obs = np.ones((self.map_size*enlarge, self.map_size*enlarge, 3))
        for i in range(self.map_size):
            for j in range(self.map_size):

                if obs[i][j][0] == 0.0 and obs[i][j][1] == 0.0 and obs[i][j][2] == 0.0:
                    cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (0, 0, 0), -1)
                if obs[i][j][0] == 1.0 and obs[i][j][1] == 0.0 and obs[i][j][2] == 0.0:
                    cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (0, 0, 255), -1)
                if obs[i][j][0] == 0.0 and obs[i][j][1] == 1.0 and obs[i][j][2] == 0.0:
                    cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (0, 255, 0), -1)
                if obs[i][j][0] == 0.0 and obs[i][j][1] == 0.0 and obs[i][j][2] == 1.0:
                    cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (255, 0, 0), -1)
        cv2.imshow('image', new_obs)
        cv2.waitKey(100)

上面代码中，这个部分生成如下图，其实就是生成环境的矩形框，1的部分到时候赋予黑颜色，0的部分赋予白色，就构建出了上面的图，这里也计算了agent的目标位置与起始位置。

def __init__(self, size):
    self.map_size = size
    self.raw_occupancy = np.zeros((self.map_size, self.map_size))
    for i in range(self.map_size):
        self.raw_occupancy[0][i] = 1
        self.raw_occupancy[self.map_size - 1][i] = 1
        self.raw_occupancy[i][0] = 1
        self.raw_occupancy[i][self.map_size - 1] = 1
        self.raw_occupancy[i][int((self.map_size - 1) / 2)] = 1
    self.raw_occupancy[1][int((self.map_size - 1) / 2)] = 0
    self.raw_occupancy[self.map_size - 2][int((self.map_size - 1) / 2)] = 0

    self.occupancy = self.raw_occupancy.copy()

    self.agt1_pos = [int((self.map_size - 1) / 2), 1]
    self.goal1_pos = [int((self.map_size - 1) / 2), self.map_size - 2]
    self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1

通过下面代码把数字为1的地方赋予黑色，把0的地方赋予白色，结果如下图。
def get_global_obs(self):
    obs = np.zeros((self.map_size, self.map_size, 3))
    for i in range(self.map_size):
        for j in range(self.map_size):
            if self.occupancy[i][j] == 0:
                obs[i, j, 0] = 1.0
                obs[i, j, 1] = 1.0
                obs[i, j, 2] = 1.0
    obs[self.agt1_pos[0], self.agt1_pos[1], 0] = 1.0
    obs[self.agt1_pos[0], self.agt1_pos[1], 1] = 0.0
    obs[self.agt1_pos[0], self.agt1_pos[1], 2] = 0.0
    return obs

通过下面的代码把框图放大。

def render(self):
    obs = self.get_global_obs()
    enlarge = 30
    new_obs = np.ones((self.map_size*enlarge, self.map_size*enlarge, 3))
    for i in range(self.map_size):
        for j in range(self.map_size):

            if obs[i][j][0] == 0.0 and obs[i][j][1] == 0.0 and obs[i][j][2] == 0.0:
                cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (0, 0, 0), -1)
            if obs[i][j][0] == 1.0 and obs[i][j][1] == 0.0 and obs[i][j][2] == 0.0:
                cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (0, 0, 255), -1)
            if obs[i][j][0] == 0.0 and obs[i][j][1] == 1.0 and obs[i][j][2] == 0.0:
                cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (0, 255, 0), -1)
            if obs[i][j][0] == 0.0 and obs[i][j][1] == 0.0 and obs[i][j][2] == 1.0:
                cv2.rectangle(new_obs, (j * enlarge, i * enlarge), (j * enlarge + enlarge, i * enlarge + enlarge), (255, 0, 0), -1)
    cv2.imshow('image',new_obs)
    cv2.waitKey(100)

下面这段代码主要是描述agent的动作与reward。

def step(self, action_list):
    reward = 0
    # agent1 move
    if action_list[0] == 0:  # move up
        if self.occupancy[self.agt1_pos[0] - 1][self.agt1_pos[1]] != 1:  # if can move
            self.agt1_pos[0] = self.agt1_pos[0] - 1
            self.occupancy[self.agt1_pos[0] + 1][self.agt1_pos[1]] = 0
            self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1
    elif action_list[0] == 1:  # move down
        if self.occupancy[self.agt1_pos[0] + 1][self.agt1_pos[1]] != 1:  # if can move
            self.agt1_pos[0] = self.agt1_pos[0] + 1
            self.occupancy[self.agt1_pos[0] - 1][self.agt1_pos[1]] = 0
            self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1
    elif action_list[0] == 2:  # move left
        if self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] - 1] != 1:  # if can move
            self.agt1_pos[1] = self.agt1_pos[1] - 1
            self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] + 1] = 0
            self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1
    elif action_list[0] == 3:  # move right
        if self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] + 1] != 1:  # if can move
            self.agt1_pos[1] = self.agt1_pos[1] + 1
            self.occupancy[self.agt1_pos[0]][self.agt1_pos[1] - 1] = 0
            self.occupancy[self.agt1_pos[0]][self.agt1_pos[1]] = 1

    if self.agt1_pos == self.goal1_pos:
        reward = reward + 5

    done = False
    if reward == 5:
        done = True
    return reward, done

到这里，agent运行环境已经介绍完成。

下面给出GAIL_OppositeV4.py代码：

from torch.distributions.categorical import Categorical
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
from env_OppositeV4 import EnvOppositeV4
import numpy as np
import csv
from collections import deque
import os



class Actor(nn.Module):
    def __init__(self, N_action):
        super(Actor, self).__init__()
        self.N_action = N_action
        self.fc1 = nn.Linear(2, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, self.N_action)

    def get_action(self, h):
        h = F.relu(self.fc1(h))
        h = F.relu(self.fc2(h))
        h = F.softmax(self.fc3(h), dim=1)
        m = Categorical(h.squeeze(0))
        a = m.sample()
        log_prob = m.log_prob(a)
        return a.item(), h, log_prob

class Discriminator(nn.Module):
    def __init__(self, s_dim, N_action):
        super(Discriminator, self).__init__()
        self.s_dim = s_dim
        self.N_action = N_action
        self.fc1 = nn.Linear(self.s_dim + self.N_action, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 1)

    def forward(self, state, action):
        state_action = torch.cat([state, action], 1)
        x = torch.relu(self.fc1(state_action))
        x = torch.relu(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))
        return x

class GAIL(object):
    def __init__(self, s_dim, N_action):
        self.s_dim = s_dim
        self.N_action = N_action
        self.actor1 = Actor(self.N_action)
        self.disc1 = Discriminator(self.s_dim, self.N_action)
        self.d1_optimizer = torch.optim.Adam(self.disc1.parameters(), lr=1e-3)
        self.a1_optimizer = torch.optim.Adam(self.actor1.parameters(), lr=1e-3)
        self.loss_fn = torch.nn.MSELoss()
        self.adv_loss_fn = torch.nn.BCELoss()
        self.gamma = 0.9

    def get_action(self, obs1):
        action1, pi_a1, log_prob1 = self.actor1.get_action(torch.from_numpy(obs1).float())
        return action1, pi_a1, log_prob1

    def int_to_tensor(self, action):
        temp = torch.zeros(1, self.N_action)
        temp[0, action] = 1
        return temp

    def train_D(self, s1_list, a1_list, e_s1_list, e_a1_list):
        p_s1 = torch.from_numpy(s1_list[0]).float()
        p_a1 = self.int_to_tensor(a1_list[0])
        for i in range(1, len(s1_list)):
            temp_p_s1 = torch.from_numpy(s1_list[i]).float()
            p_s1 = torch.cat([p_s1, temp_p_s1], dim=0)
            temp_p_a1 = self.int_to_tensor(a1_list[i])
            p_a1 = torch.cat([p_a1, temp_p_a1], dim=0)

        e_s1 = torch.from_numpy(e_s1_list[0]).float()
        e_a1 = self.int_to_tensor(e_a1_list[0])
        for i in range(1, len(e_s1_list)):
            temp_e_s1 = torch.from_numpy(e_s1_list[i]).float()
            e_s1 = torch.cat([e_s1, temp_e_s1], dim=0)
            temp_e_a1 = self.int_to_tensor(e_a1_list[i])
            e_a1 = torch.cat([e_a1, temp_e_a1], dim=0)

        p1_label = torch.zeros(len(s1_list), 1)
        e1_label = torch.ones(len(e_s1_list), 1)

        e1_pred = self.disc1(e_s1, e_a1)
        # print('e1_pred', e1_pred)
        loss = self.adv_loss_fn(e1_pred, e1_label)
        p1_pred = self.disc1(p_s1, p_a1)
        # print('p1_pred', p1_pred)
        loss = loss + self.adv_loss_fn(p1_pred, p1_label)
        self.d1_optimizer.zero_grad()
        loss.backward()
        self.d1_optimizer.step()

    def train_G(self, s1_list, a1_list, log_pi_a1_list, r1_list, e_s1_list, e_a1_list):
        T = len(s1_list)
        p_s1 = torch.from_numpy(s1_list[0]).float()
        p_a1 = self.int_to_tensor(a1_list[0])
        for i in range(1, len(s1_list)):
            temp_p_s1 = torch.from_numpy(s1_list[i]).float()
            p_s1 = torch.cat([p_s1, temp_p_s1], dim=0)
            temp_p_a1 = self.int_to_tensor(a1_list[i])
            p_a1 = torch.cat([p_a1, temp_p_a1], dim=0)

        e_s1 = torch.from_numpy(e_s1_list[0]).float()
        e_a1 = self.int_to_tensor(e_a1_list[0])
        for i in range(1, len(e_s1_list)):
            temp_e_s1 = torch.from_numpy(e_s1_list[i]).float()
            e_s1 = torch.cat([e_s1, temp_e_s1], dim=0)
            temp_e_a1 = self.int_to_tensor(e_a1_list[i])
            e_a1 = torch.cat([e_a1, temp_e_a1], dim=0)

        p1_pred = self.disc1(p_s1, p_a1)
        fake_reward = p1_pred.mean()

        a1_loss = torch.FloatTensor([0.0])
        for t in range(T):
            a1_loss = a1_loss + fake_reward * log_pi_a1_list[t]
        a1_loss = -a1_loss / T

        # print(a1_loss)
        self.a1_optimizer.zero_grad()
        a1_loss.backward()
        self.a1_optimizer.step()

class REINFORCE(object):
    def __init__(self, N_action):
        self.N_action = N_action
        self.actor1 = Actor(self.N_action)

    def get_action(self, obs):
        action1, pi_a1, log_prob1 = self.actor1.get_action(torch.from_numpy(obs).float())
        return action1, pi_a1, log_prob1

    def train(self, a1_list, pi_a1_list, r_list):
        a1_optimizer = torch.optim.Adam(self.actor1.parameters(), lr=1e-3)
        T = len(r_list)
        G_list = torch.zeros(1, T)
        G_list[0, T - 1] = torch.FloatTensor([r_list[T - 1]])
        for k in range(T - 2, -1, -1):
            G_list[0, k] = r_list[k] + 0.95 * G_list[0, k + 1]

        a1_loss = torch.FloatTensor([0.0])
        for t in range(T):
            a1_loss = a1_loss + G_list[0, t] * torch.log(pi_a1_list[t][0, a1_list[t]])
        a1_loss = -a1_loss / T
        a1_optimizer.zero_grad()
        a1_loss.backward()
        a1_optimizer.step()

    def save_model(self):
        torch.save(self.actor1, 'V4_actor.pkl')

    def load_model(self):
        self.actor1 = torch.load('V4_actor.pkl')

if __name__ == '__main__':
    torch.set_num_threads(1)
    env = EnvOppositeV4(9)
    max_epi_iter = 100
    max_MC_iter = 100

    # train expert policy by REINFORCE algorithm
    agent = REINFORCE(N_action=5)
    if os.path.exists('./V4_actor.pkl'):
        agent.load_model()
    else:
        print('无保存模型，将从头开始训练！')

    for epi_iter in range(max_epi_iter):
        env.reset()
        a1_list = []
        pi_a1_list = []
        r_list = []
        acc_r = 0
        for MC_iter in range(max_MC_iter):
            env.render()
            state = env.get_state()
            action1, pi_a1, log_prob1 = agent.get_action(state)
            a1_list.append(action1)
            pi_a1_list.append(pi_a1)
            reward, done = env.step([action1, 0])
            acc_r = acc_r + reward
            r_list.append(reward)
            if done:
                break
        print('Train expert, Episode', epi_iter, 'average reward', acc_r / MC_iter)
        if done:
            agent.train(a1_list, pi_a1_list, r_list)

    # record expert policy
    agent.save_model()
    exp_s_list = []
    exp_a_list = []
    env.reset()
    for MC_iter in range(max_MC_iter):
        env.render()
        state = env.get_state()
        action1, pi_a1, log_prob1 = agent.get_action(state)
        exp_s_list.append(state)
        exp_a_list.append(action1)
        reward, done = env.step([action1, 0])
        print('step', MC_iter, 'agent 1 at', exp_s_list[MC_iter], 'agent 1 action', exp_a_list[MC_iter], 'reward', reward, 'done', done)
        if done:
            break

    # generative adversarial imitation learning from [exp_s_list, exp_a_list]
    agent = GAIL(s_dim=2, N_action=5)
    for epi_iter in range(max_epi_iter):
        env.reset()
        s1_list = []
        a1_list = []
        r1_list = []
        log_pi_a1_list = []
        acc_r = 0
        for MC_iter in range(max_MC_iter):
            # env.render()
            state = env.get_state()
            action1, pi_a1, log_prob1 = agent.get_action(state)
            s1_list.append(state)
            a1_list.append(action1)
            log_pi_a1_list.append(log_prob1)
            reward, done = env.step([action1, 0])
            acc_r = acc_r + reward
            r1_list.append(reward)
            if done:
                break
        print('Imitate by GAIL, Episode', epi_iter, 'average reward', acc_r/MC_iter)
        # train Discriminator
        agent.train_D(s1_list, a1_list, exp_s_list, exp_a_list)

        # train Generator
        agent.train_G(s1_list, a1_list, log_pi_a1_list, r1_list, exp_s_list, exp_a_list)

    # learnt policy
    print('expert trajectory')
    for i in range(len(exp_a_list)):
        print('step', i, 'agent 1 at', exp_s_list[i], 'agent 1 action', exp_a_list[i])

    print('learnt trajectory')
    env.reset()
    for MC_iter in range(max_MC_iter):
        # env.render()
        state = env.get_state()
        action1, pi_a1, log_prob1 = agent.get_action(state)
        exp_s_list.append(state)
        exp_a_list.append(action1)
        reward, done = env.step([action1, 0])
        print('step', MC_iter, 'agent 1 at', exp_s_list[MC_iter], 'agent 1 action', exp_a_list[MC_iter])
        if done:
            break

运行结果为：

expert trajectory
step 0 agent 1 at [[0.44444444 0.11111111]] agent 1 action 1
step 1 agent 1 at [[0.55555556 0.11111111]] agent 1 action 4
step 2 agent 1 at [[0.55555556 0.11111111]] agent 1 action 3
step 3 agent 1 at [[0.55555556 0.22222222]] agent 1 action 1
step 4 agent 1 at [[0.66666667 0.22222222]] agent 1 action 0
step 5 agent 1 at [[0.55555556 0.22222222]] agent 1 action 0
step 6 agent 1 at [[0.44444444 0.22222222]] agent 1 action 3
step 7 agent 1 at [[0.44444444 0.33333333]] agent 1 action 4
step 8 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 9 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 10 agent 1 at [[0.33333333 0.33333333]] agent 1 action 4
step 11 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 12 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 13 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 14 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 15 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 16 agent 1 at [[0.44444444 0.33333333]] agent 1 action 1
step 17 agent 1 at [[0.55555556 0.33333333]] agent 1 action 2
step 18 agent 1 at [[0.55555556 0.22222222]] agent 1 action 3
step 19 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 20 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 21 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 22 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 23 agent 1 at [[0.55555556 0.33333333]] agent 1 action 0
step 24 agent 1 at [[0.44444444 0.33333333]] agent 1 action 4
step 25 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 26 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 27 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 28 agent 1 at [[0.33333333 0.33333333]] agent 1 action 3
step 29 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 30 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 31 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 32 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 33 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 34 agent 1 at [[0.22222222 0.33333333]] agent 1 action 2
step 35 agent 1 at [[0.22222222 0.22222222]] agent 1 action 3
step 36 agent 1 at [[0.22222222 0.33333333]] agent 1 action 3
step 37 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 38 agent 1 at [[0.33333333 0.33333333]] agent 1 action 3
step 39 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 40 agent 1 at [[0.22222222 0.33333333]] agent 1 action 3
step 41 agent 1 at [[0.22222222 0.33333333]] agent 1 action 3
step 42 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 43 agent 1 at [[0.33333333 0.33333333]] agent 1 action 3
step 44 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 45 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 46 agent 1 at [[0.44444444 0.33333333]] agent 1 action 1
step 47 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 48 agent 1 at [[0.55555556 0.33333333]] agent 1 action 1
step 49 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 50 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 51 agent 1 at [[0.66666667 0.33333333]] agent 1 action 0
step 52 agent 1 at [[0.55555556 0.33333333]] agent 1 action 0
step 53 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 54 agent 1 at [[0.44444444 0.33333333]] agent 1 action 1
step 55 agent 1 at [[0.55555556 0.33333333]] agent 1 action 1
step 56 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 57 agent 1 at [[0.66666667 0.33333333]] agent 1 action 4
step 58 agent 1 at [[0.66666667 0.33333333]] agent 1 action 1
step 59 agent 1 at [[0.77777778 0.33333333]] agent 1 action 1
step 60 agent 1 at [[0.77777778 0.33333333]] agent 1 action 4
step 61 agent 1 at [[0.77777778 0.33333333]] agent 1 action 0
step 62 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 63 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 64 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 65 agent 1 at [[0.66666667 0.33333333]] agent 1 action 1
step 66 agent 1 at [[0.77777778 0.33333333]] agent 1 action 0
step 67 agent 1 at [[0.66666667 0.33333333]] agent 1 action 1
step 68 agent 1 at [[0.77777778 0.33333333]] agent 1 action 3
step 69 agent 1 at [[0.77777778 0.44444444]] agent 1 action 3
step 70 agent 1 at [[0.77777778 0.55555556]] agent 1 action 0
step 71 agent 1 at [[0.66666667 0.55555556]] agent 1 action 0
step 72 agent 1 at [[0.55555556 0.55555556]] agent 1 action 0
step 73 agent 1 at [[0.44444444 0.55555556]] agent 1 action 0
step 74 agent 1 at [[0.33333333 0.55555556]] agent 1 action 1
step 75 agent 1 at [[0.44444444 0.55555556]] agent 1 action 4
step 76 agent 1 at [[0.44444444 0.55555556]] agent 1 action 0
step 77 agent 1 at [[0.33333333 0.55555556]] agent 1 action 1
step 78 agent 1 at [[0.44444444 0.55555556]] agent 1 action 3
step 79 agent 1 at [[0.44444444 0.66666667]] agent 1 action 0
step 80 agent 1 at [[0.33333333 0.66666667]] agent 1 action 3
step 81 agent 1 at [[0.33333333 0.77777778]] agent 1 action 1
learnt trajectory
step 0 agent 1 at [[0.44444444 0.11111111]] agent 1 action 1
step 1 agent 1 at [[0.55555556 0.11111111]] agent 1 action 4
step 2 agent 1 at [[0.55555556 0.11111111]] agent 1 action 3
step 3 agent 1 at [[0.55555556 0.22222222]] agent 1 action 1
step 4 agent 1 at [[0.66666667 0.22222222]] agent 1 action 0
step 5 agent 1 at [[0.55555556 0.22222222]] agent 1 action 0
step 6 agent 1 at [[0.44444444 0.22222222]] agent 1 action 3
step 7 agent 1 at [[0.44444444 0.33333333]] agent 1 action 4
step 8 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 9 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 10 agent 1 at [[0.33333333 0.33333333]] agent 1 action 4
step 11 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 12 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 13 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 14 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 15 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 16 agent 1 at [[0.44444444 0.33333333]] agent 1 action 1
step 17 agent 1 at [[0.55555556 0.33333333]] agent 1 action 2
step 18 agent 1 at [[0.55555556 0.22222222]] agent 1 action 3
step 19 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 20 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 21 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 22 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 23 agent 1 at [[0.55555556 0.33333333]] agent 1 action 0
step 24 agent 1 at [[0.44444444 0.33333333]] agent 1 action 4
step 25 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 26 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 27 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 28 agent 1 at [[0.33333333 0.33333333]] agent 1 action 3
step 29 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 30 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 31 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 32 agent 1 at [[0.44444444 0.33333333]] agent 1 action 0
step 33 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 34 agent 1 at [[0.22222222 0.33333333]] agent 1 action 2
step 35 agent 1 at [[0.22222222 0.22222222]] agent 1 action 3
step 36 agent 1 at [[0.22222222 0.33333333]] agent 1 action 3
step 37 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 38 agent 1 at [[0.33333333 0.33333333]] agent 1 action 3
step 39 agent 1 at [[0.33333333 0.33333333]] agent 1 action 0
step 40 agent 1 at [[0.22222222 0.33333333]] agent 1 action 3
step 41 agent 1 at [[0.22222222 0.33333333]] agent 1 action 3
step 42 agent 1 at [[0.22222222 0.33333333]] agent 1 action 1
step 43 agent 1 at [[0.33333333 0.33333333]] agent 1 action 3
step 44 agent 1 at [[0.33333333 0.33333333]] agent 1 action 1
step 45 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 46 agent 1 at [[0.44444444 0.33333333]] agent 1 action 1
step 47 agent 1 at [[0.55555556 0.33333333]] agent 1 action 3
step 48 agent 1 at [[0.55555556 0.33333333]] agent 1 action 1
step 49 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 50 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 51 agent 1 at [[0.66666667 0.33333333]] agent 1 action 0
step 52 agent 1 at [[0.55555556 0.33333333]] agent 1 action 0
step 53 agent 1 at [[0.44444444 0.33333333]] agent 1 action 3
step 54 agent 1 at [[0.44444444 0.33333333]] agent 1 action 1
step 55 agent 1 at [[0.55555556 0.33333333]] agent 1 action 1
step 56 agent 1 at [[0.66666667 0.33333333]] agent 1 action 3
step 57 agent 1 at [[0.66666667 0.33333333]] agent 1 action 4
step 58 agent 1 at [[0.66666667 0.33333333]] agent 1 action 1
step 59 agent 1 at [[0.77777778 0.33333333]] agent 1 action 1
step 60 agent 1 at [[0.77777778 0.33333333]] agent 1 action 4
可以看出learnt trajectory与expert trajectory轨迹一样。

好了，现在来介绍里面的细节部分：

对于我们这个自己构建的环境，我们没有专家轨迹怎么办呢？那就自己来制作专家轨迹。

这里，使用下面代码进行样本收集：

for epi_iter in range(max_epi_iter):
    env.reset()
    a1_list = []
    pi_a1_list = []
    r_list = []
    acc_r = 0
    for MC_iter in range(max_MC_iter):
        env.render()
        state = env.get_state()
        action1, pi_a1, log_prob1 = agent.get_action(state)
        a1_list.append(action1)
        pi_a1_list.append(pi_a1)
        reward, done = env.step([action1, 0])
        acc_r = acc_r + reward
        r_list.append(reward)

下面这段代码为只有agent到达绿色的目标点采用来训练网络更新参数。

if done:
    agent.train(a1_list, pi_a1_list, r_list)

你可能感兴趣的:(reinforcement,learning,Python)

【测试语言篇四】Python进阶篇之json模块 m0_37135615 编程语言 python php 开发语言
一、json模块介绍JSON（JavaScript对象表示法）是一种轻量级数据格式，用于数据交换。在Python中具有用于编码和解码JSON数据的内置json模块。只需导入它，就可以使用JSON数据了：importjsonJSON的一些优点：JSON作为“字节序列”存在，在我们需要通过网络传输（流）数据的情况下非常有用。与XML相比，JSON小得多，可转化为更快的数据传输和更好的体验。JSON非常
DeepSeek API 客户端使用文档老大白菜 python 人工智能数据库
1.简介deep.py是一个用于与DeepSeekAPI交互的Python客户端封装。它提供了简单易用的接口，支持对话历史管理、日志记录等功能，使得与DeepSeekAPI的交互更加便捷和可靠。2.功能特点简单的接口设计自动管理对话历史完整的日志记录灵活的配置选项异常处理机制3.安装依赖pipinstallopenai4.配置环境在项目根目录创建.env文件：#WindowssetDEEPSEEK
信息检索系统评估指标的层级分析：从单点精确度到整体性能度量人工智能深度学习llm检索系统
在构建搜索引擎系统时，有效的评估机制是保证系统质量的关键环节。当用户输入查询词如"machinelearningtutorialspython"，系统返回结果列表后，如何客观评估这些结果的相关性和有效性？这正是信息检索评估指标的核心价值所在。分析用户与搜索引擎的交互模式，我们可以观察到以下行为特征：用户主要关注结果列表的前几项对顶部结果的关注度显著高于底部结果用户基于多次搜索体验形成对搜索系统整体
python系列【仅供参考】：python tornado 集成redis消息订阅的异步任务之后tornado主程序无法启动，解决方案坦笑&&life #python python tornado redis
pythontornado集成redis消息订阅的异步任务之后tornado主程序无法启动，解决方案pythontornado集成redis消息订阅的异步任务之后tornado主程序无法启动，解决方案封装redis异步类pythontornado集成redis消息订阅的异步任务之后tornado主程序无法启动，解决方案封装redis异步类sys_redis_helper.pyimportredis
Python通过SSH隧道访问数据库 Java菜鸟在北京 python sshtunnel paramiko SSH隧道访问数据库
本文介绍通过sshtunnel类库建立SSH隧道，使用paramiko通过SSH来访问数据库。实现了两种建立SSH方式：公私钥验证、密码验证。公私钥可读本地，也可读取AwsS3上的私钥文件。本质上就是在本机建立SSH隧道，然后将访问DB转发到本地SSH内去访问数据库。简单易懂，上代码：fromsshtunnelimportSSHTunnelForwarderfromsqlalchemyimport
用Python写一个天气预报小程序穿梭的编织者 Python脚本 python 小程序
一、界面效果二、完整代码importtkinterastkfromtkinterimportttkimportrequestsimportjsonfromdatetimeimportdatetimefromPILimportImage,ImageTkimportiofromttkbootstrapimportStyleclassWeatherApp:def__init__(self,root):s
Python写一个脚本——30行代码——1秒实现PDF任意页码拆分穿梭的编织者 Python精选 pdf python
一、引入库importosfromPyPDF2importPdfReader,PdfWriter二、定义拆分方法defsplit_pdf(input_path,output_dir,ranges):ifnotos.path.exists(output_dir):os.makedirs(output_dir)withopen(input_path,'rb')asfile:pdf=PdfReader(
python手写kmeans算法菜鸟懿机器学习聚类算法 python
kmean聚类是最基础和常见的算法，工程上使用比较常见，spark,sklearn都有实现，本文手写实现kmeans#!/usr/bin/pythonimportsysimportrandomimportmathdefcreate_rand_points(max_x,max_y,count):"""Createcountpoints(0-x),(0-y)."""points=[]foriinran
Python 科学计算与机器学习入门：NumPy + Scikit-Learn 实战指南吴师兄大模型 python numpy scikit-learn 人工智能开发语言机器学习编程
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
python start函数_Python中10个常用的内置函数半残大叔霁天 python start函数
大家好，我是小张在3.8版本中，Python解释器有近69个内置函数可供使用，有了它们能极大地提高编码效率，数量虽然不少，但在日常搬砖中只用到其中一部分，根据使用频率和用法，这里列出来几个本人认为不错的内置函数，结合一些例子介绍给大家complex()返回一个形如a+bj的复数，传入参数分为三种情况：参数为空时，返回0j参数为字符串时，将字符串表达式解释为复数形式并返回参数为两个整数(a,b)时，
一个完整的python webSockets游戏服务器，每100ms接收并广播玩家位置小宝哥Code Python基础及AI开发 python 游戏服务器
PythonWebSockets游戏服务器下面是一个完整的PythonWebSockets游戏服务器实现，它每100ms接收并广播玩家位置信息。这个服务器使用websockets和asyncio库来处理WebSocket连接和异步操作。完整代码#!/usr/bin/envpython3"""实时游戏位置广播服务器每100ms接收玩家位置并广播给所有连接的客户端"""importasyncioimp
32路模拟采集PCI总线带DIO用什么采集卡阿尔泰1999 数据分析嵌入式硬件科技
北京阿尔泰科技PCI5659是一-款多功能数据采集卡，具有32路12位100K采集频率，AD带16K字FIFO缓存，保证数据的连续性，并带16路可设方向的DIO功能。产品支持阿尔泰科技最新的ART-DAQ数据管理软件，提供QT、PYTHON、LABVIEW、VC、VB、VB.NET、C#等例子程序。模拟量输入通道数32路精度12位*大采样频率100KsPs多通道采样速度各通道*大采样频率/设置的采
Python 爬虫实战：艺术品市场趋势分析与交易平台数据抓取西攻城狮北 python 爬虫开发语言
一、引言在当今数字化时代，艺术品市场正经历着前所未有的变革。随着互联网技术的飞速发展，越来越多的艺术品交易转移到了线上平台，这为我们提供了海量的数据资源。通过Python爬虫技术，我们可以抓取艺术品交易平台上的数据，进而分析艺术品市场的趋势，为投资者、收藏家以及艺术爱好者提供有价值的参考。本文将带领读者深入探索Python爬虫在艺术品市场的应用。从爬虫的基本原理到实际代码实现，再到数据的清洗、分析
【2025年饿了么春招-3月14日-第二题（200分）- 小红的排列构造】（题目+思路+Java&C++&Python解析+在线测试) 塔子哥学算法 java c++python 算法数据结构饿了么
题目内容小红希望你构造一个长度为nnn的排列，满足∑i=1n∗i\sum_{i
贪心算法在背包问题上的运用（Python） MATLAB卡尔曼智能算法的MATLAB实现贪心算法 python 算法
背包问题有n个物品，它们有各自的体积和价值，现有给定容量的背包，如何让背包里装入的物品具有最大的价值总和？这就是典型的背包问题(又称为0-1背包问题)，也是具体的、没有经过任何延伸的背包问题模型。背包问题的传统求解方法较为复杂，现定义有一个可以载重为8kg的背包，另外还有4个物品，物品的价值和质量数据如下表，不考虑背包的容量。4个物品的总质量大于8kg，所以要想在有限载重的背包携带更多质量的物品，
接口测试中加密参数如何处理？海姐软件测试接口测试 python 开发语言测试工具职场和发展
1.加密类型及应对策略①对称加密（AES/DES）特点：加密解密使用同一密钥。处理方法：向开发获取密钥和加密算法（如AES-CBC、AES-ECB）。使用代码或工具解密响应数据：python复制fromCrypto.CipherimportAESimportbase64defdecrypt_aes(key,encrypted_data):cipher=AES.new(key.encode(),AE
用Python玩转Hyperledger：构建企业级区块链解决方案 Echo_Wish Python！实战！perl python opencv 人工智能
用Python玩转Hyperledger：构建企业级区块链解决方案大家好，我是Echo_Wish。在区块链技术的炙手可热中，“企业级区块链”俨然成为了下一个重磅关键词。相比于公有区块链，企业级区块链更注重隐私性、灵活性和高效性。而在这片“蓝海”中，Hyperledger项目无疑是企业级区块链解决方案的标杆。如果再搭配上Python这种“高效工具”，简直让人事半功倍！那么，如何将Python与Hyp
Android自动化测试工具海棠如醉 web技术自动化运维
细解自动化测试工具Airtest-CSDN博客以下是几种常见的Android应用自动化测试工具：Appium：支持多种编程语言，如Java、Python、Ruby、JavaScript等。可以用于Web应用程序和原生应用程序的自动化测试，并支持iOS和Android平台。Espresso：由Google开发的AndroidUI测试框架，可用于测试应用程序的用户界面和与用户的交互。Espresso支
Python说明一一代码 python
Python的主要特点：1.**易读易写**：Python的语法简洁明了，代码可读性高。2.**跨平台**：Python可以在多种操作系统上运行，如Windows、macOS、Linux等。3.**丰富的库**：Python拥有庞大的标准库和第三方库，涵盖了从Web开发到数据科学的多个领域。4.**动态类型**：Python是动态类型语言，变量不需要显式声明类型。5.**解释型语言**：Pytho
使用 Excel 实现绩效看板的自动化 chenchihwen 自动化运维
引言在日常工作中，团队的绩效监控和管理是确保项目顺利进行的重要环节。然而，面临着以下问题：数据分散：系统中的数据难以汇总，缺乏一个宏观的团队执行情况视图。看板缺失：系统本身可能无法提供合适的Dashboard，导致数据分析困难。手动操作繁琐：数据采集、汇总和分析过程繁琐且耗时。本文将介绍如何利用免费的软件和工具（如Python、MySQL、Excel等）实现绩效看板的自动化。通过邮件自动推送和接收
Python真经：代码修仙录 zzzzjflzdvkk python 开发语言青少年编程 python真经
第一章：Python真经的起源在八十年代末，九十年代初，荷兰国境之内，有一位名为GuidovanRossum的修士，于国家数学与计算机科学研究所中，悟出了一门无上真经——Python。此真经融合了诸多上古大能的智慧结晶，如ABC、Modula-3、C、C++、Algol-68、SmallTalk、Unixshell等，终成一体，化为Python真经。Python真经自诞生之日起，便遵循GPL（GN
python提示unmatched_Python自动化学习--异常提示 weixin_39933356
举例：打开一个不存在的文件时：>>open("abc.txt","r")会提示错误Traceback(mostrecentcalllast):File"D:/project1/test.py",line11,inopen("abc.txt","r")FileNotFoundError:[Errno2]Nosuchfileordirectory:'abc.txt'这时可以用try....except
pyenv 管理多个 Python 版本(1) 数据科学工厂 python
引言你是否曾希望参与一个支持多个Python版本的项目，但又不知道如何轻松地测试所有这些版本？你是否对Python的最新版本感到好奇？或许你想尝试这些新功能，但又不想冒险破坏你的开发环境。幸运的是，如果你使用pyenv，管理多个Python版本并不复杂。本文[1]将向你展示如何高效地在项目上工作，同时减少因尝试使用正确版本的Python而产生的困扰。通过本文，你将学会：安装多个Python版本安装
编程助手学Python--Deepseek对提示词模板PromptTemplate / ChatPromptTemplate / ChatMessagePromptTemplate 的理解 sunyaox 编程助手学Python python 服务器开发语言
编程助手学Python--Deepseek对提示词模板PromptTemplate/ChatPromptTemplate/ChatMessagePromptTemplate的理解1.PromptTemplate主要功能：示例：2.ChatPromptTemplate主要功能：示例：3.ChatMessagePromptTemplate主要功能：示例：总结在构建基于语言模型的应用程序时，Prompt
[python多版本管理] pyenv-win 详细安装和使用 java我跟你拼了其他 python 开发语言多版本管理
文章目录第一种安装方式介绍pyenv快速启动pyenv-win命令验证安装手动检查设置使用如何更新pyenvAnnouncements第二种安装方式安装pyenv-win配置环境变量安装Python版本切换Python版本查看已安装版本创建虚拟环境（可选）Python常用的版本Python3.x系列关于Python2.x系列总结第一种安装方式介绍python的[pyenv][1]是一个很好的工具，
pipenv install -r requirements.txt 总是幸福的老豌豆日常工作总结 python
前言最近部署python项目时本地通过虚拟环境进行运行项目报错：错误如下PSC:\Users\Administrator\Desktop\desktop_health_sdk\desktop_health_sdk>pipenvinstall-rrequirements.txtWarning:Python2.7wasnotfoundonyoursystem..
Python学习-----项目设计1.0（设计思维和ATM环境搭建） Fitz& Python学习学习 python
目录前言：项目开发流程MVC设计模式什么是MVC设计模式？ATM项目要求ATM项目的环境搭建前言：我个人学习Python大概也有一个月了，在这一个月中我发布了许多关于Python的文章，建立了一个Python学习起步的专栏（https://blog.csdn.net/m0_73633088/category_12186491.html），在这里我非常感谢各位的一路陪伴，你们的支持是我创作的不竭动力
笔记-python之celery使用详解大白砌墙笔记 python 开发语言
Celery是一个用于处理异步任务的Python库，它允许你将任务分发到多个worker进行处理。以下是Celery的使用详解：安装Celery使用pip安装Celery：pipinstallcelery创建Celery实例首先，需要创建一个Celery实例，指定broker（消息中间件）和backend（结果存储）。fromceleryimportCeleryapp=Celery('tasks'
【python】虚拟环境工具pyenv 南隅。 python python 开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录0x0安装配置pyenv和virturalenv插件0x00pyenv0x01pyenv-virtualenv插件0x02pyenv下载安装包速度0x1使用pyenv0x2卸载pyenv0x3pyenv配置问题0x30问题描述0x31debug0x32problem0x33复现0x4补充0x40windows的python虚拟
使用Python编写Web应用程序的框架 - Celery YOUFDJ python 前端开发语言 Python
使用Python编写Web应用程序的框架-CeleryCelery是一个功能强大的Python库，用于编写具有异步任务处理和分布式消息传递功能的Web应用程序。它是一个开源项目，广泛应用于许多大型的网络应用和分布式系统中。本文将介绍Celery框架的基本概念和使用方法，并提供相应的源代码示例来帮助您更好地理解和使用Celery。Celery的安装要开始使用Celery，您需要首先安装它。您可以使用
github中多个平台共存 jackyrong github
在个人电脑上，如何分别链接比如oschina,github等库呢，一般教程之列的，默认 ssh链接一个托管的而已，下面讲解如何放两个文件 1）设置用户名和邮件地址 $ git config --global user.name "xx" $ git config --global user.email "[email protected]"
ip地址与整数的相互转换(javascript) alxw4616 JavaScript
//IP转成整型 function ip2int(ip){ var num = 0; ip = ip.split("."); num = Number(ip[0]) * 256 * 256 * 256 + Number(ip[1]) * 256 * 256 + Number(ip[2]) * 256 + Number(ip[3]); n
读书笔记-jquey+数据库+css chengxuyuancsdn html jquery oracle
1、grouping ,group by rollup, GROUP BY GROUPING SETS区别 2、$("#totalTable tbody>tr td:nth-child(" + i + ")").css({"width":tdWidth, "margin":"0px", &q
javaSE javaEE javaME == API下载 Array_06 java
oracle下载各种API文档： http://www.oracle.com/technetwork/java/embedded/javame/embed-me/documentation/javame-embedded-apis-2181154.html JavaSE文档： http://docs.oracle.com/javase/8/docs/api/ JavaEE文档： ht
shiro入门学习 cugfy java Web 框架
声明本文只适合初学者，本人也是刚接触而已，经过一段时间的研究小有收获，特来分享下希望和大家互相交流学习。首先配置我们的web.xml代码如下，固定格式，记死就成 <filter> <filter-name>shiroFilter</filter-name> &nbs
Array添加删除方法 357029540 js
刚才做项目前台删除数组的固定下标值时，删除得不是很完整，所以在网上查了下，发现一个不错的方法，也提供给需要的同学。 //给数组添加删除 Array.prototype.del = function(n){
navigation bar 更改颜色张亚雄 IO
今天郁闷了一下午，就因为objective-c默认语言是英文，我写的中文全是一些乱七八糟的样子，到不是乱码，但是，前两个自字是粗体，后两个字正常体，这可郁闷死我了，问了问大牛，人家告诉我说更改一下字体就好啦，比如改成黑体，哇塞，茅塞顿开。翻书看，发现，书上有介绍怎么更改表格中文字字体的，代码如下
unicode转换成中文 adminjun unicode 编码转换
在Java程序中总会出现\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5这个的字符，这是unicode编码，使用时有时候不会自动转换成中文就需要自己转换了使用下面的方法转换一下即可。 /** * unicode 转换成中文
一站式 Java Web 框架 firefly aijuans Java Web
Firefly是一个高性能一站式Web框架。涵盖了web开发的主要技术栈。包含Template engine、IOC、MVC framework、HTTP Server、Common tools、Log、Json parser等模块。 firefly-2.0_07修复了模版压缩对javascript单行注释的影响，并新增了自定义错误页面功能。更新日志：增加自定义系统错误页面功能
设计模式——单例模式 ayaoxinchao 设计模式
定义 Java中单例模式定义：“一个类有且仅有一个实例，并且自行实例化向整个系统提供。” 分析从定义中可以看出单例的要点有三个：一是某个类只能有一个实例；二是必须自行创建这个实例；三是必须自行向系统提供这个实例。 &nb
Javascript 多浏览器兼容性问题及解决方案 BigBird2012 JavaScript
不论是网站应用还是学习js,大家很注重ie与firefox等浏览器的兼容性问题，毕竟这两中浏览器是占了绝大多数。一、document.formName.item(”itemName”) 问题问题说明：IE下，可以使用 document.formName.item(”itemName”) 或 document.formName.elements ["elementName&quo
JUnit-4.11使用报java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing错误 bijian1013 junit4.11 单元测试
下载了最新的JUnit版本，是4.11，结果尝试使用发现总是报java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing这样的错误，上网查了一下，一般的解决方案是，换一个低一点的版本就好了。还有人说，是缺少hamcrest的包。去官网看了一下，如下发现：
[Zookeeper学习笔记之二]Zookeeper部署脚本 bit1129 zookeeper
Zookeeper伪分布式安装脚本(此脚本在一台机器上创建Zookeeper三个进程，即创建具有三个节点的Zookeeper集群。这个脚本和zookeeper的tar包放在同一个目录下，脚本中指定的名字是zookeeper的3.4.6版本，需要根据实际情况修改)： #!/bin/bash #!!!Change the name!!! #The zookeepe
【Spark八十】Spark RDD API二 bit1129 spark
coGroup package spark.examples.rddapi import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.SparkContext._ object CoGroupTest_05 { def main(args: Array[String]) { v
Linux中编译apache服务器modules文件夹缺少模块(.so)的问题 ronin47 modules
在modules目录中只有httpd.exp，那些so文件呢？我尝试在fedora core 3中安装apache 2. 当我解压了apache 2.0.54后使用configure工具并且加入了 --enable-so 或者 --enable-modules=so (两个我都试过了) 去make并且make install了。我希望在/apache2/modules/目录里有各种模块，
Java基础-克隆 BrokenDreams java基础
Java中怎么拷贝一个对象呢？可以通过调用这个对象类型的构造器构造一个新对象，然后将要拷贝对象的属性设置到新对象里面。Java中也有另一种不通过构造器来拷贝对象的方式，这种方式称为克隆。 Java提供了java.lang.
读《研磨设计模式》-代码笔记-适配器模式-Adapter bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * 适配器模式解决的主要问题是，现有的方法接口与客户要求的方法接口不一致 * 可以这样想，我们要写这样一个类（Adapter）: * 1.这个类要符合客户的要求 ---> 那显然要
HDR图像PS教程集锦&心得 cherishLC PS
HDR是指高动态范围的图像，主要原理为提高图像的局部对比度。软件有photomatix和nik hdr efex。一、教程叶明在知乎上的回答： http://www.zhihu.com/question/27418267/answer/37317792 大意是修完后直方图最好是等值直方图，方法是HDR软件调一遍，再结合不透明度和蒙版细调。二、心得 1、去除阴影部分的
maven-3.3.3 mvn archetype 列表 crabdave ArcheType
maven-3.3.3 mvn archetype 列表可以参考最新的：http://repo1.maven.org/maven2/archetype-catalog.xml [INFO] Scanning for projects... [INFO]
linux shell 中文件编码查看及转换方法 daizj shell 中文乱码 vim 文件编码
一、查看文件编码。在打开文件的时候输入:set fileencoding 即可显示文件编码格式。二、文件编码转换 1、在Vim中直接进行转换文件编码,比如将一个文件转换成utf-8格式 &
MySQL--binlog日志恢复数据 dcj3sjt126com binlog
恢复数据的重要命令如下 mysql> flush logs; 默认的日志是mysql-bin.000001，现在刷新了重新开启一个就多了一个mysql-bin.000002
数据库中数据表数据迁移方法 dcj3sjt126com sql
刚开始想想好像挺麻烦的，后来找到一种方法了，就SQL中的 INSERT 语句，不过内容是现从另外的表中查出来的，其实就是 MySQL中INSERT INTO SELECT的使用下面看看如何使用语法：MySQL中INSERT INTO SELECT的使用 1. 语法介绍有三张表a、b、c，现在需要从表b
Java反转字符串 dyy_gusi java 反转字符串
前几天看见一篇文章，说使用Java能用几种方式反转一个字符串。首先要明白什么叫反转字符串，就是将一个字符串到过来啦，比如"倒过来念的是小狗"反转过来就是”狗小是的念来过倒“。接下来就把自己能想到的所有方式记录下来了。 1、第一个念头就是直接使用String类的反转方法，对不起，这样是不行的，因为Stri
UI设计中我们为什么需要设计动效 gcq511120594 UI linux
随着国际大品牌苹果和谷歌的引领，最近越来越多的国内公司开始关注动效设计了，越来越多的团队已经意识到动效在产品用户体验中的重要性了，更多的UI设计师们也开始投身动效设计领域。但是说到底，我们到底为什么需要动效设计？或者说我们到底需要什么样的动效？做动效设计也有段时间了，于是尝试用一些案例，从产品本身出发来说说我所思考的动效设计。一、加强体验舒适度嗯，就是让用户更加爽更加爽的用
JBOSS服务部署端口冲突问题 HogwartsRow java 应用服务器 jboss server EJB3
服务端口冲突问题的解决方法，一般修改如下三个文件中的部分端口就可以了。 1、jboss5/server/default/conf/bindingservice.beans/META-INF/bindings-jboss-beans.xml 2、./server/default/deploy/jbossweb.sar/server.xml 3、.
第三章 Redis/SSDB+Twemproxy安装与使用 jinnianshilongnian ssdb reids twemproxy
目前对于互联网公司不使用Redis的很少，Redis不仅仅可以作为key-value缓存，而且提供了丰富的数据结果如set、list、map等，可以实现很多复杂的功能；但是Redis本身主要用作内存缓存，不适合做持久化存储，因此目前有如SSDB、ARDB等，还有如京东的JIMDB，它们都支持Redis协议，可以支持Redis客户端直接访问；而这些持久化存储大多数使用了如LevelDB、RocksD
ZooKeeper原理及使用 liyonghui160com
ZooKeeper是Hadoop Ecosystem中非常重要的组件，它的主要功能是为分布式系统提供一致性协调(Coordination)服务，与之对应的Google的类似服务叫Chubby。今天这篇文章分为三个部分来介绍ZooKeeper，第一部分介绍ZooKeeper的基本原理，第二部分介绍ZooKeeper
程序员解决问题的60个策略 pda158 框架工作单元测试
根本的指导方针 1. 首先写代码的时候最好不要有缺陷。最好的修复方法就是让 bug 胎死腹中。良好的单元测试强制数据库约束使用输入验证框架避免未实现的“else”条件在应用到主程序之前知道如何在孤立的情况下使用日志 2. print 语句。往往额外输出个一两行将有助于隔离问题。 3. 切换至详细的日志记录。详细的日
Create the Google Play Account sillycat Google
Create the Google Play Account Having a Google account, pay 25$, then you get your google developer account. References: http://developer.android.com/distribute/googleplay/start.html https://p
JSP三大指令 vikingwei jsp
JSP三大指令一个jsp页面中，可以有0~N个指令的定义！ 1. page --> 最复杂：<%@page language="java" info="xxx"...%> * pageEncoding和contentType： > pageEncoding：它