torch.utils.tensorboard
即可找到SummaryWriter
类实例数据要追踪的数据。每次运行时,该类对象首先会在给定目录log_dir中创建 “事件文件”(本次运行的数据仓库),然后在训练过程中我们可以利用其提供的一系列高级 API 向事件文件中异步地添加数据,从而实时地追踪数据变化。该类的定义如下torch.utils.tensorboard.writer.SummaryWriter(log_dir=None, comment='', purge_step=None, max_queue=10, flush_secs=120, filename_suffix='')
log_dir
(str) – 事件文件保存的目录位置。默认为runs / CURRENT_DATETIME_HOSTNAME,每次运行后都会更改。推挤使用分层文件夹结构,例如对于每个新实验,可以传递“runs / exp1”,“runs / exp2”等来进行比较。comment
(str) –在默认的log_dir后缀中的注释。如果已设置log_dir,则此参数无效。purge_step
(int) – 若在运行到第 T + X T + X T+X 个 step 的时候由于各种原因(内存溢出)发生崩溃,那么当服务重启之后,就回退 X X X 个 step,从 T T T 时刻重新开始将数据写入文件。purge_step 参数就是设置的 X X X,这一段数据将被重新写入。注意,崩溃和恢复的实验应该具有相同的log_dirmax_queue
(int) – 记录事件和摘要时在内存中开的队列的长度,当队列慢了之后就会把数据写入磁盘(文件)中。flush_secs
(int) – 将待处理事件和摘要刷新到磁盘的频率,以秒为单位,默认为每两分钟一次。filename_suffix
(str) – 添加到log_dir目录中所有事件文件名的后缀。有关文件名构造的更多详细信息,请参阅tensorboard.summary.writer.event_file_writer.EventFileWriter。
tensorboard --logdir 数据文件夹
命令运行网页服务,其中 “数据文件夹” 应设置为之前实验时设置的 log_dir。若看到了如下输出 TensorBoard 2.8.0 at http://localhost:6006/ (Press CTRL+C to quit)
则说明启动成功,在浏览器打开相应 url 就能进入TensorBoard界面看到数据显示了。注意默认端口是 6006,如果想进一步指定网页服务端口,可以用 tensorboard --logdir=数据文件夹 --port=端口
命令我们使用 SummaryWriter
类提供的一系列 API 记录数据,比较常用的包括
pillow
包pillow
包matplotlib
包moviepy
包tag
来创建特殊图表。注意对于每个 SummaryWriter
对象该函数只能调用一次,因为它只向tensorboard提供元数据,所以可以在训练循环之前或之后调用该函数综合测试代码参考这里,运行效果可以参考这里:
# 引入SummaryWriter
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
import torchvision
from PIL import Image
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
##### 1、add_scalar实例
def add_scalar_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_scalar")
# 绘制 y = 2x 实例
x = range(100)
for i in x:
writer.add_scalar('add_scalar实例:y=2x', i * 2, i)
# 关闭
writer.close()
add_scalar_demo()
##### 2、add_scalars 实例
def add_scalars_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_scalars")
r = 5
for i in range(100):
writer.add_scalars('add_scalars实例', {'xsinx':i*np.sin(i/r),
'xcosx':i*np.cos(i/r),
'tanx': np.tan(i/r)}, i)
# 关闭
writer.close()
add_scalars_demo()
##### 3、add_text 实例
def add_text_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_text")
writer.add_text('lstm', 'This is an lstm', 0)
writer.add_text('rnn', 'This is an rnn', 10)
# 关闭
writer.close()
add_text_demo()
##### 4、add_graph 实例
def add_graph_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_graph")
img = torch.rand([1, 3, 64, 64], dtype=torch.float32)
model = torchvision.models.AlexNet(num_classes=10)
# print(model)
writer.add_graph(model, input_to_model=img)
# 关闭
writer.close()
add_graph_demo()
##### 5、add_image 实例
def add_image_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_image")
img1 = np.random.randn(1, 100, 100)
writer.add_image('add_image 实例:/imag1', img1)
img2 = np.random.randn(100, 100, 3)
writer.add_image('add_image 实例:/imag2', img2, dataformats='HWC')
img = Image.open('../imgs/1.png')
img_array = np.array(img)
writer.add_image('add_image 实例:/cartoon', img_array, dataformats='HWC')
# 关闭
writer.close()
add_image_demo()
##### 6、add_images 实例
def add_images_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_images")
imgs1 = np.random.randn(8, 100, 100, 1)
writer.add_images('add_images 实例/imgs1', imgs1, dataformats='NHWC')
imgs2 = np.zeros((16, 3, 100, 100))
for i in range(16):
imgs2[i, 0] = np.arange(0, 10000).reshape(100, 100) / 10000 / 16 * i
imgs2[i, 1] = (1 - np.arange(0, 10000).reshape(100, 100) / 10000) / 16 * i
writer.add_images('add_images 实例/imgs2', imgs2) # Default is :math:`(N, 3, H, W)`
img = Image.open('../imgs/1.jpg')
img3 = np.array(img)
imgs4= np.zeros((5, img3.shape[0], img3.shape[1], img3.shape[2]))
for i in range(5):
imgs4[i] = img3//(i+1)
writer.add_images('add_images 实例/imgs4', imgs4, dataformats='NHWC') # Default is :math:`(N, 3, H, W)`
# 关闭
writer.close()
add_images_demo()
##### 7、add_figure 实例
def add_figure_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_figure")
# First create some toy data:
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)
# Create just a figure and only one subplot
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('Simple plot')
writer.add_figure("add_figure 实例:figure", fig)
# 关闭
writer.close()
add_figure_demo()
##### 8、add_pr_curve 实例
def add_pr_curve_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_pr_curve")
labels = np.random.randint(2, size=100) # binary label
predictions = np.random.rand(100)
writer.add_pr_curve('add_pr_curve 实例:pr_curve', labels, predictions, 0)
# 关闭
writer.close()
add_pr_curve_demo()
##### 9、add_embedding 实例
def add_embedding_demo():
# 将信息写入logs文件夹,可以供TensorBoard消费,来可视化
writer = SummaryWriter("logs/add_embedding")
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
import keyword
import torch
meta = []
while len(meta) < 100:
meta = meta + keyword.kwlist # get some strings
meta = meta[:100]
for i, v in enumerate(meta):
meta[i] = v + str(i)
label_img = torch.rand(100, 3, 10, 32)
for i in range(100):
label_img[i] *= i / 100.0
writer.add_embedding(torch.randn(100, 5), metadata=meta, label_img=label_img)
# 关闭
writer.close()
add_embedding_demo()
import gym
import torch
import random
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from gym.utils.env_checker import check_env
from gym.wrappers import TimeLimit
from datetime import datetime
from torch.utils.tensorboard import SummaryWriter
class PolicyNet(torch.nn.Module):
''' 策略网络是一个两层 MLP '''
def __init__(self, input_dim, hidden_dim, output_dim):
super(PolicyNet, self).__init__()
self.fc1 = torch.nn.Linear(input_dim, hidden_dim)
self.fc2 = torch.nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = F.relu(self.fc1(x)) # (1, hidden_dim)
x = F.softmax(self.fc2(x), dim=1) # (1, output_dim)
return x
class VNet(torch.nn.Module):
''' 价值网络是一个两层 MLP '''
def __init__(self, input_dim, hidden_dim):
super(VNet, self).__init__()
self.fc1 = torch.nn.Linear(input_dim, hidden_dim)
self.fc2 = torch.nn.Linear(hidden_dim, 1)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
class PPO(torch.nn.Module):
def __init__(self, state_dim, hidden_dim, action_range, actor_lr, critic_lr, lmbda, epochs, eps, gamma, device):
super().__init__()
self.actor = PolicyNet(state_dim, hidden_dim, action_range).to(device)
self.critic = VNet(state_dim, hidden_dim).to(device)
self.actor_optimizer = torch.optim.Adam(self.actor.parameters(), lr=actor_lr)
self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=critic_lr)
self.device = device
self.gamma = gamma
self.lmbda = lmbda # GAE 参数
self.epochs = epochs # 一条轨迹数据用来训练的轮数
self.eps = eps # PPO 中截断范围的参数
self.device = device
def take_action(self, state):
state = torch.tensor(state, dtype=torch.float).to(self.device)
state = state.unsqueeze(0)
probs = self.actor(state)
action_dist = torch.distributions.Categorical(probs)
action = action_dist.sample()
return action.item()
def compute_advantage(self, gamma, lmbda, td_delta):
''' 广义优势估计 GAE '''
td_delta = td_delta.detach().numpy()
advantage_list = []
advantage = 0.0
for delta in td_delta[::-1]:
advantage = gamma * lmbda * advantage + delta
advantage_list.append(advantage)
advantage_list.reverse()
return torch.tensor(np.array(advantage_list), dtype=torch.float)
def update(self, transition_dict):
states = torch.tensor(np.array(transition_dict['states']), dtype=torch.float).to(self.device)
actions = torch.tensor(transition_dict['actions']).view(-1, 1).to(self.device)
rewards = torch.tensor(transition_dict['rewards'], dtype=torch.float).view(-1, 1).to(self.device)
next_states = torch.tensor(np.array(transition_dict['next_states']), dtype=torch.float).to(self.device)
dones = torch.tensor(transition_dict['dones'], dtype=torch.float).view(-1, 1).to(self.device)
td_target = rewards + self.gamma * self.critic(next_states) * (1-dones)
td_delta = td_target - self.critic(states)
advantage = self.compute_advantage(self.gamma, self.lmbda, td_delta.cpu()).to(self.device)
old_log_probs = torch.log(self.actor(states).gather(1, actions)).detach()
# 用刚采集的一条轨迹数据训练 epochs 轮
for _ in range(self.epochs):
log_probs = torch.log(self.actor(states).gather(1, actions))
ratio = torch.exp(log_probs - old_log_probs)
surr1 = ratio * advantage
surr2 = torch.clamp(ratio, 1 - self.eps, 1 + self.eps) * advantage # 截断
actor_loss = torch.mean(-torch.min(surr1, surr2)) # PPO损失函数
critic_loss = torch.mean(F.mse_loss(self.critic(states), td_target.detach()))
# 更新网络参数
self.actor_optimizer.zero_grad()
self.critic_optimizer.zero_grad()
actor_loss.backward()
critic_loss.backward()
self.actor_optimizer.step()
self.critic_optimizer.step()
if __name__ == "__main__":
def moving_average(a, window_size):
''' 生成序列 a 的滑动平均序列 '''
cumulative_sum = np.cumsum(np.insert(a, 0, 0))
middle = (cumulative_sum[window_size:] - cumulative_sum[:-window_size]) / window_size
r = np.arange(1, window_size-1, 2)
begin = np.cumsum(a[:window_size-1])[::2] / r
end = (np.cumsum(a[:-window_size:-1])[::2] / r)[::-1]
return np.concatenate((begin, middle, end))
def set_seed(env, seed=42):
''' 设置随机种子 '''
env.action_space.seed(seed)
env.reset(seed=seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
state_dim = 4 # 环境观测维度
action_range = 2 # 环境动作空间大小
actor_lr = 1e-3
critic_lr = 1e-2
num_episodes = 200
hidden_dim = 64
gamma = 0.98
lmbda = 0.95
epochs = 10
eps = 0.2
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
# build environment
env_name = 'CartPole-v0'
env = gym.make(env_name, render_mode='rgb_array')
check_env(env.unwrapped) # 检查环境是否符合 gym 规范
set_seed(env, 0)
# build agent
agent = PPO(state_dim, hidden_dim, action_range, actor_lr, critic_lr, lmbda, epochs, eps, gamma, device)
# TensorBoard writer
TIMESTAMP = "{0:%Y-%m-%dT%H-%M-%S/}".format(datetime.now())
writer = SummaryWriter(f"logs/PPO")
#writer = SummaryWriter(f"logs/PPO/{TIMESTAMP}")
# start training
return_list = []
for i in range(10):
with tqdm(total=int(num_episodes / 10), desc='Iteration %d' % i) as pbar:
for i_episode in range(int(num_episodes / 10)):
episode_return = 0
transition_dict = {
'states': [],
'actions': [],
'next_states': [],
'next_actions': [],
'rewards': [],
'dones': []
}
state, _ = env.reset()
# 以当前策略交互得到一条轨迹
while True:
action = agent.take_action(state)
next_state, reward, terminated, truncated, _ = env.step(action)
next_action = agent.take_action(next_state)
transition_dict['states'].append(state)
transition_dict['actions'].append(action)
transition_dict['next_states'].append(next_state)
transition_dict['next_actions'].append(next_action)
transition_dict['rewards'].append(reward)
transition_dict['dones'].append(terminated or truncated)
state = next_state
episode_return += reward
if terminated or truncated:
break
#env.render()
# 用当前策略收集的数据进行 on-policy 更新
agent.update(transition_dict)
# 更新进度条
return_list.append(episode_return)
pbar.set_postfix({
'episode':
'%d' % (num_episodes / 10 * i + i_episode + 1),
'return':
'%.3f' % episode_return,
'ave return':
'%.3f' % np.mean(return_list[-10:])
})
pbar.update(1)
writer.add_scalars(
main_tag='return',
tag_scalar_dict={f'hidden{hidden_dim}':episode_return},
global_step=i*int(num_episodes / 10) + i_episode
)
writer.add_hparams(
hparam_dict={
'actor_lr': actor_lr,
'critic_lr': critic_lr,
'hidden_dim': hidden_dim,
'gamma': gamma,
'lmbda': lmbda,
'eps': eps,
'num_episodes': num_episodes
},
metric_dict={
'hparam/ave return': np.mean(return_list),
}
)
writer.close()
# show policy performence
mv_return_list = moving_average(return_list, 29)
episodes_list = list(range(len(return_list)))
plt.figure(figsize=(12,8))
plt.plot(episodes_list, return_list, label='raw', alpha=0.5)
plt.plot(episodes_list, mv_return_list, label='moving ave')
plt.xlabel('Episodes')
plt.ylabel('Returns')
plt.title(f'{agent._get_name()} on CartPole-V0')
plt.legend()
plt.savefig(f'./result/{agent._get_name()}.png')
plt.show()
这里使用了两个 API,add_hparams
用来记录实验的超参数和结果,add_scalars
用来记录收敛过程(用这个是为了方便把多条曲线绘制到一张图中),结果如下