easyRL学习笔记:强化学习基础

https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1

pip install gym

配置开发环境

https://book.douban.com/subject/35043939/
https://zhuanlan.zhihu.com/reinforce

参考项目二

python train.py
visualdl --logdir=train_log/train --host=172.30.159.168

easyRL学习笔记:强化学习基础_第1张图片

这三个高峰意味着什么呢?
偶尔的突变
easyRL学习笔记:强化学习基础_第2张图片
6分钟左右跑完成了,我们看看效果。
easyRL学习笔记:强化学习基础_第3张图片
不知道什么原因,感觉后面是越训练越差劲了,后面我们再调试一下。

note:
前面sarsa是同策略的一直是策略π,Q学习是异策略的每次算maxQ,第六章深度Q网络是只属于异策略部分的一个深度算法。
第六章刚开始的价值函数近似只有Q函数近似,是不是就是说策略迭代时候从Q表格找maxQ用近似函数代替,价值迭代时候不需要近似V函数,然后这个近似Q和不近似的V再用深度网络训练。
DQN里还有目标网络,是不是这第六章到第九章都是在异策略的条件下做的?
参考链接https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1
Actor-Critic算法,可以这么说(PPO也可以说是异策略)

然后这个时候,我们可以参考https://github.com/sfujim/TD3
自己也实现一下TD3
可以指定一下端口号

jupyter notebook --no-browser --port 8889 --ip=192.168.1.103

参考
蘑菇书代码:https://github.com/datawhalechina/easy-rl/tree/master/projects
个人开发版代码:https://github.com/johnjim0816/rl-tutorials

但是遇到了一个问题
Failed to build mujoco_py

解决办法:安装旧版本mujoco_py
pip install mujoco_py==2.0.2.8
同时mujoco也需要安装
You appear to be missing MuJoCo. We expected to find the file here: /home/kewei/.mujoco/mujoco200

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo

https://github.com/openai/mujoco-py#install-mujoco

Which can be downloaded from the website

https://www.roboti.us/index.html

那我们下载linux版本的
easyRL学习笔记:强化学习基础_第4张图片

mkdir ~/.mujoco
mv mujoco200_linux ~/.mujoco/mujoco200

easyRL学习笔记:强化学习基础_第5张图片
在这里插入图片描述
按照官网所说,配置了环境变量,还是报错
Exception:
Missing path to your environment variable.
Current values LD_LIBRARY_PATH=/usr/local/openmpi-4.0.3/lib:/usr/local/cuda-11.1/lib64:
Please add following line to .bashrc:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kewei/.mujoco/mujoco200/bin

配置如下

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} 
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

遇事不决多重启,把wsl shutdown了再来。
果然,重启wsl后就开始运转了。
easyRL学习笔记:强化学习基础_第6张图片

/home/kewei/miniconda3/lib/python3.9/site-packages/mujoco_py/gl/osmesashim.c:1:10: fatal error: GL/osmesa.h: No such file or directory
#include
^~~~~~~~~~~~~
compilation terminated.

解决办法

sudo apt-get install mesa-common-dev
sudo apt-get install libgl1-mesa-dev libglu1-mesa-dev

但是却告诉我已经安装了
easyRL学习笔记:强化学习基础_第7张图片

尝试另外一种办法

sudo apt-get install libglew-dev
sudo gedit ~/.bashrc
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so
source ~/.bashrc

还是不行,换另外一种办法

sudo apt-get install libosmesa6-dev

这样便可以了,但是又遇到了一个问题
import的时候报错
PermissionError: [Errno 13] Permission denied: ‘patchelf’
解决:这个是构建锁导致的问题,可以通过如下办法解决:
cd /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mujoco_py
cd ~/miniconda3/lib/python3.9/site-packages/mujoco_py
sudo chmod -R 777 ./

结果这种方法又出错了,我决定安装最新版的
结果又爆了
Building wheels for collected packages: mujoco-py
Building wheel for mujoco-py (pyproject.toml) … error
error: subprocess-exited-with-error

× Building wheel for mujoco-py (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
running bdist_wheel
running build
Removing old mujoco_py cext /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/cymj_2.0.2.13_39_linuxcpuextensionbuilder_39.so
Compiling /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/cymj.pyx because it depends on /tmp/pip-build-env-t5spnhlg/overlay/lib/python3.9/site-packages/Cython/Includes/libc/string.pxd.
[1/1] Cythonizing /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/cymj.pyx
running build_ext
building ‘mujoco_py.cymj’ extension
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/gl
gcc -pthread -B /home/kewei/miniconda3/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/kewei/miniconda3/include -I/home/kewei/miniconda3/include -fPIC -O2 -isystem /home/kewei/miniconda3/include -fPIC -Imujoco_py -I/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py -I/home/kewei/.mujoco/mujoco200/include -I/tmp/pip-build-env-t5spnhlg/overlay/lib/python3.9/site-packages/numpy/core/include -I/home/kewei/miniconda3/include/python3.9 -c /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/cymj.c -o /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/cymj.o -fopenmp -w
gcc -pthread -B /home/kewei/miniconda3/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/kewei/miniconda3/include -I/home/kewei/miniconda3/include -fPIC -O2 -isystem /home/kewei/miniconda3/include -fPIC -Imujoco_py -I/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py -I/home/kewei/.mujoco/mujoco200/include -I/tmp/pip-build-env-t5spnhlg/overlay/lib/python3.9/site-packages/numpy/core/include -I/home/kewei/miniconda3/include/python3.9 -c /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/gl/osmesashim.c -o /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/gl/osmesashim.o -fopenmp -w
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/lib.linux-x86_64-cpython-39
creating /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/lib.linux-x86_64-cpython-39/mujoco_py
gcc -pthread -B /home/kewei/miniconda3/compiler_compat -shared -Wl,-rpath,/home/kewei/miniconda3/lib -Wl,-rpath-link,/home/kewei/miniconda3/lib -L/home/kewei/miniconda3/lib -L/home/kewei/miniconda3/lib -Wl,-rpath,/home/kewei/miniconda3/lib -Wl,-rpath-link,/home/kewei/miniconda3/lib -L/home/kewei/miniconda3/lib /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/cymj.o /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-39/tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/gl/osmesashim.o -L/home/kewei/.mujoco/mujoco200/bin -Wl,–enable-new-dtags,-R/home/kewei/.mujoco/mujoco200/bin -lmujoco200 -lglewosmesa -lOSMesa -lGL -o /tmp/pip-install-p35bq4mq/mujoco-py_5714e77de65c4408a54ec041c0e44487/mujoco_py/generated/_pyxbld_2.0.2.13_39_linuxcpuextensionbuilder/lib.linux-x86_64-cpython-39/mujoco_py/cymj.cpython-39-x86_64-linux-gnu.so -fopenmp
error: [Errno 13] Permission denied: ‘patchelf’
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mujoco-py
Failed to build mujoco-py
ERROR: Could not build wheels for mujoco-py, which is required to install pyproject.toml-based projects

解决办法
sudo apt-get install patchelf

然后一切顺利
easyRL学习笔记:强化学习基础_第8张图片

然后导入包报错,
ImportError: cannot import name ‘MISSING_KEY_MESSAGE’ from ‘mujoco_py.utils’ (/home/kewei/miniconda3/lib/python3.9/site-packages/mujoco_py/utils.py)

先测试一下
cd ~/.mujoco/mujoco200/bin
./simulate …/model/humanoid.xml

easyRL学习笔记:强化学习基础_第9张图片

easyRL学习笔记:强化学习基础_第10张图片
下载glfw的zip文件,解压后
预编译,出了一个小错
Looking for remove - found
– Looking for shmat
– Looking for shmat - found
– Looking for IceConnectionNumber in ICE
– Looking for IceConnectionNumber in ICE - found
CMake Error at CMakeLists.txt:218 (message):
RandR headers not found; install libxrandr development package

解决

sudo apt-get install libxrandr-dev

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

解决

pip install --upgrade numpy

easyRL学习笔记:强化学习基础_第11张图片
终于顺利安装了。
easyRL学习笔记:强化学习基础_第12张图片
这下子我们就可以好好体验一把td3了。
然后这个训练了2个多小时
easyRL学习笔记:强化学习基础_第13张图片
但是貌似9966已经收敛到极限了。

easyRL学习笔记:强化学习基础_第14张图片
运行完1000000次后有个快速更新的过程
,不过我仔细一看,貌似是全部都要又来训练一遍,泪目了,但是貌似这次的比上次要快些。
easyRL学习笔记:强化学习基础_第15张图片
使用单个环境一百万次跑完,还不错。

你可能感兴趣的:(机器学习,学习,python,开发语言)