前沿:
本来想是搭建一个本地环境,可是在安装过程中需要 cuda 10.0 ,而我安装的是 cuda 10.1 不匹配。所以就寻思着安装了一个 docker,使用容器化安装。
见官网教程
待补充
查看用户组中是否含有 docker
li@li-System-Product-Name:~$ groups
li adm cdrom sudo dip plugdev lpadmin sambashare docker
//可以看出最后一项就是docker,此时可以不用sudo,直接使用docker开始
检测 docker
li@li-System-Product-Name:~$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
// 安装成功
li@li-System-Product-Name:~$ docker --version
Docker version 18.09.5, build e8ff056
检测 nvidia-docker
li@li-System-Product-Name:~$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
Unable to find image 'nvidia/cuda:10.1-base' locally
10.1-base: Pulling from nvidia/cuda
898c46f3b1a1: Already exists
63366dfa0a50: Already exists
041d4cd74a92: Already exists
6e1bee0f8701: Already exists
131dbe7c254d: Pull complete
5bca6b05dcd6: Pull complete
0d286a7b6e12: Pull complete
Digest: sha256:6ddf907e77f4b53ac8b0b8ce9fa9cd43ffb6882f1ad0f2d41ca996f154f17c7b
Status: Downloaded newer image for nvidia/cuda:10.1-base
Mon Apr 22 13:21:37 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:65:00.0 On | N/A |
| 31% 30C P8 22W / 260W | 84MiB / 10986MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
li@li-System-Product-Name:~$ docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi
Unable to find image 'nvidia/cuda:10.0-base' locally
10.0-base: Pulling from nvidia/cuda
898c46f3b1a1: Already exists
63366dfa0a50: Already exists
041d4cd74a92: Already exists
6e1bee0f8701: Already exists
112097260ef3: Pull complete
30a67c795176: Pull complete
0d286a7b6e12: Pull complete
Digest: sha256:faac85a7d28e086173915df6456784778c4dacb429ff067def0c4a12671240e8
Status: Downloaded newer image for nvidia/cuda:10.0-base
Mon Apr 22 13:22:09 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:65:00.0 On | N/A |
| 31% 30C P8 22W / 260W | 84MiB / 10986MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
安装 tensorflow
docker pull tensorflow/tensorflow:2.0.0a0-gpu-py3 // 拉取命令
li@li-System-Product-Name:~$ docker pull tensorflow/tensorflow:2.0.0a0-gpu-py3
2.0.0a0-gpu-py3: Pulling from tensorflow/tensorflow
7b722c1070cd: Pull complete
5fbf74db61f1: Pull complete
ed41cb72e5c9: Pull complete
7ea47a67709e: Pull complete
53d00018d593: Pull complete
d452561571e2: Pull complete
741421562e36: Pull complete
cf5a5f77591f: Pull complete
8e44471d34e9: Pull complete
95409a313744: Pull complete
3ca5dc868f92: Pull complete
a1c783d09ef0: Pull complete
eed91d5a4f29: Pull complete
b36de521e979: Pull complete
Digest: sha256:f43f2ea436eebc7b9fe3c80205e6649f4d1a66cfda8626ba010f8d8dfd7985ab
Status: Downloaded newer image for tensorflow/tensorflow:2.0.0a0-gpu-py3
运行 tensorflow
docker run -it -p 8888:8888 tensorflow/tensorflow:2.0.0a0-gpu-py3 //运行命令
li@li-System-Product-Name:~$ docker run -it -p 8888:8888 tensorflow/tensorflow:2.0.0a0-gpu-py3
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.
To avoid this, run the container by specifying your user's userid:
$ docker run -u $(id -u):$(id -g) args...
测试
root@cd51a60a7f4f:/# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import absolute_import, division, print_function, unicode_literals
>>> !pip install -q tensorflow==2.0.0-alpha0
File "", line 1
!pip install -q tensorflow==2.0.0-alpha0 //此处出错,不知为何
^
SyntaxError: invalid syntax
>>> import tensorflow as tf
>>>
>>> mnist = tf.keras.datasets.mnist
>>> (x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 6s 1us/step
>>> x_train, x_test = x_train / 255.0, x_test / 255.0
>>> model = tf.keras.models.Sequential([
... tf.keras.layers.Flatten(input_shape=(28, 28)),
... tf.keras.layers.Dense(128, activation='relu'),
... tf.keras.layers.Dropout(0.2),
... tf.keras.layers.Dense(10, activation='softmax')
... ])
2019-04-22 14:03:51.302251: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-04-22 14:03:51.316205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:
2019-04-22 14:03:51.316261: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-04-22 14:03:51.316319: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2019-04-22 14:03:51.337157: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-04-22 14:03:51.338981: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x4147790 executing computations on platform Host. Devices:
2019-04-22 14:03:51.339038: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): ,
//此处怀疑是使用了 CPU 计算
>>> model.compile(optimizer='adam',
... loss='sparse_categorical_crossentropy',
... metrics=['accuracy'])
>>> model.fit(x_train, y_train, epochs=5)
Epoch 1/5
60000/60000 [==============================] - 7s 109us/sample - loss: 0.2981 - accuracy: 0.9136
Epoch 2/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.1438 - accuracy: 0.9565
Epoch 3/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.1094 - accuracy: 0.9674
Epoch 4/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.0904 - accuracy: 0.9715
Epoch 5/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.0752 - accuracy: 0.9764
>>> model.evaluate(x_test, y_test)
10000/10000 [==============================] - 1s 55us/sample - loss: 0.0759 - accuracy: 0.9760
[0.07590396217172965, 0.976]