とらりもんHOME  Index  Search  Changes  Login

とらりもん - tensorflow Diff

  • Added parts are displayed like this.
  • Deleted parts are displayed like this.

!nvidia driver, cuda, cuDNNとの連携 (Ubuntu 16.04)

ディープラーニングのライブラリであるtensorflowを便利・快適に使うには, GPUとそのドライバ(nvidia driver)とライブラリ(cuda, cuDNN), フロントエンド(keras)と連携させることが重要である。これらがうまく連携するようにインストールするのはなかなか大変である。

!GPU
* GPUの型番の確認法:
$ lspci | grep -i nvidia

!nvidia driver
* 相性確認: GPUとnvidia driverのバージョンの相性: [[https://www.nvidia.co.jp/Download/index.aspx?lang=jp]]
** GPU (GTX 1080)の場合:
*** Linux short lived: 435.21 [[https://www.nvidia.com/download/driverResults.aspx/150803/en-us]]
*** Linux long lived: 440.44 [[https://www.nvidia.com/download/driverResults.aspx/156086/en-us]]
** 以下のコマンドでも相性確認できる(?) $ ubuntu-drivers devices

* nvidia driverのバージョン確認法:
$ cat /proc/driver/nvidia/version

* nvidia driverのインストール:
sudo apt install nvidia-driver-418  ... 418は上で確認した適当な番号に変える!

* nvidia動作確認:
$ watch "nvidia-smi"

* この時点でコンピュータを再起動するほうがよかろう。

!CUDA
* [[本家|https://developer.nvidia.com/cuda-zone]]
* 相性確認: nvidia driverとCUDAの相性: [[https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html]]
* ダウンロードとインストール
$ sudo dpkg -i cuda-repo-ubuntu*.deb
$ sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
$ sudo apt install cuda
* CUDA Toolkit 10.1 update2 Archive を入れたら, 自動的にnvidiaドライバも入るらしい。
** CUDA 10.1 for >= nvidia 418.39
* CUDAのバージョン確認法:
$ nvcc -V
* CUDAはv10.0やv10.1などが/usr/local/cuda-10.0や/usr/local/cuda-10.1みたいにインストールされる。そのうちメインで使うバージョンに, /usr/local/cudaという名前でシンボリックリンクを貼る。つまり, こんなかんじ:
$ sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

!cuDNN
* 相性確認: CUDAとcuDNNとtensorflowの相性: [[https://www.tensorflow.org/install/source#linux]]
** ... とりあえず, tensorflow_gpu = 1.13.1, cuDNN = 7.4, cuda = 10.0 くらいが標準的か。
* cuDNNのバージョン確認法:
$ dpkg -l | grep "cudnn"
* cuDNNのインストール:
** get cuDNN from here: [[https://developer.nvidia.com/rdp/cudnn-archive]]
$ sudo dpkg -i libcudnn7-dev_7.6.4.38-1+cuda10.1_amd64.deb libcudnn7-doc_7.6.4.38-1+cuda10.1_amd64.deb libcudnn7_7.6.4.38-1+cuda10.1_amd64.deb
* document of cuDNN
$ xpdf /usr/share/doc/libcudnn7-doc/cuDNN-Developer-Guide.pdf
* test of cuDNN
$ cp -pr /usr/src/cudnn_samples_v7 ./
$ cd cudnn_samples_v7/conv_sample/
$ make
$ ./conv_sample

! tensorflow
* pip3でインストールする。
* sudoではやらないほうがよい。tensorflowのバージョンや, GPU使う/使わないなど, ユーザーの好みを尊重するため。
* sudoでないpip3は, --userオプションをつけるべし。
* == でバージョン指定できる。
* tensorflowは入れずに, tensorflow-gpuを入れるべし。
* pip3すれば, tensorboardなどの必要なやつは自動で入る。

!! Install tensorflow1 to Ubuntu 16.04, non-root
$ pip3 install tensorboard==1.15.0 tensorflow-gpu==1.15.0 --user
$ pip3 install keras --user

!! Install tensorflow2 to Ubuntu 16.04, non-root
$ pip3 install tensorflow-gpu --user

!! tensorflow-gpuがうまく入ったかどうかのテスト
[[https://thr3a.hatenablog.com/entry/20180113/1515820265]]
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

! keras
$ pip3 install keras

!keras-tensorflow-cudnn-cuda-nvidia-GPUが全部連携しているかを, 実際のディープラーニングでテスト
* このファイルをダウンロード: {{attach_anchor(mnist_keras_CNN.py)}}
* そして, $ python3 mnist_keras_CNN.py と打つ。
* 同時に別のターミナルで, $ watch nvidia-smi を打って, 表の中ほどの右よりのパーセントをチェック。
* これが0%のままだとたぶんGPUが働いていない(よくあるトラブル)。

!うまくいかないときは...
* コンピュータを再起動してみる!

!成功例 (2020/02/12; Ubuntu 16.04; nozomi server)
nishida@nozomi:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

nishida@nozomi:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  418.87.00  Thu Aug  8 15:35:46 CDT 2019
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

nishida@nozomi:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

nishida@nozomi:~$ dpkg -l | grep "cudnn"
ii  libcudnn7      7.6.4.38-1+cuda10.0   amd64 cuDNN runtime libraries
ii  libcudnn7-dev  7.6.4.38-1+cuda10.0   amd64 cuDNN development libraries and headers
ii  libcudnn7-doc  7.6.4.38-1+cuda10.0   amd64 cuDNN documents and samples

nishida@nozomi:~$ pip3 list | grep tensor
tensorboard            1.13.1
tensorflow-estimator   1.13.0
tensorflow-gpu         1.13.1

nishida@nozomi:~$ pip3 list | grep Keras
Keras                         2.3.1
Keras-Applications            1.0.6
Keras-Preprocessing           1.0.5

nishida@nozomi:~$ nvidia-smi
Wed Feb 12 15:38:41 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 50%   49C    P8    N/A /  N/A |     28MiB /   980MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   43C    P8     6W / 180W |      2MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

nishida@nozomi:~$ nvidia-smi -L
GPU 0: GeForce GT 710 (UUID: GPU-6129c662-ef32-9b95-57d0-ac801f63e550)
GPU 1: GeForce GTX 1080 (UUID: GPU-407f2c9e-3be0-7ca0-abd2-0fbf1371c012)


!成功例 (2020/02/12; Ubuntu 16.04; Nasahara home PC)

nishida@nasahome:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)

nishida@nasahome:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  440.33.01  Wed Nov 13 00:00:22 UTC 2019
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

nishida@nasahome:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

nishida@nasahome:~$ dpkg -l | grep "cudnn"
ii  libcudnn7      7.6.4.38-1+cuda10.1  amd64  cuDNN runtime libraries
ii  libcudnn7-dev  7.6.4.38-1+cuda10.1  amd64  cuDNN development libraries and headers
ii  libcudnn7-doc  7.6.4.38-1+cuda10.1  amd64  cuDNN documents and samples

nishida@nasahome:~$ pip3 list | grep tensor
tensorboard                   1.15.0                
tensorflow-estimator          1.15.1                
tensorflow-gpu                1.15.0  

nishida@nasahome:~$ pip3 list | grep Keras
Keras                         2.3.1                
Keras-Applications            1.0.8                
Keras-Preprocessing           1.1.0        

nishida@nasahome:~$ nvidia-smi
Wed Feb 12 00:29:46 2020      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
| 30%   21C    P8    N/A /  75W |    356MiB /  4032MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 710      Off  | 00000000:06:00.0 N/A |                  N/A |
| 40%   28C    P8    N/A /  N/A |      1MiB /   981MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1281      G   /usr/lib/xorg/Xorg                           291MiB |
|    0      3603      G   compiz                                        62MiB |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

nishida@nasahome:~$ nvidia-smi -L
GPU 0: GeForce GTX 1050 Ti (UUID: GPU-950dd629-b863-9da2-1276-8549f80bbc3a)
GPU 1: GeForce GT 710 (UUID: GPU-6316c0ca-5a86-adf4-a526-a889a7894bc1)



!失敗例 (2020/01/19 nozomi)
GTX-1080はnvidia-440.44のはずだが, aptでは入らない。かわりに,
$ sudo service lightdm stop
$ sudo service gdm stop
$ sudo apt install nvidia-418
cuda入れたらnvidia-smiで以下のエラーが出た:
Failed to initialize NVML: Driver/library version mismatch
cuda, nvidiaドライバ全部消して入れなおし。
$ sudo apt purge nvidia*
$ sudo apt install nvidia-430
$ sudo reboot
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.64  Sun Oct 27 11:26:12 UTC 2019
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
$ sudo reboot
$ nvidia-smi
Sun Jan 19 16:38:33 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 50%   50C    P8    N/A /  N/A |     28MiB /   980MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   41C    P8     6W / 180W |      2MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

その後, cuda入れたらまた
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
が出るようになったのでやりなおし。いったんcuda*とnvidia*をapt purgeする。

# https://developer.nvidia.com/cuda-10.1-download-archive-update2?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1604-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
$ sudo reboot
$ nvidia-smi
Sun Jan 19 17:12:12 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 50%   50C    P8    N/A /  N/A |     32MiB /   980MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   42C    P8     6W / 180W |      2MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

cuda-10-0を入れると自動的にnvidia-410.48が入ってしまう。

cuda-10-1を入れると自動的にnvidia-418.87が入ってしまう。tensorflow-.13.1にはcuda-10-1はダメっぽい。

GTX-1080に合うドライバを無理やり入れる:
$ sudo ./NVIDIA-Linux-x86_64-440.44.run
$ nvidia-smi
Sun Jan 19 19:25:32 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 440.44       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 50%   51C    P0    N/A /  N/A |      0MiB /   980MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   41C    P0    40W / 180W |      0MiB /  8119MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
cuda入れたらうまくいかなくなった。。。