tensorflow
nvidia driver, cuda, cuDNNとの連携 (Ubuntu 16.04)
ディープラーニングのライブラリであるtensorflowを便利・快適に使うには, GPUとそのドライバ(nvidia driver)とライブラリ(cuda, cuDNN), フロントエンド(keras)と連携させることが重要である。これらがうまく連携するようにインストールするのはなかなか大変である。
anacondaでやる
これが最も簡単。GPUのドライバもcudaもcuDNNも全部自動で入れてくれる。管理者権限もいらないし。
anacondaをインストールする:
- Anaconda*.shをダウンロードする: https://www.anaconda.com/
- 以下のコマンドを実行(管理者でなくユーザー権限で):
$ bash Anaconda3-2023.07-1-Linux-x86_64.sh
- Note: By default, you enter in anaconda at the moment of login. You can supress it by:
$ conda config --set auto_activate_base false And you can activate anaconda by: $ conda activate
- いろいろアップデートする:
$ conda activate $ conda update --all $ conda clean -packages
tensorflowをインストールする。
- バージョンはpythonのそれと不整合を起こさないように, ちょっと調べる必要があるかも。
- 以下のコマンドでインストールする。(anacondaに入ってやること)
$ conda install tensorflow=2.12.*=gpu_*
tensorflowがちゃんと動くか確認。
$ python <<EOF from tensorflow.python.client import device_lib device_lib.list_local_devices() EOF
nvidia driver
- 相性確認: GPUとnvidia driverのバージョンの相性: https://www.nvidia.co.jp/Download/index.aspx?lang=jp
- GPU (GTX 1080)の場合:
- Linux short lived: 435.21 https://www.nvidia.com/download/driverResults.aspx/150803/en-us
- Linux long lived: 440.44 https://www.nvidia.com/download/driverResults.aspx/156086/en-us
- 以下のコマンドでも相性確認できる(?) $ ubuntu-drivers devices
- GPU (GTX 1080)の場合:
- nvidia driverのバージョン確認法:
$ cat /proc/driver/nvidia/version
- nvidia driverのインストール:
sudo apt install nvidia-driver-418 ... 418は上で確認した適当な番号に変える!
- nvidia動作確認:
$ watch "nvidia-smi"
- この時点でコンピュータを再起動するほうがよかろう。
CUDA
- 本家
- 相性確認: nvidia driverとCUDAの相性: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
- ダウンロードとインストール
$ sudo dpkg -i cuda-repo-ubuntu*.deb $ sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub $ sudo apt install cuda
- CUDA Toolkit 10.1 update2 Archive を入れたら, 自動的にnvidiaドライバも入るらしい。
- CUDA 10.1 for >= nvidia 418.39
- CUDAのバージョン確認法:
$ nvcc -V
- CUDAはv10.0やv10.1などが/usr/local/cuda-10.0や/usr/local/cuda-10.1みたいにインストールされる。そのうちメインで使うバージョンに, /usr/local/cudaという名前でシンボリックリンクを貼る。つまり, こんなかんじ:
$ sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda
cuDNN
- 相性確認: CUDAとcuDNNとtensorflowの相性: https://www.tensorflow.org/install/source#linux
- ... とりあえず, tensorflow_gpu = 1.13.1, cuDNN = 7.4, cuda = 10.0 くらいが標準的か。
- cuDNNのバージョン確認法:
$ dpkg -l | grep "cudnn"
- cuDNNのインストール:
- get cuDNN from here: https://developer.nvidia.com/rdp/cudnn-archive
$ sudo dpkg -i libcudnn7-dev_7.6.4.38-1+cuda10.1_amd64.deb libcudnn7-doc_7.6.4.38-1+cuda10.1_amd64.deb libcudnn7_7.6.4.38-1+cuda10.1_amd64.deb
- document of cuDNN
$ xpdf /usr/share/doc/libcudnn7-doc/cuDNN-Developer-Guide.pdf
- test of cuDNN
$ cp -pr /usr/src/cudnn_samples_v7 ./ $ cd cudnn_samples_v7/conv_sample/ $ make $ ./conv_sample
tensorflow
- 最新バージョンはこちらで確認できる: https://www.tensorflow.org/versions?hl=ja
- pip3でインストールする。
- sudoではやらないほうがよい。tensorflowのバージョンや, GPU使う/使わないなど, ユーザーの好みを尊重するため。
- sudoでないpip3は, --userオプションをつけるべし。
- == でバージョン指定できる。
- tensorflowは入れずに, tensorflow-gpuを入れるべし。
- pip3すれば, tensorboardなどの必要なやつは自動で入る。
Install tensorflow1 to Ubuntu 16.04, non-root
$ pip3 install tensorboard==1.15.0 tensorflow-gpu==1.15.0 --user $ pip3 install keras --user
Install tensorflow2 to Ubuntu 16.04, non-root
$ pip3 install tensorflow-gpu --user
tensorflow-gpuがうまく入ったかどうかのテスト
https://thr3a.hatenablog.com/entry/20180113/1515820265
from tensorflow.python.client import device_lib device_lib.list_local_devices()
keras-tensorflow-cudnn-cuda-nvidia-GPUが全部連携しているかを, 実際のディープラーニングでテスト
- このファイルをダウンロード: mnist_keras_CNN.py
- そして, $ python3 mnist_keras_CNN.py と打つ。
- 同時に別のターミナルで, $ watch nvidia-smi を打って, 表の中ほどの右よりのパーセントをチェック。
- これが0%のままだとたぶんGPUが働いていない(よくあるトラブル)。
成功例 (2022/06/19 home PC)
# cuDNNのインストール # NVIDIAのページからダウンロード: https://developer.nvidia.com/rdp/cudnn-archive sudo dpkg -i /home/nishida/Downloads/cudnn-local-repo-ubuntu2004-8.4.0.27_1.0-1_amd64.deb sudo dpkg -i /var/cudnn-local-repo-ubuntu2004-8.4.0.27/*deb pip3 uninstall tensorflow-gpu tensorboard tensorboard-data-server tensorboard-plugin-wit tensorflow-estimator tensorflow-io-gcs-filesystem pip3 install tensorflow-gpu ipython3 In [1]: from tensorflow.python.client import device_lib In [2]: !pip3 install keras Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: keras in /home/nishida/.local/lib/python3.8/site-packages (2.9.0) WARNING: You are using pip version 21.1.2; however, version 22.1.2 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. In [3]: device_lib.list_local_devices() 2022-06-19 20:11:07.268020: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-06-19 20:11:07.305552: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-19 20:11:07.313729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-19 20:11:07.314246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-19 20:11:07.625043: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-19 20:11:07.625366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-19 20:11:07.625645: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-19 20:11:07.625915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /device:GPU:0 with 9662 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6 Out[3]: [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 1591752916153806298 xla_global_id: -1, name: "/device:GPU:0" device_type: "GPU" memory_limit: 10131406848 locality { bus_id: 1 links { } } incarnation: 12728540286538525868 physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6" xla_global_id: 416903419] $ nvidia-smi Sun Jun 19 20:16:19 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 0% 45C P8 17W / 170W | 596MiB / 12051MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1395 G /usr/lib/xorg/Xorg 35MiB | | 0 N/A N/A 2341 G /usr/lib/xorg/Xorg 182MiB | | 0 N/A N/A 2481 G /usr/bin/gnome-shell 26MiB | | 0 N/A N/A 2817 G /usr/lib/firefox/firefox 267MiB | | 0 N/A N/A 2979 G ...AAAAAAAAA= --shared-files 13MiB | | 0 N/A N/A 3057 G ...AAAAAAAAA= --shared-files 19MiB | | 0 N/A N/A 8948 G ...veSuggestionsOnlyOnDemand 38MiB | +-----------------------------------------------------------------------------+ nishida@nasahome2:~/Dropbox/2022_jitsuyo1/cudnn_samples_v8/conv_sample$ nvidia-smi -L GPU 0: NVIDIA GeForce RTX 3060 (UUID: GPU-c052c8c1-7db1-cd1c-e398-2ce9659b28db)
成功例 (2020/02/12; Ubuntu 16.04; nozomi server)
nishida@nozomi:~$ lspci | grep -i nvidia 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) 01:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1) 06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1) 06:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1) nishida@nozomi:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.87.00 Thu Aug 8 15:35:46 CDT 2019 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) nishida@nozomi:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 nishida@nozomi:~$ dpkg -l | grep "cudnn" ii libcudnn7 7.6.4.38-1+cuda10.0 amd64 cuDNN runtime libraries ii libcudnn7-dev 7.6.4.38-1+cuda10.0 amd64 cuDNN development libraries and headers ii libcudnn7-doc 7.6.4.38-1+cuda10.0 amd64 cuDNN documents and samples nishida@nozomi:~$ pip3 list | grep tensor tensorboard 1.13.1 tensorflow-estimator 1.13.0 tensorflow-gpu 1.13.1 nishida@nozomi:~$ pip3 list | grep Keras Keras 2.3.1 Keras-Applications 1.0.6 Keras-Preprocessing 1.0.5 nishida@nozomi:~$ nvidia-smi Wed Feb 12 15:38:41 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 710 Off | 00000000:01:00.0 N/A | N/A | | 50% 49C P8 N/A / N/A | 28MiB / 980MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:06:00.0 Off | N/A | | 25% 43C P8 6W / 180W | 2MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ nishida@nozomi:~$ nvidia-smi -L GPU 0: GeForce GT 710 (UUID: GPU-6129c662-ef32-9b95-57d0-ac801f63e550) GPU 1: GeForce GTX 1080 (UUID: GPU-407f2c9e-3be0-7ca0-abd2-0fbf1371c012)
成功例 (2020/02/12; Ubuntu 16.04; Nasahara home PC)
nishida@nasahome:~$ lspci | grep -i nvidia 01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) 01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1) 06:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) 06:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1) nishida@nasahome:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.33.01 Wed Nov 13 00:00:22 UTC 2019 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) nishida@nasahome:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 nishida@nasahome:~$ dpkg -l | grep "cudnn" ii libcudnn7 7.6.4.38-1+cuda10.1 amd64 cuDNN runtime libraries ii libcudnn7-dev 7.6.4.38-1+cuda10.1 amd64 cuDNN development libraries and headers ii libcudnn7-doc 7.6.4.38-1+cuda10.1 amd64 cuDNN documents and samples nishida@nasahome:~$ pip3 list | grep tensor tensorboard 1.15.0 tensorflow-estimator 1.15.1 tensorflow-gpu 1.15.0 nishida@nasahome:~$ pip3 list | grep Keras Keras 2.3.1 Keras-Applications 1.0.8 Keras-Preprocessing 1.1.0 nishida@nasahome:~$ nvidia-smi Wed Feb 12 00:29:46 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A | | 30% 21C P8 N/A / 75W | 356MiB / 4032MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 710 Off | 00000000:06:00.0 N/A | N/A | | 40% 28C P8 N/A / N/A | 1MiB / 981MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1281 G /usr/lib/xorg/Xorg 291MiB | | 0 3603 G compiz 62MiB | | 1 Not Supported | +-----------------------------------------------------------------------------+ nishida@nasahome:~$ nvidia-smi -L GPU 0: GeForce GTX 1050 Ti (UUID: GPU-950dd629-b863-9da2-1276-8549f80bbc3a) GPU 1: GeForce GT 710 (UUID: GPU-6316c0ca-5a86-adf4-a526-a889a7894bc1)
失敗例 (2020/01/19 nozomi)
GTX-1080はnvidia-440.44のはずだが, aptでは入らない。かわりに,
$ sudo service lightdm stop $ sudo service gdm stop $ sudo apt install nvidia-418
cuda入れたらnvidia-smiで以下のエラーが出た:
Failed to initialize NVML: Driver/library version mismatch
cuda, nvidiaドライバ全部消して入れなおし。
$ sudo apt purge nvidia* $ sudo apt install nvidia-430 $ sudo reboot $ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 430.64 Sun Oct 27 11:26:12 UTC 2019 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) $ nvidia-smi Failed to initialize NVML: Driver/library version mismatch $ sudo reboot $ nvidia-smi Sun Jan 19 16:38:33 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 710 Off | 00000000:01:00.0 N/A | N/A | | 50% 50C P8 N/A / N/A | 28MiB / 980MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:06:00.0 Off | N/A | | 25% 41C P8 6W / 180W | 2MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+
その後, cuda入れたらまた
$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch
が出るようになったのでやりなおし。いったんcuda*とnvidia*をapt purgeする。
# https://developer.nvidia.com/cuda-10.1-download-archive-update2?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1604-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub sudo apt-get update sudo apt-get install cuda $ nvidia-smi Failed to initialize NVML: Driver/library version mismatch $ sudo reboot $ nvidia-smi Sun Jan 19 17:12:12 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 710 Off | 00000000:01:00.0 N/A | N/A | | 50% 50C P8 N/A / N/A | 32MiB / 980MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:06:00.0 Off | N/A | | 25% 42C P8 6W / 180W | 2MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+
cuda-10-0を入れると自動的にnvidia-410.48が入ってしまう。
cuda-10-1を入れると自動的にnvidia-418.87が入ってしまう。tensorflow-.13.1にはcuda-10-1はダメっぽい。
GTX-1080に合うドライバを無理やり入れる:
$ sudo ./NVIDIA-Linux-x86_64-440.44.run $ nvidia-smi Sun Jan 19 19:25:32 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 710 Off | 00000000:01:00.0 N/A | N/A | | 50% 51C P0 N/A / N/A | 0MiB / 980MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:06:00.0 Off | N/A | | 25% 41C P0 40W / 180W | 0MiB / 8119MiB | 2% Default | +-------------------------------+----------------------+----------------------+
cuda入れたらうまくいかなくなった。。。
Keyword(s):
References: