Ubuntu16.04安装Nvidia驱动cuda,cudnn和tensorflow-gpu

lawlite19 2019-09-18 22:02:59 2748
  • 之前有在阿里云GPU服务器上弄过: 点击查看, 这里从装Nvidia开始

    一、 安装Nvidia驱动

1.1 查找需要安装的Nvidia版本

1.1.1 官网

GPU对应nvidia版本

GPU对应驱动版本

1.1.2 命令行查看推荐驱动
  • 查看驱动:ubuntu-drivers devices, 如下图
    
    ubuntu@ubuntu-System-Product-Name:~$ ubuntu-drivers devices
    == cpu-microcode.py ==
    driver   : intel-microcode - distro free

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
vendor : NVIDIA Corporation
modalias : pci:v000010DEd00001B06sv00001458sd0000374Dbc03sc00i00
driver : nvidia-410 - third-party free recommended
driver : nvidia-384 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
driver : nvidia-390 - third-party free
driver : nvidia-396 - third-party free

- 注意这里添加了`ppa`, 若是没有,可能最新的只有`nvidia-384`, 但是若想安装`cuda-9.0` 需要大于`384.81`, 不然后面安装`tensorflow-gpu` 之后也会报错
  - 图片对应网址:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

![cuda版本对应nvidia版本][4]

- 添加 `ppa`: 
  - `sudo add-apt-repository ppa:graphics-drivers/ppa`   (注意联网,去掉代理)
  - `sudo apt update`
- 然后执行`ubuntu-drivers devices`就可以看到如上的结果
- 安装:
  - 可能需要的依赖:`sudo apt install dkms build-essential linux-headers-generic` 
  - 有些可能需要禁用`nouveau`模块,查看:https://blog.csdn.net/u012235003/article/details/54575758
  - `sudo apt-get install linux-headers-$(uname -r)`
  - `sudo apt install nvidia-410`
  - 重启机器
- 查看:
  - `nvidia-smi`  
  - 显示如下结果

``` bash
(wangyongzhi_ml) ubuntu@ubuntu-System-Product-Name:/usr/local/cuda-10.0/bin$ nvidia-smi
Thu Oct 25 15:49:46 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.66       Driver Version: 410.66       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   44C    P8    20W / 250W |     42MiB / 11174MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   50C    P8    20W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       949      G   /usr/lib/xorg/Xorg                            39MiB |
+-----------------------------------------------------------------------------+
  • 跑个程序的使用情况
ubuntu@ubuntu-System-Product-Name:~$ nvidia-smi
Thu Oct 25 21:20:00 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.66       Driver Version: 410.66       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   53C    P2   128W / 250W |  10776MiB / 11174MiB |     44%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   52C    P8    21W / 250W |  10631MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       949      G   /usr/lib/xorg/Xorg                            39MiB |
|    0      3009      C   python                                     10725MiB |
|    1      3009      C   python                                     10619MiB |
+-----------------------------------------------------------------------------+

二、安装cuda

  • 官网: https://developer.nvidia.com/cuda-toolkit-archive
  • 选择想要安装的版本,这里选择的是cuda-9.0, 下载
  • 安装
    • chmod +x cuda_9.0.176_384.81_linux-run
    • sudo ./cuda_9.0.176_384.81_linux-run
    • 根据提示安装选择即可
    • 添加环境变量
    • vim ~/.bashrc
    • 加入环境变量
# cuda9.0
export PATH=/usr/local/cuda-9.0/bin/:$PATH;
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH;
  • 测试1
    • nvcc -V
    • 如下图,版本为V9.0.176
      (wangyongzhi_ml) ubuntu@ubuntu-System-Product-Name:~/wangyongzhi/software$ nvcc -V
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2017 NVIDIA Corporation
      Built on Fri_Sep__1_21:08:03_CDT_2017
      Cuda compilation tools, release 9.0, V9.0.176
  • 测试2
    • 如果上面安装过程中选择了安装Examples, 会在 ~ 文件夹下生成测试NVIDIA_CUDA-9.0_Samples 的文件
    • 进入: cd NVIDIA_CUDA-9.0_Samples
    • make
    • 进入 NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release 文件夹
    • 执行: ./deviceQuery, 可以看到类似如下信息
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          10.0 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11174 MBytes (11717181440 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1683 MHz (1.68 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024

三、安装cudnn

cudnn版本

  • 安装
    • tar -zxvf cudnn-9.0-linux-x64-v7.3.1.20.tgz
    • 将解压得到的cuda 文件夹下的内容拷贝到对应的 /usr/local/cuda-9.0文件夹下即可

四、安装Anaconda和tensorflow-gpu

# anaconda3
export PATH=/home/ubuntu/anaconda3/bin:$PATH
  • 创建虚拟环境,防止污染他人使用环境

    • conda create -n xxx python-3.6
    • conda install tensorflow-gpu
  • 测试

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
  • 打印如下信息:
2018-10-25 16:25:35.683507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.72GiB
2018-10-25 16:25:35.783459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-25 16:25:35.783843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-10-25 16:25:35.784321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1
2018-10-25 16:25:36.069610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-25 16:25:36.069634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1
2018-10-25 16:25:36.069637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y
2018-10-25 16:25:36.069639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N
2018-10-25 16:25:36.069852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10367 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-25 16:25:36.101498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10409 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
2018-10-25 16:25:36.134430: I tensorflow/core/common_runtime/direct_session.cc:288] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1

五、 多个cuda版本切换

  • 安装cuda-9.0 会在 /usr/local/ 目录下

    • 如下图,它会创建一个软连接指向了 /usr/local/cuda-9.0/
      (wangyongzhi_ml) ubuntu@ubuntu-System-Product-Name:/usr/local$ ll
      总用量 48
      drwxr-xr-x 12 root root 4096 10月 25 14:51 ./
      drwxr-xr-x 13 root root 4096 10月 25 09:39 ../
      drwxr-xr-x  2 root root 4096 4月  21  2016 bin/
      lrwxrwxrwx  1 root root   19 10月 25 00:41 cuda -> /usr/local/cuda-9.0/
      drwxr-xr-x 19 root root 4096 10月 25 14:52 cuda-10.0/
      drwxr-xr-x 18 root root 4096 10月 25 00:41 cuda-9.0/
      drwxr-xr-x  2 root root 4096 4月  21  2016 etc/
      drwxr-xr-x  2 root root 4096 4月  21  2016 games/
      drwxr-xr-x  2 root root 4096 4月  21  2016 include/
      drwxr-xr-x  4 root root 4096 4月  21  2016 lib/
      lrwxrwxrwx  1 root root    9 10月 24 14:52 man -> share/man/
      drwxr-xr-x  2 root root 4096 4月  21  2016 sbin/
      drwxr-xr-x  8 root root 4096 4月  21  2016 share/
      drwxr-xr-x  2 root root 4096 4月  21  2016 src/
  • 所以正常安装cuda 其他版本,然后创建软连接指向对应的版本即可

    sudo rm -rf cuda
    sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

Reference

声明:本文内容由易百纳平台入驻作者撰写,文章观点仅代表作者本人,不代表易百纳立场。如有内容侵权或者其他问题,请联系本站进行删除。
红包 点赞 收藏 评论 打赏
评论
0个
内容存在敏感词
手气红包
    易百纳技术社区暂无数据
相关专栏
置顶时间设置
结束时间
删除原因
  • 广告/SPAM
  • 恶意灌水
  • 违规内容
  • 文不对题
  • 重复发帖
打赏作者
易百纳技术社区
lawlite19
您的支持将鼓励我继续创作!
打赏金额:
¥1易百纳技术社区
¥5易百纳技术社区
¥10易百纳技术社区
¥50易百纳技术社区
¥100易百纳技术社区
支付方式:
微信支付
支付宝支付
易百纳技术社区微信支付
易百纳技术社区
打赏成功!

感谢您的打赏,如若您也想被打赏,可前往 发表专栏 哦~

举报反馈

举报类型

  • 内容涉黄/赌/毒
  • 内容侵权/抄袭
  • 政治相关
  • 涉嫌广告
  • 侮辱谩骂
  • 其他

详细说明

审核成功

发布时间设置
发布时间:
是否关联周任务-专栏模块

审核失败

失败原因
备注
拼手气红包 红包规则
祝福语
恭喜发财,大吉大利!
红包金额
红包最小金额不能低于5元
红包数量
红包数量范围10~50个
余额支付
当前余额:
可前往问答、专栏板块获取收益 去获取
取 消 确 定

小包子的红包

恭喜发财,大吉大利

已领取20/40,共1.6元 红包规则

    易百纳技术社区