安装RTX2080显卡驱动

近日新购了一台DELL服务器,用于TensorFlow,由于显卡是另加的,需要安装显卡驱动。

服务器配置

  • 服务器型号:DELL PowerEdge R730
  • CPU:2*Intel(R) Xeon(R) E5-2650 v4
  • 内存:8*32G
  • 磁盘:2*1.2T,raid 0
  • 显卡:2*Nvidia RTX2080
  • 系统:Ubuntu 18.04

使用标准Ubuntu 仓库进行自动化安装

首先,检测显卡型号和推荐的驱动程序的模型。在命令行中输入如下命令:

1
2
3
4
5
6
7
8
9
root@rohn-PowerEdge-R730:/home/rohn#  ubuntu-drivers devices
== /sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0 ==
modalias : pci:v000010DEd00001E82sv00001043sd00008674bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-415 - third-party free
driver : nvidia-driver-430 - third-party free recommended
driver : nvidia-driver-418 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin

从输出结果可以看到,目前系统已连接Nvidia RTX2080显卡,CUDA 10.0 需要 410.x 或更高版本。并且建议安装驱动程序是nvidia-430版本的驱动。

安装驱动:

1
sudo ubuntu-drivers autoinstall

由于DELL对未认证的PCI设备的热量估算不准确造成的,默认会加大风扇风速。可以用ipmi有关命令关闭PCIE卡的响应。

1
2
sudo apt install ipmitool
ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x01 0x00 0x00

安装完成后重启系统:

1
reboot

查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
root@rohn-PowerEdge-R730:~# nvidia-smi
Mon Jun 3 09:56:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:04:00.0 Off | N/A |
| 22% 28C P8 17W / 215W | 0MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2080 Off | 00000000:82:00.0 Off | N/A |
| 22% 29C P8 20W / 215W | 0MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

安装CUDA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-10-0 \
libcudnn7=7.4.1.5-1+cuda10.0 \
libcudnn7-dev=7.4.1.5-1+cuda10.0


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get update && \
sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \
&& sudo apt-get update \
&& sudo apt-get install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0