This document describes the set of tasks necessary to plan to launch an openstack instance being able to use one or more GPU resources.
The document assumes that you have an openstack account and your project environment is already ready to use.
In order we must
You must choose a template able to make "passtrough"
$ openstack flavor show m3.gpu_xxxlarge_3 -c properties -c name
+------------+----------------------------------------------+
| Field | Value |
+------------+----------------------------------------------+
| name | m3.gpu_xxxlarge_3 |
| properties | pci_passthrough:alias='geforcertx-2080-ti:3' |
+------------+----------------------------------------------+
Take in mind the number of gpu devices. There is flavors for one or more cards.
$ openstack flavor list | grep 'm3.'
| 10 | m3.gpu_xlarge_2 | 61440 | 320 | 0 | 16 | True |
| 7 | m3.gpu_large_1 | 8192 | 80 | 0 | 4 | True |
| 8 | m3.gpu_large_2 | 8192 | 80 | 0 | 4 | True |
| 9 | m3.gpu_xlarge_1 | 30720 | 320 | 0 | 8 | True |
Choosing "m3.gpu_large_2" is choosing a template able to build a instance with a 4 core CPU, 8 GB of ram, an 80 Gb disk and two (2) GPU cards.
$ openstack image list --status active -c Name | grep -i gpu
| C7_gpu_Anaconda3 |
| Rocky8_gpu_Anaconda3 |
The instance must have a a property of the type 'img_hide_hypervisor_id': "true", in order to satisfy the constraints of the NVIDIA driver.
$ openstack image show -c properties --format json C7_gpu_Anaconda3
{
"properties": {
"os_hidden": false,
"os_hash_algo": "sha512",
"os_hash_value": "351860779f2940d312d94735a89707f064869c0794aaa0db11d1f115a49f08f9e4e8b1f094ddad85d66e9dfa05511fd1588f83eddc008244165c0aecd18e5a8d",
"image_state": "available",
"boot_roles": "user",
"user_id": "d3ab4b5a369f4e60a02c6e1f0552eab2",
"image_type": "snapshot",
"base_image_ref": "00891b31-1d54-4d2c-bb1a-90ff7068d446",
"owner_project_name": "proj01",
"img_hide_hypervisor_id": "true",
"image_location": "snapshot",
"owner_user_name": "ccarranza",
"instance_uuid": "10ca670a-7395-4bfc-bfbf-96e06acb6e59"
}
}
$openstack server create --flavor m3.gpu_xlarge_1 \
--image C7_gpu_Anaconda3 \
--nic net-id=${OS_NET} \
--security-group default \-
--key-name your-key \
--description " instance created for .... in .... context" \
--availavility-zone nova \
yourname_gpu1
$openstack server add floating ip yourname_gpu1 134.158.21.14
$ ping 134.158.21.14
In the instance the prompt showed that the anaconda3 "base" environment is preconfigured.
$ ssh -i /home/guest/.ssh/your_priv_key.pem centos@134.158.21.14
(base) [centos@jc-gpu1 ~]$
(base) [centos@jc-gpu1 ~]$ conda env list
# conda environments:
#
base * /opt/anaconda3
We can test the NVIDIA driver, the template chooses m3.gp_xlarge_1 is prepared to attach one (1) gpu card.
(base) [centos@jc-gpu1 ~]$ nvidia-smi
Fri Jul 31 14:07:54 2020
+---------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA 10.2 |
|-------------------------------+-------------------+-----------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.|
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute|
|===============================+===================+=================|
| 0 GeForce RTX 208... Off |00000000:00:05.0Off| N/A |
| 10% 33C P0 1W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+-------------------+-----------------+
+---------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===============================================================|
No running processes found |
+---------------------------------------------------------------+
In work