Using GPU resources

Using GPU resources

Building with the Openstack Cli API

This document describes the set of tasks necessary to plan to launch an openstack instance being able to use one or more GPU resources.

The document assumes that you have an openstack account and your project environment is already ready to use.

Steps to follow

In order we must

choose a flavor for the instance
take the right image ( GPU's resources )
Build the instance
Make the instance accesible

Chossing a flavor

You must choose a template able to make "passtrough"

$ openstack flavor show m3.gpu_xxxlarge_3 -c properties -c name
+------------+----------------------------------------------+
| Field      | Value                                        |
+------------+----------------------------------------------+
| name       | m3.gpu_xxxlarge_3                            |
| properties | pci_passthrough:alias='geforcertx-2080-ti:3' |
+------------+----------------------------------------------+

Take in mind the number of gpu devices. There is flavors for one or more cards.

$ openstack flavor list | grep 'm3.'
| 10 | m3.gpu_xlarge_2 | 61440 |  320 |         0 |    16 | True      |
| 7  | m3.gpu_large_1  |  8192 |   80 |         0 |     4 | True      |             
| 8  | m3.gpu_large_2  |  8192 |   80 |         0 |     4 | True      |
| 9  | m3.gpu_xlarge_1 | 30720 |  320 |         0 |     8 | True      |

Choosing "m3.gpu_large_2" is choosing a template able to build a instance with a 4 core CPU, 8 GB of ram, an 80 Gb disk and two (2) GPU cards.

Choice of image

$ openstack image list --status active -c Name | grep -i gpu
| C7_gpu_Anaconda3     |
| Rocky8_gpu_Anaconda3 |

The instance must have a a property of the type 'img_hide_hypervisor_id': "true", in order to satisfy the constraints of the NVIDIA driver.

$ openstack image show -c properties  --format json C7_gpu_Anaconda3
{
  "properties": {
    "os_hidden": false,
    "os_hash_algo": "sha512",
    "os_hash_value": "351860779f2940d312d94735a89707f064869c0794aaa0db11d1f115a49f08f9e4e8b1f094ddad85d66e9dfa05511fd1588f83eddc008244165c0aecd18e5a8d",
    "image_state": "available",
    "boot_roles": "user",
    "user_id": "d3ab4b5a369f4e60a02c6e1f0552eab2",
    "image_type": "snapshot",
    "base_image_ref": "00891b31-1d54-4d2c-bb1a-90ff7068d446",
    "owner_project_name": "proj01",
    "img_hide_hypervisor_id": "true",
    "image_location": "snapshot",
    "owner_user_name": "ccarranza",
    "instance_uuid": "10ca670a-7395-4bfc-bfbf-96e06acb6e59"
  }
}

Building the instance

$openstack server create --flavor m3.gpu_xlarge_1 \
        --image C7_gpu_Anaconda3 \
        --nic net-id=${OS_NET} \
        --security-group default \-
        --key-name your-key \
        --description " instance created for .... in .... context" \
        --availavility-zone nova \
        yourname_gpu1

Add a floating ip

    $openstack server add floating ip yourname_gpu1 134.158.21.14

    $ ping 134.158.21.14

Check

In the instance the prompt showed that the anaconda3 "base" environment is preconfigured.

$ ssh -i /home/guest/.ssh/your_priv_key.pem centos@134.158.21.14

    (base) [centos@jc-gpu1 ~]$
    (base) [centos@jc-gpu1 ~]$ conda env list
    # conda environments:
    #
    base                  *  /opt/anaconda3

We can test the NVIDIA driver, the template chooses m3.gp_xlarge_1 is prepared to attach one (1) gpu card.

(base) [centos@jc-gpu1 ~]$ nvidia-smi
Fri Jul 31 14:07:54 2020
+---------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01      CUDA 10.2    |
|-------------------------------+-------------------+-----------------+
| GPU  Name        Persistence-M| Bus-Id     Disp.A | Volatile Uncorr.|
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage      | GPU-Util Compute| 
|===============================+===================+=================|
|   0  GeForce RTX 208...  Off  |00000000:00:05.0Off|         N/A     |
| 10%   33C    P0     1W / 250W |   0MiB / 11019MiB | 0%      Default |
+-------------------------------+-------------------+-----------------+

+---------------------------------------------------------------+
| Processes:                                         GPU Memory |
|  GPU       PID   Type   Process name               Usage      |
|===============================================================|
  No running processes found                                    |
+---------------------------------------------------------------+

Building with the dashboard

In work

Resources

[NVIDIA CUDA Support](GPU Support (NVIDIA CUDA & AMD ROCm) — Apptainer User Guide 1.0 documentation