reseau:cloud:proxmox:lxcnvidia
Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
| reseau:cloud:proxmox:lxcnvidia [2026/03/30 09:33] – [Installation des pilotes sur le serveur Proxmox] techer.charles_educ-valadon-limoges.fr | reseau:cloud:proxmox:lxcnvidia [2026/03/30 15:08] (Version actuelle) – [Nvidia dans le Container LXC] techer.charles_educ-valadon-limoges.fr | ||
|---|---|---|---|
| Ligne 144: | Ligne 144: | ||
| * Vérifier si Proxmox voit bien les deux GPU au niveau PCIe | * Vérifier si Proxmox voit bien les deux GPU au niveau PCIe | ||
| - | < | + | < |
| + | # lspci | grep -i nvidia | ||
| AF:00.0 NVIDIA Corporation TU104GL [Tesla T4] | AF:00.0 NVIDIA Corporation TU104GL [Tesla T4] | ||
| B0:00.0 NVIDIA Corporation TU104GL [Tesla T4] | B0:00.0 NVIDIA Corporation TU104GL [Tesla T4] | ||
| + | </ | ||
| + | |||
| + | * vérifier que CUDA voit les deux cartes | ||
| + | |||
| + | < | ||
| + | # nvidia-smi -L | ||
| + | GPU 0: Tesla T4 (UUID: GPU-e5bc6842-5aa8-b29e-aa13-922b15c893f9) | ||
| + | GPU 1: Tesla T4 (UUID: GPU-6ac33a99-2cb8-eb7d-6097-f1c29e4d1e51) | ||
| + | </ | ||
| + | |||
| + | * Vérifier si le driver charge bien les deux GPU : il ne doit y avoir aucune erreur | ||
| + | |||
| + | < | ||
| + | # dmesg | grep -i nvidia | ||
| + | Erreurs possibles : | ||
| + | GPU has fallen off the bus | ||
| + | PCIe error | ||
| + | failed to initialize gpu | ||
| + | RUNTIME_PM: error | ||
| + | Unknown chipset | ||
| + | NVRM: RmInitAdapter failed | ||
| + | </ | ||
| + | |||
| + | * Vérifier si le module UVM détecte les deux GPU | ||
| + | |||
| + | < | ||
| + | # cat / | ||
| + | Il doit y avoir deux répertoires (0 et 1) : | ||
| + | # nvidia-smi -q | grep -i " | ||
| + | Compute Mode : Default | ||
| + | root@siohyp2: | ||
| + | Model: | ||
| + | IRQ: 44 | ||
| + | GPU UUID: GPU-e5bc6842-5aa8-b29e-aa13-922b15c893f9 | ||
| + | Video BIOS: 90.04.b4.00.04 | ||
| + | Bus Type: PCIe | ||
| + | DMA Size: 47 bits | ||
| + | DMA Mask: 0x7fffffffffff | ||
| + | Bus Location: | ||
| + | Device Minor: | ||
| + | GPU Firmware: | ||
| + | GPU Excluded: | ||
| + | Model: | ||
| + | IRQ: 46 | ||
| + | GPU UUID: GPU-6ac33a99-2cb8-eb7d-6097-f1c29e4d1e51 | ||
| + | Video BIOS: 90.04.b4.00.04 | ||
| + | Bus Type: PCIe | ||
| + | DMA Size: 47 bits | ||
| + | DMA Mask: 0x7fffffffffff | ||
| + | Bus Location: | ||
| + | Device Minor: | ||
| + | GPU Firmware: | ||
| + | GPU Excluded: | ||
| + | </ | ||
| + | |||
| + | Il y a deux cartes avec des adresses PCI différentes : | ||
| + | * GPU 0 → 0000: | ||
| + | * GPU 1 → 0000: | ||
| + | |||
| + | * lancer un benchmark PCIe / mémoire | ||
| + | |||
| + | < | ||
| + | # nvidia-smi topo -m | ||
| + | GPU0 GPU1 CPU Affinity | ||
| + | GPU0 | ||
| + | GPU1 NODE | ||
| + | |||
| + | Legend: | ||
| + | |||
| + | X = Self | ||
| + | SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) | ||
| + | NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node | ||
| + | PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) | ||
| + | PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) | ||
| + | PIX = Connection traversing at most a single PCIe bridge | ||
| + | NV# = Connection traversing a bonded set of # NVLinks | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | # nvidia-smi -i 0 | ||
| + | Mon Mar 30 14:49:26 2026 | ||
| + | +-----------------------------------------------------------------------------------------+ | ||
| + | | NVIDIA-SMI 595.58.03 | ||
| + | +-----------------------------------------+------------------------+----------------------+ | ||
| + | | GPU Name | ||
| + | | Fan Temp | ||
| + | | | ||
| + | |=========================================+========================+======================| | ||
| + | | | ||
| + | | N/A | ||
| + | | | ||
| + | +-----------------------------------------+------------------------+----------------------+ | ||
| + | |||
| + | +-----------------------------------------------------------------------------------------+ | ||
| + | | Processes: | ||
| + | | GPU | ||
| + | | ID | ||
| + | |=========================================================================================| | ||
| + | | No running processes found | | ||
| + | +-----------------------------------------------------------------------------------------+ | ||
| + | </ | ||
| + | |||
| + | * charger le GPU 0 | ||
| + | |||
| + | < | ||
| + | # nvidia-smi --query-gpu=utilization.gpu --format=csv --loop=1 -i 0 | ||
| + | </ | ||
| + | |||
| + | * charger le GPU 1 | ||
| + | |||
| + | < | ||
| + | # nvidia-smi --query-gpu=utilization.gpu --format=csv --loop=1 -i 1 | ||
| </ | </ | ||
| ===== Nvidia dans le Container LXC ===== | ===== Nvidia dans le Container LXC ===== | ||
| Ligne 170: | Ligne 283: | ||
| {{ : | {{ : | ||
| {{ : | {{ : | ||
| + | |||
| + | <WRAP center round info > | ||
| + | Ne plus installer le périphéirque **/ | ||
| + | </ | ||
| * Installez les drivers nvidia et la suite logicielle **cuda** dans le conteneur LXC (procédure semblable à celle de l'hote Proxmox). | * Installez les drivers nvidia et la suite logicielle **cuda** dans le conteneur LXC (procédure semblable à celle de l'hote Proxmox). | ||
reseau/cloud/proxmox/lxcnvidia.1774856026.txt.gz · Dernière modification : 2026/03/30 09:33 de techer.charles_educ-valadon-limoges.fr
