Back to All Events

UPDATED TOPIC! Survival guide for the upcoming GPU upgrades (more total power, but fewer GPUs)

Presenter: Sergey Mashchenko, SHARCNET

In the coming months, national systems will be undergoing significant upgrades. In particular, older GPUs (P100, V100) will be replaced with newest H100 GPUs from NVIDIA. The total GPU computing power of the upgraded systems will grow by a factor of 3.5, but the number of GPUs will go down significantly (from 3200 to 2100). This will present a significant challenge for our users, as "the business as usual" (using a whole GPU for each process or MPI rank) will no longer be feasible in most cases. Fortunately, NVIDIA provides two powerful technologies which can be used to mitigate this situation: MPS (Multi-Process Service) and MIG (Multi-Instance GPU). This presentation will walk you through both technologies, and discuss the ways they can be used on our clusters. We will discuss how to figure out which of the approaches will work the best for your code. At the end a live demonstration will be given.

Previous
Previous
November 13

Setting Up Compute Infrastructure for Sensitive Data

Next
Next
November 27

Causal Inference using Probabilistic Variational Causal Effect in Observational Studies"