Weekly Colloquia Series

Sponsored by Compute Ontario and presented by CAC, SciNet and SHARCNET

The Compute Ontario Colloquia is a weekly informational series hosted via Zoom. These informational presentations cover a wide range of Digital Research Infrastructure (DRI) topics, such as advanced research computing (ARC), research data management (RDM), and research software (RS). The presentations are delivered by Compute Ontario and consortium staff and featured speakers. The Colloquia are each 1 hour in length and include time for questions. No registration is required. Click on the button below to access the Zoom link for each webinar. Past events and links to recordings can be found at the bottom of this page. Presentations are also uploaded to the hosting consortium video channel.


Revisiting Cython: Is it still effective?

Presenter: Tyler Collins, SHARCNET

Python is often praised for its speed of development but criticized for its execution speed. However, this has changed significantly in recent years due to major improvements in both Python itself and its most popular libraries. Libraries such as TensorFlow, OpenCV, NumPy, and Pandas all leverage a tool called Cython. Cython is an extension of Python that allows functions to be compiled into C (or C++), mitigating Python's performance limitations.Since the last webinar on this package in 2020, several major versions have been released. This talk will explore what has changed, whether performance has improved, and if development has become easier. During the webinar, a few demo problems will be explored live. Experience with Python is expected, while familiarity with C/C++ and Jupyter notebooks will be helpful.

View Event →


PAST: Preview of the "Job Scheduling and Monitoring" self-paced course


Presenter: James Desjardins, SHARCNET

The growing library of self-paced courses offered by SHARCNet provide materials that directly relate to research productivity on Alliance compute clusters. This colloquium presentation is a preview to a soon to be release self-paced course on job scheduling and monitoring resources on Alliance clusters. The material in this course includes introductory bash prerequisite to working with the scheduler, introduction to the role of the scheduler on a compute cluster, basics of using the SLURM scheduler, scheduling properties that are specific to the Alliance environment, methods for monitoring jobs on the clusters both from the terminal on the cluster and from portals being developed for users. This colloquium is not only intended to be an opportunity to discuss the content of the self-paced course, but also an opportunity to gather input from users about important issues to add to the course in its final revisions.

View Event →

PAST: Reduction of errors, or the pursuit of correctness


Presenter: Baolai Ge, SHARCNET

In this talk, we address the impact of rounding errors encountered but often ignored in scientific and high performance computing. We begin with examples in day to day research computing and then advance to an example of summation in parallel computing with MPI and OpenMP reduction. We wish to share our explorations and hope this talk may lead to future discussions.

View Event →

PAST: High-Performance Data Science with Modern C++: Xeus-Cling and G3P


Presenter: Armin Sobhani

This is a series of talks about using modern C++ for high-performance data science. In the first talk of the series, we talk a little bit about the pros and cons of using C++ for data science projects and then we cover Xeus-Cling (https://github.com/jupyter-xeus/xeus-cling ) and G3P (Https://github.com/arminms/g3p) for rapid-prototyping of C++ codes and embedding plots and charts in a Jupyter notebook, respectively. The whole series will be available as an executable book (https://executablebooks.org/) at https://arminms.github.io/high-performance-data-science-with-modern-cpp.

View Event →

PAST: Converting Python code with NumPy to run on the GPU


Presenter: Pawel Pomorski, SHARCNET

Python's NumPy library is one of the standard ways for researchers to perform mathematical computations. With the wider availability and power of GPU resources, the need arises to convert NumPy programs to run on the GPU to improve their performance, ideally with as little modification as possible. This seminar will discuss CuPy, a library highly compatible with NumPy, which offers drop-in replacement for most NumPy (and SciPy) functions. The seminar will discuss the basic techniques used to convert a NumPy program to CuPy, emphasizing the good practices required to obtain an efficient code. Another more recently developed approach is the cuPyNumeric library from NVIDIA, which allows running NumPy code on the GPU with no code changes. The seminar will discuss how to install and run it on HPC clusters.

View Event →

PAST: The International High-Performance Computing Summer School

Presenter: Ramses van Zon

Computational research requires specialized skills not always taught as part of academic curricula. The training done by the Advanced Research Computing centers and organization in Canada and around the world fill this void. In this training landscape, the International High Performance Computing Summer School (IHPCSS) holds a special place. It is an expenses paid, in-person training event that brings together leading experts on High Performance Computing and Big Data Analytics and the students that need this knowledge. Uniquely, it has a strong mentoring component to give advice to students not only about technical issues but also on many other aspects of their studies and career. This year, the Digital Research Alliance of Canada will sponsor about 10 students to go to the IHPCSS in Lisdon, Portugal, July 6-11, 2025. In this talk, we will explain more about the school and how you might apply.

View Event →

PAST: Unlocking the Power of Comet: Streamlining Machine Learning Experimentation


Presenter: Nast Shahparian, SHARCNET

Comet is an easy-to-use platform for tracking and optimizing machine learning experiments. It integrates with popular frameworks like TensorFlow and PyTorch, allowing users to log metrics, hyperparameters, and model results. Comet helps visualize experiment progress in real-time, tune hyperparameters, and ensure reproducibility. A live demo will show how Comet improves collaboration, model performance, and tracks the environmental impact of machine learning projects.

View Event →

PAST: Interactive computing with Open Ondemand


Presenter: James Willis, SciNet

In this talk, we will introduce Open OnDemand, a web-based interface designed to provide easy access to High-Performance Computing (HPC) resources. Terminal-based interfaces can be daunting for new users with little to no experience, creating a steep learning curve. Open OnDemand aims to make HPC more accessible by offering an intuitive graphical interface that simplifies the process of submitting, monitoring, and managing jobs. We will explore the key features of Open OnDemand, including web-based access, job management, file management and support for interactive applications like Jupyter Notebooks, RStudio, and VS Code. Additionally, we will demonstrate the SciNet Open OnDemand portal and discuss its deployment and use cases.

View Event →

PAST: Data Wrangling with Tidyverse (part 3)


Presenter: Tyson Whitehead, SHARCNET

Tidyverse is an cohesive set of packages for doing data science in R. In an earlier talk, we began reviewing the data munging portions of tidyvese (dplyr, forcats, tibble, readr, stringr, tidyr, and purr) by using it to reconstruct the data hierarchy in a 500 pages reference PDF given only the words on each page and their bounding boxes. This talk will complete this.

If you have not seen the first two parts, or wish to review it, you can find it here https://www.youtube.com/watch?v=8_Q-WwqY_Og


We also covered the graphical portion of tidyverse (ggplot) here https://www.youtube.com/watch?v=PR2Rs0W4zYg

View Event →

PAST: Causal Inference using Probabilistic Variational Causal Effect in Observational Studies"


Presenter: Usef Faghihi, UQTR, SHARCNET

In this presentation, I introduce a novel causal analysis methodology called
Probabilistic Variational Causal Effect (PACE) designed to evaluate the impact of both
rare and common events in observational studies. PACE quantifies the direct causal
effects by integrating total variation, which captures the purely causal component,
with interventions on varying treatment levels. This integration also incorporates the
likelihood of transitions between different treatment states. A key feature of PACE is
the parameter d, which allows the metric to emphasize less frequent treatment
scenarios when d is low, and more common treatments when d is high, providing a causal
effect function dependent on d.

View Event →

PAST: Survival guide for the upcoming GPU upgrades (more total power, but fewer GPUs)


Presenter: Sergey Mashchenko, SHARCNET

In the coming months, national systems will be undergoing significant upgrades. In particular, older GPUs (P100, V100) will be replaced with newest H100 GPUs from NVIDIA. The total GPU computing power of the upgraded systems will grow by a factor of 3.5, but the number of GPUs will go down significantly (from 3200 to 2100). This will present a significant challenge for our users, as "the business as usual" (using a whole GPU for each process or MPI rank) will no longer be feasible in most cases. Fortunately, NVIDIA provides two powerful technologies which can be used to mitigate this situation: MPS (Multi-Process Service) and MIG (Multi-Instance GPU). This presentation will walk you through both technologies, and discuss the ways they can be used on our clusters. We will discuss how to figure out which of the approaches will work the best for your code. At the end a live demonstration will be given.

View Event →

PAST: Setting Up Compute Infrastructure for Sensitive Data


Presenter: Yohai Meiron & Shawn Winnington-Ball, SciNet

We introduce a secure computing enclave at SciNet High-Performance Computing Consortium. Codenamed S4H, this environment is already available to groups at the University of Toronto as a pilot project. S4H aims to meet researchers’ needs for hosting and working with sensitive data, which SciNet’s main cluster, Niagara, does not accommodate. In the first part (Yohai), we’ll delve into the technical details. We’ll explain how S4H is different from Niagara in that the data are encrypted at rest and access is hardened, and what that means in practice. We will talk about the difficulties of providing isolation for different research groups on a shared system, and explore the different components that make it possible, such as key management and containerization mechanisms. The second part (Shawn) will focus on adopting the Cybersecurity Maturity Model Certification (CMMC) framework. We’ll describe our journey deciphering the control set’s complexities, developing metadata for organizing remediation efforts, and crafting Plans of Action and Milestones for compliance gaps. Future steps include internal and potentially external assessments to verify compliance, with initiatives like Privacy Impact Assessment and penetration testing, with the eventual goal of being certified for Level 4 data.

View Event →

PAST: Git Part 3: Managing Workflows


Presenter: Ed Armstrong, SHARCNET

This session explores strategies and tools that enhance collaboration, improve workflow efficiency, and streamline code management. Building on foundational Git concepts presented in parts 1 & 2, we'll explore branching strategies, focusing on how they shape team collaboration and project stability. We'll cover essential commands for rebasing, merging, and resolving conflicts, providing insights into when and why each method works best. Additionally, we'll look at interactive rebase for rewriting commit history, leveraging Git tags for versioning, and working with stashes to manage changes across different contexts. This deeper dive will enable users to maintain cleaner commit histories, manage complex workflows, and handle more advanced project requirements.

View Event →

PAST: Delivering Code-Heavy Presentations with Markdown


Presenter: Ramses van Zon, SciNet

There is no shortage of tools available for producing presentations in various formats, each with its own strengths and limitations. However, technical presentations containing substantial amounts of code present additional challenges and demand specific features. Drawing from extensive experience in delivering technical presentations, this session will share key lessons learned in creating polished, clear, and maintainable slide decks. Markdown, a simple but versatile text-based format, will be demonstrated to be an effective input format to achieve these goals. This presentation will also provide a curated overview of markdown-based tools that can turn markdown input files into PDF or HTML presentations.

View Event →

PAST: Parallel Programming: MPI I/O Basics


Presenter: Jemmy Hu, SHARCNET

MPI-IO is a set of extensions to the MPI library that enable parallel high-performance I/O operations. It provides a parallel file access interface that allows multiple processes to write and read to the same file simultaneously. MPI-IO allows for efficient data transfer between processes and enables high-performance I/O operations on large datasets. It also provides additional features such as collective I/O, non-contiguous access, and file locking. This seminar will talk about some basic features of MPI I/O.

View Event →

PAST: Introduction to Data Preparation

Video not available.

Presenter: Shadi Khalifa, CAC

This presentation provides you with essential knowledge to effectively prepare data for analysis. Starting with an overview of the Data Analytics pipeline and processes, explore various statistical and visualization techniques used in Exploratory and Descriptive Analytics to understand historical data. You will then delve into the art of Data Preparation, gaining expertise in data cleaning, handling missing values, detecting, and handling outliers, as well as transforming and engineering features. By the end of the presentation, you will have a general understanding of the necessary tools to ensure data quality and integrity, enabling you to make informed decisions and derive valuable insights from their data.

View Event →

PAST: Introspection for Jobs: in-job monitoring of performance


Presenter: Mark Hahn, SHARCNET

Several types of data are collected about performance while a job is running; ultimately, much of this winds up in portals and can be examined. This data is also available to the job, while it runs, and can provide additional insight. Here, we'll demonstrate scripts that can execute within the job context to, for instance, provide summaries of performance at the job's end, or within sections of the job.

View Event →

PAST: Multi-dimensional arrays in C++


Presenter: Paul Preney, SHARCNET

The C++ 2023 standard has std::mdspan which provides a light-weight non-owning multidimensional view of a contiguous single-dimensioned array. This enables the reinterpretation of an underlying contiguous array as a multi-dimensional array with support for different memory layouts (e.g., C and Fortran) and accessing elements (e.g., directly, using atomics, etc.). As found in other programming languages, subset views / "slices" are also possible. One does not need to use the latest compiler tools or C++ standard: one can use mdspan using a reference implementation that has been backported to C++17 and C++14 as well with NVIDIA's CUDA and NVHPC tools. This talk will discuss how to use mdspan in C++ programs.

View Event →

PAST: Debugging and Optimization of PyTorch Models


Presenter: Colin Wilson, SHARCNET

Deep learning models are often viewed as uninterpretable "black boxes". As researchers, we often extend this thinking to the memory and compute utilization of such models. Using PyTorch Profiler, we can identify model bugs and bottlenecks to understand how to improve model performance from an efficiency perspective. This will improve training scaling and allow completion of large hyperparameter optimizations more efficiently. Here we will dicuss the usage of PyTorch Profiler, including some case studies of real training examples, and discuss possible optimizations based on profiler results.

View Event →

PAST: Using machine learning to predict rare events


Presenter: Weiguang Guan, SHARCNET

In some binary classification problems, the underlying distribution of positive and negative samples are highly unbalanced. For example, fraudulent credit card transactions are rare compared to the volume of legitimate transactions. Training a classification model in such a case needs to take into account the nature of skewed distribution. In this seminar, we will develop a fraud detector which can be used to screen credit card transactions. We will describe the methods used to handle unbalanced data training.

View Event →

PAST: Diagnosing Wasted Resources from User Facing Portals on the National Clusters


Presenter: Tyler Collins, SHARCNET

Researchers often leave resources on the table when specifying their job requirements on the national systems. This talk builds on previous sessions and uses the Digital Research Alliance of Canada's User Facing Portals to explore what different types of jobs look like when they waste resources. Demonstrations will include interactive jobs, parallel jobs, GPU workflows, and more. With more accurate job specifications, researchers can expect less wait times, and more throughput on any general purpose system.

View Event →

PAST: The Emergence of WebAssembly (Wasm) in Scientific Computing


Presenter: Armin Sobhani, SHARCNET

Developed collaboratively by major browser vendors, including Mozilla, Google, Microsoft, and Apple, WebAssembly (Wasm) addresses the limitations of traditional web programming languages like JavaScript. But what makes it so compelling for scientists? First, Wasm allows code written in languages like C/C++, Fortran or Rust to be compiled into its instruction format and run directly in the browser, making it accessible to anyone without installation hassles and eliminating the need for external servers. Second, with Wasm, developers can recycle existing code with near-native performance but without the hassle of rewriting it in JavaScript. Join us as we explore how Wasm is reshaping scientific workflows and empowering researchers worldwide.

View Event →

PAST: Exploring Compute Usage from User Facing Portals on the National Clusters


Presenter: James Desjardins, SHARCNET

Previous seminars in this series have described using Python tools to explore job properties and usage characteristics on the Digital Research Alliance of Canada general purpose compute clusters. The end goal of exploring job properties and usage characteristics is to get the most out of the resources available to research accounts and to minimize wait times in the job queue. This seminar reviews important properties of the scheduling configuration that may impact research throughput, then demonstrates how portals can help explore the relevant properties of research account compute usage on the clusters.

View Event →