Weekly Colloquia Series
Sponsored by Compute Ontario and presented by CAC, SciNet and SHARCNET
The Compute Ontario Colloquia is a weekly informational series hosted via Zoom. These informational presentations cover a wide range of Digital Research Infrastructure (DRI) topics, such as advanced research computing (ARC), research data management (RDM), and research software (RS). The presentations are delivered by Compute Ontario and consortium staff and featured speakers. The Colloquia are each 1 hour in length and include time for questions. No registration is required. Click on the button below to access the Zoom link for each webinar. Past events and links to recordings can be found at the bottom of this page. Presentations are also uploaded to the hosting consortium video channel.
UPDATED TOPIC! Survival guide for the upcoming GPU upgrades (more total power, but fewer GPUs)
Presenter: Sergey Mashchenko, SHARCNET
In the coming months, national systems will be undergoing significant upgrades. In particular, older GPUs (P100, V100) will be replaced with newest H100 GPUs from NVIDIA. The total GPU computing power of the upgraded systems will grow by a factor of 3.5, but the number of GPUs will go down significantly (from 3200 to 2100). This will present a significant challenge for our users, as "the business as usual" (using a whole GPU for each process or MPI rank) will no longer be feasible in most cases. Fortunately, NVIDIA provides two powerful technologies which can be used to mitigate this situation: MPS (Multi-Process Service) and MIG (Multi-Instance GPU). This presentation will walk you through both technologies, and discuss the ways they can be used on our clusters. We will discuss how to figure out which of the approaches will work the best for your code. At the end a live demonstration will be given.
Causal Inference using Probabilistic Variational Causal Effect in Observational Studies"
Presenter: Usef Faghihi, UQTR, SHARCNET
In this presentation, I introduce a novel causal analysis methodology called
Probabilistic Variational Causal Effect (PACE) designed to evaluate the impact of both
rare and common events in observational studies. PACE quantifies the direct causal
effects by integrating total variation, which captures the purely causal component,
with interventions on varying treatment levels. This integration also incorporates the
likelihood of transitions between different treatment states. A key feature of PACE is
the parameter d, which allows the metric to emphasize less frequent treatment
scenarios when d is low, and more common treatments when d is high, providing a causal
effect function dependent on d.
Setting Up Compute Infrastructure for Sensitive Data
Presenter: Yohai Meiron & Shawn Winnington-Ball, SciNet
We introduce a secure computing enclave at SciNet High-Performance Computing Consortium. Codenamed S4H, this environment is already available to groups at the University of Toronto as a pilot project. S4H aims to meet researchers’ needs for hosting and working with sensitive data, which SciNet’s main cluster, Niagara, does not accommodate. In the first part (Yohai), we’ll delve into the technical details. We’ll explain how S4H is different from Niagara in that the data are encrypted at rest and access is hardened, and what that means in practice. We will talk about the difficulties of providing isolation for different research groups on a shared system, and explore the different components that make it possible, such as key management and containerization mechanisms. The second part (Shawn) will focus on adopting the Cybersecurity Maturity Model Certification (CMMC) framework. We’ll describe our journey deciphering the control set’s complexities, developing metadata for organizing remediation efforts, and crafting Plans of Action and Milestones for compliance gaps. Future steps include internal and potentially external assessments to verify compliance, with initiatives like Privacy Impact Assessment and penetration testing, with the eventual goal of being certified for Level 4 data.
PAST: Git Part 3: Managing Workflows
Presenter: Ed Armstrong, SHARCNET
This session explores strategies and tools that enhance collaboration, improve workflow efficiency, and streamline code management. Building on foundational Git concepts presented in parts 1 & 2, we'll explore branching strategies, focusing on how they shape team collaboration and project stability. We'll cover essential commands for rebasing, merging, and resolving conflicts, providing insights into when and why each method works best. Additionally, we'll look at interactive rebase for rewriting commit history, leveraging Git tags for versioning, and working with stashes to manage changes across different contexts. This deeper dive will enable users to maintain cleaner commit histories, manage complex workflows, and handle more advanced project requirements.
PAST: Delivering Code-Heavy Presentations with Markdown
Video not available.
Presenter: Ramses van Zon, SciNet
There is no shortage of tools available for producing presentations in various formats, each with its own strengths and limitations. However, technical presentations containing substantial amounts of code present additional challenges and demand specific features. Drawing from extensive experience in delivering technical presentations, this session will share key lessons learned in creating polished, clear, and maintainable slide decks. Markdown, a simple but versatile text-based format, will be demonstrated to be an effective input format to achieve these goals. This presentation will also provide a curated overview of markdown-based tools that can turn markdown input files into PDF or HTML presentations.
PAST: Parallel Programming: MPI I/O Basics
Presenter: Jemmy Hu, SHARCNET
MPI-IO is a set of extensions to the MPI library that enable parallel high-performance I/O operations. It provides a parallel file access interface that allows multiple processes to write and read to the same file simultaneously. MPI-IO allows for efficient data transfer between processes and enables high-performance I/O operations on large datasets. It also provides additional features such as collective I/O, non-contiguous access, and file locking. This seminar will talk about some basic features of MPI I/O.
PAST: Introduction to Data Preparation
Video not available.
Presenter: Shadi Khalifa, CAC
This presentation provides you with essential knowledge to effectively prepare data for analysis. Starting with an overview of the Data Analytics pipeline and processes, explore various statistical and visualization techniques used in Exploratory and Descriptive Analytics to understand historical data. You will then delve into the art of Data Preparation, gaining expertise in data cleaning, handling missing values, detecting, and handling outliers, as well as transforming and engineering features. By the end of the presentation, you will have a general understanding of the necessary tools to ensure data quality and integrity, enabling you to make informed decisions and derive valuable insights from their data.
PAST: Introspection for Jobs: in-job monitoring of performance
Presenter: Mark Hahn, SHARCNET
Several types of data are collected about performance while a job is running; ultimately, much of this winds up in portals and can be examined. This data is also available to the job, while it runs, and can provide additional insight. Here, we'll demonstrate scripts that can execute within the job context to, for instance, provide summaries of performance at the job's end, or within sections of the job.
PAST: Multi-dimensional arrays in C++
Presenter: Paul Preney, SHARCNET
The C++ 2023 standard has std::mdspan which provides a light-weight non-owning multidimensional view of a contiguous single-dimensioned array. This enables the reinterpretation of an underlying contiguous array as a multi-dimensional array with support for different memory layouts (e.g., C and Fortran) and accessing elements (e.g., directly, using atomics, etc.). As found in other programming languages, subset views / "slices" are also possible. One does not need to use the latest compiler tools or C++ standard: one can use mdspan using a reference implementation that has been backported to C++17 and C++14 as well with NVIDIA's CUDA and NVHPC tools. This talk will discuss how to use mdspan in C++ programs.
PAST: Debugging and Optimization of PyTorch Models
Presenter: Colin Wilson, SHARCNET
Deep learning models are often viewed as uninterpretable "black boxes". As researchers, we often extend this thinking to the memory and compute utilization of such models. Using PyTorch Profiler, we can identify model bugs and bottlenecks to understand how to improve model performance from an efficiency perspective. This will improve training scaling and allow completion of large hyperparameter optimizations more efficiently. Here we will dicuss the usage of PyTorch Profiler, including some case studies of real training examples, and discuss possible optimizations based on profiler results.
PAST: Using machine learning to predict rare events
Presenter: Weiguang Guan, SHARCNET
In some binary classification problems, the underlying distribution of positive and negative samples are highly unbalanced. For example, fraudulent credit card transactions are rare compared to the volume of legitimate transactions. Training a classification model in such a case needs to take into account the nature of skewed distribution. In this seminar, we will develop a fraud detector which can be used to screen credit card transactions. We will describe the methods used to handle unbalanced data training.
PAST: Diagnosing Wasted Resources from User Facing Portals on the National Clusters
Presenter: Tyler Collins, SHARCNET
Researchers often leave resources on the table when specifying their job requirements on the national systems. This talk builds on previous sessions and uses the Digital Research Alliance of Canada's User Facing Portals to explore what different types of jobs look like when they waste resources. Demonstrations will include interactive jobs, parallel jobs, GPU workflows, and more. With more accurate job specifications, researchers can expect less wait times, and more throughput on any general purpose system.
PAST: The Emergence of WebAssembly (Wasm) in Scientific Computing
Presenter: Armin Sobhani, SHARCNET
Developed collaboratively by major browser vendors, including Mozilla, Google, Microsoft, and Apple, WebAssembly (Wasm) addresses the limitations of traditional web programming languages like JavaScript. But what makes it so compelling for scientists? First, Wasm allows code written in languages like C/C++, Fortran or Rust to be compiled into its instruction format and run directly in the browser, making it accessible to anyone without installation hassles and eliminating the need for external servers. Second, with Wasm, developers can recycle existing code with near-native performance but without the hassle of rewriting it in JavaScript. Join us as we explore how Wasm is reshaping scientific workflows and empowering researchers worldwide.
PAST: Exploring Compute Usage from User Facing Portals on the National Clusters
Presenter: James Desjardins, SHARCNET
Previous seminars in this series have described using Python tools to explore job properties and usage characteristics on the Digital Research Alliance of Canada general purpose compute clusters. The end goal of exploring job properties and usage characteristics is to get the most out of the resources available to research accounts and to minimize wait times in the job queue. This seminar reviews important properties of the scheduling configuration that may impact research throughput, then demonstrates how portals can help explore the relevant properties of research account compute usage on the clusters.
PAST: Bioinformatics: Advancements and challenges in the era of big data analysis
Presenter: Sridhar Ravichandran
Advancements in sequencing technologies have revolutionized biological sciences. Next Generation Sequencing (NGS) approaches are now a routine part of biological research generating staggering amounts of data. This rapid growth poses significant challenges in data acquisition, storage, and distribution. In addition, large-scale data analysis among national and international collaborations requires portable, scalable, and reproducible computational analysis. This webinar will also cover some best practices including workflow management for pipeline development, handling software installation, databases to run on different compute platforms and enable workflow portability and sharing.
PAST: CO Summer School 2024
Presenter: Pawel Pomorski, SHARCNET
In this colloquium, we will present the curriculum of the 2024 Compute Ontario Summer School, to be held from the 3rd to the 21st of June. Jointly organized by the Centre for Advanced Computing, SciNet, SHARCNET, and in collaboration with the Research Data Management Network of Experts, this school is packed with around 40 free courses available at introductory to advanced levels and delivered by experts in the field and covers a wide range of topics.
PAST: C++ Modules
Presenter: Ramses van Zon, SciNet
Modules are source code components with a well-defined interface such that they can be reused in other code without requiring knowledge of the implementation or exposing its internals. In this talk, we will discuss how modules can be supported in C++. We will see that for most of its existance, C++ has had to use the "header file plus object files" paradigm inherited from C to support modules, until proper C++-based modules were introduced in the C++20 standard. Unfortunately, few C++ compilers fully the C++20 modules, and their implementations and usage varies a lot. We will give an overview not only of the issues but also of what is possible with current compilers.
PAST: Data Wrangling with Tidyverse (part 2)
Presenter: Tyson Whitehead, SHARCNET
Tidyverse is an cohesive set of packages for doing data science in R. In an earlier talk, we began reviewing the data munging portions of tidyvese (dplyr, forcats, tibble, readr, stringr, tidyr, and purr) by using it to reconstruct the data hierarchy in a 500 pages reference PDF given only the words on each page and their bounding boxes. This talk will complete this. If you have not seen the first part, or wish to review it, you can find it here: https://www.youtube.com/watch?v=8_Q-WwqY_Og For completeness, we also covered the graphical portion of tidyverse (ggplot) here: https://www.youtube.com/watch?v=PR2Rs0W4zYg
PAST: How to Buy a Supercomputer for Scientific Computing
Presenter: James Willis, SciNet
We will discuss how to set criteria for selecting a replacement for existing advanced research computing cluster, and in particular, what criteria were chosen for Niagara, Canada's Large Parallel Computing Cluster.
PAST: Accelerating data analytics with RAPIDS cuDF
Presenter: Nastaran Shahparian, SHARCNET
Pandas, renowned as the go-to library for data manipulation and analysis in Python and widely adopted in machine learning. However, Pandas is slow. With the introduction of NVIDIA cuDF.pandas, the accelerated power of GPUs is integrated into Pandas, enabling faster processing without the need for any code changes. A live demo will showcase this enhancement on clusters.
PAST: Accelerating graph analysis on GPUs
Presenter: Jinhui Qin, SHARCNET
Graph analysis plays a critical role in many applications across various domains, ranging from social network analysis to bioinformatics, to fraud detection, to cybersecurity, to recommendation systems, etc. NetworkX is the go-to library for graph analysis in Python. However, when dataset and graph sizes grow, the performance of using NetworkX becomes a significant concern. This webinar introduces NVIDIA cuGraph for accelerating graph analysis on GPUs. Moreover, a recent integration of NetworkX with cuGraph, named nx-cugraph, allows accelerating workflows in NetworkX on GPUs with zero code changes. A live demo will be done on the clusters.
PAST Make: obsolete or elegant?
Presenter: Mark Hahn, SHARCNET
Make is a classic Unix development tool, which may seem archaic and narrow-purpose. But if you think of it as a declarative, parallelized workflow automation tool, it sounds more relevant. We'll consider stereotypical use of make, then its general properties, and show some interesting examples of make applied to unusual uses.