Sharcnet HPC
Sharcnet HPC
  • Видео 245
  • Просмотров 905 986
Using machine learning to predict rare events
In some binary classification problems, the underlying distribution of positive and negative samples are highly unbalanced. For example, fraudulent credit card transactions are rare compared to the volume of legitimate transactions. Training a classification model in such a case needs to take into account the nature of skewed distribution. In this seminar, we will develop a fraud detector which can be used to screen credit card transactions. We will describe the methods used to handle unbalanced data training.
________________________________________­_________
This webinar was presented by Weiguang Guan (SHARCNET) on August 28th, 2024, as a part of a series of weekly Compute Ontario Colloq...
Просмотров: 72

Видео

Diagnosing Wasted Resources from User Facing Portals on the National Clusters
Просмотров 35День назад
Researchers often leave resources on the table when specifying their job requirements on the national systems. This talk builds on previous sessions and uses the Digital Research Alliance of Canada's User Facing Portals to explore what different types of jobs look like when they waste resources. Demonstrations will include interactive jobs, parallel jobs, GPU workflows, and more. With more accu...
The Emergence of WebAssembly (Wasm) in Scientific Computing
Просмотров 136День назад
Developed collaboratively by major browser vendors, including Mozilla, Google, Microsoft, and Apple, WebAssembly (Wasm) addresses the limitations of traditional web programming languages like JavaScript. But what makes it so compelling for scientists? First, Wasm allows code written in languages like C/C , Fortran or Rust to be compiled into its instruction format and run directly in the browse...
Exploring Compute Usage from User Facing Portals on the National Clusters
Просмотров 100Месяц назад
Previous seminars in this series have described using Python tools to explore job properties and usage characteristics on the Digital Research Alliance of Canada general purpose compute clusters. The end goal of exploring job properties and usage characteristics is to get the most out of the resources available to research accounts and to minimize wait times in the job queue. This seminar revie...
Compute Ontario Summer School 2024
Просмотров 5964 месяца назад
UPDATE: registration is now open for Compute Ontario Summer School (June 3-21, 2024): training.computeontario.ca/coss2024.php In this colloquium, we will present the curriculum of the 2024 Compute Ontario Summer School, to be held from the 3rd to the 21st of June. Jointly organized by the Centre for Advanced Computing, SciNet, SHARCNET, and in collaboration with the Research Data Management Net...
Data Wrangling with Tidyverse (part 2)
Просмотров 954 месяца назад
Tidyverse is an cohesive set of packages for doing data science in R. In an earlier talk, we began reviewing the data munging portions of tidyvese (dplyr, forcats, tibble, readr, stringr, tidyr, and purr) by using it to reconstruct the data hierarchy in a 500 pages reference PDF given only the words on each page and their bounding boxes. This talk will complete this. If you have not seen the fi...
Accelerating data analytics with RAPIDS cuDF
Просмотров 1514 месяца назад
Pandas, renowned as the go-to library for data manipulation and analysis in Python and widely adopted in machine learning. However, Pandas is slow. With the introduction of NVIDIA cuDF.pandas, the accelerated power of GPUs is integrated into Pandas, enabling faster processing without the need for any code changes. A live demo will showcase this enhancement on clusters. ­ This webinar was presen...
Accelerating Graph Analysis on GPUs
Просмотров 3575 месяцев назад
Graph analysis plays a critical role in many applications across various domains, ranging from social network analysis to bioinformatics, to fraud detection, to cybersecurity, to recommendation systems, etc. NetworkX is the go-to library for graph analysis in Python. However, when dataset and graph sizes grow, the performance of using NetworkX becomes a significant concern. This webinar introdu...
Make: a declarative, lazy, parallel workload manager. Elegant or obsolete?
Просмотров 865 месяцев назад
Make is a classic Unix development tool, which may seem archaic and narrow-purpose. But if you think of it as a declarative, parallelized workflow automation tool, it sounds more relevant. We'll consider stereotypical use of make, then its general properties, and show some interesting examples of make applied to unusual uses. ­ This webinar was presented by Mark Hahn (SHARCNET) on March 13th, 2...
Debugging your code with DDT
Просмотров 1766 месяцев назад
One of the important steps of developing or maintaining a code is debugging: checking the code for errors. Simple toy codes can be debugged using print statements, but realistic codes need specialized debugging tools. We have a powerful debugger "DDT" installed on Graham and Niagara clusters. This presentation will walk you through the steps required to start debugging your codes using DDT, and...
MySQL Part 3: Constraints and Joins
Просмотров 966 месяцев назад
In MySQL, constraints and joins are fundamental concepts used to ensure data integrity in a database and query data from multiple tables. Constraints are rules enforced on the data columns of a table. Constraints provide the accuracy and reliability of the data within a database. Joins in MySQL combine rows from two or more tables based on a related column. Previous parts in the series: * Part ...
Introduction to GPU programming with OpenMP
Просмотров 3847 месяцев назад
OpenMP is a popular, portable and widely supported shared-memory parallel programming model in HPC. The OpenMP API supports multi-platform parallel programming in C/C and Fortran. As computer hardware has grown to include GPU and other specialized accelerators, OpenMP has grown as well to add device support for parallel programming on GPU and accelerators. This seminar will give an introduction...
False Sharing and Contention in Parallel Codes
Просмотров 1377 месяцев назад
Sequential programs can repeatedly read from and write to memory locations seemingly without issues. On the other hand, parallel programs can easily fall prey to weird behaviours resulting in small to very significant issues and/or performance loss that are not always easily attributable to specific pieces of code one has written. Such behaviours can be seen in multithreaded C, C , Fortran, Ope...
Skorch: Training PyTorch models with scikit-learn
Просмотров 4779 месяцев назад
PyTorch is an enormously popular framework for developing deep learning models in Python. However, scikit-learn is one of the most popular libraries for general machine learning. Skorch is a wrapper for PyTorch that allows one to use models written with PyTorch with the scikit-learn library. In this talk, we will explore how skorch allows for PyTorch models to be easily incorporated into scikit...
Squeeze more juice out of a single GPU in deep learning
Просмотров 2119 месяцев назад
It’s well known that GPUs can significantly accelerate neural network training. However, not everyone knows that a single GPU is sufficient to train most neural networks except for a few large ones (like LLM). In fact, a GPU is under-utilized in most cases. In this talk, we are addressing the under-utilization issue and proposing a way to make full use of the GPU capacity. The goal is to increa...
Generalized End to End Python and Neuroscience Workflows on a Compute Cluster
Просмотров 12210 месяцев назад
Generalized End to End Python and Neuroscience Workflows on a Compute Cluster
p2rng - A C++ Parallel Random Number Generator Library for the Masses
Просмотров 12810 месяцев назад
p2rng - A C Parallel Random Number Generator Library for the Masses
Exploring job wait times on Alliance compute clusters: a holistic view
Просмотров 9611 месяцев назад
Exploring job wait times on Alliance compute clusters: a holistic view
Data Wrangling with Tidyverse
Просмотров 20211 месяцев назад
Data Wrangling with Tidyverse
Automating scientific workflows with AiiDA
Просмотров 292Год назад
Automating scientific workflows with AiiDA
DIY job monitoring, from cache misses to CO2 footprint
Просмотров 105Год назад
DIY job monitoring, from cache misses to CO2 footprint
Leveraging the power of Linux on Windows with WSL
Просмотров 304Год назад
Leveraging the power of Linux on Windows with WSL
Contrastive learning
Просмотров 843Год назад
Contrastive learning
Modern Approaches to Profiling in Python with Scalene
Просмотров 3,9 тыс.Год назад
Modern Approaches to Profiling in Python with Scalene
CUDA, ROCm, oneAPI - All for One or One for All?
Просмотров 3,3 тыс.Год назад
CUDA, ROCm, oneAPI - All for One or One for All?
Running MATLAB on Alliance's Clusters
Просмотров 526Год назад
Running MATLAB on Alliance's Clusters
Before and after submitting Octave/Matlab jobs on the clusters
Просмотров 151Год назад
Before and after submitting Octave/Matlab jobs on the clusters
Plotnine: R's Grammar of Graphics in Python
Просмотров 761Год назад
Plotnine: R's Grammar of Graphics in Python
Accelerated DataFrame with Dask-cuDF on multiple GPUs
Просмотров 912Год назад
Accelerated DataFrame with Dask-cuDF on multiple GPUs
An introduction to MPLAPACK, a multi-precision linear algebra library
Просмотров 362Год назад
An introduction to MPLAPACK, a multi-precision linear algebra library