Handling Massive Machine Learning Models // Simon Karasik // MLOps podcast
HTML-код
- Опубликовано: 4 окт 2024
- Join us at our first in-person conference on June 25 all about AI Quality: www.aiqualityc...
Huge thank you to @nebiusofficial for sponsoring this episode. Nebius AI - nebius.ai/
MLOps podcast #228 with Simon Karasik, Machine Learning Engineer at Nebius AI, Handling Multi-Terabyte LLM Checkpoints.
// Abstract
The talk provides a gentle introduction to the topic of LLM checkpointing: why is it hard, how big are the checkpoints. It covers various tips and tricks for saving and loading multi-terabyte checkpoints, as well as the selection of cloud storage options for checkpointing.
// Bio
Full-stack Machine Learning Engineer, currently working on infrastructure for LLM training, with previous experience in ML for Ads, Speech, and Tax.
// MLOps Jobs board
mlops.pallet.x...
// MLOps Swag/Merch
mlops-communit...
// Related Links
-------------- ✌️Connect With Us ✌️ ------------
Join our slack community: go.mlops.commu...
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: go.mlops.commu...
Catch all episodes, blogs, newsletters, and more: mlops.community/
Connect with Demetrios on LinkedIn: / dpbrinkm
Connect with Simon on LinkedIn: / simon-karasik - Наука