Gaussian Processes : Data Science Concepts

How do you figure out the *ideal* sample size ... as a data scientist?

This is how to take your ML models from great to GOAT

Nardwuar vs. Chappell Roan

Rory McIlroy, Scottie Scheffler vs Bryson DeChambeau, Brooks Koepka | Crypto.com Showdown Highlights

UPSET ALERT! Jaime Munguia Gets KNOCKED OUT By Bruno Surace | FIGHT HIGHLIGHTS

This is why you should care about unbalanced data .. as a data scientist

ritvikmath

Просмотров 17 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 янв 2025

Комментарии •

@jessibenzel243 3 года назад ⁺¹⁰
We just talked about this in my machine learning course this week!! Great timing! This video is very helpful.
@haneulkim4902 3 года назад ⁺¹
Great content, these practical content is gold. Thank you :)
@pgbpro20 3 года назад
ritvikmath coming with a video of one of my favorite topics - instant like!
@JessWLStuart Год назад ⁺¹
Well presented!
@tech-n-data 2 года назад
Thank you so much for all you do.
@igorbreeze3734 2 года назад ⁺²
Hi! Great video. Is there any way you would like to creat a full in-depth catboost tutorial on some random data? Would be super useful.
@joelrubinson9973 3 года назад
very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).
@chenxiaodu2557 7 месяцев назад ⁺²
It should be "imbalanced data" instead of "unbalanced data"
@brenoingwersen784 6 месяцев назад
Lol 😂
@danielwiczew 3 года назад ⁺⁴
Okey, but with oversampling - how do you use cross validation ? Because if you use it on the oversampled dataset, you'll have dataleak
@ritvikmath 3 года назад ⁺⁸
I think you'd want to define the folds on the original data and then oversample holding some folds fixed. Example: 3-fold CV.
- split original data into 3 folds (A,B,C)
- consider (A,B) as training data -> oversample that data -> validate using C.
- repeat using A,B as validation sets
- note that there is no data leak in this case
@aghazi94 3 года назад
you are seriously so underrated
@Sameerahmed373 3 года назад
Can we customise loss function? For example more weight for misclassification of true minor class and less weight for the other error?
@d.a.k.o.s9163 Год назад
Great video!
But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?
@davidzhang4825 2 года назад
Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?
@bmebri1 3 года назад ⁺¹
Excellent video!
One question though: are certain classification models immune from class imbalance? Thanks!
@LanNguyen-eq6lf 3 года назад ⁺⁴
To my knowledge, don't think any classification what immunes from imbalanced dataset because they are data-driven. However, you are still able to get very good accuracy from imbalanced dataset. It happens when inter-class separability is very high, for example, detection of water bodies (often a minority class) over a large area is often quite accurate.
@zahrashekarchi6139 2 года назад
Great demo!
just one thought, why did you not talk about downsampling the majority class? and see what can be the impact?
@douwe7493 9 месяцев назад
This is something I am wondering about too!
@bernardfinucane2061 3 года назад
You could predict that aircraft engines NEVER fail and almost always be right.
@mrirror2277 3 года назад
Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.
@shahrinnakkhatra2857 11 месяцев назад
That's where image augmentation comes to play. You can create different variations of that image by rotating, flipping etc various transformations
@Septumsempra8818 3 года назад
Are you familiar with Latent vectors in network analysis?
s/o from South Africa
@junkbingo4482 3 года назад ⁺²
hi
when people have problems with unbalanced data, it's just the proof they did not get what they do
when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating
as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad
@junkbingo4482 3 года назад
ups, nuance wrote 'yes'!!; thx to lstm, i did not check my post, sorry! ;-)

Следующие

Автовоспроизведение

Gaussian Processes : Data Science Concepts

Gaussian Processes : Data Science Concepts

How do you figure out the *ideal* sample size ... as a data scientist?

How do you figure out the *ideal* sample size ... as a data scientist?

This is how to take your ML models from great to GOAT

This is how to take your ML models from great to GOAT

Nardwuar vs. Chappell Roan

Nardwuar vs. Chappell Roan

Rory McIlroy, Scottie Scheffler vs Bryson DeChambeau, Brooks Koepka | Crypto.com Showdown Highlights

Rory McIlroy, Scottie Scheffler vs Bryson DeChambeau, Brooks Koepka | Crypto.com Showdown Highlights

UPSET ALERT! Jaime Munguia Gets KNOCKED OUT By Bruno Surace | FIGHT HIGHLIGHTS

UPSET ALERT! Jaime Munguia Gets KNOCKED OUT By Bruno Surace | FIGHT HIGHLIGHTS

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Handling Imbalanced Dataset in Machine Learning: Easy Explanation for Data Science Interviews

Handling Imbalanced Dataset in Machine Learning: Easy Explanation for Data Science Interviews

How I Became A Data Scientist (No CS Degree, No Bootcamp)

How I Became A Data Scientist (No CS Degree, No Bootcamp)

Live Discussion On Handling Imbalanced Dataset- Machine Learning

Live Discussion On Handling Imbalanced Dataset- Machine Learning

Иван Ургант - Про возвращение Вечернего Урганта, Ёлки и природоведение / Опять не Гальцев

Иван Ургант - Про возвращение Вечернего Урганта, Ёлки и природоведение / Опять не Гальцев

SMOTE (Synthetic Minority Oversampling Technique) for Handling Imbalanced Datasets

SMOTE (Synthetic Minority Oversampling Technique) for Handling Imbalanced Datasets

Handling Imbalanced Datasets SMOTE Technique

Handling Imbalanced Datasets SMOTE Technique

Imbalanced Data in Machine Learning | Undersampling | Oversampling | SMOTE

Imbalanced Data in Machine Learning | Undersampling | Oversampling | SMOTE

Probability Calibration : Data Science Concepts

Probability Calibration : Data Science Concepts

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data

SPRUNKI Test IQ Challenge: Help Pinki CHOOSE THE RIGHT PORTAL

SPRUNKI Test IQ Challenge: Help Pinki CHOOSE THE RIGHT PORTAL

Праздничная закуска из помидоров, моцареллы и руколы с соусом из тунца

Праздничная закуска из помидоров, моцареллы и руколы с соусом из тунца

Калмыков НОКАУТИРОВАЛ Хамзата. Хоронженко VS Пахан & Маэстро-КОНФЛИКТ. Тандовский ГАЗ на Искандара

Калмыков НОКАУТИРОВАЛ Хамзата. Хоронженко VS Пахан & Маэстро–КОНФЛИКТ. Тандовский ГАЗ на Искандара

КТО ЛУЧШЕ ПЕРЕКРИЧАЛ?😂

КТО ЛУЧШЕ ПЕРЕКРИЧАЛ?😂

“Хусури Ман 19” - качество оригинал 4К. Официально!

“Хусури Ман 19” - качество оригинал 4К. Официально!

ОБНОВЛЕНИЕ АЛЬФА - ПЕРВЫЙ ТЕСТ - 11 уровни, ЛБЗ 3.0, Штурмсау, Огнеметы

ОБНОВЛЕНИЕ АЛЬФА - ПЕРВЫЙ ТЕСТ - 11 уровни, ЛБЗ 3.0, Штурмсау, Огнеметы

Играем в снегу с Бенчиком!😸 #симбочка #симба #симбочкапимпочка

Играем в снегу с Бенчиком!😸 #симбочка #симба #симбочкапимпочка

Surfing on 3.5 Million BBs!

Surfing on 3.5 Million BBs!