Due to an update in imblearn version fit_sample is throwing error. So i used-- X_train_ns,Y_train_ns = ns.fit_resample(X_train,Y_train)---and it works fine for me..
After applying 'ns', is resplitting of undersampled dataset necessary or not? Here, after balancing, model was used on original train and test dataset.
Hello Krish, I have been following your data analytics videos throughout. I have completed Live EDA And Feature Engineering Playlists, then I started following this playlist. I am quite over whelmed with sudden introduction to ML and other models which I have no clue about. Can you please tell me the playlist I should follow first to get the basic understanding of what you are teaching here? Thanks for your effort,
Hi Sir, Similar kind of error i am getting while using over sampling method.. from imblearn.over_sampling import RandomOverSampler os = RandomOverSampler(0.5) x_train_ns, y_train_ns = os.fit_sample(x_train, y_train) print("The number of classes before fit{}".format(Counter(y_train))) print("The number of classes after fit{}".format(Counter(y_train_ns))) error: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 2 3 os = RandomOverSampler(0.5) ----> 4 x_train_ns, y_train_ns = os.fit_sample(x_train, y_train) 5 print("The number of classes before fit{}".format(Counter(y_train))) 6 print("The number of classes after fit{}".format(Counter(y_train_ns))) c:\users\asing053\appdata\local\programs\python\python38-32\lib\site-packages\imblearn\base.py in fit_resample(self, X, y) 75 check_classification_targets(y) 76 arrays_transformer = ArraysTransformer(X, y) ---> 77 X, y, binarize_y = self._check_X_y(X, y) 78 79 self.sampling_strategy_ = check_sampling_strategy( c:\users\asing053\appdata\local\programs\python\python38-32\lib\site-packages\imblearn\over_sampling\_random_over_sampler.py in _check_X_y(self, X, y) 77 def _check_X_y(self, X, y): 78 y, binarize_y = check_target_type(y, indicate_one_vs_all=True) ---> 79 X, y = self._validate_data( 80 X, y, reset=True, accept_sparse=["csr", "csc"], dtype=None, 81 force_all_finite=False, AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'
Hey sir, the one question you asked in your Virtual interview "Case study which has 0 or 1 dependent variable and they are further dividied into subcatergory. Can you please answer that question like how we can do it?
hello sir, I am new one machine learning .I didn't get from where should I start .Can you please give any list or syllabus.I was seen all your vt classes ,things could great sir .Thank you so much
I really appreciate what you are trying to do. But it would have been much better if you actually answer the questions raised and also if you could explain why and how something is happening. You ask if it's clear, and I see people asking questions but sadly you avoid all the questions. And sometimes I actually have the same questions raised by others but we have to go to some other tutor and learn about it. But nevertheless you give us content to follow through, thankyou :)
When exactly in the pipeline should the imbalanced data be balanced? Is it before we begin any feature analysis, feature selection and other pre-processing techniques? Because many time Outlier analysis and removal methods will rule out some good data points in a variable, counting them as outliers when in fact they are just unbalanced data-points associated with the minority class. OR, should we balance the data just before predictive modelling for the sake of getting unbiased models & result? Let me know, please.
By “handling” imbalanced dataset, you are not really “transforming” the dataset as you would in Feature Engineering pipeline. It is part of Model Training/Tuning. You do not “clean” an imbalanced dataset, and it’s perfectly natural for datasets to be imbalanced.
Hi sir.. my dataset has 13l records for class a and 2k records for class b.. I tried the same smote, randomundersample,gridsearch,randomforest,decisiontree.. everything but column has less correlation wit the target and I'm getting very less score for class b
Krish bhai random forest model ke saath isko kaggle pe upload kijiye naa..kaggle score mein toh bahut upar rahegaa yeh model..kar ke dikhayega..request h please.
If the training data has only “negative class”, whereas testing data has both the classes; “negative” and “positive”. what kind of algorithm shall we apply ?
This error generally comes when we have freshly installed the library and you need to shut everything off star jupyter notebook once again and you can solve that error
For me my model is predicting hight false alrets and low true alrets , I am new to ML, my data is imbalanced can u please suggest which models are good for my data, when I check the model in live .. there also positive class is low num like 15 , negative class is high like 3665.
I have 1244 obs in class a and 244 obs in class b. My algorithm is classifying everything in one class. How should i rectify it? I tried logistic regression, svm, random forest.. Same problem
I have a different row for my 17 years data which means, one year have 800 row and other year I have 300 so how can I make the rows similar for my time sires prediction
Don’t go to usa on H1 or F1. Reason 1. Forget about green card 2. Conceltency , and tax will take 50% of ur salary. 3. Very low savings and expensive. 4. I have seen people working in software and being homeless for years. 5. Big problem u cannot bring ur parents for sure. 6. Parents health insurance is big problem. 7. Do any small business in India don’t come to usa. Sorry for hard words, but it’s truth.
Appreciate your sharing, sir! 2024 but your VDO is extremely helpful. Thanks, sir.
Hi Krish,
Thank you so much for everything that you are offering us at free of cost!
thank you Krish.. i am working on a project and it had imbalanced target variable. this video really helped me out.
We are ...greatly indebted to you
Thank you sir, always feel motivated after seeing your enthusiasm for data science. Learning a lot from you ❤️
Most welcome bhai
Good video ♥♥! As a current RUclipsr, I am on the lookout for creative ideas! Nice Job!
Thanks for the informative video
Best machine learning tutorials sir
Thank you v.much !
Always motivated and encourage me when new video comes
Hello Krish,
Could you please share your video on handling data imbalance in deep learning models? It would indeed be a great help.
Very Nice video..the greatest thing is we get to know what is currently used in industry not what is bookish
finished practicing code
Due to an update in imblearn version fit_sample is throwing error. So i used-- X_train_ns,Y_train_ns = ns.fit_resample(X_train,Y_train)---and it works fine for me..
Hey, is SMOTETomek taking long time?
Really helpful vdo ... Sir i just want to how u select the arange(-2,3)?
Always a pleasure to watch your videos sir 👍
finished watching
Sir can you upload video about how to predict earthquake using Naive bayes
Learned a lot from this!! Thanks man
godlike!
Thank you so much Krish. I have two teachers on RUclips. Krish and Harshit!
Hi Krish, would it be possible to make a video about how class weights are used to perform a node split in a weighted decision tree?
The precision is gone #UnderSampling 🤣 That laugh
Nice explained sir
Thank you so much for the session.
Great lecture thanks as always Krish
Thank u :)
Dont we have the Gpay facility for joining the membership,since it is asking for card number
After applying 'ns', is resplitting of undersampled dataset necessary or not? Here, after balancing, model was used on original train and test dataset.
Hi Krish, can you please make video on multi variant time series forecasting model
Sir, is there any video of imbalanced image dataset handling in CNN? or can we use the basic of this live tutorial for image classification purpose??
Did u get good accuracy (precison and recall) on image imbalnaced dataset?
Actually i'm also looking the answer for same question
Hello Krish,
I have been following your data analytics videos throughout. I have completed Live EDA And Feature Engineering Playlists, then I started following this playlist. I am quite over whelmed with sudden introduction to ML and other models which I have no clue about.
Can you please tell me the playlist I should follow first to get the basic understanding of what you are teaching here?
Thanks for your effort,
if data are unbalanced in regression problem then how to handle??
random-forest class weights example: 40:00
undersampling : 43:25
Even after installing imblearn its giving me error Module not found. Can someone help me please, due to this error I am stuck.
Hi Sir, Similar kind of error i am getting while using over sampling method..
from imblearn.over_sampling import RandomOverSampler
os = RandomOverSampler(0.5)
x_train_ns, y_train_ns = os.fit_sample(x_train, y_train)
print("The number of classes before fit{}".format(Counter(y_train)))
print("The number of classes after fit{}".format(Counter(y_train_ns)))
error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
2
3 os = RandomOverSampler(0.5)
----> 4 x_train_ns, y_train_ns = os.fit_sample(x_train, y_train)
5 print("The number of classes before fit{}".format(Counter(y_train)))
6 print("The number of classes after fit{}".format(Counter(y_train_ns)))
c:\users\asing053\appdata\local\programs\python\python38-32\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
75 check_classification_targets(y)
76 arrays_transformer = ArraysTransformer(X, y)
---> 77 X, y, binarize_y = self._check_X_y(X, y)
78
79 self.sampling_strategy_ = check_sampling_strategy(
c:\users\asing053\appdata\local\programs\python\python38-32\lib\site-packages\imblearn\over_sampling\_random_over_sampler.py in _check_X_y(self, X, y)
77 def _check_X_y(self, X, y):
78 y, binarize_y = check_target_type(y, indicate_one_vs_all=True)
---> 79 X, y = self._validate_data(
80 X, y, reset=True, accept_sparse=["csr", "csc"], dtype=None,
81 force_all_finite=False,
AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'
best sir
Hey sir, the one question you asked in your Virtual interview "Case study which has 0 or 1 dependent variable and they are further dividied into subcatergory. Can you please answer that question like how we can do it?
hello sir, I am new one machine learning .I didn't get from where should I start .Can you please give any list or syllabus.I was seen all your vt classes ,things could great sir .Thank you so much
Starting learn statistic and probability first
Hi Krish,
Thank you for your videos can you please do a session on Handling Imbalanced Image Dataset (Medical dataset if possible)?
In fraud classification, false negatives should be more important right and that means we should focus on our recall score. Am I correct??
Hi krish want to know if more than 2 classes imbalanced problem?
I really appreciate what you are trying to do. But it would have been much better if you actually answer the questions raised and also if you could explain why and how something is happening. You ask if it's clear, and I see people asking questions but sadly you avoid all the questions. And sometimes I actually have the same questions raised by others but we have to go to some other tutor and learn about it. But nevertheless you give us content to follow through, thankyou :)
What are the default parameters used by a Random Forest Classifer ( Tree Depth, No of Trees Used, No of variables used at each step) in Python ?
Sir Please makes video on Mathematics behind on SVM Regression, AdaBoost Regression, Gradient Boost Classification
When exactly in the pipeline should the imbalanced data be balanced?
Is it before we begin any feature analysis, feature selection and other pre-processing techniques? Because many time Outlier analysis and removal methods will rule out some good data points in a variable, counting them as outliers when in fact they are just unbalanced data-points associated with the minority class.
OR, should we balance the data just before predictive modelling for the sake of getting unbiased models & result? Let me know, please.
By “handling” imbalanced dataset, you are not really “transforming” the dataset as you would in Feature Engineering pipeline. It is part of Model Training/Tuning. You do not “clean” an imbalanced dataset, and it’s perfectly natural for datasets to be imbalanced.
Hi sir.. my dataset has 13l records for class a and 2k records for class b.. I tried the same smote, randomundersample,gridsearch,randomforest,decisiontree.. everything but column has less correlation wit the target and I'm getting very less score for class b
Hi sir ,please do data processing using CLI in machine learning
Krish bhai random forest model ke saath isko kaggle pe upload kijiye naa..kaggle score mein toh bahut upar rahegaa yeh model..kar ke dikhayega..request h please.
Sure karke batata hoon :)
If the training data has only “negative class”, whereas testing data has both the classes; “negative” and “positive”. what kind of algorithm shall we apply ?
(y) Great
How to handle imbalance data for multi class classification problem which has only text column as feature
ValueError: Logistic Regression supports only penalties in ['l1', 'l2', 'elasticnet', 'none'], got 11.
Must have mistyped l1 as 11 maybe
Hi Sir. Why have you not use SVM. I have read its very popular algorithm.
Why are looking left every other minute ?
AttributeError: 'NearMiss' object has no attribute '_validate_data' .... Due to the version differnce the error comes sir
This error generally comes when we have freshly installed the library and you need to shut everything off star jupyter notebook once again and you can solve that error
Difference between smote sampling and adasyn sampling?
Sir can you explain,how to create our own model
For me my model is predicting hight false alrets and low true alrets , I am new to ML, my data is imbalanced can u please suggest which models are good for my data, when I check the model in live .. there also positive class is low num like 15 , negative class is high like 3665.
I have 1244 obs in class a and 244 obs in class b. My algorithm is classifying everything in one class. How should i rectify it? I tried logistic regression, svm, random forest.. Same problem
I have a different row for my 17 years data which means, one year have 800 row and other year I have 300 so how can I make the rows similar for my time sires prediction
I have 1244 obs in class a and 244 obs in class b. My algorithm is classifying everything in one class. How should i rectify it?
Try using kfold.
Krish sir ,If our targets are regression values instead of classification problem . Then how to examine the data is imbalanced ?
Why will a regression problem have imbalance ? Give me an example? Regression problems do not have imbalance!!!! They are real values.
@krish: what if there is imbalance in data in one single attributes rather class attributes
Don’t go to usa on H1 or F1.
Reason
1. Forget about green card
2. Conceltency , and tax will take 50% of ur salary.
3. Very low savings and expensive.
4. I have seen people working in software and being homeless for years.
5. Big problem u cannot bring ur parents for sure.
6. Parents health insurance is big problem.
7. Do any small business in India don’t come to usa.
Sorry for hard words, but it’s truth.
Hello sir where can i find videos fo spam detection project.
Hi Sir,Please upload a video on detailed explanation to crack Google Summer of Code,Please Sir...Thank You..
I don't understand, Why this video in feature engineering playlist
How We can save the data after under sampling or over sample ?? If possible kindly sir give me code 👩💻
LOGIC FOR 1000 MILES /HOUR RAILWAY
ENGINE
I got this Error how can I solve it.....ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.
why you have uploaded this video on Feature Engineering ->
The concepts could have been explained better rather just focusing on hands on session with the technique