Thanks Greg. This made me realise how non-standard my code is. I learnt: - Use copy or deepcopy and not assignment. - Always perform preprocessing on the train and test separately. - sklearn pipelines have nothing to do with ETL pipelines from Data Engineering. - sklearn transfers have nothing to do with NLP Transformers. - sk elarn estimators have nothing to do with Statistics estimators.
Great stuff! I’m curious why you used FunctionTransformer instead of ColumnTransformer, which could run the two scalers in parallel? Also, since FunctionTransformer is stateless, the documentation says that fit just checks the input rather than actually fitting the scaling parameters. Doesn’t that lead to data leakage since applying transform to test data won’t use parameters learned from fitting on the training data?
I undstand what you are doing here but I have two questions that I think would be helpful and would make it easier to follow along and replicate you steps. 1) Where did you get the data. I can't the california_housing dataset that is already in the train/test form. 2) Why not use scikit-learn tooling rather than doing it yourself? Like you could have used train/test split or pipelines (or column transformer... or similar stuff). That just has me confused.
Great tutorial! I use the MinMaxScaler with the option to scale from -1 to 1 instead of 0 to 1 when I am dealing with values that can be positive and negative. Seems to be fine, but I may need to reconsider going forward. I have never noticed any issues though.
@@GregHogg I didnt explained myself clearly... I want to create a pipeline that receives a trained cifar10 model an also make preprocessing on the e data set ? so I cant use your way?
What tangents? This video was not only to the point from the start, but it also went into depth with useful examples. If you thought those were tangents, I recommend watching again, maybe with more care this time.
Take my courses at mlnow.ai/!
Thanks Greg. This made me realise how non-standard my code is.
I learnt:
- Use copy or deepcopy and not assignment.
- Always perform preprocessing on the train and test separately.
- sklearn pipelines have nothing to do with ETL pipelines from Data Engineering.
- sklearn transfers have nothing to do with NLP Transformers.
- sk elarn estimators have nothing to do with Statistics estimators.
Super glad you got some useful pointers!!
Keep Posting Greg, I am Data Analyst by profession and your video certainly helps a lot
That's awesome! Thank you 😄
Great stuff! I’m curious why you used FunctionTransformer instead of ColumnTransformer, which could run the two scalers in parallel? Also, since FunctionTransformer is stateless, the documentation says that fit just checks the input rather than actually fitting the scaling parameters. Doesn’t that lead to data leakage since applying transform to test data won’t use parameters learned from fitting on the training data?
A very practical video, that I came across on Pipelines. Thank you for this video!
Awesome that's great to hear. You're very welcome ☺️☺️
thanks, Greg. really good explanation and structured example. this makes it easy to create a template for easy reuse!
When you do the StandardScaler().fit on the dataframe, what is the meaning of this operation? what is happening?
Just out of curiosity, is there a reason you don't use train_test_split to get X and y values?
yes, why he uses X_train for train_predictions instead of another dataset X_valid
I undstand what you are doing here but I have two questions that I think would be helpful and would make it easier to follow along and replicate you steps.
1) Where did you get the data. I can't the california_housing dataset that is already in the train/test form.
2) Why not use scikit-learn tooling rather than doing it yourself? Like you could have used train/test split or pipelines (or column transformer... or similar stuff). That just has me confused.
How to transform y variable and then fit model. And after how to reverse transform for the scatter plotting
Great tutorial! I use the MinMaxScaler with the option to scale from -1 to 1 instead of 0 to 1 when I am dealing with values that can be positive and negative. Seems to be fine, but I may need to reconsider going forward. I have never noticed any issues though.
This was very helpful, thank you :)
I would love to see a tutorial that covers using pipelines with multilayer perceptron models (MLPs), CNNs and LSTMS.
Thank you Greg! It's a great video!
Glad to hear it!
Did you say pipelines doesn't function for classifications problems? Min: 1:07
Does, not doesn't
@@GregHogg thanks 🙏🏼
Thanks for the great tutorial. Can you make a video on how to combine multiple feature selection methods and feature extraction using python?
Thanks for this amazing video! Would that work also with a statsmodels model?
Thanks so much!! And I'm not sure, haven't tried :)
Thanks for the great tutorial! what do I need to change to create a pipeline for an image classification model? like the cifar10 model?
Well, everything. You probably won't be using scikit for that. And you're very welcome!
@@GregHogg I didnt explained myself clearly... I want to create a pipeline that receives a trained cifar10 model an also make preprocessing on the e data set ? so I cant use your way?
Great Video!
Thank you Adrian!
Perfect explanation. Thanks a lot
Very welcome 😁
nice video Greg
Thanks so much!!
Thank you for the video!
Bro can you show how to make youtube and any video downloader make by python
TYSM bro really appreciate this
Very welcome!!
Awesome !
Thank you!
Can you share this notebook?
dang i think i lost it, sorry
awesome ty
Thank you!
Bro you literally just copied this out of a textbook lmao but I respect the grind.
you are ❤
❤️
Too confusing. Too many tangents, doesn't cover the main idea clearly. Downvoted.
Well I upvoted it to counter you
What tangents? This video was not only to the point from the start, but it also went into depth with useful examples. If you thought those were tangents, I recommend watching again, maybe with more care this time.