I was lost after finish an ibm course in data science. Nobody give me a job because I don´t have experience. I think that with your videos i will get it. Thanks for your excelent work, really has helped me a lot. Greetings from Colombia!
I really wanted to do a ML project where I can utilize all ML algorithms on single dataset. I find this playlist best for my project. Thank you a lot krish sir for making this informative and instructive tutorial !!!
Krish - You are awesome as always. I was out of the market for more than a year and now getting back to Data Science and your videos are helping me refresh my skills. Thanks
I think there is one improvement required here which is we should split data first then do fit_transform on training data and then transform data on test set.
I wonder why most of those who watch it learn something well but don't press the like button. I think this is the least you can do it. So please support those who try to teach the world-class method for free.
I have gone through your Python & Ml Playlist and it was a great learning experience, Thanks for this End to End Ml Project playlist, and thanks for your note that you will extend this deep learning & NLP, I am eagerly waiting for your session for deep learning & NLP implementation in this playlist,
Amazing Playlist. Learning alot. One point on transformation: Standard scalar should be applied separately on training and then use that scaler to transform the test data. This way there wont be any data leakage in the testing set.
Excellent video . I retired recently & just thought to keep myself engaged by learning new things , saw your video & found it very useful. Keep it up & best wishes for all the hard work you are doing in spreading knowledge. -- Sudeep Mathur
For the sns.countplot() function we have to pass the value for x. i.e sns.countplot(x=data) will work. otherwise sns.coutplot(data) will give an error.
There is one issue - we have made here Standard Scaling using whole X rather we should have done that using only X_train -> fit_transform and X_test -> transform
Hi Sensei, I am following your projects and every detail, and I am very thankful for your valuable content. but I think I found a code mistake at EDA part. with this new code : works well .... regards
Thankyou for the solution, I am facing a similar issue with "df.groupby('parental_level_of_education').agg('mean').plot(kind='barh',figsize=(10,10))" at this statement
@@yashpisat9267 Here is the solution. df.groupby('parental_level_of_education')[['math_score','reading_score','writing_score','average']].agg('mean').plot(kind='barh',figsize=(10,10))
we thankful for your wonderful knowledge on ML and i have a wish if could make a Deep Learning project playlist from scratch it would be very grateful of you.
# Remove duplicates df_no_duplicates = df.drop_duplicates() # Keep the last occurrence of each duplicate df_no_duplicates = df.drop_duplicates(keep='last')
That's super great work, it really helped me. But i'm surprised that you've done columns transformation "Standard Scaler" before splitting the train/test sets, most articles said it will result in data leakage, can you please elaborate
Hello Krish, I was hoping to ask for your opinion on a particular aspect of data preprocessing. Shouldn't we perform data splitting first to prevent data leakage, as standard scaling considers the mean and variance of the entire dataset? This may include the test set, leading to potential data leakage. Would you kindly share your thoughts on this topic? Thank you very much.
hi @krishnaik06 , can you kindly show how did you work with jupyter in vscode , i mean did you do the eda in jupyter notebook and then converted to vscode ..
Thank you, Krish. It refreshed a lot of information and skills I'm looking forward to seeing the automation and deployment part of it. Will you integrate the ML Ops part in the future?
Install the ipykernel again, it would may be upgrade to the latest python package available not 3.8 used in this video conda install -p environment path ipykernel --update-deps --force-reinstall and then in the interpreter selecte the correct jupyter kernel, it should work
Hi Krish, when I try to do from src.logger import logging, it gives error no module named src, but if i do from logger import logging then it works? any idea???
Could have used CV or might be any other data splitting techniques but I guess the main aim of this tutorial was to build a framework for an ML project. Can improvise later as per new ideas or in-depth explorations.
Hi everyone, I'm getting the below error when I'm trying to run "exception.py" file. (c:\Users\pavva\OneDrive\Documents\AI Project\venv) C:\Users\pavva\OneDrive\Documents\AI Project>python src/exception.py Traceback (most recent call last): File "src/exception.py", line 2, in from src.logger import logging ModuleNotFoundError: No module named 'src' I did import this line "from src.logger import logging" in exception.py. All the files name are correct and it's in proper order. Can someone help me? Thank you.
import sys sys.path.append(os.path.abspath('C:/Users/xxxx/MLProject/src')) # Now you can import 'logging' from 'logger' module from logger import logging Try adding the code,This will resolve your issue.
Guys please help me. My computer restarted on its own and now even after reactivating the "venv", I'm unable to run the exception.py file. I'm getting an error saying that there is "ModuleNotFoundError: No module named 'src' ".
Or Direct Go the file and add in ml projects folder notebook and then go to note book folder and add data folder and paste the file and vs code will analyse this and you successfully add the file
Hi Sir, I have a doubt, as you have created "total marks" and "average marks" as two separate independent features, and you are doing EDA for both the features, suggested to create 2 separate models for each of them as well. But my doubt is why do we need to do the same things for both of them separately as average marks is directly correlated with total marks(total marks/3). Am I missing something? Please clarify. Love your videos 😊
Hey Shivam, It kind of depends on the problem that we are trying to solve. Suppose, what we are doing here is all self learning but there must be a target decided by the stakeholders/clients. If you are trying to predict a student's eligibility for a scholarship, the total marks might be more important than the average marks since scholarships may be based on total marks. If the model indicates that the total marks are a strong predictor of a student's performance, it may be harder to understand how much the average marks contributed to that prediction if they are not considered separately. Also, elimination of noise and variability in the data is a factor here!
Hi Krish, just a small doubt I am facing an issue with installing everything through requirements.txt instead I had to install everything separately. what could be the issue here?
I have become mad after again and again setting up the environment,since even after installing all of the libraries but running the code it says that library is not there.
Hello Sir , thank you so much for this playlist. Here getting an error : ValueError: A given column is not a column of the dataframe while executing 'X = preprocessor.fit_transform(X)' , even i have done with X = df.drop('reading_score',axis=1) . Please help
Join this channel membership to get access to materials and connect with me:
ruclips.net/channel/UCNU_lfiiWBdtULKOw6X0Digjoin
I sir i have got problem as how to extract data into vs code CSV student daya
@krish naik could youplease help with step by step how to commit to large files of the project to github?
I was lost after finish an ibm course in data science. Nobody give me a job because I don´t have experience. I think that with your videos i will get it. Thanks for your excelent work, really has helped me a lot. Greetings from Colombia!
Thanks Krish,you realised a gap in the market where there are very less people explaining end to end this clear!
I really wanted to do a ML project where I can utilize all ML algorithms on single dataset. I find this playlist best for my project. Thank you a lot krish sir for making this informative and instructive tutorial !!!
After this series I regained the interest in machine learning, thanks for timely series..👍
Krish - You are awesome as always. I was out of the market for more than a year and now getting back to Data Science and your videos are helping me refresh my skills. Thanks
Thanks a lot Sir for extending the series..with ci/cd pipeline and mlops..Very much looking forward to it.
I think there is one improvement required here which is we should split data first then do fit_transform on training data and then transform data on test set.
I wonder why most of those who watch it learn something well but don't press the like button. I think this is the least you can do it. So please support those who try to teach the world-class method for free.
You are doing an amazing job presenting how to get this done in industrial environment. Thanks for your effort!
Best Playlist and channel i ve ever seen, THANK YOU SO MUCH KRISH
I love the parts that you have an error, but you don't stop recording. and this teaches us every person might have these problems.
thank you very much you are the very few of those ppl who shows errors , it helps a lot , keep going!!
What this entire project be described as to mention in my resume as a fresher.
Help will be appreciated all. :)
I have gone through your Python & Ml Playlist and it was a great learning experience, Thanks for this End to End Ml Project playlist, and thanks for your note that you will extend this deep learning & NLP, I am eagerly waiting for your session for deep learning & NLP implementation in this playlist,
i enjoy ur style of teaching. Thank you for all your hard work
Amazing Playlist. Learning alot. One point on transformation: Standard scalar should be applied separately on training and then use that scaler to transform the test data. This way there wont be any data leakage in the testing set.
Krish you are really redefining the tech educational system, you are awesome!!
Excellent video . I retired recently & just thought to keep myself engaged by learning new things , saw your video & found it very useful. Keep it up & best wishes for all the hard work you are doing in spreading knowledge. -- Sudeep Mathur
Thank you for this amazing playlist Krish!
God bless you.
This series is absolute gold
very well explained and good job for learners like me. Thanks. Gob bless you
Hii Krish thanks for extending the project. Please include Data and model versioning and mlops practices
Yes @krishnaik much needed one.
Nothing be better than . He just poured everything as data science needs . Owe to him
For the sns.countplot() function we have to pass the value for x. i.e sns.countplot(x=data) will work. otherwise sns.coutplot(data) will give an error.
Thank you @sabin Adhikari
@@shwetakumari__2085 What is the error?
There is one issue - we have made here Standard Scaling using whole X rather we should have done that using only X_train -> fit_transform and X_test -> transform
Hi Sensei, I am following your projects and every detail, and I am very thankful for your valuable content. but I think I found a code mistake at EDA part.
with this new code :
works well .... regards
Thankyou for the solution, I am facing a similar issue with "df.groupby('parental_level_of_education').agg('mean').plot(kind='barh',figsize=(10,10))" at this statement
@@yashpisat9267 Here is the solution.
df.groupby('parental_level_of_education')[['math_score','reading_score','writing_score','average']].agg('mean').plot(kind='barh',figsize=(10,10))
Thanks man!!
how u solved that error
please tell emmediately @@shivankvishwakarma2994
@@yashpisat9267Hey I have questions....can you explain this specific code please
loving the playlist sir thank you for it
you are the why I love machine learning
your teaching is like ❤❤
Thank you for the valuable tutorials.
Excellent Video.. Thanks for sharing it.
Enjoying the videos
thank you so much sir for your video. I am learning a lot of new concept in better explaination all because of you
very nice explaination sir...thank u very much
we thankful for your wonderful knowledge on ML and i have a wish if could make a Deep Learning project playlist from scratch it would be very grateful of you.
# Remove duplicates
df_no_duplicates = df.drop_duplicates()
# Keep the last occurrence of each duplicate
df_no_duplicates = df.drop_duplicates(keep='last')
How you create notebook folder and in that EDA and model training I don't understand
looking forward for deep learning end to end series
Sir amazing video btw
Boys have performed really well in MATHSSS
where's the code video about this session?
I am getting packages not found in vcode , though I followed all the steps from start and also in my global env all packages are installed.
You should't do the preprocessing on X. You should fit to X_train and fit_transform on X_test
Great ! Thanks Krish
Thanks!
That's super great work, it really helped me.
But i'm surprised that you've done columns transformation "Standard Scaler" before splitting the train/test sets, most articles said it will result in data leakage, can you please elaborate
Thanks for this series
Thank you for the series
thanks for this video
10-07-2023
Shouldn't we split the data first and then apply transformation? Won't this lead to data leakage?
Thank you sir
How can we export data and eda codings could you please explain that part?
Sorry, but I really don't understand how to get the data and jupyter files into my Visual Studio Code, can you help me?
Thank you for problem statement sir we are eagerly waiting for that
Please make a Pyspark end to end project like a real world
thanks sir it will help us a lot🙏
Hello Krish, I was hoping to ask for your opinion on a particular aspect of data preprocessing. Shouldn't we perform data splitting first to prevent data leakage, as standard scaling considers the mean and variance of the entire dataset? This may include the test set, leading to potential data leakage. Would you kindly share your thoughts on this topic? Thank you very much.
Yes i will take care of it while writing in a modular way...
Thank You for series
Thank you so much sir
Hi, How are we importing the data of the csv file into VS ?
Adjusted R² instead of R² for the evaluation metric.
Thank You Sir..
Thank You So much sir 💗
Thank you so much !
hi @krishnaik06 , can you kindly show how did you work with jupyter in vscode , i mean did you do the eda in jupyter notebook and then converted to vscode ..
So much happy sir ❤❤❤
Sir I am not able to open jupyter notebook in vs code I thing there is error in file
Pls help me to resolve this...
Thank you, Krish. It refreshed a lot of information and skills I'm looking forward to seeing the automation and deployment part of it. Will you integrate the ML Ops part in the future?
Hey Krish i have all my libraries such as numpy but when i try to run it through the ipy kernal it shows numpy not found
Bro, were you able to resolve this?
@@prianshmadan Kindly let me know the solution for the same please
Install the ipykernel again, it would may be upgrade to the latest python package available not 3.8 used in this video conda install -p environment path ipykernel --update-deps --force-reinstall and then in the interpreter selecte the correct jupyter kernel, it should work
Hi Krish, when I try to do from src.logger import logging, it gives error no module named src, but if i do from logger import logging then it works? any idea???
bcz both .py files are present in same module so we can directly import it, if exception is present outside of src then src.logger will work
@@karishmamehar4081 yep i was able to understand that but why did it work for krish in the video and me getting error
@@rahulsharma5693 yes same doubt i think it has something to do with magic
Neeed more videos like this
@Krish Naik
Hello Sir, why didn't you use Cross Validation instead of Train-Test-Split ?
Could have used CV or might be any other data splitting techniques but I guess the main aim of this tutorial was to build a framework for an ML project. Can improvise later as per new ideas or in-depth explorations.
how and when he added the notebook folder he didn't mention in the starting please help i am stuckk
Here we did not discuss about catboost_info file that is present, why is there and what is it's use??
please explain Krish sir.
It will automatically come once you install catboost and IDK why.
I'm very interested Krish about your teaching techniques and in this end-to-end project, can I expect automation of the project with code
Thanks again
how did you get the dataset and import it on vscode
i dont understand
Hello Krish, is there any projects that solved Direction of Arrival Problem in Audio Signal Processing. Can you do a tutorial on it
sir any upcoming data analyst batch missed 50% off offer😔
Thanks 😊
where do I get the data files. I mean the contents of the notebook folder?? I am coding along the series
bro do u know how to get contents of notebook,please sayy????
Hi everyone,
I'm getting the below error when I'm trying to run "exception.py" file.
(c:\Users\pavva\OneDrive\Documents\AI Project\venv) C:\Users\pavva\OneDrive\Documents\AI Project>python src/exception.py
Traceback (most recent call last):
File "src/exception.py", line 2, in
from src.logger import logging
ModuleNotFoundError: No module named 'src'
I did import this line "from src.logger import logging" in exception.py.
All the files name are correct and it's in proper order.
Can someone help me?
Thank you.
Hi, remove src/ from import
@@ramin.nourizade if i only use "import logging" then it's not updating in the logging file.
I also getting same problem syntax invalide at line 9
refresh your vscode
import sys
sys.path.append(os.path.abspath('C:/Users/xxxx/MLProject/src'))
# Now you can import 'logging' from 'logger' module
from logger import logging
Try adding the code,This will resolve your issue.
drop_dupicates(inplace=true)
How we can export the EDA, model training and data file to Visual Studio
Yes I have same question
Which kinds of project need to choose when we preparing for interview?
Guys please help me. My computer restarted on its own and now even after reactivating the "venv", I'm unable to run the exception.py file. I'm getting an error saying that there is "ModuleNotFoundError: No module named 'src' ".
great
how I can add notebook folder..? you did'nt tell about notebook and csv.
add it from vs code
Or Direct Go the file and add in ml projects folder notebook and then go to note book folder and add data folder and paste the file and vs code will analyse this and you successfully add the file
Hi Sir, I have a doubt, as you have created "total marks" and "average marks" as two separate independent features, and you are doing EDA for both the features, suggested to create 2 separate models for each of them as well. But my doubt is why do we need to do the same things for both of them separately as average marks is directly correlated with total marks(total marks/3). Am I missing something? Please clarify. Love your videos 😊
Hey Shivam,
It kind of depends on the problem that we are trying to solve. Suppose, what we are doing here is all self learning but there must be a target decided by the stakeholders/clients.
If you are trying to predict a student's eligibility for a scholarship, the total marks might be more important than the average marks since scholarships may be based on total marks.
If the model indicates that the total marks are a strong predictor of a student's performance, it may be harder to understand how much the average marks contributed to that prediction if they are not considered separately.
Also, elimination of noise and variability in the data is a factor here!
If Boosting and Bagging methods are very powerful then why a simple Ridge Reg has more R2 score ??
thanks
Hi Krish, just a small doubt I am facing an issue with installing everything through requirements.txt instead I had to install everything separately. what could be the issue here?
why is that when i tried to import library it says "no module named 'numpy'
same here
I have become mad after again and again setting up the environment,since even after installing all of the libraries but running the code it says that library is not there.
anyone getting an error related to installing catboost??
or is it just me?
same here. Did u find the solution?
Me too - Failed to build catboost
ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects
Please make a end to end project of pyspark
What is the correct sci-kit learn version for 3.8
Im facing Module not found error even after creating the enviornment. Any help me how to fix this
I have doubt while running code I face syntax error in exception handling
Hello Sir , thank you so much for this playlist. Here getting an error : ValueError: A given column is not a column of the dataframe while executing 'X = preprocessor.fit_transform(X)' , even i have done with X = df.drop('reading_score',axis=1) . Please help
use inplace=True
stuning
where's the code video about this session?