Tutorial 4-End To End ML Project With Deployment- Data Ingestion Implementation Line By Line
HTML-код
- Опубликовано: 8 мар 2023
- In thi videos we will be implementing the data ingestion where we will be performing necessary tasks such reading the dataset,dividing the dataset into train and test and saving it in the artifact folder
Project Code: github.com/krishnaik06/mlproject
Join iNeuron's Data Science Masters Course with Job Guaranteed Starting From April 3rd 2023
ineuron.ai/course/Full-Stack-...
Join this channel membership to get access to materials:
/ @krishnaik06
check out the end to end project playlist
• End To End Data Scienc...
Check Out My Other Playlist
Python Playlist:
Python In English: • Complete Road Map To B...
Python In Hindi: • Tutorial 1- Python Ove...
Stats Playlist:
English 7 Days Statistics Live Session : • Live Day 1- Introducti...
Hindi: Stats Playlist: • Starter Roadmap For Le...
Complete ML Playlist: • Complete Road Map To B...
Hindi: ML Playlist: • Introduction To Machin...
5 DaysLive Deep Learning Playlist: • 5 Days Live Deep Learn...
Complete Deep Learning Playlist:
• Why Deep Learning Is B...
Check out the entire playlist of end to end project implementation
ruclips.net/p/PLZoTAELRMXVPS-dOaVbAux22vzqdgoGhG
Hi Krish.. what is this concept in python programming? train_path : str=os.path..
the artifacts folder was not created when I runned the programe. could you please help?
@ krishnaik06
To the point, effective, made complicated setup simple & optimized the course for beginners. A lot of hard work seen. Thank you!
Absolutely love the content . You are revolutionizing the education in data science field.
Love your content Krish. Watching your tutorials, It's a good start of my day. Thanks a lot for all your efforts.
Beautiful, Thank you!
I love the video! It's very intuative and informative as you explain the line-by-line coding! Keep it coming please!
you providing lots knowledge sir but getting above the head 😇😇😇
Amazing thank you!
Very informative session. Please keep it up.
Really Incredible project, just amazing 🤩
i love end to end ML series ^^
Thank you Krish.
Hi Krish, This is an amazing tutorial with very relevant contents in easy to understand format. One of the best. Thanks
import os
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
from exception import CustomException
from logger import logging
import pandas as pd
from sklearn.model_selection import train_test_split
from dataclasses import dataclass ...............................whoever is facing src module not found or log module not found please , replace ur code
python -m src.components.data_ingestion run by this command
Brother it really helped me. I can't express my happiness in words. Thanks a lot.😃
Bro it really work thank you. also plz explain this line sys.path.append(str(Path(__file__).parent.parent)).
code that work for me.
import os
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
from exception import CustomException
from logger import logging
import pandas as pd
from sklearn.model_selection import train_test_split
print('it is here')
It still doesn't work, can you share your code ?
Thanks a lot, it took me more than 4hrs :)
Excellent
just keep continue sir krish
Krish you're a blessing to many. Thanks so much for sharing your knowledge with us. Am glad to be associated with ineuron family . Blessings 🙌
thanks
Hiii Sir Use artifacts/ to ignore folder
Hi all, I am just struggling to understand that what we need to retrain in production again. Because once we train the model, we export it and write the prediction or inferencing code and that code needs to be running in production to run predictions or inferencing on production data. Kindly help me to understand this or am I missing something.
Is it recommended to have multiple functions for different database ingestion, or it should be done in the single function?
thank you
@Krish Naik
Good tutorial. But How to work when we use Cross validation instead of train test split ??
thanks for this video series , sir
10-07-2023
Thank u so much sir for for this project sessions... u r taking so much efforts for us. best of best project explanations in every video. From Nashik(Maharashtra)
Hi bro,
On which topic this project is ??
Hi sir, do we need any paid version software for this project?
Your are great sir i working on project I have one question So im working on a project and i have a column which is car mdels but it contains numeric and letters so i was wondering if i did an encoding method the numbers could be lost so i want to know is there another way
Please sir any solution let me know
how can the number be lost
Hey Krish, I have bee following your series from Day 1 and it is amazing. I had a small doubt, from where can I get this data? I tried to use raw version from your git but kept getting error. Or can you give the name of dataset so it would be easier to search for it.
I have same doubt please anyone know please help me
@@Pravin33unique95 tell me
@@Pravin33unique95 data set link is provided in EDA file
My logs are displayed on the terminal screen but not being added to the log file in the explorer. Any fix for this?
same did u find a solution ?
What should be my approxh to create multi-class image classification and captioning without using Tensorflow, Keras, or sklearn?
Any suggestions?
A great video, but I had have a issue with modules instances, for instance, when I put: python src/components/data_ingestion.py the process trigger a ModuleNotFoundError: No module named 'src'.............. but, if write this comand line: python -m src.components.data_ingestion the process works, so... why isn't work the instance: from src.exception import CustomException ????? please I'll wait any feedback...
Same issue
same issue
Use this in terminal
python -m src.components.data_ingestion
Hi krish, I am facing issue while installing requirements.txt
hi all, when i am restarting the project from vs code ia m unable to run the terminal of venv, do we have to activate again for each set up the env?
what all are the things to activate when starting up
Yes I'm facing the same issue @krish sir plz resolve about it
launch vs code from anaconda command prompt and then you can use conda to activate your venv environment, as it needs to use conda environment
Getting error by running data_ingestion.py
Error : module not found from src.component.data_transformation
Python will search for modules in the same directory as the script, i.e. src/components/:
to avoid this, i run the follwin line in the terminal:
python -m src.components.data_ingestion
@@jmcloudpro I had the same issue. Thanks to you, I was able to resolve the issue.
@@jmcloudpro thanks bro
I can not save train_set and test_set to path, I can save raw data but can not save after splitting. does anyone know the solution?
What is the difference between 'notebook/data/stud.csv' and 'notebook\data\stud.csv'?
In my PC I have the relative path how: 'notebook/data/stud.csv'
How you create this path and how you import data set please help i don't get it
its different for different os system, u probably using mac
@Krish Naik, u added .artficats in git ignore instead of artifacts, that's why the folder pushed into git
No worries but i am glad u understood why
@@krishnaik06 the artifacts folder was not created and I cant continue the tutorial because of this, could please help?
d:\Projects\mlproject\venv\python.exe: can't find '__main__' module in 'src/components' Can anyone help me fix this error as I am nnot able to get this fixed to generate the filrs
same problem
How to do data ingestion for image data?
hello guys,
I have veen trying to execute data ingestion but keep getting an error as" from src.exception import CustomException
ModuleNotFoundError: No module named 'src'", I have verified __init__.py in src folder but still no luck. Any leads as to how to resolve this issue.
did u get it
Go to your root folder of your project through command line and type: export PYTHONPATH=$PYTHONPATH:`pwd`
Instead, you can run: python -m src.components.data_ingestion
@@arangembu974 thanks bro it's working
@@rockgangadharan8531 it is showing no module name logger, any idea ?
I am getting module error , no module named src.exception why I am getting this , still I have src.exception module can any one clarify my doubt
I am getting the same error, did you find the solution?
os.makedirs(os.path.dirname(self.ingestion_config.train_data_path),exist_ok=True) sir in this code why you have set .train_data_path ?
i think he is creating folder for artifacts
but why only for train_data_path , and not for test_data_path or raw_data_path@@Veerbasantreddy111
For me it displaying the error like, 'src' module is not recognized though I am using --init__.py in src folder. pls give suggestion.
same for me
@@arjitsharma2503 same problem guys did u find any solution?
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
this will fix the issue of no src module found
I cant keep up, Like in the previous tutorial it was on ipynb file and all the things was sorted out there but in this tutorial we are creating some new stuffs...like which is important and which i should follow
(C:\Deependra\generic-ML-project\venv) C:\Deependra\generic-ML-project>python src/components/data_ingestion.py
Traceback (most recent call last):
File "src/components/data_ingestion.py", line 3, in
from src.exception import CustomException
ModuleNotFoundError: No module named 'src'
Even when I have the src folder and everything is in order, I still receive this issue. Could someone please explain why?
import os
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
from exception import CustomException
from logger import logging
import pandas as pd
got solution bro?
Hi krish is this series only for experienced people or for freshers also?
For everyone
Is mlops good for freshers also
sir why did we wrote the make directory code for only the train data path and not for test and raw
did you get the answer for that?
this is used just to create artifacts folder
facing an issue "ModuleNotFoundError: No module named 'src' " someone please tell the solution to resolve it
import os
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
from exception import CustomException
from logger import logging
import pandas as pd
from sklearn.model_selection import train_test_split
from dataclasses import dataclass
@dataclass
class DataIngestionConfig:
train_data_path = os.path.join('artifacts',"train.csv")
test_data_path = os.path.join('artifacts',"test.csv")
row_data_path = os.path.join('artifacts',"data.csv")
class DataIngestion:
def __init__(self):
self.ingestion_config = DataIngestionConfig()
def initiate_data_ingestion(self):
logging.info("Entered the data ingestion method or component.")
try:
df = pd.read_csv('Notebook/data/stud.csv')
logging.info("Read the dataset as dataframe.")
os.makedirs(os.path.dirname(self.ingestion_config.train_data_path), exist_ok = True)
df.to_csv(self.ingestion_config.row_data_path, index = False, header = True)
logging.info("Train test split initiated.")
train_set, test_set = train_test_split(df, test_size = 0.2, random_state = 42)
train_set.to_csv(self.ingestion_config.train_data_path, index = False, header = True)
test_set.to_csv(self.ingestion_config.test_data_path, index = False, header = True)
logging.info("Ingestion of the data is completed.")
return(
self.ingestion_config.train_data_path,
self.ingestion_config.test_data_path
Type In command Prompt/ powershell : python src/components/data_ingestion.py or python -m src.components.data_ingestion
ModuleNotFoundError: No module named 'src' Getting this error even with python 3.9
Kindly guide
python -m src.components.data_ingestion rub it by this command
python -m src.components.data_ingestion
excellent i, it worked, I was struggling with this for days! you're a life saver@@swatijadhav6304
@@MR_Struggler-J thanks for this. however, when I run that, i get No module named logger.
I don't think this is for freshers, you need very thorough knowledge to understand it. Totally going above the head. 🤕
Don't worry. Go step by step. Even I used to struggle with it but after practice I am able to understand it
Just complete python stats and eda and learn the algorithm which is used in this project you will definately understand it
can anybody help me as i am getting a error which state module not found error
as src
import os
import sys
# Add the project's root directory to sys.path
# sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
from src.exception import CustomException
from src.logger import logging
import pandas as pd
from sklearn.model_selection import train_test_split
from dataclasses import dataclass
@dataclass
class DataIngestionConfig:
train_data_path: str=os.path.join('artifacts',"train.csv")
test_data_path: str=os.path.join('artifacts',"test.csv")
raw_data_path: str=os.path.join('artifacts',"data.csv")
class DataIngestion:
def __init__(self):
self.ingestion_config=DataIngestionConfig()
def initiate_data_ingestion(self):
logging.info("Entered the data ingestion method or component")
try:
df=pd.read_csv('notebook\data\stud.csv')
logging.info('Read the dataset as dataframe')
os.makedirs(os.path.dirname(self.ingestion_config.train_data_path),exist_ok=True)
df.to_csv(self.ingestion_config.raw_data_path,index=False,header=True)
logging.info("Train test split initiated")
train_set,test_set = train_test_split(df,test_size=0.2,random_state=42)
train_set.to_csv(self.ingestion_config.train_data_path,index=False,header=True)
test_set.to_csv(self.ingestion_config.test_data_path,index=False,header=True)
logging.info("Ingestion of the data is completed")
return(self.ingestion_config.train_data_path,
self.ingestion_config.test_data_path)
except Exception as e:
raise CustomException(e,sys)
if __name__=="__main__":
obj=DataIngestion()
obj.initiate_data_ingestion()
this will solve your problem
hi sir why this error i am getting : ModuleNotFoundError: No module named 'src'
same for me
plz reply @krishnaik
same here, kindly reply someone
try to use this:
python3 -m src.components.data_ingestion
@@akashkapil2184 it doesnt work for me . been hours working to solve this .damn
@krishnaik06 I can't move forward :( with tutorials as I have this error: ModuleNotFoundError: No module named 'src'
did u resolve it pls ?
@@hadilbrahem3193 some are using this command: python -m src.components.data_ingestion and it works!
try-> python -m src.components.data_ingestion
same
@@GaganaMD it is not working no files are created
Getting an error saying " name s is not defined"in the __init__.py file
Getting error by running data_ingestion.py
Error : module not found from src.component.data_transformation
Were you able to solve it? I am getting the same error. @@mohdsameer9214
File "src/components/data_ingestion.py", line 3, in
from src.exception import CustomException
ModuleNotFoundError: No module named 'src'
got this error any solutions?
i got the same error
same here bro
Did you resolve it??
did u resolve ut ?
@@hadilbrahem3193 yes kindly please check __init__.py file.that is required to call outside this as a module
i wrote .artifacts in .gitignore file but still its not ignoring that file and its showing me that its exceeding 100mb
22:01 src module is not found error occurred. Will anyone help me out ??
I am facing the same issue. If you got the solution, please tell me too
sir artifacts are not being created and even logs. help
update: solved
@@Veerbasantreddy111 How did you solved this issue? I am facing the same
@@TheSuddi123 Have you solved the issue? I see mistakes in my code that libraries such as datalasses and datetime overwrite the system libraries
@@nickchern392 yeah silly mistake in code
@@TheSuddi123 hello how did u solve it ?
Hi krish,
I am getting this error:
(D:\Vybhav\end to end ML project\venv) D:\Vybhav\end to end ML project>python src/components/data_ingestion.py
Traceback (most recent call last):
File "D:\Vybhav\end to end ML project\src\components\data_ingestion.py", line 3, in
from src.exception import CustomException
ModuleNotFoundError: No module named 'src'
but i have src folder and in that exception, logger fliles are there with components folder. Please help me in this.
same issue i am facing
I am facing the same issue
use this command : python -m src.components.data_ingestion
Heyy! Were you able to solve it? I am facing the same issue.
Were you able to solve it@@faheemkhan9786
where is data set sir ??
You added the artifacts folder in .gitignore. Why did it still get committed?
got error while run data_ingestion ----ModuleNotFoundError: No module named 'dataclasses'
did you find solution?
Got error no module named src
did you solve it?
@@swL1941 yes
@@akj3344 How?
@@swL1941 This worked out for me:
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent.parent))
@@eduardoquintanilla9904 Thanks, it actually solved my error.
error : ModuleNotFoundError: No module named 'src'
code fix :
import os
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
from exception import CustomException
from logger import logging
import pandas as pd
from sklearn.model_selection import train_test_split
print('it is here')
This solved my problem. Thanks!
15:47 No it's not easy. My brain is out!
Bounce ho raha hai sir 😟😕☹
No part 5 ?🥲
released now
ImportError: cannot import name 'ModelTrainerConfig' from 'src.components.model_trainer'
Solution: Comment this line of code
#from src.components.data_transformation import DataTransformation
#from src.components.data_transformation import DataTransformationConfig
#from src.components.model_trainer import ModelTrainerConfig
#from src.components.model_trainer import ModelTrainer
the artifacts/ directory was already added to the Git repository before you added it to .gitignore, and Git is still tracking it.
git rm -r --cached artifacts/
git commit -m "Remove artifacts/ directory from Git repository"
git status
git push