Perform Exploratory Data Analysis In Minutes- Data Science| Machine Learning

Поделиться
HTML-код
  • Опубликовано: 3 янв 2025

Комментарии • 150

  • @krishnaik06
    @krishnaik06  4 года назад +35

    Please note Pandas profiling will not work properly when u have many features in ur dataset. I tried some, ran out of memory.

    • @shashwatdev2371
      @shashwatdev2371 4 года назад +1

      Make a video on Block chain vs Data Science

    • @souvikghosh6509
      @souvikghosh6509 4 года назад +3

      in anaconda, pandas-profiling version is 1.4.1, unable to perform ProfileReport() work.

    • @madhan-adavani
      @madhan-adavani 4 года назад

      Even the library Dtale works in similar fashion. It has more features than pandas profiling. But, am not sure about the time it takes for execution.

    • @rajeshbc5405
      @rajeshbc5405 4 года назад

      Is there a way to speed up the profiling by excluding certain columns n Som features like correlation? I tried running this on the credit card default dataset from kaggle n it hung

    • @_I_N_D_R_A__
      @_I_N_D_R_A__ 4 года назад +1

      @@souvikghosh6509 same issue but what we will do plz help me..

  • @kailashmahadar3257
    @kailashmahadar3257 4 года назад +17

    Despite of (me) knowing so little about DS and not knowing how huge it is , I must say you are the person from whom I came to know that I still have many things to learn. Thank you sir for your unconditional effort. I would have had to pay thousands or lakhs for the knowledge you are providing free of cost.(I came to know DS exists because of you)

  • @sreshtasharon1002
    @sreshtasharon1002 3 года назад +3

    I haven't seen any DS RUclips channel better than this ..hats off Krish!!

  • @sagarkawade2541
    @sagarkawade2541 2 года назад +1

    wow how amazing the exploratory data analysis did in only few minutes .
    great sir thank u

  • @VikashKumar-ty6uy
    @VikashKumar-ty6uy 4 года назад +4

    Wah ....Maza aa gaya, yeh concept dekh ke

  • @sachinjoshi187
    @sachinjoshi187 4 года назад +4

    Thank you so much..This will help me a lot at work. Really like your energy level and the easy way of explaining the concept.Thank You.

  • @KalyanGk0
    @KalyanGk0 3 года назад

    Every time i stuck your videos clear all my doubts ..thank you krish 😊

  • @ukuleleguy16cs36
    @ukuleleguy16cs36 4 года назад +3

    This is amazing library for EDA..

  • @midhunskani
    @midhunskani 4 года назад +2

    Thanks for introducing me to this amazing library. Im good data pre processing and cleaning stuff. But i wasnt good in data visualization. This library would be very useful to people like me. Also the quality of your videos has vastly improved with a good microphone

    • @gayatrikvr1111
      @gayatrikvr1111 4 года назад +1

      Hi Midhul
      I have just started with EDA.
      Can you please help me the sources that helped you master data processing and data cleaning.
      Thanks 😊

    • @midhunskani
      @midhunskani 4 года назад +1

      @@gayatrikvr1111 When did u start programming ? If you are completely new to programming then i would recommend taking a udemy course which is pretty cheap to get just started with the basics. After that if you need more help then reply

  • @abhijeetwaghchaure6504
    @abhijeetwaghchaure6504 3 года назад

    It actually makes coding so easy..Thanks Krish..Please make such videos which informed us more and more newest libraries.

  • @dheerajkumark2268
    @dheerajkumark2268 4 года назад +11

    Kindly make video on pros and cons of ML algorithms and it reason behind evolution of algorithms

  • @chinmayjape4981
    @chinmayjape4981 4 года назад

    Great Library and Awesome Video... Thanks for sharing this....

  • @koruvappaspandu4180
    @koruvappaspandu4180 4 года назад +1

    This library is awesome #Thank u Krish Bro

  • @aadityajaiswal2225
    @aadityajaiswal2225 4 года назад +1

    Thanks sir it helped me a lot 😁😁
    Excellent video 👌👌

  • @jagrutisheth1996
    @jagrutisheth1996 3 года назад

    Something amezing I learn today, Thank you so much

  • @sankhabanerjee3533
    @sankhabanerjee3533 4 года назад +2

    Krish,this is amazing ⭐
    Thanks for sharing, it will help a lot

  • @mdyousufuddin
    @mdyousufuddin 3 года назад

    Amazing, Saved a lot of time. ThankYou very much..........

  • @vvkwadhwa
    @vvkwadhwa 4 года назад

    Boss it's really awesome and thanks a lot for sharing your videos with upcoming features...

  • @rakeshreddy3750
    @rakeshreddy3750 4 года назад +1

    Most amazing ds video I've ever seen

  • @datasciencewithraghav
    @datasciencewithraghav 4 года назад +2

    i learned something new today. Thank you

  • @darshitsolanki7352
    @darshitsolanki7352 4 года назад +2

    Amazing yrr krish no code nothing man soo so esy 🙏🙏🙏🙏😂😂😂😂😂😂😂😂😂😂😂😂😂😂

  • @albertopenalver1435
    @albertopenalver1435 2 года назад

    Very amazing. Thanks for sharing

  • @Drkalaamarab
    @Drkalaamarab 3 года назад

    ❤️ Brilliant demonstration

  • @akhilrapalli4118
    @akhilrapalli4118 4 года назад

    Great video krish 👏👏👏👍🏻👍🏻👌

  • @madhukumaryadav7194
    @madhukumaryadav7194 4 года назад

    Thank You Krish . This seems to be a wonderful library

  • @yogenderkushwaha5523
    @yogenderkushwaha5523 4 года назад +1

    This is amazing sir. really worked

  • @shashisharde8016
    @shashisharde8016 4 года назад

    thankyou krish .you have been amazing.
    thanx for all the help.

  • @prasunprakash2297
    @prasunprakash2297 4 года назад +6

    i tried installing pandas-profiling, its throwing error: no attribute called to_widget

  • @sandipansarkar9211
    @sandipansarkar9211 4 года назад

    Great explanation. Need to practice in Jupyter notebook which is pending.Thanks

  • @anuragmishra6262
    @anuragmishra6262 4 года назад +2

    This is Fantastic !!!!

  • @mohanmike8780
    @mohanmike8780 4 года назад

    Learnt new skill 👌 thank u Krish bro

  • @gauravsharma-bl2sw
    @gauravsharma-bl2sw 4 года назад

    Hi Krish, I am a grad student who completed my Masters from the University Of Arizona and now going for Ph.D. in this field. I Have one doubt here. The procedure I used to follow is this. I always do comparing and finding missing values first and then move towards scaling.
    Variable Identification; X, y => #cat, #Num
    Describe the data, info, head(), etc. => read the data properly, get an insight into it.
    Outliers feature-wise
    Missing values if #really large, then first fill them separately, separate them form outliers.
    else check for outliers
    For missing value treatment: dropna, fillna, Imputer(mean, median, mode), predictive Modeling(will gonna get too behaved data), K-NN( k=? The dataset should not be too large).
    For outliers:
    Visualization - Box plot, scatter plot, hist plot
    Detect - Box plot; -1.5 IQR to 1.5 IQR
    SD* - (2-3) x SD
    5th to 95th %tile
    Mahalanobis distance
    Remove the outliers - drop them, natural log, reduce the wgts, treat them separately
    Entire dataset outliers
    Imbalance Data - final class
    Univariate correlation - Cat-count %age, cont curve, skewness, kurtosis
    Treat skewness and kurtosis
    Feature selection / PCA
    Feature correlation/bivariate correlation - if any two are the same, drop either, helps in visualization the relationship between them.
    Cat - Cat - cat1 table, cat2 table, table(cat1 v/s cat2), chi2 test, see if you need to treat them separately
    Cont - Cont - scatter plot, correlation = cov(X,Y)/(varX VarY)^0.5
    Cat-Cont - see if you can treat them separately, separate them by tables and test p-values (z-test, t-test, ANOVA, Wilcoxon, Friedman)
    Feature engineering -
    Variable transformation - either sqrt, cube root => left skewed [cubic (-, +, 0), sqrt (+,0)] or loge=>right skewed[except 0, -]
    Done to convert non-lin to liner relationship
    Can use binning
    Variable creation

  • @skyrayzor3693
    @skyrayzor3693 3 года назад

    Thank you for sharing this thing this was very helpful

  • @niharmhaske9778
    @niharmhaske9778 4 года назад +1

    Great work bro

  • @krishnasarathmaddula194
    @krishnasarathmaddula194 4 года назад

    This is too good. thanks for educating us.

  • @debarshichanda8588
    @debarshichanda8588 4 года назад +2

    This is amazing 😍

  • @prasadvlogs5510
    @prasadvlogs5510 4 года назад

    Amazing Krishna

  • @ravibhat2849
    @ravibhat2849 4 года назад +5

    Is okay to prefer this method compared to the manual one?

  • @AyushGupta-kp9xf
    @AyushGupta-kp9xf 4 года назад +1

    Sir will you teach in the ineuron course ?

  • @niranjanjamkhande3773
    @niranjanjamkhande3773 4 года назад +1

    Fantastic...!

  • @niyazahmad9133
    @niyazahmad9133 4 года назад

    Osm i never heard abt this just amazing..

  • @AshitDebdas
    @AshitDebdas 4 года назад

    with one shot we can get 98% idea about data set..

  • @arjyabasu1311
    @arjyabasu1311 4 года назад

    Amazing video sir !!!

  • @rohandas1478
    @rohandas1478 3 года назад

    This is really cool.!!!!

  • @umabansod
    @umabansod 4 года назад

    Its Amazing 👍

  • @leutrimhajrullahu3521
    @leutrimhajrullahu3521 3 года назад

    He is the chosen one...

  • @adilgondal7568
    @adilgondal7568 4 года назад +1

    bro, thank you sooooo much for this. you are awesome.

  • @GodModeGamingGG
    @GodModeGamingGG 3 года назад +1

    I'm getting error at the code: profile = ProfileReport(df, title = 'Profiling Report', explorative=True)
    It's saying, TypeError: _plot_histogram() got an unexpected keyword argument 'title'
    I've installed pandas-profiling. Am I missing anything?

  • @evan7306
    @evan7306 4 года назад +1

    Thanks! Can you teach us about tranformers and attention model?

  • @raiyyantaukueer2944
    @raiyyantaukueer2944 4 года назад +1

    How to check which version of pandas profiling is installed
    I tried pandas_profiling.__version__ but it didn't work??

  • @vibhusw9794
    @vibhusw9794 4 года назад +1

    Is it required the data to be normalized before using this lib and also if we are using this lib, then is Manual EDA still required?

  • @vervitkhandelwal
    @vervitkhandelwal 4 года назад +2

    Sir ,Why do i need ineuron course if i can study from study material you have provided ?

    • @krishnaik06
      @krishnaik06  4 года назад

      Real world projects and internship is the main thing and definitely the syllabus is advanced

    • @basavarajpatil9821
      @basavarajpatil9821 4 года назад +1

      @@krishnaik06 where to find internship sir.

  • @allieubisse470
    @allieubisse470 4 года назад +6

    I've been using this Library and gives me errors via Google colab.

    • @RealSlimShady7
      @RealSlimShady7 4 года назад +1

      I was stuck in the same problem with Colab, I have to pip install it everytime and then it worked for me. Try it out.

    • @aryapulkit8886
      @aryapulkit8886 4 года назад +1

      here is the solution !! ruclips.net/video/pLxgt20kKWU/видео.html

    • @viveksingh881
      @viveksingh881 4 года назад

      is this library reliable or sufficient enough for eda?

  • @saiprakash5224
    @saiprakash5224 4 года назад

    Great Video!!

  • @rrrprogram8667
    @rrrprogram8667 4 года назад +1

    Anything similar library in R???

  • @Uma_maheswar_rao
    @Uma_maheswar_rao Год назад

    sir can you make a video on reverse engineering. It will very help full for begineers, please sir.

  • @nikhilgaikwad9954
    @nikhilgaikwad9954 4 года назад +2

    Code with harry told about this yesterday

    • @krishnaik06
      @krishnaik06  4 года назад +2

      Wowlet me check

    • @vish8877
      @vish8877 4 года назад +1

      I also saw this on medium today morning, but you made it interesting

  • @chintanchitroda5511
    @chintanchitroda5511 4 года назад +1

    you should try eazeml package

  • @NeeRaja_Sweet_Home
    @NeeRaja_Sweet_Home 4 года назад

    Thank You !!! for the great information

  • @kosanaraghusai6106
    @kosanaraghusai6106 2 года назад

    I had a doubt regarding EDA , for suppose I had a dataset that contains 3 folders , in that 3 folders I have files with .jpeg format. Now , I need to perform EDA on this dataset, and also Can I convert the data into csv file????

  • @tusharrewatkar3805
    @tusharrewatkar3805 4 года назад +2

    Just by executing one line of code we can do whole EDA.This is good but then everyone can do whole EDA what will be the difference maker?

    • @joe8ones
      @joe8ones 4 года назад +2

      exactly..i feel that very soon, maybe data scientists would be out of job, everything is becoming automated

    • @MuhammadAhmed-jm1bs
      @MuhammadAhmed-jm1bs 3 года назад

      That's where the beauty of DS comes, you just cannot analyze the data by yourself, you need to have the understanding of various DS concepts in order to get all this information. Just the information is not enough and even this library works on limited datasets. So no worries you cannot replace DS over a certain library.

  • @shashanks2024
    @shashanks2024 4 года назад +1

    Hey krish. Thanks for this video. Also I've got a doubt that what if the dataset is huge. And this ends up taking a lot of time. Is there a way where we could reduce the time by considering not the entire dataset but only a part of it?

    • @kunalgoyal8529
      @kunalgoyal8529 4 года назад

      I think you should drop a few columns and then pass it for profiling.

  • @shashwatdev2371
    @shashwatdev2371 4 года назад

    Make a video on Block chain vs Data Science

  • @pradeepkumar-wh5oq
    @pradeepkumar-wh5oq 4 года назад

    Sir i get an error while installing pandas-profiling, its not installing astropy and shows like setup for astropy not build

  • @shankybatra5487
    @shankybatra5487 2 года назад +1

    Hi Krish, This is awesome and it really reduced time in doing EDA. Just one question i have? When i am trying the command "profile.to_widgets() " it gives an error as " TraitError: n_rows and n_columns must be positive integer ".
    i tried profile.to_notebook_() and it works but i am not sure what happend with widget command.
    please advise.
    thanks

  • @FunnyGamer1
    @FunnyGamer1 4 года назад

    video on whole skillset needed data analysis for b-tech undergraduate?

  • @jeevankumar3527
    @jeevankumar3527 4 года назад +1

    Soo we overcome the use of tableau

  • @mugisharonald6197
    @mugisharonald6197 4 года назад

    Okay this is so good but i have a question, Is using this library enough to perform EDA on every data/ dataset? Or are there reasons where EDA would be done from scratch. I mean if pandas-profiling is there why is it that some people do not use it for Exploratory Data Analysis?

  • @ashifaappu6750
    @ashifaappu6750 4 года назад +1

    sir when i have given the code :
    profile = ProfileReport(df,title='pandas report on iris',explorative=True)
    TypeError: _plot_histogram() got an unexpected keyword argument 'title'

  • @Taranggpt6
    @Taranggpt6 4 года назад +1

    Why am i getting different view also parameters like title, explorative are giveing me error and only ProfileReport(df) is working for me and giving ne output.. but not including all the things ..

    • @krishnaik06
      @krishnaik06  4 года назад +3

      Yes it will work too

    • @Taranggpt6
      @Taranggpt6 4 года назад +2

      @@krishnaik06 Hi krish,. But in that way it is not giving ne the all the information like you got

    • @darshitsolanki7352
      @darshitsolanki7352 4 года назад

      Hey krish my explorative and .to_widget() is also not working pllzzzzzzzzzzzz help🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏someone

    • @Taranggpt6
      @Taranggpt6 4 года назад

      @@darshitsolanki7352 same for me bro but still without using .to_widget() output is ready as krish also told it will work too.. but thing is we are getting lesser information compared to what krish got in his video

    • @darshitsolanki7352
      @darshitsolanki7352 4 года назад

      @@krishnaik06 hey krish plzz help my explorative as well to_widgets is not working plzzz help🙏🙏🙏🙏🙏🙏

  • @vish8877
    @vish8877 4 года назад +1

    Awesome

  • @manishshinde4954
    @manishshinde4954 4 года назад

    Hi Krishna, Can you make videos on streamlit? This was very informative.

  • @satyendrasingh3805
    @satyendrasingh3805 4 года назад

    _plot_histogram() got an unexpected keyword argument 'title' . How to resolve this error ?

  • @AshishKashyap-zq8ws
    @AshishKashyap-zq8ws 4 года назад +2

    sir data science ke liye python mein expert hona jaruri hai

  • @karamikoalexanderkrisna3285
    @karamikoalexanderkrisna3285 4 года назад

    Can you do unsupervised learning for GIS/Remote sensing projects?

  • @nizamudeenk8744
    @nizamudeenk8744 4 года назад +1

    Its not working in google colab

  • @raghubandi9076
    @raghubandi9076 4 года назад +1

    How can we deploy this to larger audience

  • @nilavasen8631
    @nilavasen8631 4 года назад

    Dear Krish, Can you please let me know the key topics needed to learn Data Analytics area ( not the actual ML algo part) using Python ?
    Do we need to have in depth knowledge of Stat/Math to become Data Analyst ?
    Thanks in advance.

  • @sandipansarkar9211
    @sandipansarkar9211 3 года назад

    finished watching

  • @asagaming7
    @asagaming7 4 года назад

    Great info!!

  • @pawanpu3202
    @pawanpu3202 4 года назад

    if we are using small data set, with limited variables, is there any limitations in using this library?

    • @krishnaik06
      @krishnaik06  4 года назад +1

      No limitations as such

    • @pawanpu3202
      @pawanpu3202 4 года назад

      Thanks Krish!! You are great, I love to watch your tutorial. Keep providing us tutorials 🙏🙏

  • @nothing8919
    @nothing8919 4 года назад

    Thank you alot Sir,

  • @sandipansarkar9211
    @sandipansarkar9211 3 года назад

    finished practice but pandas profiling is not working

  • @darshitsolanki7352
    @darshitsolanki7352 4 года назад

    Hey krish i tried this one but it is giving an error here explorative ha no attribute

  • @pavankulkarni841
    @pavankulkarni841 4 года назад

    Bro What about Competitive Programming is it stopped?

  • @midhileshmomidi2434
    @midhileshmomidi2434 4 года назад

    Have seen this in medium yesterday

  • @sandipansarkar9211
    @sandipansarkar9211 4 года назад

    I have finished practicing in Jupyter notebook but i am facing a error which is

  • @raghavaritti8973
    @raghavaritti8973 4 года назад +7

    there goes 50% of a data analyst's job in 18 mins

  • @snehadixit4222
    @snehadixit4222 4 года назад

    I tried installing PP Library concat() got an unexpected keyword argument 'join_axes', can you please make some proper video on this. thing

  • @himadrijoshi3745
    @himadrijoshi3745 4 года назад

    i really wanted to use this library....but its take eons to deliver outputs

  • @mansiupadhyay5387
    @mansiupadhyay5387 9 месяцев назад

    Im getting javascript error under variables section of the report. Any fixes anyone?

  • @nibinjoseph2136
    @nibinjoseph2136 4 года назад

    ModuleNotFoundError: No module named 'pandas_profiling'
    tried many times installing.but didn't work!
    can somebody help???

    • @krishnaik06
      @krishnaik06  4 года назад +1

      Pip install pandas-profiling is the command

    • @nibinjoseph2136
      @nibinjoseph2136 4 года назад

      @@krishnaik06 but still not working. tried in administrator as well

    • @nibinjoseph2136
      @nibinjoseph2136 4 года назад

      says Requirement already satisfied but y error again in Jupiter notebook?

  • @siddharthchauhan3404
    @siddharthchauhan3404 4 года назад

    If i am trying to do on my dataset i am getting KeyError: 'Requested level (var1) does not match index name (None)'. Can anyone help me out with the error?

  • @jsverma143
    @jsverma143 4 года назад +3

    Hi Krish i am getting error while installing pandas-profiling
    ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

    • @mithunsarkar7495
      @mithunsarkar7495 4 года назад

      Same for me as well.. Also it uninstalled all important existing libraries.

  • @AshitDebdas
    @AshitDebdas 4 года назад

    would you mind to recommend me some other really good channel related to data science

  • @ranjeethp2818
    @ranjeethp2818 4 года назад

    I'm getting TypeError: concat() got an unexpected keyword argument 'join_axes' .
    I've updated to latest version, but still not working
    Anyone please help me with this.

    • @arjundev4908
      @arjundev4908 4 года назад

      same here too..

    • @karanbhuva7733
      @karanbhuva7733 4 года назад

      i am also getting this error

    • @karanbhuva7733
      @karanbhuva7733 4 года назад

      Try to install with
      Pip install pandas-profilling[notebook,html]
      It will work properly

  • @kirankumarj8229
    @kirankumarj8229 4 года назад

    HI Krish, I want to join in youtube channel and I am a student of Ineuron. How to join the channel. Please let me know.

  • @kilarunagarjun9277
    @kilarunagarjun9277 3 года назад

    TraitError: n_rows and n_columns must be positive integer (Can anybody help me)

  • @001Debjeet
    @001Debjeet 4 года назад +1

    profile = tableau

  • @khushpatelmd
    @khushpatelmd 4 года назад +1

    I am getting
    TypeError: _plot_histogram() got an unexpected keyword argument 'title'
    Update: Install it directly from github as there is some issue with latest version of pandas
    pip install github.com/pandas-profiling/pandas-profiling/archive/master.zip