First hour with a Kaggle Challenge

Поделиться
HTML-код
  • Опубликовано: 20 мар 2020
  • Neural Networks from Scratch: nnfs.io
    Channel membership: / @sentdex
    Discord: / discord
    Support the content: pythonprogramming.net/support...
    Twitter: / sentdex
    Instagram: / sentdex
    Facebook: / pythonprogramming.net
    Twitch: / sentdex

Комментарии • 323

  • @SexySnorlax
    @SexySnorlax 4 года назад +442

    "keep social distance"
    sir know your audience we already are

  • @suzikang9283
    @suzikang9283 4 года назад +79

    0:00 intro to dataset
    3:18 browsing through files
    5:58 loading files into python program
    8:43 (cleaning & structuring) getting keys from text and storing into variables
    15:07 thinking about “extracting meaning from text” --> NLP. i.e. what are we looking for? --> keywords in papers that are consistent
    19:03 looking for “incubation” in text
    32:18 using regular expressions
    42:08 plotting
    44:20 adding rest of files to script
    45:21 looking at other kernels on kaggle

  • @alexdavis9324
    @alexdavis9324 4 года назад +121

    Thank you for not editing out the mistake at the 30 minute mark. That makes me feel a lot better about my own silly mistakes.

    • @sentdex
      @sentdex  4 года назад +21

      Heh, happy to keep it realistic :D

    • @wongright
      @wongright 4 года назад +1

      @@sentdex Improved approachability was the big benefit when watching you debug in real time. Thank you.

  • @hammerofheaven1313
    @hammerofheaven1313 4 года назад +142

    Not to be confused with kegel challenges.

  • @caseymeehan5901
    @caseymeehan5901 4 года назад +5

    So glad you left in that error (printing full text ~30 min mark). It makes me feel so much better :) Thanks for doing this, it is so rad! I was looking at the kaggle competition but I am too much of a noob to know where to start.

  • @sebbecht
    @sebbecht 4 года назад +1

    Just got started browsing kaggle for future challenges two days ago, great to see a series on this :) excited to see how far you take this!

  • @thesitcomaddict
    @thesitcomaddict 4 года назад +1

    Your thought process is so clear! Thanks for showing this was really enlightening to watch :)!

  • @ralphlagos4210
    @ralphlagos4210 4 года назад

    Love this channel! So glad I found it, thanks for uploading :).

  • @fuba44
    @fuba44 4 года назад +102

    I liked this "come as you are" format, could have easily been longer..

    • @pw7225
      @pw7225 4 года назад +17

      In fact, I think this is way better for learning. Since you see the actual process. Like George Hotz' coding sessions.

    • @non_complete
      @non_complete 4 года назад +5

      @@pw7225 I love george's sessions. You might like Jon Gjengset too he has a similar style, mostly does rust development.

    • @parkerdinkins5541
      @parkerdinkins5541 4 года назад +4

      @@pw7225 geohot is an absolute mad lad! this format is definitely much better than structured sessions. it really captures the trial and error process of programming

    • @danielcolomer7815
      @danielcolomer7815 4 года назад

      @@parkerdinkins5541 same here for quantum computing ruclips.net/channel/UC-2knDbf4kzT3uzWo7iTJyw (disclosure I own the channel xD) and that's exactly the reason i chose the style as well! I find showing the real process is the best way to help ppl learn

  • @clearthinking5441
    @clearthinking5441 4 года назад

    Great video Harrison! I really enjoy seeing how you think, it gives the viewer a more accurate picture as to what coding is really like. Keep these videos coming please!

  • @attentiondeficitdisorder
    @attentiondeficitdisorder 4 года назад

    These are so awesome to watch. It really helps to see your logic and thought process. As someone new to trying to process datasets like this, it's great to have confirmation that I'm not doing it some weird, crazy way.

  • @gabrielk3733
    @gabrielk3733 4 года назад

    I've been waching your channel for a few months now...the way you are talking, the way you are thinking, your python knowledge and experience is absolutely amazing for me, you're a GENIUS!

  • @Evilleoleo
    @Evilleoleo 4 года назад

    dude you have so many good videos, was going through your data analysis playlist yesterday, so today this was perfect thanks dude!

    • @sentdex
      @sentdex  4 года назад +1

      Glad you like them!

  • @ankushbisht-0055
    @ankushbisht-0055 Год назад

    Love how you keep the debuging part and small mistakes. Keep doing such great live coding , was looking for such content from a long time.

  • @ramzykaram296
    @ramzykaram296 4 года назад +1

    I can keep watching you programming the whole quarantine time, seriously your videos are so interesting so please do more videos

  • @TiboLatte
    @TiboLatte 4 года назад

    This was really useful please continue ! You're doing awesome work thanks

  • @KylePapili
    @KylePapili 4 года назад

    Very interesting seeing your thought process working through a new dataset like in this vid. Loved it!

  • @BrentBrewington
    @BrentBrewington 3 года назад +1

    you got my like & subscribe, my dude. wow, this was super useful to watch - i'm a Sr Data Analyst looking to go Data Scientist, so looking to learn from more people like you. also kind of interesting to watch this 1 yr later

  • @qaispalekar
    @qaispalekar 4 года назад

    Thanks for maiking this video. It would be great if you make more videos like this. Will get a rough idea of how to tackle such big data.

  • @hewypy9015
    @hewypy9015 4 года назад +8

    loving these live coding videos where you explore real world datasets. thank you!

  • @TheSaintsVEVO
    @TheSaintsVEVO 4 года назад +10

    😂 “that went so fast” - yeah, you do remember you have a supercomputer right?

  • @Luckylesss
    @Luckylesss 4 года назад

    I LOVE these types of videos. Please keep them coming! Maybe even show us your googling to find answers to problems like needing a regex refresher.

    • @sentdex
      @sentdex  4 года назад +3

      Given more time, I would have included that. I've including some of my internet searching in the past, seems like people really enjoyed this format of video, so maybe more to come :)

  • @cyruscuenca
    @cyruscuenca 4 года назад +1

    I'm learning to analyze image data right now, and even though you're analyzing text, I found this really helpful. Thanks!

    • @merth17
      @merth17 4 года назад

      he's just that inspiring

  • @TheAcolossus
    @TheAcolossus 4 года назад +69

    Everyone: What are you working on?
    Me: A Covid-19 machine machine learning model
    Everyone: How does it help us with Covid-19?
    Me: It doesn't

    • @sentdex
      @sentdex  4 года назад +58

      If you're helping to parse through the insanely dense amount of information and research to answer the questions that are being asked, you *are* helping.

    • @owoled282
      @owoled282 4 года назад +2

      Hey, is machine learning squared deeper than deep learning ?

    • @cesarp6761
      @cesarp6761 4 года назад +19

      if it keeps you home for hours doing this.. it does help! :p

  • @RepiGameplays
    @RepiGameplays 4 года назад

    To get the names of folder I usually just use F2 instead of right clicking and rename. I rename stuff a lot and it surely has been helping. Great video!

  • @balthazaromeyer4334
    @balthazaromeyer4334 3 года назад +1

    Sentdex you are one of the best teacher I ever encountered. Keep Strong, Keep Teaching Us! People like you should be Glorified! Leaving your error in he videos is humble and remembers the mortals where we come from.

  • @ramil17998
    @ramil17998 4 года назад

    Really enjoyed the video. Thanks for making all the mistaies and raising my confidence bar :P

    • @sentdex
      @sentdex  4 года назад +1

      Heh, happy to help

  • @nmertsch8725
    @nmertsch8725 4 года назад +2

    About the cite-fields in the JSON data: When you write a scholarly article and use findings of other articles (e.g. to compare your results with them or to build your study on earlier findings), you cite the original articles. "Because others already found out that rotten fish smells (see 1-4), ..." would contain a cite range from 1 to 4, because the cited articles 1 to 4 have shown that rotten fish smells and you build up on that without investigating it yourself.
    In the bottom of the article there is a list of references, where each number is associated with another scholarly article.

  • @MistaT44
    @MistaT44 4 года назад

    This is an excellent series! kudos

  • @shivamshukla438
    @shivamshukla438 4 года назад

    this is really nice i think we can apply more re's and logic to get more information as you suggested like cleaning too

  • @clumsydnkey29
    @clumsydnkey29 4 года назад

    Such a helpful video! Thank you!

  • @erosennin950
    @erosennin950 4 года назад +5

    That's what im talking about a kaggle challenge MAN! big thanks :)) I would like to learn from a pro, how to approach problems and solve them the fastest way possible.

  • @kar-s6716
    @kar-s6716 3 года назад +1

    That print(t) made my day .. 😂😂

  • @jackbillimack7159
    @jackbillimack7159 4 года назад

    You are the man sentdex! It's hard to express how much you have inspired me while introducing great concepts. Keep up the great work.
    Does anyone know if the scientific community is making strides to standardize raw data and move from PDF-type papers that need more cleanup to interactive IPython-type papers that could store all findings? A move like this seems like it would open the flood gates of open-source hypothesis testing and review. Hosting and publicizing poor analysis could be a problem, but I would appreciate any information and opinions folks have.

  • @FlorianLinscheid
    @FlorianLinscheid 4 года назад +19

    Just to answer that very basic question. Make the decimal part optional by grouping it and then make the whole thing a group and you're good to go.
    re.findall(r'( \d{1,2}(\.\d{1,2})? day[s]?)', sentence)

    • @sentdex
      @sentdex  4 года назад +1

      Heh, thanks!

    • @floxire7042
      @floxire7042 4 года назад

      Could you please explain why his technique with the parentheses didn't work ?

    • @sentdex
      @sentdex  4 года назад +3

      @@floxire7042 if you just have 1 set of parentheses, you'll find only examples that match the full string u searched for. But only return the part of the match inside the parentheses

    • @FlorianLinscheid
      @FlorianLinscheid 4 года назад +1

      The main catch here was that he made that one group by using parentheses. Regex will only output what's inside the parentheses then. So to get the whole number again, you need to make the whole expression another group then.
      Putting the first or the second half in parenthesis doesn't matter in this case. I just found it more logical to have always the first two digits, followed by an optional decimal. Other way round works just as well.
      Hope that was clear.

    • @floxire7042
      @floxire7042 4 года назад

      @@sentdex Oh ok thanks I didn't know that

  • @vaibhavkhobragade9773
    @vaibhavkhobragade9773 2 года назад

    You are so swift. It seems you are invincible in coding. I love your walkthrough for the kaggle challenge.

  • @HellTriX
    @HellTriX 4 года назад +8

    I think the most impressive part of this challenge, is a 50 minute challenge of not mentioning that which shall be demonetized :)

  • @DanipBlog
    @DanipBlog 2 года назад

    I'm glad you decided to leave the 'print(t)' blooper in the video 😂😂

  • @Mahmoud_Gabr
    @Mahmoud_Gabr 3 года назад +1

    I’m sure I’m not the first to ask, but please do more videos like this!! The lack of editing is also very helpful. Thank you 👍

  • @junaidmahmud2894
    @junaidmahmud2894 3 года назад +1

    Can you please do some more competitions like this?
    This is amazing?

  • @EranM
    @EranM 4 года назад

    Harrison! Well put video! I very much enjoyed it! You are hilarious!

  • @puneetsingh5219
    @puneetsingh5219 4 года назад +3

    Yo, this video was long due. Thank you.

  • @mdougf
    @mdougf 4 года назад

    Thank you so much! I’ve been so intimidated by even approaching a Kaggle problem!!!!

  • @Mr3zoozee
    @Mr3zoozee 4 года назад

    what a Coincidence
    i was looking for videos like this
    thx sentex

  • @Pythonenthusiast
    @Pythonenthusiast 4 года назад

    I don't know if others mentioned it before, but you got some cool mugs! I guess you can make a video on that as well!

  • @leonshamsschaal
    @leonshamsschaal 4 года назад

    Thank you so much! I have always wanted to do Kaggle competitions but never really known how to approach them.

    • @sentdex
      @sentdex  4 года назад +1

      Happy to help!

  • @connor4440
    @connor4440 4 года назад

    SO happy you left the error in lol, shows that every programmer, no matter the skill level can have stupid little errors like that

  • @ambarishkapil8004
    @ambarishkapil8004 4 года назад

    Nice and Insightful tutorial.

  • @borispapic9510
    @borispapic9510 4 года назад

    Wow just took up this challenge a few days ago but hit a wall and didnt know how to proceed. This is a godsend!

  • @thedemonlord9232
    @thedemonlord9232 4 года назад

    i want more! what i am supposed to be watching while in lockdown?

  • @nickybu_
    @nickybu_ 4 года назад

    For the incubation day regex you can also do: re.findall(r"(\d{1,2}\.?\d{1,2}) day", sentence)
    This will find any integer/decimal number followed by 'day' but only output the number, avoiding the need to split.

  • @shkronjax
    @shkronjax 4 года назад

    very nice. Im glad this knowledge is open source.

  • @fredrickpwol8639
    @fredrickpwol8639 4 года назад +1

    Nice one, way to go 👍.

  • @alexr7530
    @alexr7530 4 года назад

    Thanks for the video. Hope you'll continue the rubric

  • @fuat7775
    @fuat7775 4 года назад

    Hey thanks for the video. Abstract is a list because each paragraph is a text item. Look for the schema file included in the directory, that might help you. Cheers

  • @glennlaciapag7074
    @glennlaciapag7074 4 года назад

    Awesome job sir!

  • @selcukmisir2399
    @selcukmisir2399 4 года назад

    You are the best sentdex!!!

  • @nassehk
    @nassehk 4 года назад

    Hello. Great to see your workflow. I think median is a better measure of finding the average rather than mean in your case because you are looking at a population of incubation times.

  • @SomebodyOutTh3re
    @SomebodyOutTh3re 4 года назад

    22:31 hahaha. Great video thank you!

  • @leosdeoilha
    @leosdeoilha 4 года назад

    Always great videos! Why don’t you use spacy for nlp? It takes a lot of the re out of the way!

  • @pinakeekaushik7803
    @pinakeekaushik7803 4 года назад +1

    print( " Really loving this bro, can you please continue it like for other kaggle challenges too" )

    • @sentdex
      @sentdex  4 года назад +1

      I could try some others like this, sure

  • @sameerzahid3544
    @sameerzahid3544 4 года назад

    I really like how your operating system looks and the text editor 😍👌

  • @classicrockman90
    @classicrockman90 4 года назад

    Definitely look into glob from the standard library. Much easier than nested for loops to pick up files recursively in a folder structure with a pattern like *.json

  • @DP-dc2vv
    @DP-dc2vv 4 года назад

    Super informative, thanks for posting.
    Two thoughts:
    (1) Get VS code or Spyder/Anaconda; I use Spyder for general purpose Python stuff--the iPython integration is the best I've found. VS Code is potentially better still (depending on preferences), as it provides access to a terminal as well (though the iPython implementation is run through Jupyter and pretty janky).
    (2) Re the regex stuff, no shame in googling. There isn't a SINGLE programmer of any sort that doesn't need to google at times. I've been a professional for over a decade and regularly need to google syntax on basic methods, etc. Regex is a much more involved beast, and unless you use it daily I'd be amazed if you remembered syntax needed for specific applications.

  • @mohammedimran9353
    @mohammedimran9353 4 года назад

    Man... Huge fan of u bro

  • @teresitaeyzaguirre4741
    @teresitaeyzaguirre4741 Год назад

    new Fave channel

  • @sdmit2000
    @sdmit2000 4 года назад

    you are simply the best!

  • @nighteagle9961
    @nighteagle9961 4 года назад

    38:13
    I think this works:
    single_day = re.findall(r" (?:\d{1,2}[.])?\d{1,2} [D,d]ay", sentence)
    Thanks for the videos and keep this good work. Learning a lot from you.

  • @ayhamkanhoush2912
    @ayhamkanhoush2912 4 года назад

    Hey man , thanks for the video ,great as usual , if you have some time to apply some NLP on the same data would be perfect :D

  • @SybotLV5
    @SybotLV5 4 года назад +4

    in university theses you can have abstracts in multiple languages, hence the list structure. in papers? probably not

  • @not-lain
    @not-lain 3 года назад

    24:59 beautiful cup noises

  • @hectoralarcon4888
    @hectoralarcon4888 4 года назад +10

    I envy the fluency of your python programming. :( I always get stuck during preprocess for a while.

  • @mbappekawani9716
    @mbappekawani9716 3 года назад

    nice data cleanup buddy

  • @cwbh10
    @cwbh10 4 года назад

    Thank you sentdex, very cool.

  • @tarsala1995
    @tarsala1995 4 года назад

    Your machine took over the camera view. Who knows where is this going

  • @sourabhk2373
    @sourabhk2373 4 года назад

    How long will we have to wait for your nnfs book ? Do you plan on having location based discount ie based on Purchasing Power Parity (PPP) ?

  • @Tony-mt4pi
    @Tony-mt4pi 3 года назад

    When I saw that he did not notice the "print(t)" line, I wanted to shout "into" the screen to let him know that.

  • @abcdxx1059
    @abcdxx1059 4 года назад

    this is gold

  • @philipm6652
    @philipm6652 4 года назад

    Great Content you've got there.

    • @sentdex
      @sentdex  4 года назад

      Glad you enjoy it!

  • @Gamegankk
    @Gamegankk 4 года назад

    the forward slash works and it always works

  • @vinayyadav2036
    @vinayyadav2036 4 года назад

    Hii @sentdex
    I really love your content and learned a lot from your videos. Please make a video on Django sessions and cookies, I'm really stuck.

  • @adeeb12321
    @adeeb12321 4 года назад +1

    thank you

  • @mihaisabadac9631
    @mihaisabadac9631 4 года назад

    Great tutorial and good theme also :) For me worked re.findall(r" \d{1,2}\.*\d{1,2} day", sentence). I don't know if someone else wrote some other solution, too many comments :D Thanks sentdex

  • @noctreik
    @noctreik 4 года назад

    Also, with your style of programming, I recommend you to run things in ipython shell and copy/paste fragments of working code in sublime text.

  • @PositronQ
    @PositronQ 4 года назад

    Formula:
    Pf = the probability of infection on the virus
    C = the consequences of the situation
    Dn+1 = C*Pf
    Dn = another_day
    Dn+1 = next_day or actual day
    So Dn+1/Dn= “the porcentage of the increase of days” example: 22/11=2 so 2 is the porcentage of that days in increase but if you want to predict the next day so multiply the actual_day(dn+1 * “the porcentage of increase” in This case 22*2= 44) and this a formula if you want to predict all days of your country or in the world.

  • @anupamchakrawarti1803
    @anupamchakrawarti1803 4 года назад

    Soldier, GOOD JOB !

  • @mohitnagarkoti4086
    @mohitnagarkoti4086 4 года назад

    Hello Sentdex,
    i need some advice, i know Python, Pandas, Seaborn and basic of ML like fitting and Modeling.
    should i go with a project now or should i explore sklearn library ??
    Thankyou

  • @ilyasmax4778
    @ilyasmax4778 4 года назад

    keep this type bro

  • @alexr7530
    @alexr7530 4 года назад +8

    36:56
    I guess in the regular expression you wanted to make a non-capturing group: '(?: ...)'

  • @smyaknti
    @smyaknti 4 года назад +1

    Forward Slashes do work fine on windows.

  • @techystuffs371
    @techystuffs371 2 года назад +1

    It was the coffee mug for me :)

  • @preethamrakshithp3522
    @preethamrakshithp3522 4 года назад

    I started with tf coz of you ! , Can you make a tutorial on tf data it will help a lot ;) , thanks !

  • @kuldeepsingh2983
    @kuldeepsingh2983 3 года назад

    i am in love with shark-coffee

  • @XQzmeeMusic
    @XQzmeeMusic 4 года назад

    That f'ing mug 🤣🤣

  • @thepunisher9270
    @thepunisher9270 4 года назад

    you have some cool mugs.

  • @JustSomeAussie1
    @JustSomeAussie1 4 года назад

    The forward slash in os.listdir(f"{}/{}") definitely works on Windows, i just tested it. (tested with Python 3.6.4)

  • @mayukh_
    @mayukh_ 4 года назад

    I am starting to like your mugs

  • @CodeAbstract
    @CodeAbstract 4 года назад

    @sentdex, What do you think about static languages?

  • @ronit8067
    @ronit8067 4 года назад

    I have a dumb question. can you use like a list of similar words to the target word and bruteforce search the text ? of course it would need some optimization for larger texts.

    • @sentdex
      @sentdex  4 года назад +1

      Try it and find out!

  • @bartekdusza63
    @bartekdusza63 4 года назад

    Hey Sentdex have you been thinking about making a video about MineRL in python?