First hour with a Kaggle Challenge
HTML-код
- Опубликовано: 20 мар 2020
- Neural Networks from Scratch: nnfs.io
Channel membership: / @sentdex
Discord: / discord
Support the content: pythonprogramming.net/support...
Twitter: / sentdex
Instagram: / sentdex
Facebook: / pythonprogramming.net
Twitch: / sentdex
"keep social distance"
sir know your audience we already are
nope your were at distance when everyone was outside the house
@@abcdxx1059 mjhjnn bjm
@@abcdxx1059 khdw
0:00 intro to dataset
3:18 browsing through files
5:58 loading files into python program
8:43 (cleaning & structuring) getting keys from text and storing into variables
15:07 thinking about “extracting meaning from text” --> NLP. i.e. what are we looking for? --> keywords in papers that are consistent
19:03 looking for “incubation” in text
32:18 using regular expressions
42:08 plotting
44:20 adding rest of files to script
45:21 looking at other kernels on kaggle
Thank you for not editing out the mistake at the 30 minute mark. That makes me feel a lot better about my own silly mistakes.
Heh, happy to keep it realistic :D
@@sentdex Improved approachability was the big benefit when watching you debug in real time. Thank you.
Not to be confused with kegel challenges.
... that's the word ... She whispered to me ...
hah!
So glad you left in that error (printing full text ~30 min mark). It makes me feel so much better :) Thanks for doing this, it is so rad! I was looking at the kaggle competition but I am too much of a noob to know where to start.
Just got started browsing kaggle for future challenges two days ago, great to see a series on this :) excited to see how far you take this!
Your thought process is so clear! Thanks for showing this was really enlightening to watch :)!
Love this channel! So glad I found it, thanks for uploading :).
I liked this "come as you are" format, could have easily been longer..
In fact, I think this is way better for learning. Since you see the actual process. Like George Hotz' coding sessions.
@@pw7225 I love george's sessions. You might like Jon Gjengset too he has a similar style, mostly does rust development.
@@pw7225 geohot is an absolute mad lad! this format is definitely much better than structured sessions. it really captures the trial and error process of programming
@@parkerdinkins5541 same here for quantum computing ruclips.net/channel/UC-2knDbf4kzT3uzWo7iTJyw (disclosure I own the channel xD) and that's exactly the reason i chose the style as well! I find showing the real process is the best way to help ppl learn
Great video Harrison! I really enjoy seeing how you think, it gives the viewer a more accurate picture as to what coding is really like. Keep these videos coming please!
These are so awesome to watch. It really helps to see your logic and thought process. As someone new to trying to process datasets like this, it's great to have confirmation that I'm not doing it some weird, crazy way.
I've been waching your channel for a few months now...the way you are talking, the way you are thinking, your python knowledge and experience is absolutely amazing for me, you're a GENIUS!
dude you have so many good videos, was going through your data analysis playlist yesterday, so today this was perfect thanks dude!
Glad you like them!
Love how you keep the debuging part and small mistakes. Keep doing such great live coding , was looking for such content from a long time.
I can keep watching you programming the whole quarantine time, seriously your videos are so interesting so please do more videos
This was really useful please continue ! You're doing awesome work thanks
Very interesting seeing your thought process working through a new dataset like in this vid. Loved it!
you got my like & subscribe, my dude. wow, this was super useful to watch - i'm a Sr Data Analyst looking to go Data Scientist, so looking to learn from more people like you. also kind of interesting to watch this 1 yr later
Thanks for maiking this video. It would be great if you make more videos like this. Will get a rough idea of how to tackle such big data.
loving these live coding videos where you explore real world datasets. thank you!
😂 “that went so fast” - yeah, you do remember you have a supercomputer right?
I LOVE these types of videos. Please keep them coming! Maybe even show us your googling to find answers to problems like needing a regex refresher.
Given more time, I would have included that. I've including some of my internet searching in the past, seems like people really enjoyed this format of video, so maybe more to come :)
I'm learning to analyze image data right now, and even though you're analyzing text, I found this really helpful. Thanks!
he's just that inspiring
Everyone: What are you working on?
Me: A Covid-19 machine machine learning model
Everyone: How does it help us with Covid-19?
Me: It doesn't
If you're helping to parse through the insanely dense amount of information and research to answer the questions that are being asked, you *are* helping.
Hey, is machine learning squared deeper than deep learning ?
if it keeps you home for hours doing this.. it does help! :p
To get the names of folder I usually just use F2 instead of right clicking and rename. I rename stuff a lot and it surely has been helping. Great video!
Sentdex you are one of the best teacher I ever encountered. Keep Strong, Keep Teaching Us! People like you should be Glorified! Leaving your error in he videos is humble and remembers the mortals where we come from.
Really enjoyed the video. Thanks for making all the mistaies and raising my confidence bar :P
Heh, happy to help
About the cite-fields in the JSON data: When you write a scholarly article and use findings of other articles (e.g. to compare your results with them or to build your study on earlier findings), you cite the original articles. "Because others already found out that rotten fish smells (see 1-4), ..." would contain a cite range from 1 to 4, because the cited articles 1 to 4 have shown that rotten fish smells and you build up on that without investigating it yourself.
In the bottom of the article there is a list of references, where each number is associated with another scholarly article.
This is an excellent series! kudos
this is really nice i think we can apply more re's and logic to get more information as you suggested like cleaning too
Such a helpful video! Thank you!
That's what im talking about a kaggle challenge MAN! big thanks :)) I would like to learn from a pro, how to approach problems and solve them the fastest way possible.
That print(t) made my day .. 😂😂
You are the man sentdex! It's hard to express how much you have inspired me while introducing great concepts. Keep up the great work.
Does anyone know if the scientific community is making strides to standardize raw data and move from PDF-type papers that need more cleanup to interactive IPython-type papers that could store all findings? A move like this seems like it would open the flood gates of open-source hypothesis testing and review. Hosting and publicizing poor analysis could be a problem, but I would appreciate any information and opinions folks have.
Just to answer that very basic question. Make the decimal part optional by grouping it and then make the whole thing a group and you're good to go.
re.findall(r'( \d{1,2}(\.\d{1,2})? day[s]?)', sentence)
Heh, thanks!
Could you please explain why his technique with the parentheses didn't work ?
@@floxire7042 if you just have 1 set of parentheses, you'll find only examples that match the full string u searched for. But only return the part of the match inside the parentheses
The main catch here was that he made that one group by using parentheses. Regex will only output what's inside the parentheses then. So to get the whole number again, you need to make the whole expression another group then.
Putting the first or the second half in parenthesis doesn't matter in this case. I just found it more logical to have always the first two digits, followed by an optional decimal. Other way round works just as well.
Hope that was clear.
@@sentdex Oh ok thanks I didn't know that
You are so swift. It seems you are invincible in coding. I love your walkthrough for the kaggle challenge.
I think the most impressive part of this challenge, is a 50 minute challenge of not mentioning that which shall be demonetized :)
I'm glad you decided to leave the 'print(t)' blooper in the video 😂😂
I’m sure I’m not the first to ask, but please do more videos like this!! The lack of editing is also very helpful. Thank you 👍
Can you please do some more competitions like this?
This is amazing?
Harrison! Well put video! I very much enjoyed it! You are hilarious!
Yo, this video was long due. Thank you.
Thank you so much! I’ve been so intimidated by even approaching a Kaggle problem!!!!
what a Coincidence
i was looking for videos like this
thx sentex
I don't know if others mentioned it before, but you got some cool mugs! I guess you can make a video on that as well!
Thank you so much! I have always wanted to do Kaggle competitions but never really known how to approach them.
Happy to help!
SO happy you left the error in lol, shows that every programmer, no matter the skill level can have stupid little errors like that
Nice and Insightful tutorial.
Wow just took up this challenge a few days ago but hit a wall and didnt know how to proceed. This is a godsend!
i want more! what i am supposed to be watching while in lockdown?
For the incubation day regex you can also do: re.findall(r"(\d{1,2}\.?\d{1,2}) day", sentence)
This will find any integer/decimal number followed by 'day' but only output the number, avoiding the need to split.
very nice. Im glad this knowledge is open source.
Nice one, way to go 👍.
Thanks for the video. Hope you'll continue the rubric
Hey thanks for the video. Abstract is a list because each paragraph is a text item. Look for the schema file included in the directory, that might help you. Cheers
Awesome job sir!
You are the best sentdex!!!
Hello. Great to see your workflow. I think median is a better measure of finding the average rather than mean in your case because you are looking at a population of incubation times.
22:31 hahaha. Great video thank you!
Always great videos! Why don’t you use spacy for nlp? It takes a lot of the re out of the way!
print( " Really loving this bro, can you please continue it like for other kaggle challenges too" )
I could try some others like this, sure
I really like how your operating system looks and the text editor 😍👌
Definitely look into glob from the standard library. Much easier than nested for loops to pick up files recursively in a folder structure with a pattern like *.json
Super informative, thanks for posting.
Two thoughts:
(1) Get VS code or Spyder/Anaconda; I use Spyder for general purpose Python stuff--the iPython integration is the best I've found. VS Code is potentially better still (depending on preferences), as it provides access to a terminal as well (though the iPython implementation is run through Jupyter and pretty janky).
(2) Re the regex stuff, no shame in googling. There isn't a SINGLE programmer of any sort that doesn't need to google at times. I've been a professional for over a decade and regularly need to google syntax on basic methods, etc. Regex is a much more involved beast, and unless you use it daily I'd be amazed if you remembered syntax needed for specific applications.
Man... Huge fan of u bro
new Fave channel
you are simply the best!
38:13
I think this works:
single_day = re.findall(r" (?:\d{1,2}[.])?\d{1,2} [D,d]ay", sentence)
Thanks for the videos and keep this good work. Learning a lot from you.
Hey man , thanks for the video ,great as usual , if you have some time to apply some NLP on the same data would be perfect :D
in university theses you can have abstracts in multiple languages, hence the list structure. in papers? probably not
24:59 beautiful cup noises
I envy the fluency of your python programming. :( I always get stuck during preprocess for a while.
nice data cleanup buddy
Thank you sentdex, very cool.
Your machine took over the camera view. Who knows where is this going
How long will we have to wait for your nnfs book ? Do you plan on having location based discount ie based on Purchasing Power Parity (PPP) ?
When I saw that he did not notice the "print(t)" line, I wanted to shout "into" the screen to let him know that.
this is gold
Great Content you've got there.
Glad you enjoy it!
the forward slash works and it always works
Hii @sentdex
I really love your content and learned a lot from your videos. Please make a video on Django sessions and cookies, I'm really stuck.
thank you
Great tutorial and good theme also :) For me worked re.findall(r" \d{1,2}\.*\d{1,2} day", sentence). I don't know if someone else wrote some other solution, too many comments :D Thanks sentdex
Also, with your style of programming, I recommend you to run things in ipython shell and copy/paste fragments of working code in sublime text.
Formula:
Pf = the probability of infection on the virus
C = the consequences of the situation
Dn+1 = C*Pf
Dn = another_day
Dn+1 = next_day or actual day
So Dn+1/Dn= “the porcentage of the increase of days” example: 22/11=2 so 2 is the porcentage of that days in increase but if you want to predict the next day so multiply the actual_day(dn+1 * “the porcentage of increase” in This case 22*2= 44) and this a formula if you want to predict all days of your country or in the world.
Soldier, GOOD JOB !
Hello Sentdex,
i need some advice, i know Python, Pandas, Seaborn and basic of ML like fitting and Modeling.
should i go with a project now or should i explore sklearn library ??
Thankyou
keep this type bro
36:56
I guess in the regular expression you wanted to make a non-capturing group: '(?: ...)'
Forward Slashes do work fine on windows.
It was the coffee mug for me :)
I started with tf coz of you ! , Can you make a tutorial on tf data it will help a lot ;) , thanks !
i am in love with shark-coffee
That f'ing mug 🤣🤣
you have some cool mugs.
The forward slash in os.listdir(f"{}/{}") definitely works on Windows, i just tested it. (tested with Python 3.6.4)
I am starting to like your mugs
@sentdex, What do you think about static languages?
I have a dumb question. can you use like a list of similar words to the target word and bruteforce search the text ? of course it would need some optimization for larger texts.
Try it and find out!
Hey Sentdex have you been thinking about making a video about MineRL in python?