R Tutorial | Regular Expressions in R
HTML-код
- Опубликовано: 25 янв 2025
- Today we talked about regular expressions: what they are and how they are useful. We used the stringr package to do this, but the ideas are the same in base R and are similar in other languages including python, javascript, etc. Leave your questions in the comments!
All code available here: github.com/col...
This is absolutely the best intro to Regular Expressions in R video on RUclips
Thanks for the kind words, glad it was helpful!
I can only say you are the best. Even an old man like myself (61) can now understand the regex basics. I am and old engineer trying to catch up with the new generation.
Thanks for the kind comment, glad you enjoyed the video!
I have a midterm today and you explained the * symbol really well and understanding. Thank you!!
Glad to help!
where regular expressions really shine ✨😂 bright like a 💎
Joke aside, great video 🙌
Thank you - that was outstanding. Best explanation of regex I have seen. The way you "built it up" from basically nothing to the phone extractor was a great explanation.
Glad it was so helpful!
This is the first video I came across which actually explained the regular expressions the way I wanted to learn, thanks a ton buddy! lifesaver video ^_^
Glad to hear! Thanks for the nice comment
This is a wonderfully explained tutorial. well paced and well explained. Thank you!
Very well done, Colin. Comprehensive and fun tutorial
Thanks Colin. It's great to see the regex gradually being built up. Really helpful
Fantastic video, Colin! I literally had no clue about regex one hour ago, now I sort of have a basic understanding on how to use these tools. It would be awesome to do a follow up video applying these tools in a more complex setting. Thanks for your work!
Thanks for watching! Hoping to start these back up again soon
Thanks! This is really useful and well explained.
Glad it helped!
Nice video, Colin. You explain things very clearly. Keep up the great work
Thanks GREAT tutorial for Regular Expressions.
Thank you!!!!! I used this vid as a supplementary while I'm reading the stringr chapter in "R for Data Science, tidy verse" book, really helped me
It's a great book! Highly recommend it for anyone learning R.
I think we need more of this! Absolutely distilled it down to the basics. Thank you so much! Again, we need more of these tutorials haha :)
I would love to do more when I have a bit more time. Thanks for the support!
This is fantastic. The only question I have after this is taking it further to the sentence level of strings. So, applying str_match_all() on a sentence to extract strings that contain part of a string, and limit the extraction to the word level in a text mining approach. A demonstration of this would be useful. I plan on using tokens() to make this simpler for me and the data I'm currently working with, but I'd enjoy a follow up. Great video, following your channel for more.
Thanks, not a bad idea. Hopefully when I have some more time I could do a scraping demo, that would be very interesting!
Thank you for existing
Many thanks, Colin. Excellent tutorial
Thanks for this great tutorial. Please keep doing this. First 20 minutes in, and I really like that you talked about a potential error. Also, one question: how are you jumping to numbers in the strings in RStudio (looking for the keyboard shortcut you're using)?
Thanks! I wish I was that good at keyboard shortcuts, for some reason OBS doesn't want to capture my cursor. It's pretty confusing, so I'll try to fix it for next stream.
Great tutorial, thanks a lot!
Great tutorial. You are gifted 😄
Thank you for a great tutorial
Thanks! what about if you had characters instead of numbers?
I highly recommend going through the tutorials here if you still have some confusion:
regexone.com/
@@colinquirkDS Hey Colin, this is a shot in the dark. I have been trying to extract the following pattern into a separate col for the attribute major. There are some repeating major strings and I cant seem to figure out how to set up the regrex to also extract both characters with angle brackets { }. and for some reason my pattern also pulls minor.... Any enlightenment would be much appreciated.
test %
mutate(major = str_extract_all(test$lith, "[major].*[{](\\D[a-z]*)[}]") %>%
map_chr(toString))
This has been so helpful, thanks alot
Thanks for watching!
Amazing!
Amazing. Get this man some more views.
Thank you very much Collin, great tutorial.
Just one question: After I found succesfully some strings with the regex expretions, How could I include in the expretion the following 3 OR 4 words?
I've got the expresion LEY\\sN°\\s(\\d{3,4})\\sDE\\s(\\d[1,2])\\sDE\\s(\\w{4,11})\\sde\\s(\\d{4}), which matches LEY N° 2371 DE 22 DE MAYO DE 2002, but then follows a name, (that consists in 3 or 4 words).
Thanks in advance for your time, keep helping people
If you check out regex101.com or any other similar site, you can play around more deeply, but something like this might work for you?
^(\w+\s){2,3}\w+$
Read as "find at least one word character followed by a space 2 or 3 times, and then find at least one more word character"
You will have to work this into your full regex of course but that is the first thing that comes to mind. Good luck!
@@colinquirkDS Thanks a lot! I will try it later, but so far seems like what I need!
Really helpful, thanks mate!
Question: Great video! I need to extract "Math & Science" from a column. I try and it gives me: unused argument error. There are words in front and behind "Math & Science" I tried ",*Math & Sciene.*" but I received an error for that too.
Can you put your entire line of code in a comment?
@@colinquirkDS str_match(kw_06$testdiv, ".*Mathematical & Physical.*") I was able to extract the values, now I need to add a separator after this pattern to split the column
Thanks Collin: How would one go about splitting the following based on 2 decimal points: for example 18.00-1.10 split to 18.00 and -1.10 another example 400.000.00 split to 400.00 and 00.00
Something like this should work for you:
(.*?\..{2})(.*)
Play around with it in a regex tester, but you can read it as "for the first group, match anything up until the first decimal, then get the next two characters. For the second group, get everything else."
very helpful. thank you so much
can you please share the code in GitHub and give the link in description
Done!
github.com/colinquirk/LivestreamCode/blob/master/2020-08-12/stringr.Rmd
Thank you so much!!
Thanks for this
excelent tutorial , you should be a teacher
awesome!
new sub
I LOVE YOU BRO
Thank you
Hi Colin, if you don't mind, please share your email address , I need to contact you. my email aassenga@ihi.or.tz . Thanks
You have to be careful naming your variables. letters already exists as lower case letters of the English alphabet.
Terrific content. Hope you don't mind if I add your channel to my Awesome R Learning Resources list on GitHub? If you'd like to contribute any resources of your own, please open a pull request! We would love to have your input.
github.com/iamericfletcher/r-learning-resources
Glad you like it! Please do share it around!