How to write a simple regular expression in R using sub and str_replace (CC183)

Riffomonas Project

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 янв 2025

Комментарии • 37

@tstratton1 2 года назад ⁺²
I think these are my favorite youtube videos of all time. Even if I already know what he's doing.
@Riffomonas 2 года назад
Lol thanks! I’ll do another regular expression video later in the week. Let me know if you have ideas on other things you’d like to see
@dariushghasemi6476 2 года назад
@@Riffomonas I'm so enthusiast to dynamic programming, e.g. running multiple linear regression for a bunch of features, or even running mediation analysis! Thanks Patrick :)
@jameswhitaker4357 Год назад
This is easily the best explanation of regex. Idk how I’ve made it this far without really utilizing them, but some new projects are looking like I’m going to have to use it. You’re a godsend
@sven9r 2 года назад ⁺³
Today I struggled with regex and now this video. This channel is so much underrated. Thank you so much Pat.
@Riffomonas 2 года назад ⁺¹
Wonderful! I’ll have another regex episode on Thursday. Let me know if you have any regex-related questions and we can keep them going 😊
@sven9r 2 года назад
@@Riffomonas You already helped me so much with the matrix problem the other day! Today I nested 350 matrices with the help you provided in under 4 hours. Such a good feeling
@alexw5126 3 месяца назад
Great teacher, one of the things I could never get my head around, simlar to the taxonomy levels :D Thank you!
@Riffomonas 3 месяца назад
Wonderful - thanks for watching! 🤓
2 года назад ⁺²
Excellent! Regular expressions was always a difficult topic to implement in R. Thanks for the video!
@Riffomonas 2 года назад
My pleasure! There will be another regular expression video out later this week. Let me know if there’s anything else you’d like to learn about regular expressions
@jyotikataria129 Год назад
@@Riffomonas where is this datatable? Im not able to download it.
@haraldurkarlsson1147 2 года назад ⁺²
Speaking of naming samples. NASA has a great system for naming meteorite samples. For instance the sample "ALH84001". This sample was collected in Antartica in the Allan Hills region during a collecting mission in 1984 (hence the 84). It was the first sample collected (hence the 001). This is a pretty famous meteorite since it is of Martian origin and NASA scientists thought at one point that they had discovered fossil bacteria in it.
@Riffomonas 2 года назад
Very cool! I remember reading the “bacterial fossil” paper in a journal club.
@haraldurkarlsson1147 2 года назад
@@Riffomonas I worked on water in Martian meteorites as NRC fellow at NASA for a couple years. The "big" discovery came the year after I left and my sponsor at NASA was one of the authors. Talk about timing...
@mitchdobbs6296 2 года назад ⁺¹
Pat this is awesome -- I'm just getting to work on regular expressions and this video was the next puzzle piece for me . You rock!
@Riffomonas 2 года назад ⁺¹
Fantastic! Thanks Mitch. Thursday will have another regex episode with some more advanced concepts. Let me know if there’s anything you’re wondering about and maybe we could keep it going 😊
@mitchdobbs6296 2 года назад
@@Riffomonas Heck yeah - can’t wait!
@ErionMaxhari 2 года назад ⁺¹
Excellent job. In fact you can use substr to extract fixed length chars. Especially useful for extracting female or male. It's always the first char
@Riffomonas 2 года назад
Thanks!
@roymccormick5328 2 года назад ⁺¹
sooooo super helpful for what I was stuck with today thx😎
@Riffomonas 2 года назад
Wonderful! Glad it was helpful 🤓
@tlange5091 2 года назад ⁺¹
This is really, really helpful! Thank you
@Riffomonas 2 года назад
Thanks for watching!
@ahmed007Jaber 2 года назад ⁺¹
Great topic. Any tips/ resources to grab a certian text eg xxxx-xxx in string content? Just to grab this pattern and ignore anything else?
@Riffomonas 2 года назад
Wrap the text you want in parentheses and put .* on both sides and then use \\1 as the replacement value
@CoachPegasus 2 года назад ⁺¹
In date column ,
I need to change ' 04-04-2020' to ' 04/04/2020' ,
then I need to convert to datetime.
i did it with stringr
after printing it shows all NAN.
@Riffomonas 2 года назад
With the 04-04-2020 format try using the mdy or dmy functions from lubridate depending on if it’s month-day or day-month
@jamesleleji6984 2 года назад
How do you find and replace a string in different column names
@elcheff 2 года назад ⁺¹
thank you very much
@Riffomonas 2 года назад
My pleasure - thanks for watching!
@spencermartin4846 2 года назад ⁺¹
And suddenly it makes sense
@Riffomonas 2 года назад
🤓
@russtin1 2 года назад ⁺³
Regex is great, but you can really pull your hair out trying to figure it out
@Riffomonas 2 года назад
Totally! 😂
@AndreaDalseno 2 года назад ⁺¹
Hi, actually I'm much more comfortable with Python, but I like to improve my skills with R (the other side of the moon) and your videos are simply awesome.
This is how I would solve the task in python (where df is a Dataframe and after having imported pandas as pd and re):
(pd.DataFrame(df["samples"]
.map(lambda y : re.match('(\w)(\d+)(\w)(\d+)',y).group(1,2,4))
.to_list(),
columns=['gender', 'sample_n', 'day'],
index=df["samples"])
.assign(gender = lambda x : x["gender"].map({'F':'Female', 'M':'Male'}),
sample_n = lambda x : x["sample_n"].astype(int),
day = lambda x: x["day"].astype(int)))
While in R I would do ("translate into") something like this:
dist_tbl %>%
select("samples")%>%
mutate(as_tibble(str_match(matrix(unlist(samples)), "(?\\w)(?\\d*)(?\\w)(?\\d*)")[,c(2,3,5)]))%>%
mutate(day = as.integer(day))%>%
mutate(sample = as.integer(sample))%>%
mutate(gender=ifelse(gender == "F", "Female", "Male"))
It should work pretty fine. The good parte of Regular Expressions is that you can solve complex tasks in just one command.
PS group matching in python starts from 0 while in R starts from 1 so the numbers in group selection are different. In R I gave a name to each group, while in python it's easier to name the columns since the names, AFAIK, are not returned but only used inside RE.
@Riffomonas 2 года назад ⁺¹
Thanks for watching - I’ll talk about groups in the next episode!

Следующие

Автовоспроизведение

An alternative to Rstudio? Make your own R IDE: integrated development environment (CC182)