Fuzzy String Matching in R | Overview and R Tutorial (Using fuzzywuzzy, polyfuzz, and difflib)

Diagnose, Explore and Repair your data in #R quick {dlookr}

Access & Collect Data with APIs in R (Example) | Ft. Kirby White | JSON File, Key & Create Shiny App

Engineers vs Extreme Hide & Seek

Blox Fruits ALL Changes in Dragon Rework Update

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

Fuzzy Matching in R (Example) | Approximate String, Name & Text Search | adist(), agrep() & amatch()

Statistics Globe

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 25 янв 2025

Комментарии • 28

@1453angela 7 месяцев назад ⁺¹
Hello! If I want to do an exact match and a fuzzy match at the same time how can I do it? 🥺
@StatisticsGlobe 7 месяцев назад
Hey, I'm not sure if I understand your question. How would this work theoretically?
@jennykim5795 2 года назад ⁺²
Hi, excellent video!!!! What is the default method for measuring distance for function "stringdist" here? Since you didn't set the method, I was curious.
@StatisticsGlobe 2 года назад
Hey Jenny, thank you very much for the kind feedback, glad you like the video! The default method of stringdist is oas. You can find more info on this here: www.rdocumentation.org/packages/stringdist/versions/0.9.8/topics/stringdist Regards, Joachim
@loancet1878 2 года назад ⁺¹
Thank you su much ! What you explain is exactly what I was looking for to deal with my data !
@StatisticsGlobe 2 года назад
This is great to hear Loancet!
@tildawilson1198 2 года назад ⁺¹
How are you viewing the actual values ([1] "Bill Clintion" "Barack Obama") rather than just the numbers ([1] 5 3) in this? I see you switch back and forth a bunch of times but I'm not sure how you're doing that.
@cansustatisticsglobe 2 года назад
Hello Tilda,
You can use the value=TRUE argument in the use of agrep() function. It would give you the exact values or use the amatch() in square brackets to identify the index positions in the pres_df data frame. The script is given below the video. You should click on show more to see it.
Regards,
Cansu
@haraldurkarlsson1147 2 года назад ⁺¹
Use case.
I may have an actual use case for this. In my courses I give so-called fill-in-the-blank(s) questions. Students frequently misspell words in the most inventive ways possible (not by design of course) and I am pretty flexible in terms of giving full credit for "near misses". however, sometimes I wonder "How close is that answer actually?" This lesson gives me some ideas of how that may be accomplished by calculating the LD distance. The course management program I use (Blackboard) is not nearly good enough to do this however by itself. I would have generate versions I accept based on LD and feed those versions to Blackboard myself. I thought I would share these thoughts of mine.
Thanks for the wonderful videos.
@StatisticsGlobe 2 года назад ⁺¹
Thank you very much for sharing this use case Haraldur! Indeed, this should be a good example where fuzzy matching is useful.
@haraldurkarlsson1147 3 года назад ⁺¹
Excellent video. Very interesting stuff!
I do have a request or suggestion. Kerby could you do a video or a series of video on NLP (Natural Language Processing)? It seems to be a field that is gaining steam. My son is a layer and a data scientist who studies NLP for legal docs and I would love to know what he does for a living.
@StatisticsGlobe 3 года назад
Thanks for the kind words and the great suggestion Haraldur! I'll forward it to Kirby. Regards, Joachim
@manny1manito2 2 года назад ⁺²
this is great, would fuzzy_join work with dates?
@StatisticsGlobe 2 года назад ⁺¹
Thank you! I have never done this myself, but this Stack Overflow thread seems to discuss your question: stackoverflow.com/questions/58718287/fuzzyjoin-with-dates-in-r
@robertjl5619 2 года назад ⁺³
Awesome tutorial. Levenstein distance still doesn't beat speed of fuzzyLookup in excel which is a shame. Neither does fuzzy join package. Frustrating bottleneck for automation but the performance is unquestionable. Tokenized jaccard in fuzzyLookup in excel still the king.
@StatisticsGlobe 2 года назад ⁺¹
Hey Robert, thanks a lot for the kind words and the additional info!
@robertjl5619 2 года назад ⁺¹
@@StatisticsGlobe love your vids bud and your no bullshit approach. keep it up!
@StatisticsGlobe 2 года назад ⁺¹
Thanks mate! :)
@paulboutros6093 2 года назад ⁺¹
What do you suggest for a large data? (About 600,000)
@StatisticsGlobe 2 года назад
Hey Paul, have you tried the code of this video? Did you get any error messages?
@andrea-mj9ce 2 года назад ⁺¹
So _amatch_ is the most general function here for fuzzy matching
@StatisticsGlobe 2 года назад
Hey Andrea, sorry for the delayed response, I was on vacation and couldn't reply earlier. Could you please explain your comment in some more detail? I'm afraid I don't get it :) Regards, Joachim
@jelly3388 Год назад ⁺¹
amazing!
@matthias.statisticsglobe Год назад
Hey Jelly, thanks for the positive feedback! Glad you like the video!
@michaelmartin2367 Месяц назад ⁺¹
Please put audio on both sides ^^ its so annoying to listen to on headphones
@StatisticsGlobe 22 дня назад
Thanks for the feedback! Actually, it should be on both sides. Not sure why you cannot hear it.
@Tommygun0110 Год назад ⁺¹
nice
@matthias.statisticsglobe Год назад
Hi Olphy, thanks for the comment! Glad you like it!

Следующие

Автовоспроизведение

Fuzzy String Matching in R | Overview and R Tutorial (Using fuzzywuzzy, polyfuzz, and difflib)

Fuzzy String Matching in R | Overview and R Tutorial (Using fuzzywuzzy, polyfuzz, and difflib)

Diagnose, Explore and Repair your data in #R quick {dlookr}

Diagnose, Explore and Repair your data in #R quick {dlookr}

Access & Collect Data with APIs in R (Example) | Ft. Kirby White | JSON File, Key & Create Shiny App

Access & Collect Data with APIs in R (Example) | Ft. Kirby White | JSON File, Key & Create Shiny App

Engineers vs Extreme Hide & Seek

Engineers vs Extreme Hide & Seek

Blox Fruits ALL Changes in Dragon Rework Update

Blox Fruits ALL Changes in Dragon Rework Update

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

Pachuca (MEX) vs Al Ahly (EGY) Penalty Shootout | Intercontinental Cup | 12/14/2024 | beIN SPORTS

Pachuca (MEX) vs Al Ahly (EGY) Penalty Shootout | Intercontinental Cup | 12/14/2024 | beIN SPORTS

What's in a Name? Fast Fuzzy String Matching - Seth Verrinder & Kyle Putnam - Midwest.io 2015

What's in a Name? Fast Fuzzy String Matching - Seth Verrinder & Kyle Putnam - Midwest.io 2015

Tidyverse in R - tips & tricks

Tidyverse in R - tips & tricks

BM25 : The Most Important Text Metric in Data Science

BM25 : The Most Important Text Metric in Data Science

A New Indexing Technique for Quickly Fuzzy Matching Entire Dataset Records

A New Indexing Technique for Quickly Fuzzy Matching Entire Dataset Records

Creating an advanced regular expression in R with str_replace and separate (CC184)

Creating an advanced regular expression in R with str_replace and separate (CC184)

Fuzzy String Matching in Python

Fuzzy String Matching in Python

All Rust string types explained

All Rust string types explained

Complex Fibonacci Numbers?

Complex Fibonacci Numbers?

Tutorial: Create and Customize a Simple R Shiny Dashboard

Tutorial: Create and Customize a Simple R Shiny Dashboard

No more worries about freezing hands and feet when going out in winter!

No more worries about freezing hands and feet when going out in winter!

Мы больше не подруги🤣

Мы больше не подруги🤣

Студия Wylsacom

Студия Wylsacom

Avocados from Mexicoooo 😁

Avocados from Mexicoooo 😁

Злой БЛИППИ Преследует Нас в Развлекательном Парке !

Злой БЛИППИ Преследует Нас в Развлекательном Парке !

😳 Купил китайский кроссовер, но не ожидал такой "сюрприз" на утро! | Новостничок

😳 Купил китайский кроссовер, но не ожидал такой "сюрприз" на утро! | Новостничок

Создал Пилу в Dota 2 Победитель Получает 100.000 Рублей !

Создал Пилу в Dota 2 Победитель Получает 100.000 Рублей !

Every Minute One Person Is Eliminated

Every Minute One Person Is Eliminated