Text analysis in R. Part 2: Analysis approaches

Text analysis in R. Part 1: Preprocessing

Text Analysis Basics

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

BLACK BAG - Official Trailer [HD] - Only in Theaters March 14

Demetrious Johnson Trains w/ KHABIB & ISLAM MAKHACHEV! | EXCLUSIVE FOOTAGE!

Text analysis in R. Part 1b: Advanced preprocessing

Kasper Welbers

Просмотров 4,8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 31 янв 2025

Комментарии • 4

@ethanjudah8420 Год назад
Hi, I'm trying to do this on reddit data but the files I have are too large (100gb+) for only 3 months of data. That's in .zst. Do you have any suggestions on how to deal with this and apply these techniques on this data set in R?
@kasperwelbers Год назад
If your file is too large to keep in memory, the only option is to work through it in batches or streaming. So the first thing to look into would be whether there is a package in R for importing ZST files that allows you to stream it in or select specific rows/items (so that you can get it in batches). But perhaps the bigger issue here would be that with this much data you really need to focus on fast preprocessing, so that you'll be able to finish your work in the current decade. So first make a plan what type of analysis you want to do, and then figure out which techniques you definitely need for this. Also, consider whether it's possible to run the analysis in multiple steps. Maybe you could first just process the data to filter it on some keywords, or to store it in a searchable database. Then you could do the more heavy NLP lifting only for the documents that require it.
@katebjorklund4650 2 года назад
After I stem, I get a lot of single letter nonwords. Any advice on how to deal with those?
@kasperwelbers 2 года назад ⁺¹
Hi @Kate, that depends. If the words are non-informative you could just delete all single letter words. If the problem is that the words (before stemming) were informative, then perhaps stemming just doesn't work that well for your data (which can depend on the language your working with). For most languages (especially non-english) I would generally recommend using lemmatization if a good model is available for your language.

Следующие

Автовоспроизведение

Text analysis in R. Part 2: Analysis approaches

Text analysis in R. Part 2: Analysis approaches

Text analysis in R. Part 1: Preprocessing

Text analysis in R. Part 1: Preprocessing

Text Analysis Basics

Text Analysis Basics

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

BLACK BAG - Official Trailer [HD] - Only in Theaters March 14

BLACK BAG - Official Trailer [HD] - Only in Theaters March 14

Demetrious Johnson Trains w/ KHABIB & ISLAM MAKHACHEV! | EXCLUSIVE FOOTAGE!

Demetrious Johnson Trains w/ KHABIB & ISLAM MAKHACHEV! | EXCLUSIVE FOOTAGE!

Is WESTERN Or EASTERN Dragon Better in Blox Fruits?! (Which YOU Should Choose!)

Is WESTERN Or EASTERN Dragon Better in Blox Fruits?! (Which YOU Should Choose!)

Emil Hvitfeldt - Text Preprocessing in R

Emil Hvitfeldt - Text Preprocessing in R

Text analysis in R. Demo 2: Sentiment dictionaries

Text analysis in R. Demo 2: Sentiment dictionaries

Learn R in 39 minutes

Learn R in 39 minutes

Web Scrape Text from ANY Website - Web Scraping in R (Part 1)

Web Scrape Text from ANY Website - Web Scraping in R (Part 1)

Why R? 2020 | Ken Benoit - Why you should stop using other text mining packages and embrace quanteda

Why R? 2020 | Ken Benoit - Why you should stop using other text mining packages and embrace quanteda

Topic modeling with R and tidy data principles

Topic modeling with R and tidy data principles

R for Text Analysis - Understanding R and RStudio

R for Text Analysis - Understanding R and RStudio

AI Is Making You An Illiterate Programmer

AI Is Making You An Illiterate Programmer

Игра в Кальмара, но там играет Глупый Парень, 1 серия (анимация в Роблокс)

Игра в Кальмара, но там играет Глупый Парень, 1 серия (анимация в Роблокс)

На фронте передышка перед...?

На фронте передышка перед...?

ИСПЫТАНИЕ МОРОЗОМ: НОЧЕВКА в ЗИМНЕМ ЛЕСУ без ПАЛАТКИ и СПАЛЬНИКА! ВЫЖИТЬ в МЕТЕЛЬ - ЛАГЕРЬ ВЫЖИВШИХ

ИСПЫТАНИЕ МОРОЗОМ: НОЧЕВКА в ЗИМНЕМ ЛЕСУ без ПАЛАТКИ и СПАЛЬНИКА! ВЫЖИТЬ в МЕТЕЛЬ - ЛАГЕРЬ ВЫЖИВШИХ

Как римляне называли легионеров?

Как римляне называли легионеров?

ТВОЙ ДРУГ ПРИШЕЛ В ГОСТИ😂#shorts

ТВОЙ ДРУГ ПРИШЕЛ В ГОСТИ😂#shorts

Ясновидящая предупредила: следующая вспышка перезагрузит всё на планете! Дарья Миронова

Ясновидящая предупредила: следующая вспышка перезагрузит всё на планете! Дарья Миронова

Чем может обернуться плавание с акулой

Чем может обернуться плавание с акулой

Пора отказываться от USB Type-C?

Пора отказываться от USB Type-C?