#5 Read and process multiple text files in Python

Data Skills for Everyone

Просмотров 24 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 дек 2024

Комментарии •

@hopelopez83 3 года назад ⁺²
How do you generate a comparison report of multiple files and reading each line of all the text files?
@dataskillsforeveryone8205 3 года назад
I am not exactly sure what you mean with the comparison report, but I see it can be as simple as maybe comparing the length of content of each file to something more sophisticated, in which case you may need some advanced python packages. But for the simple case of say computing some statistics on the content of each file and reporting, you coud use a list comprehension in combination with some functions to achieve that
@pradnyakasar614 2 года назад ⁺¹
How to find out the unique word count of multiple text files at same time?
@1UniverseGames 2 года назад ⁺²
Can you add one thing here, like we read .txt file from a directory, now, I run each .txt file and save it to a RESULT_Dir folder? how can I do that, like Input Dir of txt-> our code then read each .txt file one after another for 2 minutes and then save the results of scanning or parsing of .txt file to a different directory and each resultant txt file will be different here as each file contain different information, how can I do it? but now I learn get input file and read it, but I wanna learn next part can you help please
@dataskillsforeveryone8205 2 года назад
Yes, that should work the same way. You can begin by setting the output folder just like we did for the input directory called input_folder in the code. Then, within the same loop, you are reading the individual files, you can do some processing with the data and then write it out to the destination folder. Or you can to chose to carry out some bulk operation with all the files and then later write out everything in bulk to the disk. Many ways to go about it
@saud5133 2 года назад ⁺¹
Hello...
How to Read a Binary File Like Metastock Files Using Python
@smurfk7678 7 месяцев назад
How to interpolate multiple.nc files in python Jupytrr notebook and get a single file in .nc format? Currect resolution of my gridded data is 0.25*0.25 degree i want to interpolate it upto 0.01*0.01 degree. Can you help me by code?
@32334694 2 года назад ⁺¹
Great video sir thanks!
So I have 2 folders of IMDb reviews negative and positive reviews,
I labelled them 0,1 resp.
Stored it in a list of tuple seperately. Added the 2 lists. Random.shuffle to mix it up.
Split it into train and test at 75/25 ratio.
My question is how do I read the train set txt files and create a vocab using word_tokenize. And then tfidf of that vocab? Thanks in advance.
@dataskillsforeveryone8205 2 года назад ⁺¹
It would be great if you can share your code so that it's more concrete about what you are trying to do. It seems to me that you have already read these files into memory. I would like to see it a bit more clearly what you have done up to the point where you are using word_tokenize
@hosamho3119 Год назад
طططط
@olumidebenjami4 2 года назад
You are a life saver
@rohitjangra6939 2 года назад
Returning a blank list. While reading the json from a folder.
@dataskillsforeveryone8205 2 года назад
Hi Rohit, it's not very clear what you mean. If you can provide more information, that would be great
@syedrehman5818 2 года назад
how can i preprocess (nltk, stopwords, tokenization) all text files in a folder (database)?
@dataskillsforeveryone8205 2 года назад
Dear Syed, thanks for reaching out. I would love a bit more clarification. But depending on what you are trying to achieve, you might either read in your files and process them one by one and join the results at the end, or read and join all the files and then process them together for a single output. It depends
@SHASHANKRUSTAGII 2 года назад
How to get away with this error?
'utf-8' codec can't decode byte 0xc9 in position 9: invalid continuation byte
@dataskillsforeveryone8205 2 года назад ⁺¹
The error suggests that you are using the utf-8 encoding to open a file with a different encoding. This may call for a trial and error approach using different encoding in the open method. But if you know the encoding of the document, then used that straight away in the call to open method.
The other option is to open the file in binary mode and then read it in. If you are reading the file with pandas, you could also ignore this error and read in the rest of the data
@SHASHANKRUSTAGII 2 года назад
@@dataskillsforeveryone8205 Thank you. It helped me.
@gopikishan1028 2 года назад
thank you it worked for me ...
@dataskillsforeveryone8205 2 года назад
You're welcome!
@ganeshsrivatsakalahasti6813 3 года назад ⁺¹
Hi sir. I've few doubts regarding few concepts. Would kindly request you to help me. How may I reach you out please ?
@dataskillsforeveryone8205 3 года назад
Hi Sir, thank you linking up. I would be glad to hear your question and provide any help that I can. I'm not sure yet how I can connect with you privately on RUclips. If you know how, I'll be glad to learn :)
@ajaxx627 3 года назад
Please I have a problem with some work.
I was given a list of words let’s say about 200 different words. And I’m meant to create a code that generates 3 random words each together.
Eg wordlist=[a, b, c, d, e,................z]
Output should be = a, d, z
c, o, x
And so on
Please how do I do it?
@dataskillsforeveryone8205 3 года назад ⁺²
If you are interested in selecting 3 random words until all words are used up,
here is one approach using a python list:
1. use random.sample to randomly pick 3 words (but words will still be in the original list after picking)
2. remove the words that have been picked from the original list
3. add the picked words to another list just so you won't repeat their selection.
Something like this
4. keep the procedure repeated until you have exhausted the original list
words = [] # add your words here
num_words = len(words)
num_to_pick = 3
sampled = []
indices = list(range(num_words))
for i in range(int(num_words/num_to_pick)):
if num_to_pick < len(indices):
random_indices = random.sample(indices,k=num_to_pick)
else:
random_indices = indices
sampled.extend([words[j] for j in random_indices])

selected_words= [words[i] for i in random_indices]
for r in random_indices:
indices.remove(r)
print(selected_words)
@ajaxx627 3 года назад ⁺¹
@@dataskillsforeveryone8205 Tysm 😊 I already did it tho but I appreciate. God bless 💚
@DoUKnowMee 2 года назад
for me its just reading 1 random file in the folder
@hosamho3119 Год назад
حط

Следующие

Автовоспроизведение

#6 Save multiple text files to disk in Python