Timestamps: 00:00 Intro 04:36 Creating our own Datasets 08:29 Creating JSONL for Hugging Face 15:15 Uploading Datasets for Git 19:10 LFS for Large Files 21:56 Closing Notes
I sincerely appreciate this series of videos. There are a lot of tutorials for hugging face but non of them explain how to use our own datasets. They all use the benchmarks and make it useless to apply the models to our own dataset.I would appreciate it if you explain how we can upload our private data file on hugging face rather than the public version that you showed in the video. Because it requires authentication it is really worth explaining. Thank you so much
Hey james! I dont have enough time to watch all of these incredible videos! Can you please advice on the importance of learning Hugging Face? Also that browser is awesome! A friend suggested something similar to that back in the day, but never had much time to look into it. you seem like you use it well for your projects! Can you put together some more shorts on just kind of your set up, what tools you use day today for your job, and maybe snippits of advice on how to stay focused? I have a bit of a shiny object syndrom right now with all the AI stuff coming out.
Please I gave a dataset which contains audio files and a metadata file in CSV format. How to upload all of this and gave in the format of hugging Face datasets? One column for input_id, one for audio, another for transcription or text and normalized text?
Do you know of best practises when it comes to hosting ones own dataset (in a GCP bucket for instance) but in the HF compatible apache arrow format. i.e. private data that can still easily be ingested by HF models without storing it on the Hub.
Timestamps:
00:00 Intro
04:36 Creating our own Datasets
08:29 Creating JSONL for Hugging Face
15:15 Uploading Datasets for Git
19:10 LFS for Large Files
21:56 Closing Notes
I sincerely appreciate this series of videos. There are a lot of tutorials for hugging face but non of them explain how to use our own datasets. They all use the benchmarks and make it useless to apply the models to our own dataset.I would appreciate it if you explain how we can upload our private data file on hugging face rather than the public version that you showed in the video. Because it requires authentication it is really worth explaining. Thank you so much
that's a really good idea, I will see if I can include in upcoming video or just add another quick one on authentication - thankyou!
Thank you very much you helped me massively upload my custom dataset for Fill-Mask task :D
Thanks....jumping to next video :) ...
i am currently trying to upload my dataset to huggingface rightnow you are so helpful
haha great timing
The video is useful. Keep continue brother.
thanks, will do
Can we load a dataset from our private cloud?(This data I don't want to upload to hugging face) I don't find any examples
Hey james! I dont have enough time to watch all of these incredible videos! Can you please advice on the importance of learning Hugging Face?
Also that browser is awesome! A friend suggested something similar to that back in the day, but never had much time to look into it. you seem like you use it well for your projects! Can you put together some more shorts on just kind of your set up, what tools you use day today for your job, and maybe snippits of advice on how to stay focused? I have a bit of a shiny object syndrom right now with all the AI stuff coming out.
please create video on how to create text to image generator model in hugging face
Please I gave a dataset which contains audio files and a metadata file in CSV format. How to upload all of this and gave in the format of hugging Face datasets? One column for input_id, one for audio, another for transcription or text and normalized text?
Hey james ' thank u for ur big effors can you tell me about jobs in platforms online like hugging face or langchain or together ai
Thanks that is great
Do you know of best practises when it comes to hosting ones own dataset (in a GCP bucket for instance) but in the HF compatible apache arrow format. i.e. private data that can still easily be ingested by HF models without storing it on the Hub.
yes will go through some of this in the next video and number 3