Data - Deep Learning and Neural Networks with Python and Pytorch p.2

sentdex

Просмотров 194 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 окт 2024

Комментарии • 255

@jphoward86 5 лет назад ⁺¹⁰¹
DataLoader actually returns a LIST of x tensors and y tensors, not a tensor of tensors (this would be impossible unless the x and y dimensions were the same, of course).
@sentdex 5 лет назад ⁺²²
Thanks for the correction!
@VictorRodriguez-zp2do 4 года назад ⁺¹
Actually it is not a list it is a DataLoader object. It's more like a generator.
@VictorRodriguez-zp2do 4 года назад
though, if you meant the training examples then you are right those are lists.
@judedavis92 3 года назад ⁺⁷
This is why you’re amazing. You don’t just teach the framework. You teach the logic, the intuition, the best practices, and the philosophy behind deep learning. Thanks.
@jasonmomo7447 4 года назад ⁺⁴⁵
Your 30 min videos feels like 5 min while my 1 hr lecture feels like 3 hrs of pure boredom.
@Proprogrammer001 4 года назад ⁺²⁸
"You came back!"
Yeah, no shit, you beautiful human
@aysesalihasunar9563 4 года назад ⁺²
I am gonna use deep learning and pytorch for an NLP-related project that I am in. I have been looking for tutorials and there are thousands of them available out there! After I got bored with two of them, gave a try to yours, liked it very much! I didn't get bored, you explained details very well with fine wording. I even *finally* started using jupyter thanks to this tutorial. Now, I am going to the third episode!
@aryanbhatia6992 5 лет назад ⁺²⁸
Loving this series. And please do a neural network from scratch series after this
@judedavis92 3 года назад
It’s like you can tell the future
@gass3d 5 лет назад ⁺²⁶
Instead of manually typing out the dictionary for counter_dict you could also do:
counter_dict = {x:0 for x in range(10)}
@maxhansen5166 3 года назад
or even more sexy, use a defaultdict
@krishnam5919 4 года назад ⁺²
Thanks a lot for posting the video series in an easy to understand manner with lots of explanation! Many thanks.
@kadaliakshay6770 Год назад
I wanted to express my gratitude for sharing that incredibly insightful and valuable deep learning tutorial on neural networks. It has been a game-changer in my understanding of the topic, and I truly appreciate your informal guidance. Thank you!
@theplayingofgames 4 года назад ⁺⁴
The 'Base 8' as you say is because a byte has 8 bits of data. When working directly with hardware (particularly micro controllers) the most efficient way to execute or parse data is dependant on the processor itself though it will always be in multiples of 8. If that makes sense.... p.s. love the channel!!!
@koushiksahu68 5 лет назад ⁺⁶
I was taking the fastai deep learning course in which Jeremy Howard uses fastai library which is built on pytorch. Your tutorials are really helping me to get a better understanding of the core pytorch stuff that was used in that course. Thank you😄
@anneest 4 года назад ⁺¹
Doing the same here! :-)
@diogoverde4458 5 лет назад ⁺¹³
One video a day? This is madness!
I love it
@wayfaring.stranger Год назад
having a million subs is cool.
having a million subs who are following you to learn technical & niche content is very cool.
@Stinosko 5 лет назад ⁺⁵³
Of course i came back! 🤩
@TheBddan013 4 года назад ⁺¹
same :D
@arindam96 3 года назад
Just finished all the 8 videos in this playlist. Loved it. Hope you make more of these pytorch videos.
@RedShipsofSpainAgain 5 лет назад
Great vid Harrison thanks. Regarding choosing batch size, yes typically the larger the batch size, the faster your model converges. But setting your batch size too large will result in an out_of_memory error. So it's important to calculate the largest batch size possible without exceeding your CUDA memory limit.
But one thing I've never gotten a straight answer is this:
"What's the largest batch size you can choose, as a function of
1) your particular GPU memory size,
2) the size (MB, KB, etc) of your training examples (maybe GBs if training very large images), and
3) the size of your model parameters
all three of these must fit inside your GPU memory. So in theory there should be a formula to calculate the largest possible batch size. Something like GPU_mem_size - model_weights_params_size = remaining memory for training samples. Then take that value and divide it by the typical size of a single training sample (image file or whatever). The result of this division, call it n, is theoretically the largest batch size you can fit in your GPU. Then you would probably round down to the nearest multiple of 8.
I'm probably leaving something out of this equation, but that's the general idea.
Any thoughts on a straightforward approach to calculating this theoretical max batch size? Thanks Harrison (or anyone else who happens to know the answer)
@gedance 4 года назад ⁺¹
I am totally new to Deep Learning and Pytorch. Your explanations are awesome, I understand better now! :D Thanks so much!!
@rbaleksandar 2 года назад ⁺²
You can display every image one by one (later on you will probably use a grid or something) like this:
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=True)
batch_idx = 0
for data in trainset:
print('Training batch:', batch_idx)
batch_images = data[0]
batch_labels = data[1]
for image, label in zip(batch_images, batch_labels):
plt.imshow(image.view(28, 28))
print('Image label:', label)
plt.show()
batch_idx += 1
Just like with OpenCV showing a plot using PyPlot leads to a pause in the execution of the program until the window with the image inside is closed.
In addition to save typing for future viewers use dict.fromkeys(range(0,10), 0) to create the train_counter_dict. ;)
@olee_7277 4 года назад
Man thanks for all of this. Usually I can't follow tutorials but with you its so easy. Keep up the good work :)
@jeff4877 5 лет назад
I understand the purpose of mnist, i am personally tired of them, but I am glad that you take this approach then move on to more complex models, you are awesome, thanks!!!
@RajatBhatt999 5 лет назад ⁺⁶
As Harrison mentioned at 25:29 If anyone is wondering how to create counter_dict using Counter, then here is the code.
from collections import Counter
ys = [x[1] for x in train]
print(Counter(ys))
print(dict(Counter(ys)) #if you want dictionary object
@Parkbro 5 лет назад
Similarly you can do:
from collections import Counter
Counter(trainset.dataset.targets.tolist())
@souravdey1227 3 года назад
You're so correct when you say that data acquisition, preprocessing and all that jazz take up about 90% of the time. and it is sooooo frustrating
@davidserero9125 5 лет назад ⁺⁷
You uploaded it fast! I’m sooo happyyy! Next one sooon please!!
@joseortiz_io 5 лет назад ⁺⁴
Awesome work. Thank you! Keep them coming. It's good for the soul!🙏
@anthonypicciano7934 3 года назад
Thank you for the beginner friendly great tutorials. I’m new to learning python but was able to understand and follow so far! Seeing the potential applications of basic concepts me more reason to keep learning!
@maslaxali8826 2 года назад
Thank you Harrison for your wonderful tutorial, and getting me started in the right direction!!
@ayanokouji3345 3 года назад
one of the best explanations that I have ever seen man thanks
@omeraiman2983 5 лет назад
to change editor font size, go to settings > notebook > user preferences >
{
"codeCellConfig": {"fontSize": 24}
}
or whatever.
@emmanuelacerocasildo4064 4 года назад ⁺⁵
Hi, there! First of all, the way you explain is amazing. Then, I tried the code of this video and I got this warning: "..\torch\csrc\utils\tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program." I hope you can help this very beginner.
@andrewwilkinson3978 4 года назад
I am having the same issue.
@chuckchen2851 4 года назад
Could somebody please take a look at this problem? It's quite discouraging for beginners like us.
@eitanshirman9072 2 года назад
@@chuckchen2851 Hey, did you fix it somehow?
@juulermens4353 2 года назад
Could you check the writeable flag via print(arr.flags['WRITEABLE'])?
If it’s set to False, you should create a copy of it as suggested by the warning message.
@martinmartin6300 4 года назад ⁺⁴
The batch size (more specifically stochastic gradient descent) has nothing to do with overfitting. Lower batch size (stochastic gradient descent in general) can lead to less local minima problems and at the same time you can train much faster because it is better to get a less accurate estimate of the gradient but apply it more often with a bit smaller learning rate (to make up for the "noisy" gradient estimates).
@rodrigomaximo1034 2 года назад ⁺¹
This is controversial. Yann LeCun, sourcing a paper from 2018, says you shouldn't use a batch size bigger than 32. But other papers say you can use bigger and bigger batch sizes without any problem with speed and generalization, as long you use some techniques such as warming up the learning rate (start it small and increase it gradually). Source: Géron, Hands-on Machine Learning, 2nd Edition.
@erickgomez7775 Год назад
@@rodrigomaximo1034 we agree that the local minima problems (sharpness) and the lack of exploration cause lack of generalization. Is lack of generalization the same as overfitting?
@hussamsoufi1825 5 лет назад ⁺²
Jeez man, you're a freakin hero! looking forward for the next video of this series :)
@arijitsur644 5 лет назад
Your Coffee Cup collection is really nice and intriguing.
@cohnjesse 4 года назад
To change the code font size you should be able to go to settings -> JupyterLab Theme -> Increase Code Font Size
@aamir122a 3 года назад
Mate as always awesome, I hope you realise the content you create is better than most of the paid stuff.
@rubenvicente4677 5 лет назад
Other way to check if the data is balanced:
counter_dict = {}
for data in train:
if data[1] in counter_dict:
counter_dict[data[1]] += 1
else:
counter_dict[data[1]] = 1
total = sum(counter_dict.values())
for k,v in counter_dict.items():
counter_dict[k] = v/total
print(counter_dict)
@alexzemm0153 5 лет назад ⁺²
I obviously came back you are so good!
@NeoKailthas 4 года назад
running into exactly the problems you are talking about. thank you!!!
@Mi-xp6rp 4 года назад
Can't wait to see Neural Networking with numpy alone!
@portiseremacunix 4 года назад
very good videos to pick up pytorch without too much background talking. thanks.
@TheIsWorld 4 года назад
Small info for beginners and advanced users who just missed.
If your python version is less than 3.6, then you will need to use this piece of code for the last cell.
for i in range(10):
print('{}: {}'.format(i, counter_dict[i]))
To get your python version:
import sys
print(sys.version)
@ariramkilowan5325 5 лет назад ⁺¹
The whole Base 8 thing... I think it's actually just powers of 2. This helps internally when doing the matrix multiplication operations. I think it's easier to distribute that computational load.
@py_tok5589 3 года назад
good stuff, I m running through your Pytorch tutorials, thank you
@axios.24.psgtech 5 лет назад ⁺²
It's Rocking Dude!
Just Love what u are doin!
@martinmartin6300 4 года назад
You could also (in theory; don't know whether pytorch provides support for this) make sure that examples of each class is presented to the network with equal probability in each epoch if your dataset is not so well balanced. Its still better to utilize extra data of a class if it has significant more examples rather than discarding the extra data.
@SuperJg007 3 года назад
for imbalanced datasets you can try oversampling techniques to achieve that, you can try SMOTE or ADASYN for that matter.
@rahilpatel9029 4 года назад ⁺³
Remember troop: usually, it's just usually! 13:39
@tvbharath1987 5 лет назад
count={}
for xs,ys in trainset:
for y in ys:
try:
count[int(y)]+=1
except:
count[int(y)]=1

print(count)
Reduced some labor in creating Count dictionary :)
@HiPh0Plover1 4 года назад
You explain complex stuff so easily
@pythonforall8869 3 года назад ⁺¹
UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. ....help
@undfmusic 3 года назад
same problem over here ... - somebody having a solution?
@bharathhegde8526 3 года назад ⁺¹
17:26 Correction: data is a list - the first element is a tensor of tensors (the pixel information of 10 samples) while the second element is a tensor of integers representing the labels (the corresponding digits) of those 10 sample images.
@xn4pl 5 лет назад ⁺⁶
counter_dict = {i:0 for i in range(10)}
@romariocarvalhoneto7831 5 лет назад ⁺¹
batch_size = 10
counter_dict = {x:0 for x in (range(batch_size))}
@BiancaAguglia 3 года назад
@@romariocarvalhoneto7831 batch_size might be a misleading name since, in this video, 10 is not related to the batch size, but rather to the number of labels/outcomes. 🙃
@applepie7282 3 года назад
musa kurt kardeş, yapıyosun bu sporu
@musclecard54 4 года назад
gave a thumbs up just for "ya came back!", but this tutorial deserves the thumbs up anyway :)
@Gruftgrabbler 2 года назад
this is a great series. thank you very much :)
@zahraheydari172 Год назад
This was super helpful, thank you so much.
Is it nessecary to suffle the train and test datasets in DataLoader when we've already shuffled them in defining train and test?
@NailOriginal 3 года назад
I just love your videos. Thanks so much!
@ghaznavipc 2 года назад
you CAME *BACK*
@oversightentertainment6733 4 года назад
Great tutorial; it's going to save me a lot of time; thank you...... ...... ......
@tonyjames1980 4 года назад
sentdex is the bestman,
Thanks
@עדיאלסקייסט 3 года назад
If you haven't solve the font size problam go to Tools->Settings->Editor and than just pick a font size.
@angelgabrielortiz-rodrigue2937 3 года назад
Great video. I have learned so much with just these two videos.
I do have question. Regarding in second: 21:00. Why is it that when you print (y), you get the same number as when you imshow the value that you passed to x (data[0][0])? Does this tells me that x (data[0][0]) and y (data[1][0]) have the same value? I don't quite understand that part.
@888ussama 3 года назад
Yeahhhhhh i came back!!! Its so real. Great tutorials
@matthewsidaway1437 4 года назад
this course is excellent
@gallegom58 5 лет назад ⁺¹
So if in the case that 1 would be 45 percent of your outputs, how would you balance it? Just remove the ones out?
@cbeif 3 года назад
Hit CTRL+ or CTRL- to make your fonts bigger or smaller inside of a browser
@GarethDavidson 2 года назад
I like the cut of your jib sir! Great videos, it's nice to see someone confident, knows what they're doing and is a natural teacher. It feels like you're primarily did it to share your knowledge and made videos because that supported the main goal. Sounds silly but there's a lot of guff on RUclips that is first and foremost a video to watch
Some feedback though, code review style:
You should have explained what that 1 was all about. Everything else made perfect sense, but hand-waving over that sorta left me with a gap where I was getting distracted thinking about wtf it meant and had to rewind. In fact it's still bugging me. Gonna have to look it up.
The powers of two thing is 'cause everything is transistors and 0 or 1. So they're even numbers, they align nicely with memory, are faster with less waste etc.
Grumbling about magic numbers in other people's code, then putting them in yours.. tsk tsk. Would have been better to make width and height variables to encourage this new generation of ML barbarians to give things decent names.
Could have explained that "rectified linear" is just "floor(0)", had to look it up and frowned at the whole of mathematics for their hyperpathalogical superfluoustic jargonization. The stuff about scaling the outputs kinda lost both of us too.
Finally, religious issue: embrace functional! You get the view vs reshape thing, but overwrite variables and prefer classes with mutable state. Functions are much cleaner other than passing parameters, better to test, easier to reuse. Might cut down on the copypasta in the AI world if people are reusing functions.
Great vids though like I said, pace is great, feels natural and the plot thing and your efficiency almost made me *want* to use a jupyter notebook as my interpreter. I probably would if I wasn't a snob about unit tests 😂
@ronit8067 5 лет назад
I have 2 questions. 1:what is the ideal state that the data set has to be in before applying the transform.compose() method. here the mnist data was surely clean. 2: while balancing what should be the general deviation percentage amongst all labels?
@wolfroma123 3 года назад
Good video. Could you please clarify for loop during balance computation? I would expect that data in trainset would be returned in shuffled way but you use it like normal array
@gautamj7450 5 лет назад ⁺¹
Can the '1' in x.shape be considered as the number of channels???
1 for grayscale, and 3 for RGB??
@ramtinazami2850 3 года назад
What if the data is not balanced? what do we need to do? could you explain that too?
@Azariven 5 лет назад
Hi sentdex, thanks for pointout out the balance issue. Do you know how to balance the dataset for unsupervised learning where you don't know if it's balanced or not (likely not)?
@poprockssuck87 5 лет назад
Are there NN models that are trained to construct (or suggest parameters for) other NN models (e.g., # and types of layers, # of nodes, connectedness, loss functions, etc.) based on the type and size of the data that one has? Is scale the main issue, is there a recurrence problem in even beginning to approach this, would the problem come with generalizing the model enough to take all types of data as an input, or something else?
@shashidharreddy1624 5 лет назад
awesome work brother.
@abdulsalamaboubakar4059 5 лет назад
I think for the batch_size they use base 8 because 1 byte is 8 bits.
thanks for the vid as well
@yes4me 3 года назад ⁺¹
One way to increase the font is to use Pycharm instead
@sentdex 3 года назад
Lol xD
@krum3155 2 года назад
for resizing jupyter lab, I pressed ctrl and used the scroll wheel
@bwatspro 5 лет назад
Working with tensors, can't count to 10. Now, I know you are for real.
@sentdex 5 лет назад ⁺¹
Anyone can work with tensors. Counting to 10? whewph. I wouldn't even know where to begin =/
@johndoucette3687 4 года назад
I have been trying to learn about image classification for a while. I have looked at several methods of classification but what I really need to know to get started is how do I prepare my images for classification. I am photographing tiny sea shells and have a few unique categories but don't know how because all the videos I have seen use either the Iris set, the digital images set, or other established data sets. But I haven't seen any videos on how to use original data. Maybe I haven't looked far enough for videos or other information.
@pushpajitbiswas3752 5 лет назад
You're breath taking
@slinky7355 5 лет назад
24:37 counter_dict = {k:0 for k in range(10)}
@esysss 5 лет назад
add this to your setting overrides of advance setting editor to increase your font size:
{"codeCellConfig": {
"autoClosingBrackets": true,
"fontFamily": null,
"fontSize": 20,
"lineHeight": null,
"lineNumbers": false,
"lineWrap": "off",
"matchBrackets": true,
"readOnly": false,
"insertSpaces": true,
"tabSize": 4,
"wordWrapColumn": 80
},
"markdownCellConfig": {
"autoClosingBrackets": false,
"fontFamily": null,
"fontSize": 20,
"lineHeight": null,
"lineNumbers": false,
"lineWrap": "on",
"matchBrackets": false,
"readOnly": false,
"insertSpaces": true,
"tabSize": 4,
"wordWrapColumn": 80
},
"rawCellConfig": {
"autoClosingBrackets": false,
"fontFamily": null,
"fontSize": 20,
"lineHeight": null,
"lineNumbers": false,
"lineWrap": "on",
"matchBrackets": false,
"readOnly": false,
"insertSpaces": true,
"tabSize": 4,
"wordWrapColumn": 80
}}
@_RMSG_ 2 года назад
I'm not quite understanding how numbers of layers/neurons can be trial and error past a certain point, for example GPT, that's a lot of parameters to change randomly
@catsinheat 5 лет назад
something is wrong with the data files from MNIST. They are in a UBYTE format. So your examples don't work. Please help!
@harrikah 4 года назад
So I want to master one, pytorch or tensorflow, which would you advise me to go for sir?
@zahranoor7227 3 года назад
Hi I am getting the following error : "stack expects each tensor to be equal size, but got [60000, 28, 28] at entry 0 and [60000] at entry 1" when I try to print the data by : for data in trainset:
print(data)
break
Could u guide me where I am doing wrong?
@illustrious_rooster 5 лет назад ⁺³
A dictionary comprehension would have worked great to define your counter_dict!
counter_dict = {i:0 for i in range(10)}
@sushrutkrishna1727 4 года назад
Xs,ys=data
and if ys is printed it returns a kind of list of tensors. But when iterating again in ys using y, the output should be just the label value right ? how does it return all the number of 1s,3s and all. For that case Xs should have been used , but here y has been used. its just confusing me a lot. Why do we iterate through ys instead of Xs?
@gangofnoobs5775 3 года назад
you used MNIST dataset but what if I have JSON data locally and how to load such local dataset
@cedric1731 4 года назад ⁺¹
I love how not even one percent disliked this video... That is quite uncommon to see on YT...
@abhinavkumardubey8550 5 лет назад
I have this strange doubt, while I was running the code on colab. first, I ran the total and count dict part, and then in the next cell, I ran the for loop to count the values. Now when I did this I got the answers in 10k and % in 100+. but when I ran the count_dict and for loop in single-cell it worked correctly, I wanna know the reason behind it. Please help me out!
@wktodd 5 лет назад
Thanks Harrison .
@TheCianJoseph 3 года назад ⁺¹
Why use Jupyter Lab?
@Cat_Sterling 3 года назад
Why the picture of 3 is in color if the dataset is black and white, I wonder?
@fvgoya 5 лет назад ⁺⁷
Like in the beginning so i don't forget! :-)
@owenpearson1193 3 года назад
Your a hero
@habeang304 3 года назад
My gpu ram is so small I can only fit 1 batch size is that even a good idea?
@professionalnerd5654 4 года назад
sorry i'm late to the problem but would a low tech solution work? can you just zoom your browser in?
@nix99problems 5 лет назад
shouldn't we use stratified sampling for splitting?
@amine_jbz 7 дней назад
If you encounter an issue during this step (21:11) where the kernel keeps dying, insert this snippet of code before importing matplotlib:
"""Python
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
"""
@haribejishkumar9511 4 года назад
hey Sentdex: What if we use reinforcement deep q learning model and needed to input the data from the sensors.

Следующие

Автовоспроизведение

Building our Neural Network - Deep Learning and Neural Networks with Python and Pytorch p.3