Answering your question in the video about Pydantic validation (~ 14:53 ), pydantic's default mode is to validate on instantiation only. But you can set validate_assignment=True in the ConfigModel of your model to validate when you assign as well.
I don't get the purpose of this. When would you want to validate on instantiation, but not on assignment? Sounds like a good way to complicate your debugging if you assume a variable is a particular type and then turns out to be overwritten with something completely different.
@@drheaddamage I guess the use case for only validating at instantiation is that after that you may trust it based on the code and not validating later may provide performance benefits. The instantiation could be with external data (user input, config file, API data) while the subsequent assignments are within the code itself (e.g. calculations)
Also validation on every assignment is quite expensive operation for pydantic. There is a pattern when you create only unchangeable instances. Like using dataclasses ‘frozen’ init parameter. And if you want to change instance, you should create a new one.
@@drheaddamage Validation on assigment could be very expensive. If you use a functional approach, you will never modify data (objects) just create new ones, so makes sense.
@@drheaddamage It is not needed with frozen classes and those are extensively used in some projects. I think there was some argument that this is a bit more secure way of keeping data - creating new objects over editing existing one
Thanks for the video! We use both dataclasses and pydantic in our projects for 2 different cases. Pydantic is more than a schema validation library today but we consider it should be used as a schema validation tool only. So, we use it extensively for the request/response schemas and some other things. On the other hand we have data models that we map on the corresponding DB tables and we use dataclasses for that. Basically, the standard flow is request -> pydantic validation -> some logic and dataclass models -> pydantic validation -> response. Sometimes it may be handy to map the schema to the model data automagically and for that case there is an orm_mode flag in pydantic. I just want to say they are not competitors at all.
Great video! The idea is great ... comparing tools and them giving your nuggets of wisdom on which one to choose for each job ... It's great to have a glimpse of your experience while choosing tools. Suggestion for a next one: comparison between web frameworks (such as fast api, django, flask)
Hi Arjan, thanks for your effort you put in this videos, I got a lot of inspiration from them in the last years! I wanted to add, that Pydantic also has a dataclass decorator which is a drop-in replacement for the standard dataclasses, with all the validation features available as for the Pydantic BaseModel. Perhaps that would have been the more comparable choice. Keep up the great work! Timo
@@evandrofilipe1526 it has all the validation features, but it is not a replacement for the Pydantic BaseModel. I recommend their docs for further details.
One thing to note is that the Pydantic dataclass has some restrictions compared to the Pydantic BaseModel features. Would be nice to compare the Python built-in dataclass, the Pydantic dataclass and the traditional Pydantic BaseModel.
Awesome video! I use dataclasses and Pydantic a lot. I'll take a look at attrs now. Adding Mypy to my dev workflow has also helped me deliver better code.
I’m been rocking Pydantic for about a month now and I’m absolutely in love with it. It’s simpler to use than data classes IMO when using REST API. Granted I have a bunch of base models into another class then I use my own methods to manipulate the data but it works out nicely in the end.
Arjan it would be awesome if you'd interview the creator of pydantic. It would be a fantastic episode of "Become A Better Software Developer" or something like that. I don't know if you've thought about this kind of format. I'd love to see pros dive into the nitty gritty and share ideas or the way to do stuff. Just a thought.
Great video ! a question that pops - Where are you getting your knowledge from ? reading the documentation is nice, and playing around with a module or a library is great - but aren't you afraid to miss out few of the features ? are there any good forums or knowledge resources you're using to stay up to date and understand all the possible features of a library or module you use ?
I've used pandera pretty heavily in production and it's very capable and the developer is super helpful. But I haven't explored pydantic extensively, and I feel there might be some advantages to using it in exchange for writing quite a lot more code
And another nit in the attrs example, and this may very well be my misunderstanding of str, but instead of lower should rather we use casefold for comparisons? As I understand that is dedicated for that purpose.
Great video. Good explanation for using int in price. This, like your other videos, come in handy in lots of different scenarios. Thank you for your contribution.
Great Video! I actually started using dataclasses when I saw your video about it few months ago. I 'd like you to make a short video with pros and cons about your recommendation to use int() instead of float() type for prices fields. You left me thinking about that idea. Many thanks! 😋
for pydantic not printing the object type just do print(repr(banana)) and it will put it in a nice form that looks exactly like you would create it in code repr calls the __repr__ dunder method which all python classes inherit from default object. In python print calls __str__, which unless overridden actually calls __repr__. If you have ever printed an object to terminal and you get something like: its because it __str__ has not been overwritten and it has inherited __repr__ from object. I guess somehow youtube is messing with my underscores and turning my text italics but I think its still understandable.
It's worth to mention the performance. Dataclasses are way faster than BaseModel (I learnt this the hard way :-/). There are some improvements expected with Pydantic 2.0, but for now: In [1]: from dataclasses import dataclass In [2]: @dataclass ...: class A: ...: a: int = 1 ...: b: str = "one" ...: In [3]: %timeit a = A() 80.5 ns ± 0.212 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) In [5]: from pydantic import BaseModel In [6]: class B(BaseModel): ...: a: int = 1 ...: b: str = "one" ...: In [7]: %timeit b = B() 1.03 µs ± 3.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Can’t read all comments, but I think I saw a bug in the dataclass example. I think if your code would ever run through a midnight you’d need that default for the order date to also be dynamic with default_factory, otherwise it’ll be the date of the creation of the *class* and not the instance. So if I started my program December 2023, all my orders will come up with that date by default.
About the representation in pydantic, instead of printing the banana, you can do print(repr(banana)), you will get Banana(name='banana', category='fruit'....).
Great overview thank you :) Personal preference: dataclass with `slots=True`, I hope we get build in object pooling like `pool=300` one day for for dataclasses
Nice video, I realised about attrs because of Fluent Python 2nd edition by Luciano Ramalho. There is an entire section in that book about Data classes. If you ask me, I prefer to use attrs or Pydantic. You can use dataclass to prototype an initial version of some tiny app. However, once you need to build something more professional, you definitely need to go beyond. And Attrs or Pydantic is the correct choice.
7:38 As soon as you said it uses subclassing, I immediately thought “there must be a metaclass involved”. Had a look at the source code, and yes, that is how it works.
I have a question about dataclass VS normal class. When I use dataclass define one kind of data, at the beginning it is perfect. However, then more and more methods were added based on the attributes of the dataclass and the class is becoming more and more heavy and some of the methods doing something very complicated. I am not sure whether this is a good practice and whether I should convert the dataclass to a normal class. Could you please give me some suggestion about the use case of normal and data classes?
I had chosen pydantic for a few projects, but now I'm ripping it all out because of the incompatible API changes brought by 2.0. It appears the authors of pydantic are _very_ opinionated about how method names should be formatted, so they cavalierly replaced parse_raw by model_validate_json. Not only is the function name much longer, but it apparently does exactly the same thing. If I keep pydantic in the project, then it's like a ticking time bomb for future developers.
Hehe, this is REALLY a NIT. I was told that zero is neither negative or positive, that function in the attr example is non-negative, not positive. (Of course with IEEE-754 we can get zero BOTH negative or positive, but that is only useful for a zero that isn’t really zero, just too small)
I haven’t been able to understand Enum classes and what exactly are they useful? I've been looking for videos that explain it well but there aren't any. Could you please make one?
strange that attrs is the base class for dataclass and the same functions for validation are not inherited. is attrs using a different package to add in validation? What about just using a validation class in the first place as this is a time when inheritance is a good thing?
I am not sure if why question is clear and related to this video. Sometimes in my classes there are a lot of lines in the init/post_init method. The reason of this is I don’t want the same thing being calculated many times and saving the result in memory can speed up the process. Maybe I should just use function instead?
If it's a complicated computation, I would suggest to move it out of the init method and into a separate function. You could also look into functools cached_property decorator. This computes a value once and then memoizes it.
as a newcomer I was hoping to find The Answer, what I should be using beginning from now. After watching this I am not sure, it seems odd to have a choice of 3. What would it be like 5 years from now?
You missed something important: pydantic has its own dataclass decorator which can be used much like a BaseModel, but is fully API compatible with a built-in dataclass
I honestly used dataclasses for some time, but I stopped using them. I prefer having all classes written according to the standard rules for classes. So no instance variables in the part where you would standard see the class variables defined. Instead of mixing the dataclasses into regular classes, It would have been better if a new type of datastructure would be implemented in Python, and which is available by DEFAULT. Making the distinction with a decorator just doesn't do it for me... I like to say that pydantic is by the way a 'cleaner' solution imo...
I like attrs a lot, but it is a bit funny to me that we say: "Python is great because it is dynamically typed", and then the first thing we do is strictly typing using one of these packages...
Pydantic's Soo good to ensure that your python objects can easily become dicts, serialize in JSON, parse_raw query results and validate at the same time. I use pydantic all the time so other developpers only need to learn them and not have to deal with the differences coming in dataclasses fields
💡 Get my FREE 7-step guide to help you consistently design great software: arjancodes.com/designguide.
Answering your question in the video about Pydantic validation (~ 14:53 ), pydantic's default mode is to validate on instantiation only. But you can set validate_assignment=True in the ConfigModel of your model to validate when you assign as well.
I don't get the purpose of this. When would you want to validate on instantiation, but not on assignment? Sounds like a good way to complicate your debugging if you assume a variable is a particular type and then turns out to be overwritten with something completely different.
@@drheaddamage I guess the use case for only validating at instantiation is that after that you may trust it based on the code and not validating later may provide performance benefits. The instantiation could be with external data (user input, config file, API data) while the subsequent assignments are within the code itself (e.g. calculations)
Also validation on every assignment is quite expensive operation for pydantic.
There is a pattern when you create only unchangeable instances. Like using dataclasses ‘frozen’ init parameter.
And if you want to change instance, you should create a new one.
@@drheaddamage Validation on assigment could be very expensive. If you use a functional approach, you will never modify data (objects) just create new ones, so makes sense.
@@drheaddamage It is not needed with frozen classes and those are extensively used in some projects. I think there was some argument that this is a bit more secure way of keeping data - creating new objects over editing existing one
Thanks for the video! We use both dataclasses and pydantic in our projects for 2 different cases. Pydantic is more than a schema validation library today but we consider it should be used as a schema validation tool only. So, we use it extensively for the request/response schemas and some other things. On the other hand we have data models that we map on the corresponding DB tables and we use dataclasses for that. Basically, the standard flow is request -> pydantic validation -> some logic and dataclass models -> pydantic validation -> response. Sometimes it may be handy to map the schema to the model data automagically and for that case there is an orm_mode flag in pydantic. I just want to say they are not competitors at all.
Great video! The idea is great ... comparing tools and them giving your nuggets of wisdom on which one to choose for each job ... It's great to have a glimpse of your experience while choosing tools.
Suggestion for a next one: comparison between web frameworks (such as fast api, django, flask)
Like to see that as well!
Great idea
Thanks!
Thank you so much!
Thanks
Thank you so much!
Hi Arjan, thanks for your effort you put in this videos, I got a lot of inspiration from them in the last years!
I wanted to add, that Pydantic also has a dataclass decorator which is a drop-in replacement for the standard dataclasses, with all the validation features available as for the Pydantic BaseModel. Perhaps that would have been the more comparable choice.
Keep up the great work!
Timo
It has all the features of pydantic without inheritance?!
@@evandrofilipe1526 it has all the validation features, but it is not a replacement for the Pydantic BaseModel. I recommend their docs for further details.
One thing to note is that the Pydantic dataclass has some restrictions compared to the Pydantic BaseModel features. Would be nice to compare the Python built-in dataclass, the Pydantic dataclass and the traditional Pydantic BaseModel.
You've been releasing banger videos one after another, i swear, I link or get linked your video hours after release!
Awesome video! I use dataclasses and Pydantic a lot. I'll take a look at attrs now. Adding Mypy to my dev workflow has also helped me deliver better code.
I’m been rocking Pydantic for about a month now and I’m absolutely in love with it. It’s simpler to use than data classes IMO when using REST API. Granted I have a bunch of base models into another class then I use my own methods to manipulate the data but it works out nicely in the end.
Agreed. Pydantic also works extremely well when using an ORM like Sqlalchemy.
Arjan it would be awesome if you'd interview the creator of pydantic. It would be a fantastic episode of "Become A Better Software Developer" or something like that. I don't know if you've thought about this kind of format. I'd love to see pros dive into the nitty gritty and share ideas or the way to do stuff. Just a thought.
Thanks for the suggestion!
Thank you, I learned a lot again. Have a nice weekend!
Thanks so much to bring this to the table!
Great video ! a question that pops - Where are you getting your knowledge from ? reading the documentation is nice, and playing around with a module or a library is great - but aren't you afraid to miss out few of the features ? are there any good forums or knowledge resources you're using to stay up to date and understand all the possible features of a library or module you use ?
Great video as always! By the way, what library do you prefer for performing validation on dataframes?
I've used pandera pretty heavily in production and it's very capable and the developer is super helpful. But I haven't explored pydantic extensively, and I feel there might be some advantages to using it in exchange for writing quite a lot more code
And another nit in the attrs example, and this may very well be my misunderstanding of str, but instead of lower should rather we use casefold for comparisons? As I understand that is dedicated for that purpose.
Great video. Good explanation for using int in price. This, like your other videos, come in handy in lots of different scenarios. Thank you for your contribution.
I'm happy to hear you're enjoying the content!
Great Video! I actually started using dataclasses when I saw your video about it few months ago. I 'd like you to make a short video with pros and cons about your recommendation to use int() instead of float() type for prices fields. You left me thinking about that idea. Many thanks! 😋
Great suggestion!
You already convinced me of dataclasses. They spread across our code base like a slime mold.
Haha, good to hear 😊
Just a late thank you for your dataclasses video which taught me about the __post_init__() method which really came in handy recently.
for pydantic not printing the object type just do print(repr(banana)) and it will put it in a nice form that looks exactly like you would create it in code
repr calls the __repr__ dunder method which all python classes inherit from default object. In python print calls __str__, which unless overridden actually calls __repr__. If you have ever printed an object to terminal and you get something like: its because it __str__ has not been overwritten and it has inherited __repr__ from object.
I guess somehow youtube is messing with my underscores and turning my text italics but I think its still understandable.
It's worth to mention the performance. Dataclasses are way faster than BaseModel (I learnt this the hard way :-/). There are some improvements expected with Pydantic 2.0, but for now:
In [1]: from dataclasses import dataclass
In [2]: @dataclass
...: class A:
...: a: int = 1
...: b: str = "one"
...:
In [3]: %timeit a = A()
80.5 ns ± 0.212 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: from pydantic import BaseModel
In [6]: class B(BaseModel):
...: a: int = 1
...: b: str = "one"
...:
In [7]: %timeit b = B()
1.03 µs ± 3.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Thank you for the great video. Love the deep dive in new packages.
Glad you enjoyed it!
Super helpful information, thanks Arjan
Glad it was helpful!
In Python, one can also use `Decimal` to represent monetary amounts
My english is not so good, usually I don't understand all but ur speech is so good, I understood almost everything, btw its very helpful video
Can’t read all comments, but I think I saw a bug in the dataclass example. I think if your code would ever run through a midnight you’d need that default for the order date to also be dynamic with default_factory, otherwise it’ll be the date of the creation of the *class* and not the instance. So if I started my program December 2023, all my orders will come up with that date by default.
Would be great to have a follow-up which compares validation in pydantic, param and traitlets
Noted!
Thanks @ArjanCodes, quality content every time!
Thank you!
Dude, great video. Just really really great.
Thanks a ton!
As always, awesome video!
Thank you for the great content. I'm improving a lot my developer skills with your lessons.
Happy to help!
literally 5 minutes ago, I was in the situation to decide which to use. (you read my mind)
Haha, so I was just in time :).
What did you choose?
@@ArjanCodes because the wizard is not late nor early. Just in time.
Like a compiler...
dataclass for the win
Great video!
Glad you enjoyed it!
About the representation in pydantic, instead of printing the banana, you can do print(repr(banana)), you will get Banana(name='banana', category='fruit'....).
Hey Arjan, great video. How would you use dataclasses, attrs or pydantic to validate Tabular Machine Learning data. Greetings from Germany :)
I hit like even before starting your video. Already know the content's gonna be amazing, keep doing the awesome work!!!!
Thank you so much!
Awesome content as always! Could you please make video on how to persist these objects into a datbase and best practices to do so ?
Thanks! Your suggestion is noted :)
Great overview thank you :) Personal preference: dataclass with `slots=True`, I hope we get build in object pooling like `pool=300` one day for for dataclasses
Nice video, I realised about attrs because of Fluent Python 2nd edition by Luciano Ramalho. There is an entire section in that book about Data classes. If you ask me, I prefer to use attrs or Pydantic. You can use dataclass to prototype an initial version of some tiny app. However, once you need to build something more professional, you definitely need to go beyond. And Attrs or Pydantic is the correct choice.
7:38 As soon as you said it uses subclassing, I immediately thought “there must be a metaclass involved”.
Had a look at the source code, and yes, that is how it works.
I have a question about dataclass VS normal class. When I use dataclass define one kind of data, at the beginning it is perfect. However, then more and more methods were added based on the attributes of the dataclass and the class is becoming more and more heavy and some of the methods doing something very complicated. I am not sure whether this is a good practice and whether I should convert the dataclass to a normal class. Could you please give me some suggestion about the use case of normal and data classes?
I had chosen pydantic for a few projects, but now I'm ripping it all out because of the incompatible API changes brought by 2.0. It appears the authors of pydantic are _very_ opinionated about how method names should be formatted, so they cavalierly replaced parse_raw by model_validate_json. Not only is the function name much longer, but it apparently does exactly the same thing. If I keep pydantic in the project, then it's like a ticking time bomb for future developers.
Thank you fot this video! It's really helpfull ☺
I'm glad to hear the video was helpful!
thanks for the compare! the real content is here as always ;)
Thanks for watching!
4:50 Sounds like defining key fields in a database record.
4:07 Why not use the Decimal type?
When you printed that banana it totally made sense to me
Awesome video! Thank you for the great content.
Could you please make video about python metaprogramming and metaclass with real world example.
Thanks for the suggestion it's noted!
Hehe, this is REALLY a NIT. I was told that zero is neither negative or positive, that function in the attr example is non-negative, not positive. (Of course with IEEE-754 we can get zero BOTH negative or positive, but that is only useful for a zero that isn’t really zero, just too small)
I haven’t been able to understand Enum classes and what exactly are they useful? I've been looking for videos that explain it well but there aren't any. Could you please make one?
great stuff
strange that attrs is the base class for dataclass and the same functions for validation are not inherited. is attrs using a different package to add in validation? What about just using a validation class in the first place as this is a time when inheritance is a good thing?
I am not sure if why question is clear and related to this video. Sometimes in my classes there are a lot of lines in the init/post_init method. The reason of this is I don’t want the same thing being calculated many times and saving the result in memory can speed up the process. Maybe I should just use function instead?
If it's a complicated computation, I would suggest to move it out of the init method and into a separate function. You could also look into functools cached_property decorator. This computes a value once and then memoizes it.
@@ArjanCodes Thanks! I never know this decorator. It helps!
I would like to know what you think about typeguard module since I never seen you use it.
as a newcomer I was hoping to find The Answer, what I should be using beginning from now. After watching this I am not sure, it seems odd to have a choice of 3. What would it be like 5 years from now?
"Don't be an attr (etter)..." is a great way to start the week! 😁
You missed something important: pydantic has its own dataclass decorator which can be used much like a BaseModel, but is fully API compatible with a built-in dataclass
I really like the "frozen" option of dataclass that allow to have some sort of immutability. Does attrs or pedantic have something similar?
Pydantic supports frozen fields or frozen instances
What brand is your hoodie? I want one but can’t find the logo in google image search 😂
I've been having this doubt since a long time. which one among them is better?
Are there any difference between those three options in terms of performance?
Thanks! Can you please compare pydantic and marshmallow?
With pydantic you have super powerful objects factory to perform really good OOP without much effort
Now I need to learn what atter is in Dutch.
Nice video, cheers from a country where taxes aren't limited to 100% 😭
ok now I want to know what attrs means in dutch
The dutch word is 'etter', which is pronounced similar to attr in english. Etter translates more or less to 'jerk' in english 😂
Actually attr sounds like the Dutch word "etter" which means a brat who is acting up.🙂
in afrikaans, it is pretty bad
@@samsung40_media87 what does it mean?
he is een etter -> he's a jerk (according to Google translate)
Arjan how I communicate you officially for your courses?
To talk about my courses you can send an email to: support@arjancodes.com
@@ArjanCodes Okay Arjan I will. ❤
@@ArjanCodes Now I did that
I honestly used dataclasses for some time, but I stopped using them.
I prefer having all classes written according to the standard rules for classes. So no instance variables in the part where you would standard see the class variables defined.
Instead of mixing the dataclasses into regular classes, It would have been better if a new type of datastructure would be implemented in Python, and which is available by DEFAULT. Making the distinction with a decorator just doesn't do it for me... I like to say that pydantic is by the way a 'cleaner' solution imo...
attrs is excellent, but it's kinda not compatible with everything else, so it's an all in approach
Лучший!
I always get tickled by the volume and popularity of code that attempts to bolt type checking onto a duck-typed language
Curious. Why not, like price being int, make weight also int and represent the smallest unit we care about (like gram)?
I forgot that "attrs" in dutch was niet zo leuk 😂
I like attrs a lot, but it is a bit funny to me that we say: "Python is great because it is dynamically typed", and then the first thing we do is strictly typing using one of these packages...
Pydantic's Soo good to ensure that your python objects can easily become dicts, serialize in JSON, parse_raw query results and validate at the same time.
I use pydantic all the time so other developpers only need to learn them and not have to deal with the differences coming in dataclasses fields
한국인 중에 Pydantic 을 가르쳐 줄 수 있는 사람은 대개 미국에 살고 있으니... 감사합니다.
Nice video, now don't be a attr and like the video 😂😂😂