Thanks for watching! Forgot to mention that Walnut, which is open source, also uses Hazel's serialization library: github.com/StudioCherno/Walnut/tree/dev/Walnut/Source/Walnut/Serialization And don't forget you can try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/TheCherno . You’ll also get 20% off an annual premium subscription.
I believe one of the first things you probably need to add to this simple serialization implementation is some way of cleanly dealing with versions. One day you will have a new version of Hazel and users will need to be able to still read the asset packs from the previous Hazel version, at least during the upgrade. This simple implementation as it is now will force a new data file version to be used every time a data field is changed or added. It needs additional code to deal with backwards compatibility, which will need to be created and tested too. Once Hazel is shipped to users, this is something to think about when refactoring code.
Thats a good point. It means that every time you add or remove a data in the stream, you have to write some kind of adapter from the old data stream version to the new one? Seems to be a lot of work to do manually, i hope they thought about that!
No, not at all needed. Firstly, you are authoring them from assets on disk. What the thing is, the asset files themselves, and their representation, the serialized assetpack are two different things. Don't confuse the representation with the thing itself. Second, it should just be right the first time, instead of building and already planning with deprecation/versions, etc. without even having built the first complete version of a system is bad design thinking. Thirdly, additions should not break any system that is the basic "Open-Closed" Principle. Also additions would only be comprised of previous atomic data-types. If data-types are ever "removed" they would simply be read as their replacements. I also don't understand what, why one would ever remove a data-type from a serialized file. Plus, keeping old versions of generated files around and relying on translation/transcription layers is generally a bad idea, why not simply re-generate them from the source of it? A few things can be kept in mind and things will basically never have to change in the way you think they would have to change in order to entirely break any system. That is simply bad Design Thinking, "thinking ahead" to hypotheticals which might never enter your domain.
@@MrSofazocker I agree with all this. For assets generated/built to be shipped in a game, there will not be any versioning at play since these files will be regenerated from the assets on disk and as Cherno was mentioning, it all depends on the requirements, so the implementation should be the most simple implementation as long as it covers the requirements. My thinking however was that since Hazel is a game engine I can imagine there will be game project data to be serialized on disk during development of a game. When a newer version of Hazel (the editor) is created, it needs to be able to somehow correctly read the game project that was under development using the previous version of Hazel. Unity solves this by upgrading your project files when it first launches the newer version.
@@Bazzeee In the more generic problem than the "game engine case" you mention version handling can become way more complex than just "convert all the data files when new engine version is first launched". One could convert during installation of the new version as well, but that basically has the same challenges. In my case I did a lot of storing of parameter/description files used as input to scientific model simulations, but for the office folks simply think of this as "documents". The point is that during installation of a new software version you don't necessarily know about or have access to "all data files" that need to be converted to the new serialization version. The file may have been archived away or even later restored from a backup. Or maybe you just got a "document" from a friend with an older version. The point here is that the user needs to be able to access these "documents" somehow. It can be acceptable to let them use a "conversion tool" before they can open the files in their current software, or they should not need to bother with the versioning themselves. That would be a design desicion. In any case at some point there also needs to be some conversion code written, which essentially must be able to deserialize from version K and serialize to version L, where L may or may not be restricted to only the latest version. Then maybe consider for example two instances of the software using the serialization as a means of exchanging such data between them (eg, a TCP stream between two machines), but the systems are at different versions. Same general versioning problem. At some point the must exist the equivalent of a Deserialize() function for each previous version. You would probably want it to be part of the same (de)serialization code base too to avoid having copies of the same code around. I can tell from experience that data members of a struct/class are both deleted as well as _replaced_ over time. For things that are added one has to decide on which values these should be assigned in the new version. That's a decision that might even depend on some other values within the data set. If a set {a,b,c,d} is replaced by the set {a,c,e,f} the original _b_ and _d_ values (that the user originally entered) are lost, and some equivalent set of new values must be calculated. Not quite ideal, but reality can bug you some times...
The "difficult" part is not structs/objects... it's shared struct/objects, that are saved in one place and then, pointers to them need to be recreated when loading... pointers can act as a "uid"... but that means that a struct/objects when loading need to have a place to store the pointer at save time, and when loading look for it in the loaded objects. That is introducing some dependencies, and so, the order to save and load becomes important.
Why do you think that? The datatypes already are loaded into a tree, which you can derive their usage through, data gets stored somewhere every use gets the pointer to it? Done.
@@MrSofazocker In that scenario they might not be in a tree; the pointed structs might form a graph. Then you cannot simply use WriteObject as done here since you might with mutual recursion. While you could serialize the pointer as an ID, you still need to also associate the objects with the ID in the serialization itself. And when reading, the pointed to object might not have been loaded yet, so either you would have return to the object later to fix the pointer, or you can load it immediately assuming you have associated the current object with its ID. None of this is insurmountable at all, but still more difficult. Even in the simpler case with just a shared object, you will still need to avoid duplicating it when deserializing again.
I did this in school for a network server/client chat solution. Each package was prefixed with a type (broadcast/message/system action) then data. The big difference was that bool only wrote 1 bit and buffer was a bit array with memory offset management. Fun project
5:20 It feels so good to hear this coming from someone who knows what he's talking about. Spent an entire day at work recently trying to understand why a method in a Spring service would not serialize a date according to its field annotations. The problem? We never found out. We shoehorned a custom date serializer into the global object mapper instance instead (by putting a different annotation on a different method). Basically, a game of choose your hostage taker.
I do a very similar thing, even in C#. The only real addition is that I version every non-trivial section, i.e. every implementation of Serialize/Deserialize in a class. On read, if the version of that section is less than the current version, it calls the method to read that version. It only ever writes the latest version, but it can read every previous version in addition to the current version. This way it can always load old data, and updates it when the data is saved again. Deserialize() { byte version = ReadByte(); switch (version) { case 1: DeserializeV1(); break; case 2: DeserializeV2(); break; } } DeserializeV1() { // Reads V1 data into the current data structure, adapting/updating where necessary or filling new fields with defaults } DeserializeV2() { // Reads V2 data into the current data structure }
I once implemented a serialization framework where I abstracted write and read to a certain extent. Instead of readInt() and writeInt(), there was processInt(), with a pointer to an int (and so on). The reader object would fill in the pointer, the writer object would write out the pointed-to value. Maybe 5% of code needed to test whether it was reading or writing, for instance if it needed to initialize other structures on read, but you didn't want to run that code on write. Doing it this way really removed a lot of the boilerplate associated with such systems. [In general you have my full agreement on simple, though. In my experience self-deriving serialization libraries are only useful for tech demos, for real code it is worthwhile to do it right the first time.]
Since the object to be deserialized into has already been allocated, probably want std::is_trivially_copyable instead of std::is_trivial since it doesn't require a default constructor. A nice thing about Cereal is that you for simpler custom types you can combine the serialization and deserialization into the same function also without duplicating the lines.
It's really similar to the implementation of my game engine. When I started developing it I also decided to make things as simple as possible, specially because my engine is only for my own games. The main big thing that I do differently is using less of the C++ standard library so the code is prepared to be ported for consoles. Instead of replacing code in multiple places I just add when needed to my own API.
Easy solution is usually better, but data format changes a lot and you'll start having "if version >1" conditions in your read functions everywhere and it can get messy:/ For smaller project, it's probably still maintainable, but ...
Building it in the way Cherno is, there will doubtedly never have to be a change that deprecates the entire schema, other than in development of the system itself. Or did git ever break?
Yeah the Read/WriteRaw is a recipe for disaster, structs will have different padding and even get reordered on different platforms and possibly even compiler versions. Probably not a huge issue if you're targeting a single platform and toolset but this is a ticking timebomb for a portable engine.
@@mattmurphy7030 it's not really hilarious. Were you maybe born with a knowledge about struct alignment? Your comment is really pathetic. And you don't even know whether he knows about either struct alignment or endianess. And it doesn't matter anyway here. I was just asking if I'm missing something.
@@lithium you write as if something he has done is a bad thing. It's not a recipe for disaster. It depends what you are doing, and yes what you are targeting. On top of that you have to start somewhere.
@@nan0s500 It _is_ a bad thing. His audience is filled with inexperienced devs who don't know enough to know that this technique is majorly flawed, and he didn't even disclaim it off hand. I know for a fact it's a recipe for disaster because I've run into this very problem multiple times in my career. As for having "to start somewhere", this isn't some throwaway code, this is running in his actual engine, and he's encouraging other people to do it. At best it's irresponsible.
While this code effectively reads and writes int32_t and int64_t, it's important to note that it may not be cross-platform due to differences in endianness between systems. Little-endian and big-endian architectures store bytes in different orders, leading to potential incompatibility when sharing binary files across platforms. To ensure cross-platform compatibility, consider explicitly managing byte order or using libraries like Boost which have functions like native_to_big and big_to_native to handle endianness conversion.
@@MrSofazocker How in C++17/23 without using assambler can I do this globally? endian.h from c++ provide only functions like htole32 and htobe32 similar to native_to_big and big_to_native from boost? Some chips provide method to switch but not all.
I'd be curious if you do anything to handle write corruption (i.e. the editor crashing the the middle of a write), something like writing to a temporary file and doing an atomic rename.
With memory mapped files you wouldn't need different back-ends for memory and files. It also happens to be faster than file streams. Just something to consider.
Have you considered a merged writer reader serialization interface? Since it's possible to use the same interface to do both using pointers. This reduces the amount of serialization code and massively reduces the error of needing to write the code twice!
interesting, i just finished writing a custom serializer /deserializer using the new io uring facility of the linux kernel together with a circular stream buffer. then you release this video. interestingly i crrated it pretty similar to your approach, but i overload the operator>> on my stream object which makes it look especially nice.
16:20 I guess most of the time Serialize and Deserialize will look basically the same. So you could even unify that: have a parent of Reader/Writer that has methods for Raw/String/Array... keep "Read"/"Write" out of the method name (or replace by something like "Process") and the new "Serialization" method takes not a Reader or Writer but a Processor. edit: testing around, found out it's a pain with template functions. But still, Unreal is basically doing this by implementing the "
I decided to write my engine in C# because I prefer having more features more easily accessible over having "faster" code. It's also because the editor and script engine are written in C# as well, so I can easily interop.
uMod uses c# for scripting/modding. The directory is watched and the files are recompiled when changed/added. The running thread is killed and a new one is spawned. Pretty simple tbh
My first language was C#, but I'm not making my engine on it because of downsides like porting to consoles, code security, and performance. If anyone is fine with those downsides, C# is a valid option
@@ProGaming-kb9io code security is a major flaw, but I am attempting to write a custom scripting language to combat it. The scripts will be encrypted so no changes can be made, and the engine isn't much to look at
@@ProGaming-kb9io I'm only making this engine as a hobby, a side-project as it will. If it goes down well, I might release it as a full fledged github organisation.
IMO Complexity Demon enters code-base when requirements are not clear and one doesn't see the common denominators. When abstracting things down to the common and shared denominators a system becomes most simplistic in a way, that is so generalized it will fit any use-case. Yes, things can be really simple if built for one thing, and one thing only. But they can also be simple, if not simpler when they were built to the underlying mechanics that govern whatever system you are building.
This implementation lacks a lot, this is only ok when you serialize on exactly same ABI and never on some other ABI. What about alignment, endianess? You need to check whether size and alignment are the same then you can serialize raw. There is so much more before you can even call it basic serialization impl. What you have here is just a data streamer, what a serialization system can use at its core to write and read, but you also need some metadata so if something changes it can still read and deserialize. Like the type and variable name to create unique identifier which identify a specific memory location in ur stream, this makes it possible to remove a member or add member to serialize the member. Also the type and name should be generated at runtime so you can add some offset to it so serializer knows where to serialize
To me the static member function is polluting the data type. You can just make functions, so you may even put the serializers in another file and don't have to include your serializers if you don't need them.
That will highly couple the now separate serialization implementation to the data type, and you would not be able to serialize private or protected members.
Personally, I don’t like having separate read/write classes or functions because I don’t want to maintain two copies of near-identical code. Instead I use a function pointer that is assigned to either fread() or fwrite(). Then I just make a function like ‘rwxchg’ and use that to read to or write from the passed reference argument using the function pointer. Additionally, using a function pointer allows me to modify the read or write procedure in case I want to have something fancy, like integer compression. The downside to this is that the function pointer assignment may be UB since the signatures differ by a const qualifier on a parameter. Works perfectly fine on x86-64 tho.
Just have a object serialize function that have a stream param that can be in write or read mode and has a version, then serialize base types using the stream with a version param that tell in what version data was introduced, then adding data is simple… pStream->Serialize(&m_fValue, 23); …also this remove most of the read/write duplicate code
I wouldn't write a size_t to a stream, it *could* be different on different compilers/platforms/OSes, should *always* use a known size type when doing data streams. It's been mentioned in other comments, but endianness can also come into play, but also a majority of systems out there are little endian... So that's less of an issue IMO.
Cherno, you don't need to call the .close() method on std::ofstream in destructor because RAII (Resource Acquisition Is Initialization) automatically handles closing the file when the object goes out of scope. Manually calling .close() leads to redundancy since the destructor of std::ofstream closes the file automatically.
Close can fail inside the destructor and the destructor will hide the error. By calling close before the destructor is called you ensure that nothing inside the destructor is going to fail without you knowing about it. It's a clever technique.
@@mlecz he said he didn't have time to implement error handling. He'll probably do it in the near future. Also no need for exceptions. Calling close and then checking the state of the stream with `if (!stream) return error_code;` is the way to go.
It doesn't look much different conceptually than Apple's Foundation NSCoding protocol. Serialising and deserialising is totally manual but extremely simple too.
Endianness, padding bytes, different sizes of data types in different platforms, etc will cause you quite a bit of headaches soon. That's why I forgot about binary serialization and just went with text sterilization. Not super efficient but easy to deal with.
I would really recommend you to dump std::ofstream, wrap a low level FILE (fopen) API, as it can easily be between 2x to 10x faster to read/write files than std::ofstream. Please test it.
@@ohwow2074 It has nothing to do with it being C but how they have been implemented. std::ofstream is build upon FILE but adds an extra layer of buffering and lock access, in addition to stream checking and accounting. If you use FILE directly, you only get 1 buffer layer instead of 2, and you can "not use" the associated lock given that mostly likely you will never have 2 threads competing to read the same file stream simultaneously. I have measured this.
How much simpler would this be with something like Serde in Rust? Is there no good C++ equivalent, or is something like that generally not suitable for a game engine?
He default constructs a Key object then passes it by reference to ReadRaw/ReadObject that then writes into the Key object. The default (unordered_)map containers in std library returns a reference (which "are" never null) to the value. The choice of reference is part of the design decision of the function, and if the key is missing the map default construct an internal key/value pair return returns a reference to the value. Then the value is again passed by reference to ReadRaw/ReadObject to be written into.
I don't understand why you need to explicitly define a destructor to call m_Stream.close() when the destructor of m_Stream itself will close the stream. Standard streams already use RAII. It would also probably make more sense to construct m_Stream directly in the constructor's member initialiser list rather than constructing a temporary and move assigning it in the body.
This may be a dumb question but are you concerned about the write raw in terms of any number? Aka, you save something on a little endian CPU and then load it on a big endian one. (like an arm chip)
Hii cherno i am currently following your 7 year old c++ tutorial playlist. i just want to know if thats still relevant compare to morden c++ as a learner.
Hi. I've recently started watching your c++ series that started over 6 years ago and was just wondering why our visual studio looks so much different. For reference i'm in ubuntu 23 using VSCode and as far as i know to compile a project i need to use g++ a.cpp b.cpp ... till the end. I can't just click one button, and there are many more differences. Why? Shoudn't it be basically the same VS?
I think VSCode was designed in mind of being very light weight and more of a text editor whilst visual studio is more of an IDE(it has more features such as a debugger, compiler, profiling and more). Visual studio code and visual studio are two separate products and meant to solve different problems.
VS Code is a different editor to Visual Studio, You can setup some stuff in VS Code so that you can press F5 and have it build and run. You could also probably use premake to manage the solution and projects and have it spit out a make file.
IMO automagic approaches never really work that well anyways. Ive been working on my game in UE for past couple years and most of its automagic stuff ends up being a trap rather than actually useful feature. For example; Replication system constantly checks for state of your replicated objects to see if anything has changed and syncs them if they did. Seems cool at first sight, this tends to be performance hog when your game and number of replicated objects rise. This is why they are switching to push based replication where you have to notify the replication system that you indeed changed a variable. The biggest problem with these approaches is that you don't realize these bottlenecks until they start happening, at which point you need to do extensive debugging and reading engine source code to understand and fix them. Which results in like 5x total time spent vs just writing my own efficient replication from scratch. And Unreal does this everywhere which I find to be quite annoying. I would much prefer a game engine that requires manual coding but has really fast base and gives me a nice set of API's to work with. One major benefit of manual programming is, because you know how exactly your game is supposed to work, you can leverage optimizations that an automagic system could not.
I still think you should just use an open source library for some configuration format like TOML and just serialize all data that doesn't fit into a basic type as a b64 encoded string, but that may just be me who is thinking that and you'll never read this anyway. Also, you must have written Java for many years looking at how you're designing this.
Why on earth would expand increase the size by 33% and do extra processing for b64 encoded strings unless you are passing it from and to poorly made web services that have escaping errors?
@@Muzzleflash1990 Since my responses keep getting deleted I'll say it again, to prevent data corruption from users that edit the file with notepad and for web services.
@@anon_y_mousse It a simple transform (assuming you already have it byte-serialized) to convert to b64 if you need to. But there is no need to pay the cost up front if you never need to. I see what you mean now with TOML and b64 serialization for the "too complex" types. Having user editable data format though has entirely different considerations and goals than a binary encoding for performance. NVME SSDs are faster nowadays than many JSON parsing libraries.
@@anon_y_mousse Also, 9 out of 10 times I include a link in a YT comment the entire comments gets auto-deleted. YT really is a terrible platform for technical discussions.
@@Muzzleflash1990 I referred to the program by its filename and that's probably what triggered it. Possibly another reason to dislike D's scope resolution operator over C++'s.
See you've got stuck in the infinite loop of tools development. It never ends but hey, it makes content to watch. I'd like to see how you've implemented your engine's AI though
Is there a reason you're not using a library like cista or protobuf for serialization? It has an MIT license, so I am assuming it's not a license problem
I remember doing that in a project and it made my life 100X easier 💀. I wasn't even a decent programmer back then and it helped me write a program that was reading and writing objects from/to a file
Thanks for watching! Forgot to mention that Walnut, which is open source, also uses Hazel's serialization library: github.com/StudioCherno/Walnut/tree/dev/Walnut/Source/Walnut/Serialization
And don't forget you can try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/TheCherno . You’ll also get 20% off an annual premium subscription.
Hey Cherno, I didn't find the links for other reflection libraries
hey mate at the start you mentioned a link to a previous video, for some reason it did not pop up for me, could you send us the link again?
!! you wanted to supply resources for introspection in C++
I believe one of the first things you probably need to add to this simple serialization implementation is some way of cleanly dealing with versions. One day you will have a new version of Hazel and users will need to be able to still read the asset packs from the previous Hazel version, at least during the upgrade. This simple implementation as it is now will force a new data file version to be used every time a data field is changed or added. It needs additional code to deal with backwards compatibility, which will need to be created and tested too. Once Hazel is shipped to users, this is something to think about when refactoring code.
Thats a good point. It means that every time you add or remove a data in the stream, you have to write some kind of adapter from the old data stream version to the new one?
Seems to be a lot of work to do manually, i hope they thought about that!
a "translator" layer can be a good solution ? to pass the data from version 1 to v2 for exemple
No, not at all needed.
Firstly, you are authoring them from assets on disk. What the thing is, the asset files themselves, and their representation, the serialized assetpack are two different things.
Don't confuse the representation with the thing itself.
Second, it should just be right the first time, instead of building and already planning with deprecation/versions, etc. without even having built the first complete version of a system is bad design thinking.
Thirdly, additions should not break any system that is the basic "Open-Closed" Principle. Also additions would only be comprised of previous atomic data-types.
If data-types are ever "removed" they would simply be read as their replacements.
I also don't understand what, why one would ever remove a data-type from a serialized file.
Plus, keeping old versions of generated files around and relying on translation/transcription layers is generally a bad idea, why not simply re-generate them from the source of it?
A few things can be kept in mind and things will basically never have to change in the way you think they would have to change in order to entirely break any system.
That is simply bad Design Thinking, "thinking ahead" to hypotheticals which might never enter your domain.
@@MrSofazocker I agree with all this. For assets generated/built to be shipped in a game, there will not be any versioning at play since these files will be regenerated from the assets on disk and as Cherno was mentioning, it all depends on the requirements, so the implementation should be the most simple implementation as long as it covers the requirements. My thinking however was that since Hazel is a game engine I can imagine there will be game project data to be serialized on disk during development of a game. When a newer version of Hazel (the editor) is created, it needs to be able to somehow correctly read the game project that was under development using the previous version of Hazel. Unity solves this by upgrading your project files when it first launches the newer version.
@@Bazzeee In the more generic problem than the "game engine case" you mention version handling can become way more complex than just "convert all the data files when new engine version is first launched". One could convert during installation of the new version as well, but that basically has the same challenges. In my case I did a lot of storing of parameter/description files used as input to scientific model simulations, but for the office folks simply think of this as "documents". The point is that during installation of a new software version you don't necessarily know about or have access to "all data files" that need to be converted to the new serialization version. The file may have been archived away or even later restored from a backup. Or maybe you just got a "document" from a friend with an older version.
The point here is that the user needs to be able to access these "documents" somehow. It can be acceptable to let them use a "conversion tool" before they can open the files in their current software, or they should not need to bother with the versioning themselves. That would be a design desicion. In any case at some point there also needs to be some conversion code written, which essentially must be able to deserialize from version K and serialize to version L, where L may or may not be restricted to only the latest version. Then maybe consider for example two instances of the software using the serialization as a means of exchanging such data between them (eg, a TCP stream between two machines), but the systems are at different versions. Same general versioning problem. At some point the must exist the equivalent of a Deserialize() function for each previous version. You would probably want it to be part of the same (de)serialization code base too to avoid having copies of the same code around.
I can tell from experience that data members of a struct/class are both deleted as well as _replaced_ over time. For things that are added one has to decide on which values these should be assigned in the new version. That's a decision that might even depend on some other values within the data set. If a set {a,b,c,d} is replaced by the set {a,c,e,f} the original _b_ and _d_ values (that the user originally entered) are lost, and some equivalent set of new values must be calculated. Not quite ideal, but reality can bug you some times...
Cherno's mewing in the thumbnail
🤫
🥶
lol
He’s working on blue steel
@@churchersit’s super hot right now
The "difficult" part is not structs/objects... it's shared struct/objects, that are saved in one place and then, pointers to them need to be recreated when loading... pointers can act as a "uid"... but that means that a struct/objects when loading need to have a place to store the pointer at save time, and when loading look for it in the loaded objects. That is introducing some dependencies, and so, the order to save and load becomes important.
Why do you think that?
The datatypes already are loaded into a tree, which you can derive their usage through, data gets stored somewhere every use gets the pointer to it? Done.
@@MrSofazocker In that scenario they might not be in a tree; the pointed structs might form a graph. Then you cannot simply use WriteObject as done here since you might with mutual recursion. While you could serialize the pointer as an ID, you still need to also associate the objects with the ID in the serialization itself. And when reading, the pointed to object might not have been loaded yet, so either you would have return to the object later to fix the pointer, or you can load it immediately assuming you have associated the current object with its ID. None of this is insurmountable at all, but still more difficult.
Even in the simpler case with just a shared object, you will still need to avoid duplicating it when deserializing again.
8 bytes to represent a byte, he’s a madman
Lmao
No, it was 8 bytes to represent a _length_ :-)
I did this in school for a network server/client chat solution. Each package was prefixed with a type (broadcast/message/system action) then data. The big difference was that bool only wrote 1 bit and buffer was a bit array with memory offset management. Fun project
Yeah, as others have said, add versioning. The first few bits should be used for checking the version.
5:20 It feels so good to hear this coming from someone who knows what he's talking about. Spent an entire day at work recently trying to understand why a method in a Spring service would not serialize a date according to its field annotations. The problem? We never found out. We shoehorned a custom date serializer into the global object mapper instance instead (by putting a different annotation on a different method). Basically, a game of choose your hostage taker.
I do a very similar thing, even in C#. The only real addition is that I version every non-trivial section, i.e. every implementation of Serialize/Deserialize in a class. On read, if the version of that section is less than the current version, it calls the method to read that version. It only ever writes the latest version, but it can read every previous version in addition to the current version. This way it can always load old data, and updates it when the data is saved again.
Deserialize()
{
byte version = ReadByte();
switch (version)
{
case 1:
DeserializeV1();
break;
case 2:
DeserializeV2();
break;
}
}
DeserializeV1()
{
// Reads V1 data into the current data structure, adapting/updating where necessary or filling new fields with defaults
}
DeserializeV2()
{
// Reads V2 data into the current data structure
}
I once implemented a serialization framework where I abstracted write and read to a certain extent. Instead of readInt() and writeInt(), there was processInt(), with a pointer to an int (and so on). The reader object would fill in the pointer, the writer object would write out the pointed-to value. Maybe 5% of code needed to test whether it was reading or writing, for instance if it needed to initialize other structures on read, but you didn't want to run that code on write. Doing it this way really removed a lot of the boilerplate associated with such systems.
[In general you have my full agreement on simple, though. In my experience self-deriving serialization libraries are only useful for tech demos, for real code it is worthwhile to do it right the first time.]
Since the object to be deserialized into has already been allocated, probably want std::is_trivially_copyable instead of std::is_trivial since it doesn't require a default constructor. A nice thing about Cereal is that you for simpler custom types you can combine the serialization and deserialization into the same function also without duplicating the lines.
It's really similar to the implementation of my game engine. When I started developing it I also decided to make things as simple as possible, specially because my engine is only for my own games. The main big thing that I do differently is using less of the C++ standard library so the code is prepared to be ported for consoles. Instead of replacing code in multiple places I just add when needed to my own API.
Easy solution is usually better, but data format changes a lot and you'll start having "if version >1" conditions in your read functions everywhere and it can get messy:/ For smaller project, it's probably still maintainable, but ...
Building it in the way Cherno is, there will doubtedly never have to be a change that deprecates the entire schema, other than in development of the system itself.
Or did git ever break?
What about endianess of architectures? Am I correct that you don't handle it or am I missing something?
Yeah the Read/WriteRaw is a recipe for disaster, structs will have different padding and even get reordered on different platforms and possibly even compiler versions. Probably not a huge issue if you're targeting a single platform and toolset but this is a ticking timebomb for a portable engine.
Endianness is not a big issue, but the fact that he doesn’t know about struct alignment is hilarious
@@mattmurphy7030 it's not really hilarious. Were you maybe born with a knowledge about struct alignment? Your comment is really pathetic.
And you don't even know whether he knows about either struct alignment or endianess. And it doesn't matter anyway here. I was just asking if I'm missing something.
@@lithium you write as if something he has done is a bad thing. It's not a recipe for disaster. It depends what you are doing, and yes what you are targeting. On top of that you have to start somewhere.
@@nan0s500 It _is_ a bad thing. His audience is filled with inexperienced devs who don't know enough to know that this technique is majorly flawed, and he didn't even disclaim it off hand. I know for a fact it's a recipe for disaster because I've run into this very problem multiple times in my career. As for having "to start somewhere", this isn't some throwaway code, this is running in his actual engine, and he's encouraging other people to do it. At best it's irresponsible.
While this code effectively reads and writes int32_t and int64_t, it's important to note that it may not be cross-platform due to differences in endianness between systems. Little-endian and big-endian architectures store bytes in different orders, leading to potential incompatibility when sharing binary files across platforms. To ensure cross-platform compatibility, consider explicitly managing byte order or using libraries like Boost which have functions like native_to_big and big_to_native to handle endianness conversion.
ChatGPT wrote this comment.
@@NullPointerDereference yup I'm too lazy
You generally can tell the system which one to use, its a non-problem.
@@MrSofazocker How in C++17/23 without using assambler can I do this globally? endian.h from c++ provide only functions like htole32 and htobe32 similar to native_to_big and big_to_native from boost? Some chips provide method to switch but not all.
@@mlecztoo lazy to write your own comment on a youtube video? what's this world coming to
This pattern is gonna be so useful for my final project. Thanks man
I'd be curious if you do anything to handle write corruption (i.e. the editor crashing the the middle of a write), something like writing to a temporary file and doing an atomic rename.
With memory mapped files you wouldn't need different back-ends for memory and files. It also happens to be faster than file streams. Just something to consider.
Have you considered a merged writer reader serialization interface? Since it's possible to use the same interface to do both using pointers. This reduces the amount of serialization code and massively reduces the error of needing to write the code twice!
“You‘re not behind the scene, you‘re in the scene“ - Yan
Actually an inspirational quote…
interesting, i just finished writing a custom serializer /deserializer using the new io uring facility of the linux kernel together with a circular stream buffer. then you release this video. interestingly i crrated it pretty similar to your approach, but i overload the operator>> on my stream object which makes it look especially nice.
Could you do a video on your visual studio shortcuts you use? Some of these things you’re doing to navigate the code could be very useful to know
16:20 I guess most of the time Serialize and Deserialize will look basically the same. So you could even unify that: have a parent of Reader/Writer that has methods for Raw/String/Array... keep "Read"/"Write" out of the method name (or replace by something like "Process") and the new "Serialization" method takes not a Reader or Writer but a Processor.
edit: testing around, found out it's a pain with template functions. But still, Unreal is basically doing this by implementing the "
I decided to write my engine in C# because I prefer having more features more easily accessible over having "faster" code. It's also because the editor and script engine are written in C# as well, so I can easily interop.
how can the script engine be written in c#? Do you mean that the scripts that your script engine are written in c#?
uMod uses c# for scripting/modding. The directory is watched and the files are recompiled when changed/added. The running thread is killed and a new one is spawned. Pretty simple tbh
My first language was C#, but I'm not making my engine on it because of downsides like porting to consoles, code security, and performance. If anyone is fine with those downsides, C# is a valid option
@@ProGaming-kb9io code security is a major flaw, but I am attempting to write a custom scripting language to combat it. The scripts will be encrypted so no changes can be made, and the engine isn't much to look at
@@ProGaming-kb9io I'm only making this engine as a hobby, a side-project as it will. If it goes down well, I might release it as a full fledged github organisation.
As a related topic, I'd love if you could cover versioning with respect to serialization. Seems not as trivial as I'd hope.
IMO Complexity Demon enters code-base when requirements are not clear and one doesn't see the common denominators.
When abstracting things down to the common and shared denominators a system becomes most simplistic in a way, that is so generalized it will fit any use-case.
Yes, things can be really simple if built for one thing, and one thing only.
But they can also be simple, if not simpler when they were built to the underlying mechanics that govern whatever system you are building.
This implementation lacks a lot, this is only ok when you serialize on exactly same ABI and never on some other ABI.
What about alignment, endianess? You need to check whether size and alignment are the same then you can serialize raw. There is so much more before you can even call it basic serialization impl.
What you have here is just a data streamer, what a serialization system can use at its core to write and read, but you also need some metadata so if something changes it can still read and deserialize. Like the type and variable name to create unique identifier which identify a specific memory location in ur stream, this makes it possible to remove a member or add member to serialize the member. Also the type and name should be generated at runtime so you can add some offset to it so serializer knows where to serialize
Future historians will see this video to understand how humans used to do serialization without reflection
To me the static member function is polluting the data type.
You can just make functions, so you may even put the serializers in another file and don't have to include your serializers if you don't need them.
That will highly couple the now separate serialization implementation to the data type, and you would not be able to serialize private or protected members.
your opengl series is great
what's the visual studio color theme
Personally, I don’t like having separate read/write classes or functions because I don’t want to maintain two copies of near-identical code. Instead I use a function pointer that is assigned to either fread() or fwrite(). Then I just make a function like ‘rwxchg’ and use that to read to or write from the passed reference argument using the function pointer. Additionally, using a function pointer allows me to modify the read or write procedure in case I want to have something fancy, like integer compression.
The downside to this is that the function pointer assignment may be UB since the signatures differ by a const qualifier on a parameter. Works perfectly fine on x86-64 tho.
Just have a object serialize function that have a stream param that can be in write or read mode and has a version, then serialize base types using the stream with a version param that tell in what version data was introduced, then adding data is simple…
pStream->Serialize(&m_fValue, 23);
…also this remove most of the read/write duplicate code
I wouldn't write a size_t to a stream, it *could* be different on different compilers/platforms/OSes, should *always* use a known size type when doing data streams.
It's been mentioned in other comments, but endianness can also come into play, but also a majority of systems out there are little endian... So that's less of an issue IMO.
Cherno, you don't need to call the .close() method on std::ofstream in destructor because RAII (Resource Acquisition Is Initialization) automatically handles closing the file when the object goes out of scope. Manually calling .close() leads to redundancy since the destructor of std::ofstream closes the file automatically.
Nope, while redundant it's more implicit writing.
It's not gonna close twice or smth, or idk what you think.
@@MrSofazocker this is redundant in terms of style and code management, as well as good practices for complying with the principles of the RAII idiom
Close can fail inside the destructor and the destructor will hide the error. By calling close before the destructor is called you ensure that nothing inside the destructor is going to fail without you knowing about it. It's a clever technique.
@@ohwow2074 but the problem is that there is no exception handling in his code which would make your theory valid in this case
@@mlecz he said he didn't have time to implement error handling. He'll probably do it in the near future.
Also no need for exceptions. Calling close and then checking the state of the stream with `if (!stream) return error_code;` is the way to go.
The best solution is the simplest solution. Lot of truth!
It doesn't look much different conceptually than Apple's Foundation NSCoding protocol. Serialising and deserialising is totally manual but extremely simple too.
Mind sharing what Visual Studio theme you're using? I really like the itatlic touch on the standard library stuff. Btw love you content!
I would add a checksum
Endianness, padding bytes, different sizes of data types in different platforms, etc will cause you quite a bit of headaches soon.
That's why I forgot about binary serialization and just went with text sterilization. Not super efficient but easy to deal with.
Great content, but where is Vulkan series? :d
Any reason for not using any of the serialization formats that use a code generator? Protobuf, cap’n’proto, msgpack etc.
He's an idiot
I would really recommend you to dump std::ofstream, wrap a low level FILE (fopen) API, as it can easily be between 2x to 10x faster to read/write files than std::ofstream. Please test it.
Just switching to a C API makes things 10X faster?
@@ohwow2074 It has nothing to do with it being C but how they have been implemented. std::ofstream is build upon FILE but adds an extra layer of buffering and lock access, in addition to stream checking and accounting. If you use FILE directly, you only get 1 buffer layer instead of 2, and you can "not use" the associated lock given that mostly likely you will never have 2 threads competing to read the same file stream simultaneously. I have measured this.
@@miket591 interesting. I'll try and dig deeper into this.
why not use std::ostream and std:istream interface?
What's the VS color theme that you are using?
one idea, compile a normal c/c++ program to run as shaders on gpu
That's not always true. Sometimes you have to introduce complexity for flexibility and extensibility purposes
in which case the best solution is the simplest one that meets the design goals of flexibility and extensibility
Why char not Byte (std::byte) type?
That can't be written to a file. It's only intended for use with memory buffers.
Nice.. Why virtual inheritance rather than templated at compile time, "strategy pattern" style?
Does anyone know what theme he is using?
How much simpler would this be with something like Serde in Rust? Is there no good C++ equivalent, or is something like that generally not suitable for a game engine?
Yes reflect-cpp
Why have "if is_trivial" everywhere instead of a "WriteAuto" that encapsule the "if"?
Love these videos! Can someone explain line 43 at 18:47 how those that key gets the value from the map
He default constructs a Key object then passes it by reference to ReadRaw/ReadObject that then writes into the Key object. The default (unordered_)map containers in std library returns a reference (which "are" never null) to the value. The choice of reference is part of the design decision of the function, and if the key is missing the map default construct an internal key/value pair return returns a reference to the value. Then the value is again passed by reference to ReadRaw/ReadObject to be written into.
I don't understand why you need to explicitly define a destructor to call m_Stream.close() when the destructor of m_Stream itself will close the stream. Standard streams already use RAII. It would also probably make more sense to construct m_Stream directly in the constructor's member initialiser list rather than constructing a temporary and move assigning it in the body.
15:18 "Non trivial types" Why not deconstruct them and save a mapping + the data?
Ah nvm you are essentially doing that.
What happened to Twitch videos? Pretty sure there used to be past streams available there, now there's none on neither Twitch nor YT?
This may be a dumb question but are you concerned about the write raw in terms of any number? Aka, you save something on a little endian CPU and then load it on a big endian one. (like an arm chip)
Hii cherno i am currently following your 7 year old c++ tutorial playlist. i just want to know if thats still relevant compare to morden c++ as a learner.
Implement fsr or xess or dlss something like opengl series
Hi. I've recently started watching your c++ series that started over 6 years ago and was just wondering why our visual studio looks so much different. For reference i'm in ubuntu 23 using VSCode and as far as i know to compile a project i need to use g++ a.cpp b.cpp ... till the end. I can't just click one button, and there are many more differences. Why? Shoudn't it be basically the same VS?
I think VSCode was designed in mind of being very light weight and more of a text editor whilst visual studio is more of an IDE(it has more features such as a debugger, compiler, profiling and more). Visual studio code and visual studio are two separate products and meant to solve different problems.
VS Code is a different editor to Visual Studio, You can setup some stuff in VS Code so that you can press F5 and have it build and run. You could also probably use premake to manage the solution and projects and have it spit out a make file.
i did some serialization in zig. let me just say that it's much less hassle for the task than in c++.
What’s the chance of winning pirate game jam with custom game engine?
IMO automagic approaches never really work that well anyways. Ive been working on my game in UE for past couple years and most of its automagic stuff ends up being a trap rather than actually useful feature. For example;
Replication system constantly checks for state of your replicated objects to see if anything has changed and syncs them if they did. Seems cool at first sight, this tends to be performance hog when your game and number of replicated objects rise. This is why they are switching to push based replication where you have to notify the replication system that you indeed changed a variable.
The biggest problem with these approaches is that you don't realize these bottlenecks until they start happening, at which point you need to do extensive debugging and reading engine source code to understand and fix them. Which results in like 5x total time spent vs just writing my own efficient replication from scratch. And Unreal does this everywhere which I find to be quite annoying.
I would much prefer a game engine that requires manual coding but has really fast base and gives me a nice set of API's to work with. One major benefit of manual programming is, because you know how exactly your game is supposed to work, you can leverage optimizations that an automagic system could not.
#[derive(Serialize, Deserialize)]
I still think you should just use an open source library for some configuration format like TOML and just serialize all data that doesn't fit into a basic type as a b64 encoded string, but that may just be me who is thinking that and you'll never read this anyway. Also, you must have written Java for many years looking at how you're designing this.
Why on earth would expand increase the size by 33% and do extra processing for b64 encoded strings unless you are passing it from and to poorly made web services that have escaping errors?
@@Muzzleflash1990 Since my responses keep getting deleted I'll say it again, to prevent data corruption from users that edit the file with notepad and for web services.
@@anon_y_mousse It a simple transform (assuming you already have it byte-serialized) to convert to b64 if you need to. But there is no need to pay the cost up front if you never need to.
I see what you mean now with TOML and b64 serialization for the "too complex" types. Having user editable data format though has entirely different considerations and goals than a binary encoding for performance. NVME SSDs are faster nowadays than many JSON parsing libraries.
@@anon_y_mousse Also, 9 out of 10 times I include a link in a YT comment the entire comments gets auto-deleted. YT really is a terrible platform for technical discussions.
@@Muzzleflash1990 I referred to the program by its filename and that's probably what triggered it. Possibly another reason to dislike D's scope resolution operator over C++'s.
Vulkan series pls
Why not use Visitor pattern? It's so much more simpler and more versatile.
See you've got stuck in the infinite loop of tools development. It never ends but hey, it makes content to watch. I'd like to see how you've implemented your engine's AI though
Is there a reason you're not using a library like cista or protobuf for serialization? It has an MIT license, so I am assuming it's not a license problem
MGE 🤔
Java be like
> implements Serializable
done😂 (I know it has its own problems)
I remember doing that in a project and it made my life 100X easier 💀. I wasn't even a decent programmer back then and it helped me write a program that was reading and writing objects from/to a file
Why to use char* data to represent bytes in 2024? Is the std::vector not good enough? We have std::byte since C++17 if I'm not mistaken...
I wish everything was manual and simple. All the magic behind the scene is not worth the time you invest learning what is happening there
I think Zig is better than C++.
Watch me compile my .cpp with zig
So the first 10 minutes is you crying on your little project
He's idiot
Hey Cherno, when are you going to grow up and realize that C is better?
cope
lmao
OMG what will be the next? Grow up and realize that Jai is better? 😂
@@EEEEMMMMKKKK Programming analog computers)
Nah, assembly is better. Joke apart, oop is just better for games and game engines.