There is a compiler code analysis warning for this situation that, if enabled, will trigger if you set an enum to anything other than int. "CA1028: Enum storage should be Int32" This warning is not enabled by default, but I imagine there is some reason it exists. Microsoft says this about it: "Even though you can change this underlying type, it is not necessary or recommended for most scenarios. No significant performance gain is achieved by using a data type that is smaller than Int32."
Right, when you’re dealing with a billion + rows, these small optimizations are critical. But as it said in the first line of the content, it depends on your usage.
Honestly that a very anal and stupid warning There are cases where you need specific integers on the top of my head is In Protocols When the padding matters
Actually, I had a couple of times specified a non-default base type for enums. But in all those cases the actual data was transferred as a binary stream via a serial connection to an embedded device, so I had to carefully follow the protocol.
Agreed! That is probably the only viable reason though. For 1 Million records in a table this supposedly "optimization" saves 2.86MB. On top of that it might also do more harm then good as the IO could be negatively impacted. The "view" of the OP is also for no flag enums. The default of an int is a good default for flag enums as anything lower is only giving you 8 or 16 flags. The win32 API is full of int and long flag enums. I've done some PLC stuff in the past too and the PLC was even putting bit flags together in a 16 or 32 bit registers, and the only time I have had to step away from the default.
It is a micro optimization for most databases. But you can have tables with tens or hundreds millions of records where some of the columns are enum values. It can make indexes perform better by being smaller. This tip might not be very relevant to many applications but it is not wrong.
Its wrong because while you definitly can use tiny in the DB, there is no measurable cost in casting that to and from an int in the actual code. And as others and me have mentioned in other comments, using a byte enum might actually cause performance degradation. So keep that optimization in the DB.
most sane databases use data alignment, and alignment is not byte, at least int processor dsnt know what a byte is, but processor takes int into register without any problems and overheads and in sane databases no enums exist, they are not needed
We cast enums to bytes at our work, because the database is very old (25+ years), and we have some foreign keys to constant types, where the foreign key were of type [tiny] and now that we integrated EF core, we instead of making a join on the table that stores the value name, we just use enums of type [byte] - this just makes the EF to SQL relation easier. I don't think it neither a good or bad advice, it is just a very situational thing. which you normally wouldn't have to worry about.
I used to work on a transactionnal database that was dealing primary keys, enums with bigints, bigints everywhere! I can tell you that optimizing storage can lead to significant performances as well lowering storage for back ups and data warehouses. As for enums, I always use default integers, but if we do have millions of millions of records, i may consider using other types depending on the range of values or reconsider using a different design approach (using discriminators or not i.e table per types etc). Storage cost are very low nowadays anyways.
Just to be clear, 100% with you :). There are times of course where you should think of the types in the db. A concrete example from my previous job where we went from storing guids in strings (BigQuery didn’t have a native format for guids at the time) to its byte representation saved us a lot of storage space, and since we ingested about 10TB of json every day that optimization actually saves us quite a lot of money.
I think most people struggle to understand that optimization or "savings" have a factor/scale to them, and you can look at how effective/valuable a certain optimization or trick is by examining how it's going to _scale out_ in the context some real or hypothetical project. A "one-off" optimization or memory saving isn't helping you much, at all, unless it's a VERY big, single thing we are talking about, like maybe optimizing install/storage size of a game or app by eliminating the need for some huge chunk of data/content. Changing the underlying value type of an enum to `byte` isn't doing much for you in and of itself. It *can* in some very specific situations, like I sometimes do this in real-time 3D/game code when I'm dealing with really big data buffers or I have to have some very specific structure layout or byte alignment. But just universally making all your enums into bytes is more likely to slightly degrade performance than enhance it, as you can be causing some misaligned byte boundaries or reducing the compiler's ability to help you, not to mention you can create some future technical debt for yourself in some situations ...
Optimize is a very broad topic. Using byte as the underlying type will, of course, save space, but is it worth it? A 32-bit (or 64-bit) processor is much more speed efficient to access 32 (or 64) bits of data at a time. The data, however, needs to be 4 (or 8) byte aligned. For a modern processor to read a byte, it needs to read 32/64 bits of data and strip the unnecessary bits. If the data is not aligned, the process is even slower because shifting happens. We don't see it in a high level language, but that happens at assembly level/machine code. The only time I use byte as the underlying type for enums is when I have data interchange happening, when a byte is part of a data structure sent by an embedded device. Something like that.
There's nothing wrong with saving 3 bytes on a column in SQL Server. Odds are you have more problems than this, but that doesn't inherently make it a problem to seek out the savings. As far as how that should translate to the C# representations, I think there's room to debate. Remember that each field isn't stored just once. It's also stored in each index. It also consumes server memory. The less you use to do the same work, the more simultaneous clients you can support. The benefit from small changes like this isn't massive, but they add up for larger datasets. Agree with Nick about prioritizing your performance scrutiny.
For one reason or another I had to write a parser/serializer for some obscure format (existing one could not handle memory requirements we had - would hog up ALL THE MEMORY on a machine). The sample code required maintaining ordering of a list when inserting new stuff (so some stupid search through the list and inserting at proper index)... which in my case ended up being an absolute performance killer. After discovering that through profiling and replacing list with sorted list (or something like that), performance improved about 100x in relevant cases. In other words, do a PoC, profile it and look for things that eat up most of your performance. This way it will be less labour intensive and your time is more valuable than adding extra disk/ram to that server that is supposed to run the damn thing
Something I haven't seen mentioned is message structs. For one of my work projects we had a messaging system that was marshalling message structs and sending them over legacy hardware with very low bandwidth. In some structs, the fields were marshalled as int32 so when message frequency was high it would start slowing down. These fields were used in combination to represent some state. We found a way to optimize and drastically reduce the size by representing each state as a single byte under one enum and use bitwise operations to combine the states together using bit masking.
This enum byte thing exists like 15 years. The fact that people react to this like it's something new makes me think on the quality of any software these devs create every day!
We have a few Enums in our game code that are set to bytes and shorts but that's specifically because we do have hundreds-of-thousands to millions of them in contiguous arrays for the game's data so we do gain a fair bit of performance doing this. Especially from a cache point of view.
@@Sindrijowe don't really have a lot of places where we would need to do that. Except maybe one instance where we pack groups of six bools into single bytes - wasting 2 bits yes but it works out in that situation.
Performance? Probably not. You rather pack more information into your bytes, but packing and unpacking is usually less performent then not doing it. We were doing it in multiplayer games to reduce needed bandwidth. But it was also rather unusual practice. You can find same principle in c++ code in unreal engine a lot.
@@Revin2402 we have benchmarked it (as one should) and it actually does improve performance in our case. I'm guessing it is due to reduced cache misses.
@@Sindrijo Those are called "flags". And yes we did those things back in the day ... kind of 30 yrs ago. Fast-forward to today, I can think of VERY few cases where doing this type of thing could be called 'reasonable' and all those cases imply using tiny devices vith very little memory and storage capacity. It has NO sense to do that in ANY other case. As Nick said, not even in those 'multi-billion-row' tables. Because you can surely bet you're having WAY worse problems than saving 3 bytes (or 6, or 9) on each row, starting with the very fact that you have a SINGLE table with gazillions of rows stored on it.
Size column at database matter if column is part of index. If key has size 8 or 5byte is big difference because more keys can be contained in one 4KB db page.
To be honest I do it too, when I create an Enum to be saved in database I give byte type, because it just a keyword and conversion in dbcontext and nothing more, easy to implement. Yes most probably projects I worked on had missing better performance optimization but it does not matter, if I am aware of something while implementing, I will do it. It does not take a lot of effort.
@@7th_CAV_Trooper yes, I work in a project where we get almost a hundred millions of messages daily and we have large databases for each env, we clear the data periodically but they are still big.
In general, it's good advice to follow the conventions of whatever programming language or environment you're using, even if on the surface the convention may seem counter-intuitive. The convention for always using the `int` type in C# is a prime example. It may seem intuitive to use the smallest integer type to hold range of values needed, but the wisdom of the crowd knows something that isn't obvious, namely modern processors are optimized process data values on 32-bit and 64-bit boundaries. So you may think your being smart and efficient, but in reality there's no benefit. Sometimes there are valid exceptions to the common conventions in specific circumstances. A skilled develop knows when these exceptions are needed. A skilled developer also knows how to performance test code and search for specific and measurable optimizations when performance improvements are clearly needed.
We have a database (not SQL) where we store terrabytes of data. Even then, optimization into byte enums is not the best way to optimize the storage. We do gave some byte enums, but that's special cases that's transmitted binary over the network in high traffic paths. In a couple of edge cases I've even had to merge multiple enums into a single byte for serialization to save bandwidth and egress costs.
I've only used byte-extending enums once; to reduce the size of a struct, which might be created 100,000-200,000 times per second, every second. With byte-extending enums, the struct is exactly 16 bytes in size, which also aligns with its "natural" packing size. And even then, I doubt using int-extending enums would actually result in any actual performance degradation of note.
I've done this and I will probably do it again, but I'd never post it as a general advice, since it's much more likely to a headache while programming than it is to affect performance. For storing enums in a database, I'd use byte (tinyint) atleast 95/100 times though, since it'll be a primary key and I have some autistic traits.
The convention for naming variables is to use multiple characters, which may not be the most efficient option. Most classes only contain a few variables. Using names longer than a single character wastes space, as each character uses 16 bits. Single-character variable names allow for more efficient storage.
I think you are right. If I had to come up with an example of where enum as byte maybe could have value as an optimization, a game like Minecraft comes to mind. Imagine that almost all properties of each block could be expressed as byte-sized enums, then I guess it could be viable - for instance if you find that the game needs to swap out memory so often that it becomes a problem, if you could "magically" reduce the memory footprint by, say, half that might solve the issue. But again, this would be a very special situation, and not something for everyone (or anyone) to do by default.
When I was told about this video my instinctual answer was "with such cheap storage available, why waste your time" and my second reaction was "database normalisation is better than worrying about enum types". Maybe that reaction is 'cos I'm a database programmer at heart. Hear, hear on identifying what is a failure in understanding, not a tip.
It’s actually slower in the runtime to do that because the processor has to take the value and align it to the bitness of the processor so it is slower and gets expanded on the processor to the full bitness which uses the same memory anyhow.
The general advice is something along the lines of "Prefer what is natural, not what is the smallest." With enums, you have to remember that they are just named integers ‒ by default there are 4 millions other valid values in addition to the 10 you enumerate, but why bother getting rid of them? If there is a *natural* reason to use a different underlying type, sure, but in that case all types are equivalent; it does not matter that one is the smallest. Sure, if you need to match a particular binary format, in a file, communications protocol or similar, then yes, a byte could indeed be natural, but not because it is the smallest, but because that is what the value actually is! Another take: why stop at byte? Make it tightly bit-packed! And if there is not a power of two number of values, use fractional bits (yes, that is "possible" too)!
I do this in very rare cases for gamedev. I'm also sure that you're right you could find greater savings than that in my code... You didn't even have to get in to how absolutely zero bits are saved unless the byte enums are used along with other small types inside another type. Otherwise they will be word aligned anyway and the 'saved' bits will be unused.
Nick, in this case, I would have to side with the idea of forcing your enums to inherit from a type that only supports the precision needed. I think this author’s suggestion is made after he spoke to someone smarter than himself, who didnt explain WHY you MIGHT want to do this. The size of databases matter, not so much because of their storage requirements, but when it comes to indexes and searching that data; the more binary data you have to search, the longer it takes. The larger the backing field of an index is, the larger the DB index and the more processinng it takes to create and maintain that DB indexes. So, from a DB point of view, this is good. From a C# point of view, the size is a negligible concern. System.Enum is a class, a reference type, but enumeration values are stored on the stack. Depending on how it is used, and where it is declared, it could be stored on the heap or the stack, which means incurring the potential cost of allocation and garbage collection, but this is no different than most all other variable declaration of types in C#. Another C# concern is byte types of enums are generally used for enums that are flags (FlagAttribute), so it could confuse junior devs, unless comments are there to explain. So, overall, I would say, yes, do mitigate data storage requirements for your database, especially for the sake of your SQL Server databases, but as Nick references, there are databases that have built in optimizations for Enums. In both cases, it’s not going to make a very big impact on your application, especially since most modern applications can scale out. In general, pick types, for both data bases and programming code, that support only the scale and precision you need; do NOT create decimal values for integers, and DO use unsigned integers (note: not supported by the CTS I think) for non-negative integers, etc. But, dont rewrite an existing codebase for trivial optimizations; simply learn and write better code as you move forward in your coding career. This suggested optimization is valid, but I would probably put in the file section under “neat and good to know”, but not gonna save your poorly architectured systems. In C# 1.0b, I was doing this, but not for the sake of my DB, but for the sake of casting to and from those enums and base types. Back then, we didnt have all the Enum static methods we do today.
It's useful for indexes, where size indeed matters a lot. I would not disregard this advice in all circumstances. Now, doing it by default, probably not necessary.
I used to port mobile games in 2005 when we had 2 Mb of RAM. We had only one class to avoid wasting class definition bytes, everything was so optimized. Glad we now have reach petaflops with H100 GPU!
I do use this, but only when the value maps to DB column that is a tiny int. It becomes a sanity check that helps avoid declaring a value that can't be stored in the database column. ( I also think I used it once for binary serialization, but that is very uncommon these days since most things serialize to JSON or XML). If the value only ever exists in application memory, I am with you, just let it be regular int.
I also use typed enums, but these are my scenarios where I use them - sending/receiving it as raw bytes (which also requires care around endianness), - for interop where I match the native size (if I can be bothered to even deal with the issues caused by using the wrong type) - changing it to long/ulong for a bitfield - and very occasionally I change it to unsigned if I'm expecting to do calculations on it
I mean... the only place we'd potentially think of doing this - is in our online game which is tick based in a lockstep model: because we try to keep data usage as low as possible. And probably we wouldn't even go as far, because there's more gains elsewhere to be made - but it also doesn't do any harm... I guess? So we'd probably do that at some point, but we haven't done much optimizations yet to begin with and are currently at about 6 mb's per hour per player, which is pretty good already! But if this is the kind of optimization you need, sure. All the bits help... but 99.9% of all use cases this is not necessary. But I would also say it's not a bad thing to do, if you're sure it's the right data structure. But you shouldn't do it for the performance reason, but because it's the correct data type.
Yea, I can see maybe doing it for something like that or maybe where you are trying to fit a lot of things in a single network packet. (Or even cram something into specialized headers) even then though I would probably just transform it when creating the packet and leave the enum be in code. Which is the same thing that should be done for sql as well.
10 месяцев назад+22
I've only ever used the underlying type change for P/Invoke specifically to avoid having to cast. I can't think of any other reason
@@7th_CAV_Trooper it does. Consider struct S1 { short; long; int; long; byte; long; byte; } vs struct S2 { long; long; long; int; short; byte; byte; }. sizeof(S1) is 7*8=56 bytes, sizeof(S2) is 4*8=32 bytes. Same with classes. All "big" fields are aligned so memory access takes 1 read cycle, all "small" fields fit into 8 bytes so access takes 1 read cycle again. With no alignment certainly S1 would also take somewhat around 32 bytes, but the unlucky "big" fields would require 2 read cycles. Byte fields are byte-aligned and it's fine.
Even though it doesn't offer us a significant performance improvement, I didn't really get why not to do that. Would that actually represent a performance decrease or something similar?
On your 64bit platform, your cpu won't perform faster at copying byte than copying an integer, the only reason to do it it's for space optimization which is only a real thing to look at when you're dealing with big applications that have a huge databas. Code wise it's just annoying for people that will have to use your enum because it will 99% of the time being converted to an integer because you don't need it it to be a byte for your development purpose, which is even worse than having an integer in the first place
Indeed performance can be affected negatively because of alignment issues. Typically objects/structs aren't compacted so that fields align with what the processor handles more efficiently, so in memory probably an enum will occupy 8 bytes, even if you change its base type to byte or short. If you force the compiler to pack the fields then performance can degrade substantially because fields may be split into two memory reads and writes. Even though the benefit in the database storage requirements comes it will penalize performance there too because of misalignment.
@@monomanbr Could you please provide me some concrete references where I can learn about that? I'm really not being able to comprehend why such a simple thing could ever represent the opposite of what it should
Correct me if I'm wrong, but isn't the memory "aligned" or something like that, though? Like, using a 1 byte structure wouldn't be beneficial for memory because three other bytes would be "skipped" since objects can only be "aligned" every 4 bytes... I'm sure someone knows the correct terms for these concepts so, apologies for the ignorance... I just remember seeing something like this while working on some lower level stuff
Not an expert either, but I don't think that happens in SQL, which is the point of the advice. But for starters, this is only relevant when using EFCore, and even then, you could just tell EF to use a byte column instead of forcing the enum to be a byte.
Exactly! I'm glad someone remembers that. Using byte enums without using 3 padded bytes is stupid. If you have 4 enums in same structure then yes, this byte conversion will save you 12 bytes (enums will take 4 bytes instead of 16).
@@mad_t Even more, you'd have to have 4 enums *and* have them sequentially defined in the structure *and* the first would have to be on a %4==0 boundary.
No, this is wrong. Alignment is not applicable to single bytes. Types are aligned on multiples of their size, and since bytes are 1 byte any address satisfied alignment.
@@MulleDK19 No this is not wrong. If you have, for example, a struct with an int field and an enum field you will spend 8 bytes per struct instance, regardless of the type of the enum.
The main problem with reactions in LinkedIn is there is no "dislike" button, you can only "react", so, the minimun reaction to any post will be positive for the algorithm
Not to mention that handling byte is actually slower than int. The fastest type obviously would be types whose size matches the bus width which is usually 64 bits now.
I totally agree with you and how the post is written (being kinda misleading). But what is the downside of just making most of the enums you're using a byte (only if you 100% know that the enum has less than 255 values obv.) ? I don't see any disadvantage for that or I am wrong here? It's more like it doesn't matter yea but then it's also not bad to do it right?
You might accidentally introduce padding in the memory alignment of your fields. If you have a class holding some enum fields, and you make some byte, some int, etc. it's probably going to end up aligning those byte fields anyway and wasting your 4 bytes of memory. Not to mention that most of the time you are still reserving all these 4 bytes in the register, so it's faster in some scenarios to give your register 4 bytes to begin with. On the other hand, if you want your structs to be highly compact in memory and absolutely optimized for 4-byte alignments, say you have a combination of two enum fields, and your only concern is comparing those combinations by interpreting the entire struct as a single 4-/8-byte value, then maybe. The consensus here is, it barely matters. If it does matter, prove it with benchmarks. If you prove it, make sure to make reasonable changes according to your domain constraints and always measure how this impacts your performance. Without measuring, nothing is 100% certain.
@@ajdinhusic2574 nope, I'm saying it can be either better or worse or the same. Introducing padding = more allocated memory per instance. Also, padding could hinder cache locality. Again, we're talking in the scope of nanoseconds and a few bytes, which you don't care about most of the time.
@@AlFasGD thanks for the clarification! I didn’t get the worse part from your answer initially, because my thought process was, well if it pads to be 32 bits again, then its the same as the Int32 bit size. So can be ‘better’ but not worse. But thanks for mentioning it can in fact be slower than int/ more memory, I did not know that.
I've always made my enums inherit short... Never really thought further about it. Some senior guy told me to do that 10 years ago and I never really thought through it... Yeah Carlos, you were the one telling it 🙂 Anyway, I don't see why it would be bad to do it like that.
One thing you didn’t mention is that enums when members of a class or struct, are usually 32-bit aligned on a 32 or 64-bit architecture. The data bus is wide, and when reading a byte, a whole word is transferred. Even if just a byte were written or read, it wouldn't happen in any fewer clock cycles than for a word. So in actuality there is neither a space nor a speed benefit to using byte over int. Except maybe for an array of enum, but I’ve never seen a use case for that, and don’t feel compelled to optimize for that.
The enum doesn't need to be a byte for the database to use tinyint anyway. Just like one would most likely not let EF use longtext for every single string stored in db, the database column types should be handled in the DbContext configuration/Entity configuration/Entity attributes.
Well I was writing a long comment and autoplay just discarted it... TL;DR - We have ECG Holter module with 15years of legacy C#. Where with 12days of 12lead ECG you get like 5GB of data (2byte samples, 500Hz) in 32bit process. We have tens or hundreds components to render the data in different ways. We do stream that data from disk and even then we are still hitting 2GB RAM limit of 32bit process way before 7days of 12lead ECG. So yeah maybe here byte enums might help if they are used for some parts of those samples a big way. So... Yeah... They would probably help us. And you lose nothing by using them. I don't understand your anger here. There just are some use cases for them. And rarely there are use cases when 255 values is not enought. So I don't see any wrong doing by using them.
Ok, I do use byte in my enums, but not from an optimization perspective, but because I'm lazy. I don't even remember when or why I started using byte, probably from a requirement from an old tech lead in a decade or more years old project. It doesn't even affect the performance of the applications or the databases, but now I have muscular memory when creating enums. Sorry.
"Just watched Nick's latest Code Cop episode, and as always, it's a goldmine of practical advice! 🌟 His take on the 'Enums as Bytes' craze really put things into perspective. It's fascinating to see how a seemingly minor optimization, like treating enums as bytes, can be dissected to reveal deeper implications on code maintainability and performance. Nick does a great job explaining why context is key and how what works in one scenario (like optimizing for database storage) might not be a silver bullet for every application. It's these nuanced discussions that make software engineering so intriguing! Thanks for another thought-provoking video, Nick. Keep demystifying those LinkedIn tips! 💻🔍 #CodeCop" _This is definitely not ChatGPT speaking - where did you get that from!?_
I think it still missed a key point. unless you are packing your bytes into one register etc. the OS still allocates higher amount based on if it's 64-bit or 32-bit runtime. I'd like to see say 10000000 allocations or something and follow the size difference between byte and say long if you are running 64-bit. My hypothesis is you wouldn't see a difference at all.
I do a fair amount of typing enums as bytes - I'd go so far as to call it "fairly common" in my code. HOWEVER, it's typically used when doing things like defining device registers when I need to serialize or deserialize it from/to a bus where the size matters due to data alignment, not for any sort of savings. So defining an enum as a byte makes good sense in specialized cases, but if you're writing those cases, you already know why and wouldn't call it a "tip" but a necessity for those narrow cases. The advice, as given, is just dumb.
I've only used a non standard backing type for an enum a few times, but every single time it's been UInt64, not (S)Byte or (U)Short. Why? Because I had more than 32 values and it was being used as Flags, so I had more than 2^31 theoretical actual values. You might be thinking "But in what world do you ever need that many flags?!" Letters and numbers. I needed to store all of the letters and numbers that were present in a string as a precalculated value that could be checked for maximum similarity between strings in O(1) time using some bit ops. Made the program about 7 or 8 orders of magnitude faster, and all it cost was O(n) additional space complexity.
I just set them to match the database type. If it's a 'tinyint', even when it makes 0 sense, then the one in code is a byte. if it's a 'bigint' for a classic non-flag enumeration, then it's a long, even though there is no way I'd ever get anywhere close to exceeding 2 billion. Unless you're dealing with many thousands of usages and objects, this level of optimisation will save you less than the runtime uses to even represent your enum in the application (with interned strings and everything), and that's assuming you're paying attention to alignment (typically 4 or 8 bytes) or abusing [StructLayout] enough to actually even benefit from such optimisations.
I'm guilty of this myself. I set up once default conversion for enum to tinint in EF, thinking it was small but nice optimization. I've forgotten that one of the enumerators had negative values and this micro optimization cost me about 2 hours of debugging trying to figure out why some value is 252 (or smth like that) out of nowhere 😅 Angry at myself I reverted it back to integers
I have inherited code where people have done this. Just as you say, there are hundreds of places in the same repository which could be optimized to save more space or lead to better performance.
in very rare cases, when implementing some pre-defined APIs, I had to use a different underlying enum type, but it was `uint`. I have never ever (consciously) used anything smaller than `int`. Also, while it technically does save you 3 bytes per value, I *think* it can harm the performance, because (most) modern CPUs work better with 4 or 8 byte values than with single bytes.
Nick - I think you need to wear a blood pressure monitor and show us the before/during reading when you do the Code Cop videos 😂 I remember - a long time ago now - the only place where you could grow your understanding was the language specification, the quarterly MSDN, the library vendor, your more experienced peers (if you had them) or a book or course from a highly regarded SME (if such a thing existed). If all that failed - do the time - experiment and figure it out - the ‘science’ bit of computer science. The internet is sometimes a great place for fact - and equally a dumping ground for positively re-enforced nonsense. It’s really unfortunate that cut/copy/paste coding became prevalent - but remember - our industry enabled that! With that often comes the lack of desire for many to want to understand the how and why. If something you find solves your problem PLEASE take the time to appreciate why or if it even really does. What exciting times we live in…maybe its time to go re-watch Mike Judge’s 2006 film Idiocracy… LLM vendors...you are 100% not blanket using these sites for your training models - right ?
I personally would only do this on the very rare occasion that I would need the enum value to be a byte, either because the message required it or the db, and that would be only to avoid casting and assure type safety. But I've been developing for 30 years and don't even need one hand to tell you how many times I've run into that situation.
Okay i get it and "almost" agree with you, that this advice is for the most part total gargabe.... but what if i say there is a legit way to change the type of an enum? Yes there are: If i want to layout data in unions, that contains different data based on a an "enum" type - but wanting the union to have the same length, say. 16 or 32 KB. In that situation, the size of the enum matters a lot due to data-alignment and cache coherence. If i just use default enum (32-bit integer), then i am wasting 24 bit because i most likely dont have enums with more than 10 values. In that case using a byte is totally legit and then using another byte and a short after, to layout the data efficiently can make a difference in performance and even in stability for multi-threading use-cases, due to false-sharing or in-between cache-line issues. Also there are windows api functions, that uses shorts or bytes for enum values as well, so in that case you better use a fixed defined enum as well, so that it patches in your definition.
Hello Nick. Great content. Just a minor stupid thing: I have done some courses on SQL Server query tuning, and as a principle it is actually advisable to use the most compact possible representation for your data. The reason is not that it will take less disk space. But usually, when you do have to do a query, the engine will load pages of data, and simply put, the bigger the size of a record, less records you will be able to retrieve in a single page, meaning, more IO. And you generally try to reduce that logical IO as much as possible Now, having said that, you're completely right. We're talking about 3 bytes for each enum. Unless your table is basically full of those enum flags, we're talking about peanuts. I bet there will be better optimizations
Yeah; at 1 Million rows, the 3 bytes difference is going to save a "whopping" 2.86 MB. The principle of relational databases was indeed always about using the tiniest possible footprint. Like in 1970 when 100Mb would go for $26,000. When relational databases where invented we were concerned about storage. That goes for both physical and memory. Therefore using the smallest types and remove duplication was KING. We are not there anymore. Today, most of the sql servers users out there just create a database with the default settings. A VERY, VERY small subset is actually able to and benefit from changing the page sizes and picking data types so that data lines up and reads from storage are going to line up with your systems data bus size and your L1,2 and 3 caches, for a perfect, zero waste, read. With all the other factors of not knowing on what hardware and storage your files end up on in the cloud... looking at the byte level for some numbers is the worst place to look at. In light of the advice you can imagine someone might go and create TINYINT status columns right beside a NVARCHAR(500) status message column, or some other NVARCHAR columns elsewhere that are "oversized" just in case? Not many know this either; but is not number of characters but number of byte pairs. An NVARCHAR(200) is a 400 byte allocation.
@@paulkoopmans4620 Absolutely. I have worked with DBAs on this kind of optimization, and I have seen many talks from really good people whose focus is just that, to optimize DB. And while the principle generally stands, you always have to ask yourself how big is the effort for that miserable gains. There are usually bigger issues to tackle. That being said, I have also seen some tables with 50 flags like these.
This one isn't so terrible compared to the other ones, imo. You're never going to reach anywhere near 256 different enum values for many uses. However, I think you don't save RAM in all cases due to memory alignment stuff that I admittedly know little about.
Hi Nick, I keep watching your videos as I can learn useful things from them. I agree, that the majority of the time you should be fine with 32-bit storage for an enum and only a few special cases need to limit it to say 8 bits. However, I think you got a bit too emotional on this one for no obvious reason. I think it's more professional if you keep it cool. Thank you and keep the good content flowing!
I do it rather the other way: when going to the database, the enum values become strings (not the internal C# name, but an explicit, well-defined one, e.g. via attribute). I consider the numeric value of an enum a implementation detail.
I've done this, but in that case my code was interacting with a microcontroller that has only 512KBs of memory, and the C# code serialized a pretty big data structure, that contained a lot of enums and stuff. But it's hard to imagine any other kind of situation except for embedded stuff in 2024.
On one hand this advice *could be good in some situations*, but in a lot of places fields in a struct or class are going to be aligned on a 4-byte boundary, so you don't end up saving any space. Using a TINYINT or BYTE instead of an INT in your database could be a win over large numbers of rows, but there's often a far bigger gain to be had by limiting your strings from a default VARCHAR(MAX) to something more reasonable (which many people don't bother to do, because EF just makes all strings VARCHAR(MAX) by default). Like many "optimization" opportunities you really need to profile and measure to make sure you are actually saving something, and I imagine a lot of the people sharing this "advice" haven't profiled or measured anything. You cannot optimize by assumption alone
I have some WinAPI calls (native stuff), where i have to marshal some c byte constants, which i converted in a c# byte enums. And here, the size is important, cause otherwise memory structure is not matching anymore. But of course, this is a special edge case.
On 32/64 bit processors 8 bit vale takes 32 bit register anyway (if i remember correctly). So it will give you none performance boost. If you have very large number of records that can save some memory.
afaik one should always use the native data type for processing (enums) to get best performance. if one really want to one can convert back and forth to get an optimal memory footprint when storing or serializing, but don't do that in memory. it hurts performance.
I don't quite know if it is true. But a smart person told me once that computers are heavily optimized for ints, so using smaller types could be seen as an anti pattern.
I don't think most people know that values in memory have to be aligned, and registers in cpu have specific sizes. Byte doesn't actually save any memory, and in some cases might actually be slower. Also, I store the string values of my enums in db!! lol. Space is cheap. Your time is not. If you need that level of optimization C# is not your tool.
It might be just me using a different/wrong practice, but when i use Enums, i usually even use negative numbers to show "Bad" states/values. This makes handling and reading the enum a bit easier for me at least, but would be impossible or at least much more confusing to do with a 0-255 range.
Converting all of your giant enums to tiny sized enums doesn't even give a guarantee it will occupy exactly this desired TINYINT in the process memory :) Modern CPU doesn't give a shit about LinkedIn advice, it is designed to be super optimal dealing with machine words as "default" data type. It is more likely that you will introduce few more redundant machine instructions like movzx/movsx in the emitted code by enforcing your enum storage type to the System.Byte type. So, good luck to all followers 😅
"Your use of a byte as the underlying type for an enum is a sophisticated and elegant approach to solving your desire to feel cool like those hard core bit-banging low level programmers you have an inordinate and unexplainable admiration for, even though they are slightly more muskier than you. Please like me, I'm your best AI assistant, friend, soulmate."
Using byte or short for enums may harm performance, because CPU will may need to ajust it to int32 and back to byte every time. Sometimes I used enum based on long with [Flag] attribute, when you need more than 32 flags But never byte or short. CPUs works with int32.
I was on linkedin yesterday and saw this advice, I was telling myself is it gonna go on code cop or not, and now just opened youtube, first recommended video. Lol
I will assume you have never worked with IoT devices or, in xamarin, creating a large grid. I have done both and if we start with memory-constrained devices, this could be helpful, but, before you optimize profile and see where you may need to make improvements. If I just need an enum in a couple places probably not worth it, but if I am transmitting data from an edge sensor to a controller it may be useful. In a game, I may use an enum to specify which type of terrain in a cell and I may have millions of these, so saving a few bytes could be helpful, esp when I want to save the map, to reduce the size of the file. You shouldn't optimize prematurely but there are times when something like this may be useful and just shooting down the idea without considering when it might be useful is just bad form, IMO.
You probably shouldn't optimize to that level. But when your DB type is defined as tinyint (byte) I would better follow the same memory layout in code, to avoid any marshaling problems.
I once made the terrible mostake to store the weight of a person in a byte in the database, and then some wanted the weight in lb, never trying to save a few bytes again
LOL, I wasn't prepared to hear Nick's "yEaH THaTs A GoOD AdVIcE HEeHeeheEhe". He's normally so eloquently spoken, I did a double take hearing him speak that way. Good video
Enums have always been a curse that is iredeemable that requires a major breaking change to happen. The fact that people have to write source generators to improve performance and memory use is already a sign that it is.
There is a compiler code analysis warning for this situation that, if enabled, will trigger if you set an enum to anything other than int.
"CA1028: Enum storage should be Int32"
This warning is not enabled by default, but I imagine there is some reason it exists. Microsoft says this about it:
"Even though you can change this underlying type, it is not necessary or recommended for most scenarios. No significant performance gain is achieved by using a data type that is smaller than Int32."
Right, when you’re dealing with a billion + rows, these small optimizations are critical.
But as it said in the first line of the content, it depends on your usage.
Honestly that a very anal and stupid warning
There are cases where you need specific integers on the top of my head is
In Protocols
When the padding matters
Actually, I had a couple of times specified a non-default base type for enums. But in all those cases the actual data was transferred as a binary stream via a serial connection to an embedded device, so I had to carefully follow the protocol.
Agreed! That is probably the only viable reason though. For 1 Million records in a table this supposedly "optimization" saves 2.86MB. On top of that it might also do more harm then good as the IO could be negatively impacted.
The "view" of the OP is also for no flag enums. The default of an int is a good default for flag enums as anything lower is only giving you 8 or 16 flags. The win32 API is full of int and long flag enums.
I've done some PLC stuff in the past too and the PLC was even putting bit flags together in a 16 or 32 bit registers, and the only time I have had to step away from the default.
@paulkoopmans4620 for 1-2M records an hour, that's actually about 3 GB every month.
Context indeed matters. I used a byte enum yesterday to reduce the packet size of a serial messaging protocol.
I swear with every Code Cop video Nick becomes more and more insane 😂
It is a micro optimization for most databases. But you can have tables with tens or hundreds millions of records where some of the columns are enum values. It can make indexes perform better by being smaller. This tip might not be very relevant to many applications but it is not wrong.
Totally agree.
Care to explain how a btree with byte values performs better than int values?
@@7th_CAV_Troopernumber of page reads (from disk) is the magical counter.
Its wrong because while you definitly can use tiny in the DB, there is no measurable cost in casting that to and from an int in the actual code.
And as others and me have mentioned in other comments, using a byte enum might actually cause performance degradation.
So keep that optimization in the DB.
most sane databases use data alignment, and alignment is not byte, at least int
processor dsnt know what a byte is, but processor takes int into register without any problems and overheads
and in sane databases no enums exist, they are not needed
We cast enums to bytes at our work, because the database is very old (25+ years), and we have some foreign keys to constant types, where the foreign key were of type [tiny] and now that we integrated EF core, we instead of making a join on the table that stores the value name, we just use enums of type [byte] - this just makes the EF to SQL relation easier.
I don't think it neither a good or bad advice, it is just a very situational thing. which you normally wouldn't have to worry about.
I used to work on a transactionnal database that was dealing primary keys, enums with bigints, bigints everywhere! I can tell you that optimizing storage can lead to significant performances as well lowering storage for back ups and data warehouses. As for enums, I always use default integers, but if we do have millions of millions of records, i may consider using other types depending on the range of values or reconsider using a different design approach (using discriminators or not i.e table per types etc). Storage cost are very low nowadays anyways.
Just to be clear, 100% with you :). There are times of course where you should think of the types in the db. A concrete example from my previous job where we went from storing guids in strings (BigQuery didn’t have a native format for guids at the time) to its byte representation saved us a lot of storage space, and since we ingested about 10TB of json every day that optimization actually saves us quite a lot of money.
I think most people struggle to understand that optimization or "savings" have a factor/scale to them, and you can look at how effective/valuable a certain optimization or trick is by examining how it's going to _scale out_ in the context some real or hypothetical project. A "one-off" optimization or memory saving isn't helping you much, at all, unless it's a VERY big, single thing we are talking about, like maybe optimizing install/storage size of a game or app by eliminating the need for some huge chunk of data/content. Changing the underlying value type of an enum to `byte` isn't doing much for you in and of itself. It *can* in some very specific situations, like I sometimes do this in real-time 3D/game code when I'm dealing with really big data buffers or I have to have some very specific structure layout or byte alignment. But just universally making all your enums into bytes is more likely to slightly degrade performance than enhance it, as you can be causing some misaligned byte boundaries or reducing the compiler's ability to help you, not to mention you can create some future technical debt for yourself in some situations ...
Optimize is a very broad topic. Using byte as the underlying type will, of course, save space, but is it worth it? A 32-bit (or 64-bit) processor is much more speed efficient to access 32 (or 64) bits of data at a time. The data, however, needs to be 4 (or 8) byte aligned. For a modern processor to read a byte, it needs to read 32/64 bits of data and strip the unnecessary bits. If the data is not aligned, the process is even slower because shifting happens. We don't see it in a high level language, but that happens at assembly level/machine code.
The only time I use byte as the underlying type for enums is when I have data interchange happening, when a byte is part of a data structure sent by an embedded device.
Something like that.
There's nothing wrong with saving 3 bytes on a column in SQL Server. Odds are you have more problems than this, but that doesn't inherently make it a problem to seek out the savings. As far as how that should translate to the C# representations, I think there's room to debate.
Remember that each field isn't stored just once. It's also stored in each index. It also consumes server memory. The less you use to do the same work, the more simultaneous clients you can support. The benefit from small changes like this isn't massive, but they add up for larger datasets.
Agree with Nick about prioritizing your performance scrutiny.
For one reason or another I had to write a parser/serializer for some obscure format (existing one could not handle memory requirements we had - would hog up ALL THE MEMORY on a machine).
The sample code required maintaining ordering of a list when inserting new stuff (so some stupid search through the list and inserting at proper index)... which in my case ended up being an absolute performance killer. After discovering that through profiling and replacing list with sorted list (or something like that), performance improved about 100x in relevant cases.
In other words, do a PoC, profile it and look for things that eat up most of your performance. This way it will be less labour intensive and your time is more valuable than adding extra disk/ram to that server that is supposed to run the damn thing
Something I haven't seen mentioned is message structs. For one of my work projects we had a messaging system that was marshalling message structs and sending them over legacy hardware with very low bandwidth. In some structs, the fields were marshalled as int32 so when message frequency was high it would start slowing down. These fields were used in combination to represent some state. We found a way to optimize and drastically reduce the size by representing each state as a single byte under one enum and use bitwise operations to combine the states together using bit masking.
This enum byte thing exists like 15 years.
The fact that people react to this like it's something new makes me think on the quality of any software these devs create every day!
We have a few Enums in our game code that are set to bytes and shorts but that's specifically because we do have hundreds-of-thousands to millions of them in contiguous arrays for the game's data so we do gain a fair bit of performance doing this. Especially from a cache point of view.
Do you combine eight booleans into one?
@@Sindrijowe don't really have a lot of places where we would need to do that. Except maybe one instance where we pack groups of six bools into single bytes - wasting 2 bits yes but it works out in that situation.
Performance? Probably not. You rather pack more information into your bytes, but packing and unpacking is usually less performent then not doing it. We were doing it in multiplayer games to reduce needed bandwidth. But it was also rather unusual practice. You can find same principle in c++ code in unreal engine a lot.
@@Revin2402 we have benchmarked it (as one should) and it actually does improve performance in our case. I'm guessing it is due to reduced cache misses.
@@Sindrijo Those are called "flags". And yes we did those things back in the day ... kind of 30 yrs ago. Fast-forward to today, I can think of VERY few cases where doing this type of thing could be called 'reasonable' and all those cases imply using tiny devices vith very little memory and storage capacity. It has NO sense to do that in ANY other case. As Nick said, not even in those 'multi-billion-row' tables. Because you can surely bet you're having WAY worse problems than saving 3 bytes (or 6, or 9) on each row, starting with the very fact that you have a SINGLE table with gazillions of rows stored on it.
03:32 "Why does it bother me so much" You've no idea how much I relate to this everyday :D
Size column at database matter if column is part of index. If key has size 8 or 5byte is big difference because more keys can be contained in one 4KB db page.
To be honest I do it too, when I create an Enum to be saved in database I give byte type, because it just a keyword and conversion in dbcontext and nothing more, easy to implement. Yes most probably projects I worked on had missing better performance optimization but it does not matter, if I am aware of something while implementing, I will do it. It does not take a lot of effort.
What is it you think you're optimizing? Database size?
@@7th_CAV_Trooper yes, I work in a project where we get almost a hundred millions of messages daily and we have large databases for each env, we clear the data periodically but they are still big.
In general, it's good advice to follow the conventions of whatever programming language or environment you're using, even if on the surface the convention may seem counter-intuitive. The convention for always using the `int` type in C# is a prime example. It may seem intuitive to use the smallest integer type to hold range of values needed, but the wisdom of the crowd knows something that isn't obvious, namely modern processors are optimized process data values on 32-bit and 64-bit boundaries. So you may think your being smart and efficient, but in reality there's no benefit.
Sometimes there are valid exceptions to the common conventions in specific circumstances. A skilled develop knows when these exceptions are needed. A skilled developer also knows how to performance test code and search for specific and measurable optimizations when performance improvements are clearly needed.
We have a database (not SQL) where we store terrabytes of data. Even then, optimization into byte enums is not the best way to optimize the storage. We do gave some byte enums, but that's special cases that's transmitted binary over the network in high traffic paths. In a couple of edge cases I've even had to merge multiple enums into a single byte for serialization to save bandwidth and egress costs.
I've only used byte-extending enums once; to reduce the size of a struct, which might be created 100,000-200,000 times per second, every second. With byte-extending enums, the struct is exactly 16 bytes in size, which also aligns with its "natural" packing size.
And even then, I doubt using int-extending enums would actually result in any actual performance degradation of note.
I've done this and I will probably do it again, but I'd never post it as a general advice, since it's much more likely to a headache while programming than it is to affect performance. For storing enums in a database, I'd use byte (tinyint) atleast 95/100 times though, since it'll be a primary key and I have some autistic traits.
The convention for naming variables is to use multiple characters, which may not be the most efficient option. Most classes only contain a few variables. Using names longer than a single character wastes space, as each character uses 16 bits. Single-character variable names allow for more efficient storage.
I think you are right.
If I had to come up with an example of where enum as byte maybe could have value as an optimization, a game like Minecraft comes to mind. Imagine that almost all properties of each block could be expressed as byte-sized enums, then I guess it could be viable - for instance if you find that the game needs to swap out memory so often that it becomes a problem, if you could "magically" reduce the memory footprint by, say, half that might solve the issue.
But again, this would be a very special situation, and not something for everyone (or anyone) to do by default.
When I was told about this video my instinctual answer was "with such cheap storage available, why waste your time" and my second reaction was "database normalisation is better than worrying about enum types". Maybe that reaction is 'cos I'm a database programmer at heart. Hear, hear on identifying what is a failure in understanding, not a tip.
It’s actually slower in the runtime to do that because the processor has to take the value and align it to the bitness of the processor so it is slower and gets expanded on the processor to the full bitness which uses the same memory anyhow.
The general advice is something along the lines of "Prefer what is natural, not what is the smallest." With enums, you have to remember that they are just named integers ‒ by default there are 4 millions other valid values in addition to the 10 you enumerate, but why bother getting rid of them? If there is a *natural* reason to use a different underlying type, sure, but in that case all types are equivalent; it does not matter that one is the smallest. Sure, if you need to match a particular binary format, in a file, communications protocol or similar, then yes, a byte could indeed be natural, but not because it is the smallest, but because that is what the value actually is!
Another take: why stop at byte? Make it tightly bit-packed! And if there is not a power of two number of values, use fractional bits (yes, that is "possible" too)!
I do this in very rare cases for gamedev. I'm also sure that you're right you could find greater savings than that in my code...
You didn't even have to get in to how absolutely zero bits are saved unless the byte enums are used along with other small types inside another type. Otherwise they will be word aligned anyway and the 'saved' bits will be unused.
Nick, in this case, I would have to side with the idea of forcing your enums to inherit from a type that only supports the precision needed. I think this author’s suggestion is made after he spoke to someone smarter than himself, who didnt explain WHY you MIGHT want to do this. The size of databases matter, not so much because of their storage requirements, but when it comes to indexes and searching that data; the more binary data you have to search, the longer it takes. The larger the backing field of an index is, the larger the DB index and the more processinng it takes to create and maintain that DB indexes. So, from a DB point of view, this is good.
From a C# point of view, the size is a negligible concern. System.Enum is a class, a reference type, but enumeration values are stored on the stack. Depending on how it is used, and where it is declared, it could be stored on the heap or the stack, which means incurring the potential cost of allocation and garbage collection, but this is no different than most all other variable declaration of types in C#.
Another C# concern is byte types of enums are generally used for enums that are flags (FlagAttribute), so it could confuse junior devs, unless comments are there to explain.
So, overall, I would say, yes, do mitigate data storage requirements for your database, especially for the sake of your SQL Server databases, but as Nick references, there are databases that have built in optimizations for Enums. In both cases, it’s not going to make a very big impact on your application, especially since most modern applications can scale out.
In general, pick types, for both data bases and programming code, that support only the scale and precision you need; do NOT create decimal values for integers, and DO use unsigned integers (note: not supported by the CTS I think) for non-negative integers, etc. But, dont rewrite an existing codebase for trivial optimizations; simply learn and write better code as you move forward in your coding career.
This suggested optimization is valid, but I would probably put in the file section under “neat and good to know”, but not gonna save your poorly architectured systems.
In C# 1.0b, I was doing this, but not for the sake of my DB, but for the sake of casting to and from those enums and base types. Back then, we didnt have all the Enum static methods we do today.
It's useful for indexes, where size indeed matters a lot. I would not disregard this advice in all circumstances. Now, doing it by default, probably not necessary.
I used to port mobile games in 2005 when we had 2 Mb of RAM. We had only one class to avoid wasting class definition bytes, everything was so optimized. Glad we now have reach petaflops with H100 GPU!
I do use this, but only when the value maps to DB column that is a tiny int. It becomes a sanity check that helps avoid declaring a value that can't be stored in the database column. ( I also think I used it once for binary serialization, but that is very uncommon these days since most things serialize to JSON or XML). If the value only ever exists in application memory, I am with you, just let it be regular int.
I also use typed enums, but these are my scenarios where I use them
- sending/receiving it as raw bytes (which also requires care around endianness),
- for interop where I match the native size (if I can be bothered to even deal with the issues caused by using the wrong type)
- changing it to long/ulong for a bitfield
- and very occasionally I change it to unsigned if I'm expecting to do calculations on it
I mean... the only place we'd potentially think of doing this - is in our online game which is tick based in a lockstep model: because we try to keep data usage as low as possible. And probably we wouldn't even go as far, because there's more gains elsewhere to be made - but it also doesn't do any harm... I guess? So we'd probably do that at some point, but we haven't done much optimizations yet to begin with and are currently at about 6 mb's per hour per player, which is pretty good already! But if this is the kind of optimization you need, sure. All the bits help... but 99.9% of all use cases this is not necessary.
But I would also say it's not a bad thing to do, if you're sure it's the right data structure. But you shouldn't do it for the performance reason, but because it's the correct data type.
Yea, I can see maybe doing it for something like that or maybe where you are trying to fit a lot of things in a single network packet. (Or even cram something into specialized headers) even then though I would probably just transform it when creating the packet and leave the enum be in code. Which is the same thing that should be done for sql as well.
I've only ever used the underlying type change for P/Invoke specifically to avoid having to cast. I can't think of any other reason
Another reason is memory optimization, but I believe most of C# devs never face with such necessity. When you really need it - you know it
Unless you're compiling byte-aligned, which is a performance killer, it does not save memory.
And comms packets, with fieldOffset and Marshall attributes, or masks, anything that gets close to hardware, and unmanaged code.
@@nickbarton3191 Yeah, close to hardware is the only real reasons I can think of
@@7th_CAV_Trooper it does. Consider struct S1 { short; long; int; long; byte; long; byte; } vs struct S2 { long; long; long; int; short; byte; byte; }.
sizeof(S1) is 7*8=56 bytes, sizeof(S2) is 4*8=32 bytes. Same with classes. All "big" fields are aligned so memory access takes 1 read cycle, all "small" fields fit into 8 bytes so access takes 1 read cycle again. With no alignment certainly S1 would also take somewhat around 32 bytes, but the unlucky "big" fields would require 2 read cycles. Byte fields are byte-aligned and it's fine.
The funny, is that Enums have extra things that means there is probably a runtime cost, that even if you used byte, it wouldn't go away at all
Even though it doesn't offer us a significant performance improvement, I didn't really get why not to do that. Would that actually represent a performance decrease or something similar?
On your 64bit platform, your cpu won't perform faster at copying byte than copying an integer, the only reason to do it it's for space optimization which is only a real thing to look at when you're dealing with big applications that have a huge databas.
Code wise it's just annoying for people that will have to use your enum because it will 99% of the time being converted to an integer because you don't need it it to be a byte for your development purpose, which is even worse than having an integer in the first place
Indeed performance can be affected negatively because of alignment issues. Typically objects/structs aren't compacted so that fields align with what the processor handles more efficiently, so in memory probably an enum will occupy 8 bytes, even if you change its base type to byte or short. If you force the compiler to pack the fields then performance can degrade substantially because fields may be split into two memory reads and writes. Even though the benefit in the database storage requirements comes it will penalize performance there too because of misalignment.
@@monomanbr Could you please provide me some concrete references where I can learn about that? I'm really not being able to comprehend why such a simple thing could ever represent the opposite of what it should
Up, changing enums to shorts have no sense, but I do not understand your anger, at least this tip does not have negative impact.
Correct me if I'm wrong, but isn't the memory "aligned" or something like that, though? Like, using a 1 byte structure wouldn't be beneficial for memory because three other bytes would be "skipped" since objects can only be "aligned" every 4 bytes... I'm sure someone knows the correct terms for these concepts so, apologies for the ignorance... I just remember seeing something like this while working on some lower level stuff
Not an expert either, but I don't think that happens in SQL, which is the point of the advice. But for starters, this is only relevant when using EFCore, and even then, you could just tell EF to use a byte column instead of forcing the enum to be a byte.
Exactly! I'm glad someone remembers that.
Using byte enums without using 3 padded bytes is stupid.
If you have 4 enums in same structure then yes, this byte conversion will save you 12 bytes (enums will take 4 bytes instead of 16).
@@mad_t Even more, you'd have to have 4 enums *and* have them sequentially defined in the structure *and* the first would have to be on a %4==0 boundary.
No, this is wrong. Alignment is not applicable to single bytes. Types are aligned on multiples of their size, and since bytes are 1 byte any address satisfied alignment.
@@MulleDK19 No this is not wrong. If you have, for example, a struct with an int field and an enum field you will spend 8 bytes per struct instance, regardless of the type of the enum.
The main problem with reactions in LinkedIn is there is no "dislike" button, you can only "react", so, the minimun reaction to any post will be positive for the algorithm
Not to mention that handling byte is actually slower than int. The fastest type obviously would be types whose size matches the bus width which is usually 64 bits now.
I totally agree with you and how the post is written (being kinda misleading).
But what is the downside of just making most of the enums you're using a byte (only if you 100% know that the enum has less than 255 values obv.) ? I don't see any disadvantage for that or I am wrong here? It's more like it doesn't matter yea but then it's also not bad to do it right?
You might accidentally introduce padding in the memory alignment of your fields. If you have a class holding some enum fields, and you make some byte, some int, etc. it's probably going to end up aligning those byte fields anyway and wasting your 4 bytes of memory. Not to mention that most of the time you are still reserving all these 4 bytes in the register, so it's faster in some scenarios to give your register 4 bytes to begin with.
On the other hand, if you want your structs to be highly compact in memory and absolutely optimized for 4-byte alignments, say you have a combination of two enum fields, and your only concern is comparing those combinations by interpreting the entire struct as a single 4-/8-byte value, then maybe.
The consensus here is, it barely matters. If it does matter, prove it with benchmarks. If you prove it, make sure to make reasonable changes according to your domain constraints and always measure how this impacts your performance. Without measuring, nothing is 100% certain.
@@AlFasGD appreciate the technical explanation a lot 👍
So you're saying it can be better, but not worse?@@AlFasGD
@@ajdinhusic2574 nope, I'm saying it can be either better or worse or the same. Introducing padding = more allocated memory per instance. Also, padding could hinder cache locality. Again, we're talking in the scope of nanoseconds and a few bytes, which you don't care about most of the time.
@@AlFasGD thanks for the clarification!
I didn’t get the worse part from your answer initially, because my thought process was, well if it pads to be 32 bits again, then its the same as the Int32 bit size. So can be ‘better’ but not worse.
But thanks for mentioning it can in fact be slower than int/ more memory, I did not know that.
I've always made my enums inherit short...
Never really thought further about it. Some senior guy told me to do that 10 years ago and I never really thought through it... Yeah Carlos, you were the one telling it 🙂
Anyway, I don't see why it would be bad to do it like that.
One thing you didn’t mention is that enums when members of a class or struct, are usually 32-bit aligned on a 32 or 64-bit architecture. The data bus is wide, and when reading a byte, a whole word is transferred. Even if just a byte were written or read, it wouldn't happen in any fewer clock cycles than for a word. So in actuality there is neither a space nor a speed benefit to using byte over int. Except maybe for an array of enum, but I’ve never seen a use case for that, and don’t feel compelled to optimize for that.
I Convert All my Enum To Byte Last Week !!!!😀
I love your Code Cop videos Nick. Even if I didn't learn anything new, it entertains me immensely the way you get excited^^
The enum doesn't need to be a byte for the database to use tinyint anyway. Just like one would most likely not let EF use longtext for every single string stored in db, the database column types should be handled in the DbContext configuration/Entity configuration/Entity attributes.
Well I was writing a long comment and autoplay just discarted it...
TL;DR - We have ECG Holter module with 15years of legacy C#. Where with 12days of 12lead ECG you get like 5GB of data (2byte samples, 500Hz) in 32bit process. We have tens or hundreds components to render the data in different ways. We do stream that data from disk and even then we are still hitting 2GB RAM limit of 32bit process way before 7days of 12lead ECG. So yeah maybe here byte enums might help if they are used for some parts of those samples a big way. So... Yeah... They would probably help us. And you lose nothing by using them. I don't understand your anger here. There just are some use cases for them. And rarely there are use cases when 255 values is not enought. So I don't see any wrong doing by using them.
Ok, I do use byte in my enums, but not from an optimization perspective, but because I'm lazy. I don't even remember when or why I started using byte, probably from a requirement from an old tech lead in a decade or more years old project. It doesn't even affect the performance of the applications or the databases, but now I have muscular memory when creating enums. Sorry.
"Just watched Nick's latest Code Cop episode, and as always, it's a goldmine of practical advice! 🌟 His take on the 'Enums as Bytes' craze really put things into perspective. It's fascinating to see how a seemingly minor optimization, like treating enums as bytes, can be dissected to reveal deeper implications on code maintainability and performance. Nick does a great job explaining why context is key and how what works in one scenario (like optimizing for database storage) might not be a silver bullet for every application. It's these nuanced discussions that make software engineering so intriguing! Thanks for another thought-provoking video, Nick. Keep demystifying those LinkedIn tips! 💻🔍 #CodeCop"
_This is definitely not ChatGPT speaking - where did you get that from!?_
Okay, the Key Message is, don't overoptimize. Got it, thank you Code Cop 🙂
That and there are likely other places in the application one can make a difference in before this level of nitpicking ought be considered.
@@Dojan5 okay the point you mentioned, i overheard, is overheard an english Word? I mean, i did not get it, but thanks.
I think it still missed a key point. unless you are packing your bytes into one register etc. the OS still allocates higher amount based on if it's 64-bit or 32-bit runtime.
I'd like to see say 10000000 allocations or something and follow the size difference between byte and say long if you are running 64-bit. My hypothesis is you wouldn't see a difference at all.
I do a fair amount of typing enums as bytes - I'd go so far as to call it "fairly common" in my code. HOWEVER, it's typically used when doing things like defining device registers when I need to serialize or deserialize it from/to a bus where the size matters due to data alignment, not for any sort of savings. So defining an enum as a byte makes good sense in specialized cases, but if you're writing those cases, you already know why and wouldn't call it a "tip" but a necessity for those narrow cases. The advice, as given, is just dumb.
I've only used a non standard backing type for an enum a few times, but every single time it's been UInt64, not (S)Byte or (U)Short.
Why? Because I had more than 32 values and it was being used as Flags, so I had more than 2^31 theoretical actual values.
You might be thinking "But in what world do you ever need that many flags?!"
Letters and numbers. I needed to store all of the letters and numbers that were present in a string as a precalculated value that could be checked for maximum similarity between strings in O(1) time using some bit ops. Made the program about 7 or 8 orders of magnitude faster, and all it cost was O(n) additional space complexity.
Micro optimization ?
I just set them to match the database type.
If it's a 'tinyint', even when it makes 0 sense, then the one in code is a byte.
if it's a 'bigint' for a classic non-flag enumeration, then it's a long, even though there is no way I'd ever get anywhere close to exceeding 2 billion.
Unless you're dealing with many thousands of usages and objects, this level of optimisation will save you less than the runtime uses to even represent your enum in the application (with interned strings and everything), and that's assuming you're paying attention to alignment (typically 4 or 8 bytes) or abusing [StructLayout] enough to actually even benefit from such optimisations.
I'm guilty of this myself. I set up once default conversion for enum to tinint in EF, thinking it was small but nice optimization.
I've forgotten that one of the enumerators had negative values and this micro optimization cost me about 2 hours of debugging trying to figure out why some value is 252 (or smth like that) out of nowhere 😅 Angry at myself I reverted it back to integers
I have inherited code where people have done this. Just as you say, there are hundreds of places in the same repository which could be optimized to save more space or lead to better performance.
in very rare cases, when implementing some pre-defined APIs, I had to use a different underlying enum type, but it was `uint`. I have never ever (consciously) used anything smaller than `int`.
Also, while it technically does save you 3 bytes per value, I *think* it can harm the performance, because (most) modern CPUs work better with 4 or 8 byte values than with single bytes.
Did something like this except i used ulong for the enum. And the enum was [Flags] decorated. And we were actually running out of flag values.
I used “long” on a flags enum once, haven’t needed to go the other way before. Never say never, but also never say always!
Thanks for sharing your time and knowledge,
Nick - I think you need to wear a blood pressure monitor and show us the before/during reading when you do the Code Cop videos 😂
I remember - a long time ago now - the only place where you could grow your understanding was the language specification, the quarterly MSDN, the library vendor, your more experienced peers (if you had them) or a book or course from a highly regarded SME (if such a thing existed). If all that failed - do the time - experiment and figure it out - the ‘science’ bit of computer science.
The internet is sometimes a great place for fact - and equally a dumping ground for positively re-enforced nonsense. It’s really unfortunate that cut/copy/paste coding became prevalent - but remember - our industry enabled that! With that often comes the lack of desire for many to want to understand the how and why. If something you find solves your problem PLEASE take the time to appreciate why or if it even really does.
What exciting times we live in…maybe its time to go re-watch Mike Judge’s 2006 film Idiocracy…
LLM vendors...you are 100% not blanket using these sites for your training models - right ?
I personally would only do this on the very rare occasion that I would need the enum value to be a byte, either because the message required it or the db, and that would be only to avoid casting and assure type safety. But I've been developing for 30 years and don't even need one hand to tell you how many times I've run into that situation.
Okay i get it and "almost" agree with you, that this advice is for the most part total gargabe.... but what if i say there is a legit way to change the type of an enum?
Yes there are:
If i want to layout data in unions, that contains different data based on a an "enum" type - but wanting the union to have the same length, say. 16 or 32 KB. In that situation, the size of the enum matters a lot due to data-alignment and cache coherence.
If i just use default enum (32-bit integer), then i am wasting 24 bit because i most likely dont have enums with more than 10 values. In that case using a byte is totally legit and then using another byte and a short after, to layout the data efficiently can make a difference in performance and even in stability for multi-threading use-cases, due to false-sharing or in-between cache-line issues. Also there are windows api functions, that uses shorts or bytes for enum values as well, so in that case you better use a fixed defined enum as well, so that it patches in your definition.
Hello Nick. Great content. Just a minor stupid thing: I have done some courses on SQL Server query tuning, and as a principle it is actually advisable to use the most compact possible representation for your data. The reason is not that it will take less disk space. But usually, when you do have to do a query, the engine will load pages of data, and simply put, the bigger the size of a record, less records you will be able to retrieve in a single page, meaning, more IO. And you generally try to reduce that logical IO as much as possible
Now, having said that, you're completely right. We're talking about 3 bytes for each enum. Unless your table is basically full of those enum flags, we're talking about peanuts. I bet there will be better optimizations
Yeah; at 1 Million rows, the 3 bytes difference is going to save a "whopping" 2.86 MB. The principle of relational databases was indeed always about using the tiniest possible footprint. Like in 1970 when 100Mb would go for $26,000. When relational databases where invented we were concerned about storage. That goes for both physical and memory. Therefore using the smallest types and remove duplication was KING.
We are not there anymore. Today, most of the sql servers users out there just create a database with the default settings. A VERY, VERY small subset is actually able to and benefit from changing the page sizes and picking data types so that data lines up and reads from storage are going to line up with your systems data bus size and your L1,2 and 3 caches, for a perfect, zero waste, read.
With all the other factors of not knowing on what hardware and storage your files end up on in the cloud... looking at the byte level for some numbers is the worst place to look at. In light of the advice you can imagine someone might go and create TINYINT status columns right beside a NVARCHAR(500) status message column, or some other NVARCHAR columns elsewhere that are "oversized" just in case? Not many know this either; but is not number of characters but number of byte pairs. An NVARCHAR(200) is a 400 byte allocation.
@@paulkoopmans4620 quite noob-common with EF is to use string without specification resulting in nvarchar(max) columns. THAT is stupid.
@@daniellundqvist5012Yeah.. That too. EF should come with it's own set of code analysis to stop the person from making those.
@@paulkoopmans4620 Absolutely. I have worked with DBAs on this kind of optimization, and I have seen many talks from really good people whose focus is just that, to optimize DB. And while the principle generally stands, you always have to ask yourself how big is the effort for that miserable gains. There are usually bigger issues to tackle. That being said, I have also seen some tables with 50 flags like these.
I'm in a GC'd, vm based language running in a container... every bit counts!
This one isn't so terrible compared to the other ones, imo. You're never going to reach anywhere near 256 different enum values for many uses.
However, I think you don't save RAM in all cases due to memory alignment stuff that I admittedly know little about.
I used byte and short enums inside structs that model packets where there are protocols
This advice comes from back in the day when we had 2MB RAM to work with.
Hi Nick, I keep watching your videos as I can learn useful things from them. I agree, that the majority of the time you should be fine with 32-bit storage for an enum and only a few special cases need to limit it to say 8 bits. However, I think you got a bit too emotional on this one for no obvious reason. I think it's more professional if you keep it cool.
Thank you and keep the good content flowing!
We should get CodeCope series from such creators on LinkedIn as a response to Nick's video series
I do it rather the other way: when going to the database, the enum values become strings (not the internal C# name, but an explicit, well-defined one, e.g. via attribute). I consider the numeric value of an enum a implementation detail.
We also already have the amazing [Flags] attribute to do bitwise AND/OR'ing on enums.
I've done this, but in that case my code was interacting with a microcontroller that has only 512KBs of memory, and the C# code serialized a pretty big data structure, that contained a lot of enums and stuff. But it's hard to imagine any other kind of situation except for embedded stuff in 2024.
I use it in a IBM PC emulator, but that's like the only other use case.
On one hand this advice *could be good in some situations*, but in a lot of places fields in a struct or class are going to be aligned on a 4-byte boundary, so you don't end up saving any space. Using a TINYINT or BYTE instead of an INT in your database could be a win over large numbers of rows, but there's often a far bigger gain to be had by limiting your strings from a default VARCHAR(MAX) to something more reasonable (which many people don't bother to do, because EF just makes all strings VARCHAR(MAX) by default). Like many "optimization" opportunities you really need to profile and measure to make sure you are actually saving something, and I imagine a lot of the people sharing this "advice" haven't profiled or measured anything. You cannot optimize by assumption alone
This video has so much frustration and sadness for the LinkedIn community. I could feel the range of emotions and helplessness 😛
The byte might get padded, anyways, because things need to be aligned in memory.
I have some WinAPI calls (native stuff), where i have to marshal some c byte constants, which i converted in a c# byte enums. And here, the size is important, cause otherwise memory structure is not matching anymore. But of course, this is a special edge case.
Actually, I did extend byte for my enums when I encoded data and codec expected my value to be 1 byte integer.
I used to do that in SQL with EF, but now store strings in MongoDB.
This may makes sense, possibly only in Unity ECS where entity size is limited to 128bytes
On 32/64 bit processors 8 bit vale takes 32 bit register anyway (if i remember correctly). So it will give you none performance boost. If you have very large number of records that can save some memory.
afaik one should always use the native data type for processing (enums) to get best performance. if one really want to one can convert back and forth to get an optimal memory footprint when storing or serializing, but don't do that in memory. it hurts performance.
Such a great video!
I don't quite know if it is true. But a smart person told me once that computers are heavily optimized for ints, so using smaller types could be seen as an anti pattern.
well, C# uses int32 all over the place, you will most likely cast it to int32 anyway if you really need it as numeric value for something
I don't think most people know that values in memory have to be aligned, and registers in cpu have specific sizes. Byte doesn't actually save any memory, and in some cases might actually be slower. Also, I store the string values of my enums in db!! lol. Space is cheap. Your time is not. If you need that level of optimization C# is not your tool.
It's advice like this that makes me want this series to go the name and shame route.
It might be just me using a different/wrong practice, but when i use Enums, i usually even use negative numbers to show "Bad" states/values. This makes handling and reading the enum a bit easier for me at least, but would be impossible or at least much more confusing to do with a 0-255 range.
Negative values for error state seems like a good idea to me.
if you're being consistent then it's a good practice
I'd argue that the disk/network usage for the additional " : byte" in source control across a company will outweigh the benefits.
Converting all of your giant enums to tiny sized enums doesn't even give a guarantee it will occupy exactly this desired TINYINT in the process memory :) Modern CPU doesn't give a shit about LinkedIn advice, it is designed to be super optimal dealing with machine words as "default" data type. It is more likely that you will introduce few more redundant machine instructions like movzx/movsx in the emitted code by enforcing your enum storage type to the System.Byte type. So, good luck to all followers 😅
Agree 100%. The comment at 5:47 reads completely like it was generated by Chat-GPT, it has the same tone, structure and verbosity.
"Your use of a byte as the underlying type for an enum is a sophisticated and elegant approach to solving your desire to feel cool like those hard core bit-banging low level programmers you have an inordinate and unexplainable admiration for, even though they are slightly more muskier than you. Please like me, I'm your best AI assistant, friend, soulmate."
Using byte or short for enums may harm performance, because CPU will may need to ajust it to int32 and back to byte every time.
Sometimes I used enum based on long with [Flag] attribute, when you need more than 32 flags
But never byte or short.
CPUs works with int32.
I was on linkedin yesterday and saw this advice, I was telling myself is it gonna go on code cop or not, and now just opened youtube, first recommended video. Lol
I will assume you have never worked with IoT devices or, in xamarin, creating a large grid. I have done both and if we start with memory-constrained devices, this could be helpful, but, before you optimize profile and see where you may need to make improvements.
If I just need an enum in a couple places probably not worth it, but if I am transmitting data from an edge sensor to a controller it may be useful.
In a game, I may use an enum to specify which type of terrain in a cell and I may have millions of these, so saving a few bytes could be helpful, esp when I want to save the map, to reduce the size of the file.
You shouldn't optimize prematurely but there are times when something like this may be useful and just shooting down the idea without considering when it might be useful is just bad form, IMO.
You probably shouldn't optimize to that level. But when your DB type is defined as tinyint (byte) I would better follow the same memory layout in code, to avoid any marshaling problems.
I once made the terrible mostake to store the weight of a person in a byte in the database, and then some wanted the weight in lb, never trying to save a few bytes again
LOL, I wasn't prepared to hear Nick's "yEaH THaTs A GoOD AdVIcE HEeHeeheEhe". He's normally so eloquently spoken, I did a double take hearing him speak that way. Good video
People suggesting using non-word numeric types to save memory don't understand how memory is allocated, aligned and how reading/writing is optimized.
Enums have always been a curse that is iredeemable that requires a major breaking change to happen.
The fact that people have to write source generators to improve performance and memory use is already a sign that it is.