The Problem With UUIDs… And How To Solve It

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024

Комментарии • 97

  • @MilanJovanovicTech
    @MilanJovanovicTech  Месяц назад +3

    Want to master Clean Architecture? Go here: bit.ly/3PupkOJ
    Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt

    • @iteospace
      @iteospace Месяц назад +1

      Milan, How about "long" Snowflake Id next?

  • @P1n3apqlExpr3ss
    @P1n3apqlExpr3ss Месяц назад +6

    I'd highly recommend Jeff Moden's Guid fragmentation talk that dives deeper into this topic looking at page splits, fill factors, and gradually evolving data. It offers a perspective that is much different to the high count one-off insert/read demo shown here, and should be closer to the reality of many production systems

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад

      Link it here?

    • @P1n3apqlExpr3ss
      @P1n3apqlExpr3ss Месяц назад

      @@MilanJovanovicTech ruclips.net/video/rvZwMNJxqVo/видео.htmlsi=TJ3KhHMFwows0C_4

    • @EddieDemon
      @EddieDemon 28 дней назад

      @@MilanJovanovicTech ruclips.net/video/rvZwMNJxqVo/видео.html

    • @dotnetfanboy
      @dotnetfanboy 23 дня назад

      @@MilanJovanovicTech ruclips.net/video/qfQtY17bPQ4/видео.html

    • @goldenwraft5362
      @goldenwraft5362 15 дней назад

      I believe it is this one ruclips.net/video/qfQtY17bPQ4/видео.htmlsi=Ls6A110rSMKR5Qbn @MilanJovanovicTech

  • @WolfspiritMagic
    @WolfspiritMagic Месяц назад +13

    While everything in the video is true, there is one thing that somehow nobody mentions when talking about these time based ids: Security. In my opinion there are cases where exposing the creation Date/Time of a database record can lead to security issues. For example with these kind of IDs users will always be able to figure out if they have been created before or after another user if they get their IDs. That's sometimes the reason why you use GUIDs instead of auto incrementing numbers in the first place.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад +1

      Makes sense, but a timestamp leaks less information than an INT/LONG id.

    • @Whojoo
      @Whojoo Месяц назад +1

      Asume I know too little about security. Why would it matter if I as a user know if my account is created before or after an other user?

    • @WolfspiritMagic
      @WolfspiritMagic Месяц назад

      @@Whojoo Before/After might not a big issue but it's still data that you might not want to leak. For example if you have a game and User A invites User B to your game, might even spend some money for User B to give them a starting credit and then figures out based on the ID that User B was already member of that game before User A.

    • @WolfspiritMagic
      @WolfspiritMagic Месяц назад

      @@MilanJovanovicTech Actually I'm not sure about that. I think it depends and a timestamp might in some cases be even worse. Lets say you have a database of patients for a doctor treating a specific illness. Knowing the timestamp could give people insight into when that person actually got that illness.
      Or lets say you're looking for another job and join a job site for that. Your current employer finds you on that website. With an ID in itself of int/long (and no other IDs to compare) they won't have much informations (unless the website provides that to them) and you could be member of that size for years. With an ID based on a timestamp they will be able find out that you just recently started looking for a new job.
      Also most of randomness is based on time. Having the exact creation time in milliseconds with the entry might in some cases help an attacker to limit the amount of bruteforce required. That's for example how they recently hacked a crypto wallet.

    • @JohnathonSmith03
      @JohnathonSmith03 Месяц назад

      @@WolfspiritMagicstill don’t see how making it harder for users to lie and deceive each other would be a security issue… not to mention it isn’t even reliable at all short scale anyway otherwise time based keys around be guid anyway. So you get a small amount of ordering info which is can think of very few circumstances being an actual security issue. Just makes it harder to lie about sequencing.

  •  Месяц назад +6

    This video is really great, and there are so many bad videos on this topic. The only thing I don't think you mentioned (or I might have missed it), is that SQL Server saves and order a UNIQUEIDENTIFIER in a different byte order than C# (and some other databases). That means even with UUID 7, if we use this the UNIQUEIDENTIFIER datatype, there will be fragmentation issues. Storing as string or binary will not have the same type off issues though.

    • @RobertMcLaws
      @RobertMcLaws Месяц назад +3

      That only happens when you have a clustered UNIQUEIDENTIFIER. Create the ID column as NONCLUSTERED and create a CLUSTERED INDEX on the DateCreated instead, and your database will have almost the exact same performance as INT IDs.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад +3

      "there are so many bad videos on this topic" - I'm not sure why some videos focused on the how fast the ID creation time is. 🤷‍♂️

    • @dcernach
      @dcernach Месяц назад +1

      Never thought about it. Our DBAs complain a lot about our using GUID as a PK. If I understood correctly, you meant to create a table with an ID as a UNIQUEIDENTIFIER and add a column like CreatedOnUtc. Then, create a CLUSTERED INDEX using both columns, or create a clustered index on CreatedOnUtc and a non-clustered index on ID?

    • @daveanderson8348
      @daveanderson8348 Месяц назад +2

      @@RobertMcLaws How about using that ID column as a Foreign Key in other tables? What is the impact of that?

    • @daveanderson8348
      @daveanderson8348 Месяц назад +1

      @@dcernach The GUID type column as PK should be non-clustered index. Otherwise the fragmentation of GUID would have negative performance impact when your table has at least millions of rows.

  • @robadobdob
    @robadobdob Месяц назад +3

    I think using UUIDs in databases is one of those topics which probably doesn't have a measurable impact to most of us at the scales we work at.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад +1

      Probably true

    • @AdonikamVirgo
      @AdonikamVirgo 17 дней назад

      We've had issues in a 1mil+ row events table; Jeff Moden's talk is verbose, but boils down to not using the defaults - 90% fill factor for random ids causes expensive page splits, clustered PK is a waste and sequential inserts are not the enemy UNTIL you update and grow the row in some way. ruclips.net/video/rvZwMNJxqVo/видео.html

  • @ucretsiztakipci6612
    @ucretsiztakipci6612 Месяц назад +1

    Adding Postgress to this benchmark would be interesting. Wher is the repo?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад

      Was planning to, but didn't have time to add a Postgres benchmark

  • @jodainemoore8300
    @jodainemoore8300 Месяц назад

    Great Videos, Please can you do one on Composition and Or with Strategy Pattern, best struggling with these for months.

  • @ahmettarksahin3777
    @ahmettarksahin3777 Месяц назад

    When we use ULID or UUID as the primary key in the database, should we check whether ULID or UUID has been created before when adding data to the database?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад +1

      No need, they're random enough that you're unlikely to see a collision. And even then, you'd get an exception for duplicate key values.

  • @RomanTurovskyy
    @RomanTurovskyy Месяц назад +1

    If using GUIDs introduces less then 2x slowdown I do not see this as an issue. 10x performance penalty would have been a problem, but 2x can be easily covered by the modern hardware for the benefits GUIDs provide.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад +1

      True. Most problems in engineering are amplified by scale.

  • @thedacian123
    @thedacian123 29 дней назад

    Nice,but about database sequences?

  • @kenwarner
    @kenwarner Месяц назад

    It's probably just me but having the SSMS toolbars on two rows instead of one stuck out like a sore thumb. I always combine to a single row whenever I have to install it. 😬

  • @FranciscoCastellanodePablo
    @FranciscoCastellanodePablo Месяц назад

    Be carefully with int, it can only support database generation. I have faced problems when your application (not planned in case) began to support external applications that has there own database like mobile applications that need to work with no connection.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад

      There's also the HiLo strategy

    • @FranciscoCastellanodePablo
      @FranciscoCastellanodePablo Месяц назад

      @@MilanJovanovicTech good point, never the less in my experience its better to go with UUID, this Hilo and others strategies that could exists could be needed if UUID creates other issues like performance as mention in the video. But for the majority of the cases I presume UUID it is the better option.

  • @vijayarajan-bt5fk
    @vijayarajan-bt5fk Месяц назад

    15.26 where to get this benchmark...? Shell script;

  • @theeusftn
    @theeusftn Месяц назад

    But why I need to use timestamp in my uuid? i dont got it

  • @casperhansen826
    @casperhansen826 Месяц назад

    The problem with ULID it is not compatible with UUID/GUID so it will require a lot of changes to make it work in an existing system
    If you let the SQL server database create the guid it will be generated in sorted order and work just as well

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад

      SQL Server isn't the only database people commonly use, so we should look at this from a broader perspective.
      I'll upload a Postgres benchmark soon!

    •  Месяц назад

      If you use NEWID and not NEWSEQUENTIALID, then you should have the same problem if MSSQL generates it. NEWSEQUENTIALID is closer to what UUIDv7 does, but compatible with the UNIQUEIDENTIFIER type without getting to much fragmentation. ULID is 128 bits, same as UUID so not sure what you mean with "not compatible", there is always the little-big-endian issues when working with UUID/GUID/ULID, so if your working on a bit-level, you can get confused really fast... For the text representation, they are not presented in the same way. But if you store it as CHARs and not BINARY or UNIQUEIDENTIFIER, then you have about doubled the size already... if that's important for your usecase.

  • @DavidSmith-ef4eh
    @DavidSmith-ef4eh Месяц назад

    do they slow down queries and db in general? an eleven length unsigned int seems much more preferable than a 32-length char entry lol. I assume the index size and look up speed are much slower on the latter, on huge databases.

    • @DavidSmith-ef4eh
      @DavidSmith-ef4eh Месяц назад

      although, on the other hand, you don't need a created_at column anymore. I can see msyelf using them, for things that require unique ids accross a system

    • @Rob_III
      @Rob_III Месяц назад +2

      You shouldn't store a ULID (or (G/U)UID for that matter) as (var)char but as (G/U)UID. And, yes, a ULID can be stored as a (G/U)UID.

    • @RobertMcLaws
      @RobertMcLaws Месяц назад

      Using the NONCLUSTERED UniqueIdentifer + Clustered DateCreated is nearly as fast as integers and has been tested on huge databases with negligible performance impact.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад +1

      They have an impact on table/index size, for sure

    • @casperhansen826
      @casperhansen826 Месяц назад +1

      It's 16 bits and the main benefit is that it can be generated on the client and written to the database.
      The problem with GUID and UUID is that the order of them is more or less random so writing it to the database as a clustered index will cause some heavy manipulation to make it work. ULID should solve that problem, still generating unique values in almost sorted order making it possible to just place the new record at the end of the table

  • @marikselazemaj3428
    @marikselazemaj3428 Месяц назад

    10:45 What are those icons before class names?

  • @user-pr4yj1wr1s
    @user-pr4yj1wr1s Месяц назад

    what is the best order to study clean arc

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад

      There's a million sources, watch them all. But the source of truth is Uncle Bob's book.

  • @user-xm7sh3vw8o
    @user-xm7sh3vw8o Месяц назад +2

    SnowFlake ldGenerator

    • @DavidSmith-ef4eh
      @DavidSmith-ef4eh Месяц назад

      lol, thats an unfortunate name. does it contain the pronoun of the id as well?

    • @Rob_III
      @Rob_III Месяц назад +1

      @@DavidSmith-ef4eh Snowflake predates woke stuff by at least a decade; Twitter introduced it in 2010.

    • @DavidSmith-ef4eh
      @DavidSmith-ef4eh Месяц назад

      @@Rob_III yeah, read upon it. seems to be used across all the major socials, apparently. must be fastest I guess.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Месяц назад

      Cool, will see if I can include it in some future discussions

    • @user-xm7sh3vw8o
      @user-xm7sh3vw8o Месяц назад

      @@MilanJovanovicTech what? feel ,long long ago

  • @user-xm7sh3vw8o
    @user-xm7sh3vw8o Месяц назад

    kubernetes ,Waiting