Faster database indexes (straight from the docs)

Поделиться
HTML-код
  • Опубликовано: 4 авг 2024
  • НаукаНаука

Комментарии • 320

  • @PlanetScale
    @PlanetScale  10 месяцев назад +13

    If you enjoyed this, we also have a free MySQL for Developers course with 50+ videos. Thanks for watching! planetscale.com/learn/courses/mysql-for-developers/introduction/course-introduction

  • @aegif
    @aegif 10 месяцев назад +113

    As a student that has taken a dbms course in mysql and hated every part of it, I can absolutely tell that I would love it had it been even a bit like this video and series in general. Thank you Aaron, this content is genuinely incredible and is making me want to actually learn the details of mysql.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +19

      I can't tell you how happy this makes me. It's such a shame to me that so many interesting things are taught in such an uninteresting way.

    • @ahmad-murery
      @ahmad-murery 10 месяцев назад +9

      @@PlanetScale Teaching is a skill that not every expert possesses (assuming they were expert in the first place)

    • @PlanetScale
      @PlanetScale  10 месяцев назад +4

      @@ahmad-murery Extremely good point

    • @godonholiday
      @godonholiday 10 месяцев назад +1

      Aaron, this approach is refreshing. Hope you get to some focused use case series, like SQL for Product Managers!

    • @reikooters
      @reikooters 10 месяцев назад

      You'll have to learn it one day. There's no system in the world that doesn't use data - it's the entire point of why we build applications. I used to fall asleep in my database classes when I was in university (15 odd years ago) and my mate used to wake me up when the class finished because the lecturer was terrible. But after graduating I spent a lot of time teaching myself about databases for the reason I mentioned earlier.

  • @remedix
    @remedix 10 месяцев назад +59

    I don’t usually post on videos but the quality and the details on these is amazing! Thank you Aaron! Keep them coming!

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      Thank you so much, I really appreciate you saying that. We're definitely gonna keep em coming 🫡

    • @andynn6691
      @andynn6691 10 месяцев назад +1

      Agreed! I don't even use or like MySQL very much, but the content is often more broadly applicable.

  • @transcendtient
    @transcendtient 10 месяцев назад +24

    I appreciate you talking about collisions because the one case that you do have where it happens would be a nightmare if you didn't plan for it no matter how small the chance may be.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +5

      Totally. Especially if you use CRC32 which is going to have massive amounts of collisions!

    • @Pilikio
      @Pilikio 10 месяцев назад +4

      Also when using non cryptographic hash function you have to assume that a malicious user may enter data to ptoduce hash collisions on purpose.
      In best case this will lead to an annoying bug, in worst case a severe security vurnability.

    • @sealoftime
      @sealoftime 2 месяца назад

      @@Pilikio It can very-very theoretically cause Hash DoS attack

  • @ayeshk5997
    @ayeshk5997 10 месяцев назад +1

    I've been using MySQL for ages now, and yet you still come up with new things and ideas (like this one and the Geo box combined with haverseine calculation). Keep it up!

  • @reikooters
    @reikooters 10 месяцев назад +16

    You missed one big thing. When you add the generated column, you need to also specify the "stored" keyword. This means when inserts/updates are made, the generated column will be computed and stored in the table at the time of writing. The default behaviour is "virtual", which means the generated column is not stored with the data and instead must be computed at the time of reading. This means every time you select from the table, mysql needs to recompute the hash for every row in order to find the one you want, which kinda defeats the purpose of the whole exercise.

    • @monad_tcp
      @monad_tcp 10 месяцев назад

      It all depends on how fast is hashing vs disk I/O.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +10

      In this particular example the data is stored even though it is a virtual column. Because we put an index on it, those values are calculated and then written into the btree. Calculating the hash is pretty cheap, but if you have an expensive generated column it might be worth declaring it stored instead of virtual.

  • @aoe4_kachow
    @aoe4_kachow 10 месяцев назад +18

    I wish the results were rendered in a monospace font so that the md5 hashes lined up neatly

    • @PlanetScale
      @PlanetScale  10 месяцев назад +7

      You and me both 🫠

  • @alexeybalandin4676
    @alexeybalandin4676 10 месяцев назад +1

    I'm coming from your planetscale mysql db course and just gotta say - it's really amazing. Just the perfect level of depth, engineering knowledge and ease of use. I've been mostly the ml engineer guy through my career, now I'm switching to full-stack role and your course is really amazing for me to delve into optmizing the database shenanigans :) Thank you very much!

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      I'm so glad to hear that! Thank you for letting me know. I put a lot of work into that course so I'm pleased that you've enjoyed it

  • @simonhartley9158
    @simonhartley9158 10 месяцев назад +13

    Lossy compound indexes seems such a good idea, that it should become a standard built-in feature at some point, e.g. a particular index type. You could then have index types like Postgres' GIST which will handle the deduping automatically.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +5

      Yeah that's an interesting idea. It would be neat to see that built in

  • @ridass.7137
    @ridass.7137 10 месяцев назад

    So well explained with the right pace and example, its incredible. Thank you!

  • @ihzakarunia2408
    @ihzakarunia2408 10 месяцев назад

    As developer that always use mysql, you very did a great job explaining this topic. Thank you very much

  • @jhechtf
    @jhechtf 10 месяцев назад +2

    "remember if you hash passwords with md5, straight to jail" the delivery got me to laugh out loud, take your like.

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Got em! 😂 Glad you liked that part

  • @saadkhan3401
    @saadkhan3401 10 месяцев назад

    My man this is is very greate and the niche you choose to post this video perticularly is very great the quality and everything I hope you make this as a series as this will help everyone a lot ❤

  • @Hislodin
    @Hislodin 10 месяцев назад

    I can see how this could be very useful in optimizations. Thank you for your practical guide on deeply doc-founded knowledge.

  • @n4trojan
    @n4trojan 10 месяцев назад

    That's was definitely a great piece of knowledge I got in just 13 minutes. Thank you!

  • @mibennettc
    @mibennettc 9 месяцев назад

    I saw this video and it literally solved a problem I had at work the next day. Stellar job Aaron!

    • @PlanetScale
      @PlanetScale  9 месяцев назад

      Oooo no way! Can you give me any more hints? What'd you end up hashing?

  • @anderskozuch7838
    @anderskozuch7838 10 месяцев назад +2

    You are awesome man:) I am self learning web development and you are such a nice source for wholesome and humouristic learning!

    • @PlanetScale
      @PlanetScale  10 месяцев назад +2

      I love to hear that! Thank you for letting me know. And good luck on your web dev journey!

  • @rickmyers7627
    @rickmyers7627 7 месяцев назад

    Excellent Aaron! This technique can also be used to create a Cache key to store data in memory (or on disk), if you need to "roll-you-own" caching mechanism. Basically, sort/uppercase your arguments, then MD5 them together, and then use a hash array to map that MD5 to the response from the server. We do that with address lookups so we aren't constantly going to Google for geocode data (and eventually getting charged). Once we see an address we cache Google's geocode response and we never have to do the lookup again, thus saving money down the road.

  • @gazsi72
    @gazsi72 8 месяцев назад

    I haven't learned such a useful trick from a youtube video in a long time, thx

  • @jimothyus
    @jimothyus 10 месяцев назад +5

    These videos are really high quality. You are doing an excellent job

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      Thank you so much for saying so

    • @JohnRoux
      @JohnRoux 10 месяцев назад

      Yeah so much this!
      When I see a new video on my feed, I watch it before all my usual content because honestly, this is just so good!

  • @StingSting844
    @StingSting844 10 месяцев назад +1

    As someone who is just getting started with databases, this is incredible! Thank you A Aron

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      A A Ron 😂😭 You're welcome!

  • @pixtweaks393
    @pixtweaks393 10 месяцев назад

    Thanks for sharing this amazing idea, very well explained, thank you!

  • @postnick
    @postnick 7 месяцев назад

    Content like this I love - I hope I can find a useful way to do this in my position but my indexes are always Material + location + date usually.

  • @TroyNiemeier
    @TroyNiemeier 9 месяцев назад

    You got my sub for the "Instead of hanging out with friends, ....we continue to read " line. :)

    • @PlanetScale
      @PlanetScale  9 месяцев назад

      Haha glad you liked that one 😂

  • @budidarmawan6959
    @budidarmawan6959 10 месяцев назад

    i didn't even use MySQL, yet i was pleasantly entertained by your video (i believe some of the knowledge is applicable to other databases as well)
    thank you, keep up the good vid 💪🏻

  • @BMRG14
    @BMRG14 10 месяцев назад

    So many great tips! Thank you very much!

  • @nawaahkh
    @nawaahkh 8 месяцев назад

    Just 😲, a huge thank you for all these information in all videos.

  • @thereasonableprogrammer4921
    @thereasonableprogrammer4921 6 месяцев назад

    As someone who likes reading the MySQL docs, printing them out and binding them is hardcore 😂 props Aaron 🎉

  • @user-cf9ir4gw2c
    @user-cf9ir4gw2c 9 месяцев назад

    Super. I remember dozens of usecases, where this implementation would of saved the day.

  • @kaprikornz
    @kaprikornz 10 месяцев назад

    Quality videos man! Keep it up

  • @addanametocontinue
    @addanametocontinue 10 месяцев назад +1

    Great content. Got yourself a new subscriber!

  • @k1chko
    @k1chko 10 месяцев назад +1

    You do great work! Tough to find solid material to point developers to.

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      I appreciate that very much

  • @jamaly87
    @jamaly87 10 месяцев назад +1

    Dude your content is 🔥.

  • @pranaygoel
    @pranaygoel 10 месяцев назад

    Hi from India!
    It's the best thing I came across today.

  • @simonnjoroge933
    @simonnjoroge933 9 месяцев назад

    Amazing video, great explanation and also, nice sql tool you have there. More videos 🌟

    • @PlanetScale
      @PlanetScale  9 месяцев назад

      Thank you! Got a big one coming out tomorrow on a similar topic 🤐

  • @CRBarchager
    @CRBarchager 9 месяцев назад

    Awesome video. Now I just need to find the T-SQL version of your channel and I'm complete :D

  • @youssefhossam9077
    @youssefhossam9077 10 месяцев назад

    I really learned something today, thank you man.

  • @andyvirus2300
    @andyvirus2300 10 месяцев назад +2

    Thank you for talking about collision, I was wondering about it even if I knew this wasn’t really an issue !

    • @PlanetScale
      @PlanetScale  10 месяцев назад +3

      It's always fun to look up collision probabilities!

    • @andyvirus2300
      @andyvirus2300 10 месяцев назад +1

      @@PlanetScale not as much as your videos, and I’m learning a ton.
      Please keep up the good work !

  • @rrd_webmania
    @rrd_webmania 7 месяцев назад

    A gem again. Thanks

  • @TuralAsgar
    @TuralAsgar 7 месяцев назад

    Good for you, Aaron!

  • @MonokelJohn
    @MonokelJohn 10 месяцев назад +4

    There are some pitfalls when using concatenated rows as the key, especially when there is no separator, but also when columns are nullable.
    What I'm thinking about is a row with column values 123 and 45 and another row with values 12 and 345, which would end up having the same concatenated value (when using just concat), and therefore the same hash. You could still end up in a similar situation when using concat_ws when some of the columns are nullable. We saw that the null values are completely omitted, so two sequential nullable columns could end up with the same hash, when a value is present in either one or the other column. There are even more ways where null values in some columns could result in the same input for the hash calculation. If the null values were converted to an empty string, which would create a separator representing every column, which would solve that problem, it would however be hashed the same as an empty string, or any other string you would use to represent null, but it's the best solution I can think of.
    That's why I think it's really critical to do the full comparisons described at 11:50.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +2

      Yup this is a great argument for adding the extra conditions

    • @lyuboslavilov
      @lyuboslavilov 10 месяцев назад

      Rule of thumb - absolutely always do use separators when concatenating multiple values and producing whatever fingerprint you are producing

  • @domanjie
    @domanjie 10 месяцев назад

    Quality content👌🏾, Thank you.

  • @languagelearningexperience6814
    @languagelearningexperience6814 10 месяцев назад

    Excellent video!

  • @oguznsari
    @oguznsari 10 месяцев назад

    this is great, thank you aaron

  • @yakirgb
    @yakirgb 10 месяцев назад

    Very good video!

  • @kyleridolfo
    @kyleridolfo 10 месяцев назад

    Truly excellent video

  • @JohanNordberg
    @JohanNordberg 10 месяцев назад

    I really like your presentation style. Clear and explanatory and good video length.
    I never use MySQL, so one improvement for me would be if you could mention if this would be relevant for MSSQL and Postgres as well and if the syntax is different, just make a short note of that.
    I assume most concept are similar.

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Thanks Johan! I'm not as familiar with the other databases so I feel hesitant to speak on them

  • @rahulxcr
    @rahulxcr 10 месяцев назад

    Thanks. This is really awesome trick.

  • @standardio8270
    @standardio8270 10 месяцев назад

    This is great content.

  • @119_shrenikshah7
    @119_shrenikshah7 10 месяцев назад

    Great quality content

  • @joshhickman77
    @joshhickman77 7 месяцев назад

    The MD5 hash collision isn't the reason you add the WHERE, it's because sometimes your data can have the separator in it and "123|abc", "def" gets combined to the exact same string as "123", "abc|def" even though the values don't match.

  • @devcunha
    @devcunha 7 месяцев назад

    Very nice!

  •  10 месяцев назад

    You have enough friends and I'm proud to be your friend 👍🏻

  • @treesitter
    @treesitter 10 месяцев назад

    This video screams high quality content!

  • @kirayamato6128
    @kirayamato6128 10 месяцев назад

    dude u are awesome always!! thank you

  • @rasheedatanda9927
    @rasheedatanda9927 10 месяцев назад

    Thank you for this video. It provides more insight into why some rule of thumb exists in the database world.

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      You're welcome!

    • @rasheedatanda9927
      @rasheedatanda9927 10 месяцев назад

      @@PlanetScale In short, all your videos I've watched made lots of sense to me considering my side quest on building a database. I'm now able to see things from a perspective different from the usual "do this don't do that because of so and so..." which is quite rampant in the engineering world.

  • @prasanthomanakuttan8391
    @prasanthomanakuttan8391 10 месяцев назад

    Earned a Sub

  • @VuTuanIT
    @VuTuanIT 10 месяцев назад +1

    This is so useful that I feel stupid that didn’t knowing this after 6 years working with mysql 😂

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      Nah don't feel stupid, we should all be learning new things constantly!

  • @luizhrocha
    @luizhrocha 10 месяцев назад

    This is gold

  • @soofgolan
    @soofgolan 10 месяцев назад

    Awesome video! one small nitpick, since you're dropping nulls when joining columns into a string with the "|" separator, the following rows will be converted into the same string before hashing, even if values came from different columns.
    All these:
    "A", NULL, "B"
    NULL, "A", "B"
    "A", "B", NULL
    Map to "A|B"
    I suggest using a more resilient value concat'ing scheme

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      True! Unlikely to happen with this data, but that would be a good reason to use the conditions shown at the end of the video

  • @ongong8209
    @ongong8209 7 месяцев назад

    thanks you so much, your videos so helpul for my job. Can you do more video about EXPLAIN ANALYZE and how to optimize execution plan in MYSQL

  • @rafaelacosta5374
    @rafaelacosta5374 10 месяцев назад

    awesome!, I had no idea about this, I have a scenario where to use it. Thanks

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Love to hear that! Please report back how it goes

  • @MelroyvandenBerg
    @MelroyvandenBerg 10 месяцев назад +1

    This is so great! I hope this all works on MariaDB as well?? And now I also need to print this "book" ;)

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Theoretically it should work the exact same! Smaller = faster. Give it a go and let me know

  • @Laflamablanca969
    @Laflamablanca969 10 месяцев назад

    Another brilliant fucking video! Keep them coming 😂

  • @yangsui1301
    @yangsui1301 10 месяцев назад

    The generated column got me thinking if there is some places I can actually do better on my work, thanks Aaron!!

    • @yangsui1301
      @yangsui1301 10 месяцев назад

      One more thing, should we ignore the cost of concat, md5 or any pre-processing stuff here? I mean, I'm not a MySQL expert so I'm not sure if we do this calcuation in database level is good or not. And when should will consider using a generated column instead of a composite index? Should we always make a experiment or if there is any instuction we can follow?

    • @yangsui1301
      @yangsui1301 10 месяцев назад

      BTW the generated column is a really nice idea, if there is a column referencing or calculating by the others that our apps will have to keep it correct, it's struggling because we are afarid of there is some place in our app that doesn't handle that calculation correctly, but if this job is done by the database, at least we can avoid lots of code changes and just trust the database will keep the consistency as usual.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      Generated columns are so fun. You can use them in so many places. I would probably ignore the cost of a concat or MD5, but if you were to do something very expensive in a generated column, you might consider declaring it STORED so it's not calculated over and over. I like having the database maintain these calculated values so that no matter where the insert or update is coming from, the generated column is always going to be correct and present.

    • @yangsui1301
      @yangsui1301 10 месяцев назад

      @@PlanetScale Fair enough! Will definitely try it, thanks Aaron!!

  • @thmater-x9786
    @thmater-x9786 10 месяцев назад +1

    Whish there is a channel like this for mongodb :(
    Awesome content ❤

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Guess you'll have to switch to MySQL 😏😏

  • @emilk2729
    @emilk2729 10 месяцев назад +4

    "Straight to jail!"🤣

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      It's true, don't try it!

  • @visualapproach7155
    @visualapproach7155 10 месяцев назад

    Ok, wait a minute. How is it possible I haven't come across your content before? You actually know your stuff. Moreover, I can't even play out the jokes in my head before you say them out loud. GET OUT OF MY HEAD. Also, keep producing amazing content. Deal? Deal.

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      You drive a hard bargain but you got yourself a deal

  • @nick-khan
    @nick-khan 10 месяцев назад

    Great content up to the point! What MySQL version are you using and reading docs of?

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      The docs here were version 8, but this pattern will work in any version!

  • @user-qd3jm9vp4y
    @user-qd3jm9vp4y 2 месяца назад

    what if you update one of the columns that is used for indexing, will the hash also be updated ?

  • @FailMachineInc
    @FailMachineInc 10 месяцев назад

    thanks

  • @nathanoy_
    @nathanoy_ 10 месяцев назад

    What program do you use to execute the queries and view the results?
    EDIT: Nvm, a few seconds after posting the comment I reached 8:37 where you mention that it is TablePlus.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +2

      Bah you beat me to it! Or I guess in a way I beat me to it. Regardless... TablePlus 😂

  • @nerdswhosell
    @nerdswhosell 10 месяцев назад +1

    Love this. Is there any tips on thinking about this when building/designing the database schemas ?

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Thanks! I think the most common use cases would be 1) when you're trying to index something that's not easily indexable, you can reach for this or a functional index 2) when you're trying to "paper over" some weirdness in your data model as a convenience or 3) when you're trying to create an ID for an entity that doesn't already have a natural ID. In this case the address didn't have a natural ID, so calculating a hash of the parts gives us a nice ID that we can use for deduping, uniqueness, or indexed lookups from the application side

  • @sagnikbanerjee8169
    @sagnikbanerjee8169 5 месяцев назад

    That straight to jail part goes hard ngl

  • @k98killer
    @k98killer 10 месяцев назад

    Sounds pretty intuitive

  • @thegrumpydeveloper
    @thegrumpydeveloper 8 месяцев назад

    Honestly this video series was a big part of me switching back from neondb.

    • @PlanetScale
      @PlanetScale  8 месяцев назад

      🤐
      this makes me happy

  • @konserv
    @konserv 10 месяцев назад +1

    I would rather choose smaller hash and worry about hash collisions than picking wider hash and hoping there will be no collisions.
    Additionally to that, our CPUs and memory works better with 32/64 bit data that with 128 bit data. And, in this case, I would pick CRC32.
    Also, MD5 was designed as a cryptographic hash, you don't need cryptograpy overhead here. Simpler hash is better here.
    It would be nice to see a benchmark of those 3 approaches. Wide index, MD5, CRC32.
    Anyway, thanks for the video. I learned about auto-generated columns. And it was interesting.

    • @PlanetScale
      @PlanetScale  10 месяцев назад +2

      CRC32 plus the additional conditions to dedupe hashes would work just fine!

  • @gautam_vaishnav
    @gautam_vaishnav 10 месяцев назад

    Great ♥♥

  • @adampatterson
    @adampatterson 9 месяцев назад

    This is really cool, the only thing I was curious about is the hashing performance?
    I know the hash is only created in insert and update.
    It doesn't look like MySQL supports any of the newer hashes like xxh3 which would be my choice since it's so incredibly fast, typically faster than the computers memory bandwidth. It's not cryptographically ( is that a word ) strong, but it's randomness is extremely high

    • @PlanetScale
      @PlanetScale  9 месяцев назад

      MD5 is incredibly fast also, which is actually one of the reasons it shouldn't be used for cryptography! I haven't benchmarked it, but I think it should be fast enough for most needs. You could try CRC32 but you'd have a ton of collisions so you'd definitely need to add the additional conditions mentioned at the end of the video. Worth a few experiments to see, I suppose!

  • @nebbsie
    @nebbsie 8 месяцев назад

    Nicely explained! What software are you using for querying?

    • @PlanetScale
      @PlanetScale  8 месяцев назад

      TablePlus! ruclips.net/video/7V_CJBPZPes/видео.html

  • @conceptrat
    @conceptrat 10 месяцев назад

    You are very entertaining. Great content. Thanks 👍
    Straight to jail! 😂

  • @mahdikarimi6467
    @mahdikarimi6467 10 месяцев назад

    Love you Aaron

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      🤗 Thanks Mahdi! Made my day

  • @JohnnyBigodes
    @JohnnyBigodes 9 месяцев назад

    We use this to search for duplicates on the database. A thing that I would personally change is making all strings to LOWER(), so that if the user uses a big letter instead of a smaller one, it still sees it as a duplicate.

    • @PlanetScale
      @PlanetScale  9 месяцев назад +1

      Nice! That's probably a good addition. Good thinking

  • @alvaroaquije2572
    @alvaroaquije2572 10 месяцев назад

    This man needs to be more viral; question: you mentioned strict equality, but will this approach also work with statements like ‘is like’?

    • @PablumMcDump
      @PablumMcDump 10 месяцев назад +1

      The hash function obliterates LIKE because there's no substring any more to match on.
      The MD5 hash is only suitable for exact matching.

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Unfortunately it only works for strict equality

  • @jayantrawat5961
    @jayantrawat5961 10 месяцев назад +1

    First of all that's a great video and thanks a lot for that.
    I have a question regarding this, as we have 6 columns hashed as index and I need to search in 3 columns instead of six, will it be helpful in that case too?? Like I want to search in address, zip code and state column, will it work??

    • @LucasSouza1
      @LucasSouza1 10 месяцев назад

      10:50

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Unfortunately @LucasSouza1 is correct. It's no longer possible because you've destroyed quite a bit of that data by hashing it.

  • @SebastianMares
    @SebastianMares 10 месяцев назад

    Just wondering: Is an index on a generated column like in your example identical performance-wise and storage-wise to an index on an ULID or optimized UUID column? I assume the fragmentation would be worse for such a generated hash column due to the randomness of the values. (And yeah, I know the two serve totally different purposes, no question in that.)

    • @PlanetScale
      @PlanetScale  10 месяцев назад +2

      It'll be about the same as any other secondary index! The bad thing about random UUIDs as *primary* keys is that the entire clustered index has to be rebalanced along with all the data. That's why you (rightly) here people go on and on about ULIDs and time-ordered UUIDs, so that that rebalancing doesn't have to happen. With a secondary index, it's constantly being broken and rebalanced, but it's so much cheaper because it's only the indexed data, not the entire table's worth of data as is the case with the clustered index. (Note this is accurate for MySQL only, I know very little about the inner workings of other databases.) Hope that makes sense slash answers your question!

    • @SebastianMares
      @SebastianMares 10 месяцев назад

      @@PlanetScale Thanks!

  • @SlimNateKC
    @SlimNateKC 10 месяцев назад

    I love this, I’m curious why this isn’t just the default behavior for indices and what the crossover point is for when this is faster than the traditional index

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      It has pretty severe tradeoffs, like the fact that it only works for strict equality lookups

  • @vasiovasio
    @vasiovasio 8 месяцев назад

    Well Aaron, here is where the Real Black Magic happens!
    Actually, this is really logical for some specific cases, not all of course.
    Keep introducing more of these hidden gems!

  • @bogdanta7931
    @bogdanta7931 10 месяцев назад

    from how many indexed tables does it start to make sense?

  • @ikhwanmaftuh
    @ikhwanmaftuh 10 месяцев назад

    Niw i wonder what if the we add a cokumn from other table

  • @computerfan1079
    @computerfan1079 10 месяцев назад

    Remember: If you hash passwords with MD5: straight to jail😂. Great video, very educational

  • @dave6012
    @dave6012 10 месяцев назад +2

    Any concern with performance for tables where rows are frequently updated?

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Hashes like this will be pretty cheap to calculate, so it's no concern to me.

  • @shamimi123
    @shamimi123 10 месяцев назад

    Using that alongside hstore is pretty

  • @vasilisthemelis9017
    @vasilisthemelis9017 10 месяцев назад

    You can have collisions due to the separator being included in the value stored in the columns as well

    • @PlanetScale
      @PlanetScale  10 месяцев назад +1

      Totally! If that's possible with your data, that'd be a great reason to use the redundant conditions (mentioned at the end of the video) for deduping.

  • @j4ggi
    @j4ggi 10 месяцев назад

    Are generated columns in mysql persisted by default or does it allow you to create an index on non-persisted value? both options seem kinda weird/interesting

    • @PlanetScale
      @PlanetScale  10 месяцев назад +4

      By default they are virtual, meaning they are calculated at run time. You can create them with the STORED keyword and they'll be written to disk. Interestingly, if you create a virtual generated column and then index it, the value is in fact written to disk, because the index is written to disk.

  • @dragonwave2652
    @dragonwave2652 10 месяцев назад

    What app are you using to query request mysql?

  • @DJMariush
    @DJMariush 10 месяцев назад

    So this technique could be used to hash string column with uuid values that are a key, to avoid indexing uuid column?

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      There is actually a uuid_to_bin column to compress a uuid value, which you could use to create a smaller index!

  • @mateuszmalik1531
    @mateuszmalik1531 8 месяцев назад

    One question. Would it be possible to send a hashed value to MySQL over unencrypted network at use some built in MySQL function to unhash this data to separate columns?

    • @PlanetScale
      @PlanetScale  8 месяцев назад

      Hashing (like shown here) is a one-way function, so it's not reversible. I'm not sure exactly what you're going for, to be honest!

    • @mateuszmalik1531
      @mateuszmalik1531 8 месяцев назад

      @@PlanetScale In general I have two questions. 1st. You showed that MySQL can (un)hash data and keep it private or concatenate larger information, it can even do so in realtime based on information changed in the fields of a table. Can it also do it in the other way, update fields in the table based on changed data in the hashed field in real time? 2nd. Maybe this is a topic for different time/video. Synchronise two distant databases. An offline version periodically connects to an online and upon connection a synchronisation takes place. Is there a built in MySQL function to perform something like this?

  • @user-my5bk5xm9g
    @user-my5bk5xm9g 10 месяцев назад

    Is there a hash function in MySQL, such that if I run a query with "order by f(a,b,c)", the result set will be in the same order, as for the query with "order by a, b, c"?

    • @PlanetScale
      @PlanetScale  10 месяцев назад

      Well now that's an interesting question isn't it. I can't think of a lossless way to do that off the top of my head, but the question will haunt me so I'll let you know if I come up with anything.

  • @sohaibesohaib2914
    @sohaibesohaib2914 10 месяцев назад

    very usefull