GeoSpatial Indexes - Why You Need Them | Systems Design Interview 0 to 1 with Ex-Google SWE

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии •

  • @tarun4705
    @tarun4705 11 месяцев назад +4

    The best explanation I have seen till now.

  • @John-nhoJ
    @John-nhoJ Год назад +5

    One advantage of geospatial databases is shifts. If you were using SQL and had latitude and longitude columns, there's a huge dilemma. What if the Earth's rotation is disrupted and we have to reassign longitudes?
    With SQL, your poor team is up all night manually updating the longitude column. With geospatial, you just have to update the parent blocks.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад +2

      Nice point! Can you think of an example where we'd do shifts like these?

    • @John-nhoJ
      @John-nhoJ Год назад +1

      @@jordanhasnolife5163 Sure. Everyone knows that Australia shifts a lot each year. Instead of having to look up the individual rows for the things in Australia and update their lat, long by some offset, now you can just update the offset for the Australia node.
      The search time is the same. The update time is WAY less.

    • @wexwexexort
      @wexwexexort 10 месяцев назад +9

      Wouldn't we already doomed if Earth's rotation is distrupted? I think I wouldn't care about updating longitudes at that point.

    • @iknowinfinity
      @iknowinfinity 2 месяца назад +2

      This is the question that interviewer would ask when they really want to reject you.

  • @abbyliu1875
    @abbyliu1875 29 дней назад +1

    The step 5, I don't think it's enough to search bcb, bda, bad, we should also search for bac, bbc, bca, bcc, bcd and bdc. You don't know where the point is exactly in in the box bcb, so we need to search all 8 boxes around bcd.

    • @jordanhasnolife5163
      @jordanhasnolife5163  24 дня назад

      I think you do because you have the physical coordinates of your point and know the coordinate bounds of the other boxes

  • @raghavendrag8036
    @raghavendrag8036 4 месяца назад +1

    8:22 Here in step 5 and 6, why do we need to search all points in the neighbouring boxes? Since we have the longitude and latitude of our position couldn't we calculate the places that fall along or within our circumference and then look up the geo hashes for them in our box and the neighbouring boxes?

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 месяца назад +1

      there are infinite points on a circle and geohashes can be arbitrarily small. You want to start at some bounding box, get all of the points in there, and then confirm that they are in our radius.
      that being said, you wouldn't search a particular bounding box if you can tell that its closest point to our focal point is outside of the radius we care about.

    • @raghavendrag8036
      @raghavendrag8036 4 месяца назад

      @@jordanhasnolife5163 Got it, thank you.

  • @dishagupta7446
    @dishagupta7446 6 месяцев назад +1

    Thanks for the amazing explanation i have few questions -
    - wouldn't maintaining the geohashes in sorted order will be a costly operation?
    - when a company like yelp start maybe they will start with only few countries at the beginning. At that time will they maintain geospatial indexes for the entire globe or just the countries they onborded with? If they are just maintaining it for then onboarded countries then again sorting thing when new county is onboarded will a costly operation

    • @jordanhasnolife5163
      @jordanhasnolife5163  6 месяцев назад +2

      1) What makes it costly? I only insert locations once, but I read them many times.
      2) Why not maintain it for the whole globe from the start? It's not like this requires extra entries in the database. We just have one hash per restaurant.

    • @dishagupta7446
      @dishagupta7446 6 месяцев назад +1

      @@jordanhasnolife5163 thanks for replying

  • @arambh-gaur
    @arambh-gaur 3 месяца назад +1

    Hey Jordan, QQ why cant i create a composite index of x and y coordinate in a RDBMS ? MySQL for eg would store this data in b-trees. If i index it as CREATE INDEX lat_long_idx(latitude, longitude) the btree would first store the nodes sorted by lat and then within that subtree further nodes sorted by long. Now when i search for all points within a 1km radius of my location, b-tree would be able to fetch all the lat nodes in O(log n) time and then filter the long within those nodes in another O(log n) time. Please correct me if I am wrong, wouldn't this be a good enough time complexity even for billions of points ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад +1

      These are discrete points. Think about an example where you want to find points near 5, 5. The points in the DB are (1, 8), (3, 3), (4.9, 20) (5.1, 5.1), (5.1, 10), (5.2, 8), (5.2, 2), (5.3, 5) and draw those out in a composite index. You can basically only filter down by latitude before you have to do a linear scan of all of the longitudes.

    • @arambh-gaur
      @arambh-gaur 3 месяца назад +1

      @@jordanhasnolife5163 ok so if I want to find all points within a circle of radius r around the point x,y I can run this query :
      select * from points where x between x-r and x+r and y between y-r and y+r;
      This would run in O(log n) owing to the composite index (if I am not mistaken) but this would return me all the points within a square of size 2r, whereas we need the points within a circle, did you mean to say that we would then have to do a linear search on all points within the square to find out which of these are within the cirlce ?

    • @arambh-gaur
      @arambh-gaur 3 месяца назад +1

      Hey Jordan ! i answered my own question there I guess I got your point about the discrete points, i can narrow down the x cordinate in a range say 5-6 in O(log n) but then i need to find the y range for each of the points in this result set, there can be millions of rows like 1000 for 5.01, another 1000 for 5.02 and so on so if there are m discrete x coordinates between 5 and 6 we end up with O(m log n) which would be a killer. Thanks a lot !!!

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад +1

      @@arambh-gaur yep exactly. Nice!

  • @recursion.
    @recursion. Год назад +1

    Fire video - no cap!!

  • @zachlandes5718
    @zachlandes5718 11 месяцев назад +1

    Around 7 1/2 minutes into the video, point 5, how did you come up with those inequalities if the X value could be as little as 0.7. Wouldn’t that include BCC?

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 месяцев назад +2

      So keep in mind that x can be as little as .7 but it's a circular radius. So if the radius of the circle is 1, that means that 1 is the hypotenuse of a right triangle where the sides are (.707, .707, 1) where 1 is the hypotenuse (recall Pythagorean theorem). So technically, the furthest bottom left point that we can hit from 1.7, 1.7 is (.993, .993) which would be in BCC. So well done mate ya got me

    • @zachlandes5718
      @zachlandes5718 11 месяцев назад

      ohhhh of course. sorry i watch your videos stoned but theyre great@@jordanhasnolife5163

  • @talhumy
    @talhumy 10 месяцев назад +1

    I know its not optimal, but what if we've created the index of a composition of {x,y} , would it solve it?

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад

      Sure, though to be fair we could also have no index at all and that would solve it too. We just want to be as fast as possible.

    • @talhumy
      @talhumy 10 месяцев назад +1

      @@jordanhasnolife5163 I've meant that as far as I understand this (and please correct me if i'm wrong) ,the all problem with the reason that we didn't want to use indexes on x or y was that it will give us all the points on axis X axis Y. but using index on the composition of both, should provide us with the exact area no?
      Thanks a lot man - appreciate and really love all your videos

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад +1

      @@talhumy Hey Tal! If you think about a composition index, what that really accomplishes is first sorting on x, then sorting on y. However, we're looking for a range of x and y values, just just one x value and a range of y values. So not all of those points will be directly next to one another in a composition index.

  • @shineinusa
    @shineinusa 4 месяца назад +1

    BEST!

  • @wexwexexort
    @wexwexexort 10 месяцев назад +1

    Amazing.