One advantage of geospatial databases is shifts. If you were using SQL and had latitude and longitude columns, there's a huge dilemma. What if the Earth's rotation is disrupted and we have to reassign longitudes? With SQL, your poor team is up all night manually updating the longitude column. With geospatial, you just have to update the parent blocks.
@@jordanhasnolife5163 Sure. Everyone knows that Australia shifts a lot each year. Instead of having to look up the individual rows for the things in Australia and update their lat, long by some offset, now you can just update the offset for the Australia node. The search time is the same. The update time is WAY less.
The step 5, I don't think it's enough to search bcb, bda, bad, we should also search for bac, bbc, bca, bcc, bcd and bdc. You don't know where the point is exactly in in the box bcb, so we need to search all 8 boxes around bcd.
8:22 Here in step 5 and 6, why do we need to search all points in the neighbouring boxes? Since we have the longitude and latitude of our position couldn't we calculate the places that fall along or within our circumference and then look up the geo hashes for them in our box and the neighbouring boxes?
there are infinite points on a circle and geohashes can be arbitrarily small. You want to start at some bounding box, get all of the points in there, and then confirm that they are in our radius. that being said, you wouldn't search a particular bounding box if you can tell that its closest point to our focal point is outside of the radius we care about.
Thanks for the amazing explanation i have few questions - - wouldn't maintaining the geohashes in sorted order will be a costly operation? - when a company like yelp start maybe they will start with only few countries at the beginning. At that time will they maintain geospatial indexes for the entire globe or just the countries they onborded with? If they are just maintaining it for then onboarded countries then again sorting thing when new county is onboarded will a costly operation
1) What makes it costly? I only insert locations once, but I read them many times. 2) Why not maintain it for the whole globe from the start? It's not like this requires extra entries in the database. We just have one hash per restaurant.
Hey Jordan, QQ why cant i create a composite index of x and y coordinate in a RDBMS ? MySQL for eg would store this data in b-trees. If i index it as CREATE INDEX lat_long_idx(latitude, longitude) the btree would first store the nodes sorted by lat and then within that subtree further nodes sorted by long. Now when i search for all points within a 1km radius of my location, b-tree would be able to fetch all the lat nodes in O(log n) time and then filter the long within those nodes in another O(log n) time. Please correct me if I am wrong, wouldn't this be a good enough time complexity even for billions of points ?
These are discrete points. Think about an example where you want to find points near 5, 5. The points in the DB are (1, 8), (3, 3), (4.9, 20) (5.1, 5.1), (5.1, 10), (5.2, 8), (5.2, 2), (5.3, 5) and draw those out in a composite index. You can basically only filter down by latitude before you have to do a linear scan of all of the longitudes.
@@jordanhasnolife5163 ok so if I want to find all points within a circle of radius r around the point x,y I can run this query : select * from points where x between x-r and x+r and y between y-r and y+r; This would run in O(log n) owing to the composite index (if I am not mistaken) but this would return me all the points within a square of size 2r, whereas we need the points within a circle, did you mean to say that we would then have to do a linear search on all points within the square to find out which of these are within the cirlce ?
Hey Jordan ! i answered my own question there I guess I got your point about the discrete points, i can narrow down the x cordinate in a range say 5-6 in O(log n) but then i need to find the y range for each of the points in this result set, there can be millions of rows like 1000 for 5.01, another 1000 for 5.02 and so on so if there are m discrete x coordinates between 5 and 6 we end up with O(m log n) which would be a killer. Thanks a lot !!!
Around 7 1/2 minutes into the video, point 5, how did you come up with those inequalities if the X value could be as little as 0.7. Wouldn’t that include BCC?
So keep in mind that x can be as little as .7 but it's a circular radius. So if the radius of the circle is 1, that means that 1 is the hypotenuse of a right triangle where the sides are (.707, .707, 1) where 1 is the hypotenuse (recall Pythagorean theorem). So technically, the furthest bottom left point that we can hit from 1.7, 1.7 is (.993, .993) which would be in BCC. So well done mate ya got me
@@jordanhasnolife5163 I've meant that as far as I understand this (and please correct me if i'm wrong) ,the all problem with the reason that we didn't want to use indexes on x or y was that it will give us all the points on axis X axis Y. but using index on the composition of both, should provide us with the exact area no? Thanks a lot man - appreciate and really love all your videos
@@talhumy Hey Tal! If you think about a composition index, what that really accomplishes is first sorting on x, then sorting on y. However, we're looking for a range of x and y values, just just one x value and a range of y values. So not all of those points will be directly next to one another in a composition index.
The best explanation I have seen till now.
One advantage of geospatial databases is shifts. If you were using SQL and had latitude and longitude columns, there's a huge dilemma. What if the Earth's rotation is disrupted and we have to reassign longitudes?
With SQL, your poor team is up all night manually updating the longitude column. With geospatial, you just have to update the parent blocks.
Nice point! Can you think of an example where we'd do shifts like these?
@@jordanhasnolife5163 Sure. Everyone knows that Australia shifts a lot each year. Instead of having to look up the individual rows for the things in Australia and update their lat, long by some offset, now you can just update the offset for the Australia node.
The search time is the same. The update time is WAY less.
Wouldn't we already doomed if Earth's rotation is distrupted? I think I wouldn't care about updating longitudes at that point.
This is the question that interviewer would ask when they really want to reject you.
The step 5, I don't think it's enough to search bcb, bda, bad, we should also search for bac, bbc, bca, bcc, bcd and bdc. You don't know where the point is exactly in in the box bcb, so we need to search all 8 boxes around bcd.
I think you do because you have the physical coordinates of your point and know the coordinate bounds of the other boxes
8:22 Here in step 5 and 6, why do we need to search all points in the neighbouring boxes? Since we have the longitude and latitude of our position couldn't we calculate the places that fall along or within our circumference and then look up the geo hashes for them in our box and the neighbouring boxes?
there are infinite points on a circle and geohashes can be arbitrarily small. You want to start at some bounding box, get all of the points in there, and then confirm that they are in our radius.
that being said, you wouldn't search a particular bounding box if you can tell that its closest point to our focal point is outside of the radius we care about.
@@jordanhasnolife5163 Got it, thank you.
Thanks for the amazing explanation i have few questions -
- wouldn't maintaining the geohashes in sorted order will be a costly operation?
- when a company like yelp start maybe they will start with only few countries at the beginning. At that time will they maintain geospatial indexes for the entire globe or just the countries they onborded with? If they are just maintaining it for then onboarded countries then again sorting thing when new county is onboarded will a costly operation
1) What makes it costly? I only insert locations once, but I read them many times.
2) Why not maintain it for the whole globe from the start? It's not like this requires extra entries in the database. We just have one hash per restaurant.
@@jordanhasnolife5163 thanks for replying
Hey Jordan, QQ why cant i create a composite index of x and y coordinate in a RDBMS ? MySQL for eg would store this data in b-trees. If i index it as CREATE INDEX lat_long_idx(latitude, longitude) the btree would first store the nodes sorted by lat and then within that subtree further nodes sorted by long. Now when i search for all points within a 1km radius of my location, b-tree would be able to fetch all the lat nodes in O(log n) time and then filter the long within those nodes in another O(log n) time. Please correct me if I am wrong, wouldn't this be a good enough time complexity even for billions of points ?
These are discrete points. Think about an example where you want to find points near 5, 5. The points in the DB are (1, 8), (3, 3), (4.9, 20) (5.1, 5.1), (5.1, 10), (5.2, 8), (5.2, 2), (5.3, 5) and draw those out in a composite index. You can basically only filter down by latitude before you have to do a linear scan of all of the longitudes.
@@jordanhasnolife5163 ok so if I want to find all points within a circle of radius r around the point x,y I can run this query :
select * from points where x between x-r and x+r and y between y-r and y+r;
This would run in O(log n) owing to the composite index (if I am not mistaken) but this would return me all the points within a square of size 2r, whereas we need the points within a circle, did you mean to say that we would then have to do a linear search on all points within the square to find out which of these are within the cirlce ?
Hey Jordan ! i answered my own question there I guess I got your point about the discrete points, i can narrow down the x cordinate in a range say 5-6 in O(log n) but then i need to find the y range for each of the points in this result set, there can be millions of rows like 1000 for 5.01, another 1000 for 5.02 and so on so if there are m discrete x coordinates between 5 and 6 we end up with O(m log n) which would be a killer. Thanks a lot !!!
@@arambh-gaur yep exactly. Nice!
Fire video - no cap!!
Around 7 1/2 minutes into the video, point 5, how did you come up with those inequalities if the X value could be as little as 0.7. Wouldn’t that include BCC?
So keep in mind that x can be as little as .7 but it's a circular radius. So if the radius of the circle is 1, that means that 1 is the hypotenuse of a right triangle where the sides are (.707, .707, 1) where 1 is the hypotenuse (recall Pythagorean theorem). So technically, the furthest bottom left point that we can hit from 1.7, 1.7 is (.993, .993) which would be in BCC. So well done mate ya got me
ohhhh of course. sorry i watch your videos stoned but theyre great@@jordanhasnolife5163
I know its not optimal, but what if we've created the index of a composition of {x,y} , would it solve it?
Sure, though to be fair we could also have no index at all and that would solve it too. We just want to be as fast as possible.
@@jordanhasnolife5163 I've meant that as far as I understand this (and please correct me if i'm wrong) ,the all problem with the reason that we didn't want to use indexes on x or y was that it will give us all the points on axis X axis Y. but using index on the composition of both, should provide us with the exact area no?
Thanks a lot man - appreciate and really love all your videos
@@talhumy Hey Tal! If you think about a composition index, what that really accomplishes is first sorting on x, then sorting on y. However, we're looking for a range of x and y values, just just one x value and a range of y values. So not all of those points will be directly next to one another in a composition index.
BEST!
Amazing.