I think what would have added more interest is selecting two locations separated by a path obstacle, e.g. Athens to Palermo where the straight line Cartesian distance is small but the road path has to detour around the Adriatic Sea. Then it would be interesting to see how the algorithm adjusts the path weights.
Agree it would be interesting to see what happens there! I wanted to mainly rely on an example where everything works exactly as we want, thus Prague and Rome. But giving your example e.g. at the very end would be definitely very nice! EDIT: Made a clip with a similar example and linked in the video description. (github.com/...)
python reads like pseudocode so its often used as a substitute. it also has an added bonus of standard definitions for terms so its easier to understand
While the first part of the video was clarifying and reinforced what I already knew + gave better intuition, the last part with the 15 puzzle was really illuminating! It kinda illustrates a thought I've had lately that "everything can be a graph problem if you squint hard enough"!
An important thing he forgot to mention about the puzzle: It doesn't find **a** solution, it finds **the best** solution. That's the point of A*: returning the same optimal solution as Dijkstra, but faster. I love A*.
One major way that maps can give you answers so quickly is that they actually cache the results. One person searches for Prague to Rome and that result is saved for later searches, making the second search exceedingly faster. Saved results have a max timeout where they will always be forgotten after a certain time as they will invariably become obsolete at some point. And each node along saved paths have a list of saved routes going through it, should anything change with a node then all saved results are immediately invalidated or re-evaluated. Frequent searches for routes (say London City to Heathrow) are given a higher priority than infrequent searches such as Yakutsk to Lisbon. The high value routes are ALWAYS re-evaluated on node changes rather than just forgotten. To ensure that when the next search happens, the result is as always pretty much immediate. There's most likely also an imbalance in node lengths. Despite road ABC being shorter than DEF, DEF is always FASTER than ABC in the long term. Another place where results are cached are on the internet as a whole. Your ISP (Internet Service Provider) keeps the last result of a certain web request stored locally so subsequent requests can be satisfied much faster than having to contact the endpoint in say California when the client is in Berlin. A local copy in Berlin can be served in milliseconds where having to fetch all resources from California could take several seconds. Heck, even your browser has a local cache of pages. Especially the heavy stuff like images, audio and video. (Which is funny by the way, you are not allowed to download copyrighted material but at the same time your browser does exactly this when it caches copyright protected images etc) On an even grander scale, RUclips caches A LOT. When a new video is released in Tokyo and saved to a data center in Japan. That is the only actual copy of the video until it is slowly trickled across all RUclips's datacenters. But when a person from Norway watches that video for the first time, that video is immediately stored in a data center in Norway and so anyone else watching the video from Norway will have no issues with buffering it unlike the first one. Eventually, all of RUclipss data centers will have a (basic) copy of a newly released (and popular) video such that new viewers can at least see the video in real time without having to buffer it (though they might be seeing it in 480p at first) And videos that aren't as popular will simply fade away from non-major data centers over time as fewer to no one is watching them. In short, everything on the internet is generally going fast because results are cached. That is why data centers need petabytes of storage. Not because there's petabytes of video being uploaded to each of them but the fact that ANYTHING that is uploaded ANYWHERE is cached locally in data centers even if that local data center only really serves a thousand clients.
"Frequent searches for routes" In that case there must be some sort of "location rounding" included to find earlier searches. Practical everyone have different start or finish address. Most likely starting from or going to their home address and that will be different for each search.
@@Foersom_ Yes, locally going from north part of city A to south part of city B creates a path to any one point where a cached path exists that take you from city A to city B the fastest. The algo is still the same, just that once you hit a cached path going to your destination city (read, not locale) it will piggyback on that cached path. At the same time it does a reverse search from the locale in city B which would take you to city A until it finds the cached path A->B and then says "hop off this path at this node in City B".
Well explained. I have been using A* pathfinding in my game for over a year now. It's insanely fast in video games compared to Djikstra for example, because the areas to be calculated are often very very large. I roughly knew how A* works, but this video certainly clarified well!
Using the third dimension to visualise the direct distance of each node to Rome is GENIUS. I don't know if that was your idea but regardless the animation is great and the explanation is clear, this is a great video.
At first I wasn't very interested, I just figured I should start learning algorithms to improve my code, but by the time in all came together in "Implementation" my tiny little mind was blown. Intuitive, creative, and just downright cool; not only do you make A* easier to grasp, you also made it more interesting than a simple description of the process
Great video! The use of Also Spoke Zarathustra by Strauss for the reveal of the "physical" interpretation of node potentials really tickled my funny bone. I also really loved how you avoid introducing the vocabulary of heuristic until it basically falls out of massaging the initial algorithm - I've often seen A* introduced and explained directly reasoning with straight-line distances, and I never really grasped how that is more of a special case for applying A* than the general case [operating on the graph by redistributing potentials].
The German "also" differs in meaning from the English "also", so it's either "Thus spoke Zarathustra" or "Also sprach Zarathustra". If you mix the languages, you distort the meaning.
Great video, I think I finally understand A*after all these years! For the 15-puzzle, I thought we had to divide the heuristic by 2, but we're not swapping numbers so that doesn't make sense lol
Actually, when we first wrote the code for the 15 puzzle, we also divided by two as we did not think it out properly and then were surprised that the heuristic sucked :D
Okay, writing down my understanding: A* is a way to speed up a Djikstra search through a graph by providing an opportunity to input information from what the graph represents. For example, if each node in the graph represents a point in 2D space, you can use those points to set a “potential” for each node, which warps the graph into a new one which Djikstra will solve more quickly. So it will still produce the same output as Djikstra - but it changes the order in which Djikstra searches so that it makes its way to the solution sooner. Because Djikstra searches shorter path lengths first, if you warp the graph such that the better paths are shorter, it’ll search them first. It’s just a matter of warping the graph into something that still works for Djikstra, using a method which is actually faster. Do I have this right?
Yes! But be careful, purely geometrical intuition is hard here, since the graph is directed, with different length in one direction and the opposite one.
This is actually a common pattern in Computer Science. Usually when we find something that works we try to optimize it giving information to the structure of the input data. That's what happened going from Fully connected Neural networks to Convolutional Neural networks for example.
A* is basically guiding your search with an heuristic function you designed intuitively, or gained by other means (AI Analysis of problem), its all really about the quality of the heuristic and how well the search problem space is fit for it
I coded an A* algorithm from scratch after reading how it worked, and not looking up any code snippets. As a junior coder, it was very fulfilling and enlightening. I highly recommend anyone dabbling in this stuff, such as unity, to give it a try.
Using A* to solve search problems is always fun. I've had a bunch of random problems about finding the shortest way to solve something where I pretty quickly decide that the graph search is just the best way. No wonder why these graph search algorithms are the main thing I remember from AI course...
I am a very confused aerospace engineering student who stumbled upon map algorithm youtube by clicking a video that I thought referred to the aerodynamic variable A*, which we use to find the throat area for perfectly sonic flow through a wind tunnel. A few videos later and I feel very educated about something completely different. :)
I did not squint at the rubrics cube to see if I was able to predict whatever you were going to say. Nope...didn't at all... On a different note, thanks for this clear explanation. Definietly a good starting point to understanding what this A* algorithm is about.
That explains something I've noticed about route planning apps. I live near a river, downstream of the most likely crossing for most destinations on the other side of the river. The route my satnav, Google Maps, etc. plots *to* the crossing depends on where I'm going *after* the crossing. I'd worked out that it was looking at routes nearest a straight line between my start and end points. The video has confirmed it, and told me what the algorithm is.
i got really confused, but it kinda cleared up when i realized that by potentials i think you were talking about the heuristic measure for how you choose the next `min` distance node
Lol @ the space odyssey music. Also the idea you highlighted at the beginning is nice. The weights on the edges makes the problem a discrete analog of finding length minimizing geodesics in Finsler geometry, as opposed to in Riemannian geometry.
Wow, I am absolutely blown away by the quality of your videos! The content, the script, and the overall production value is truly amazing. I am a CS student myself, and I have been considering starting my own channel. Your work has truly inspired me and I can't wait to try my hand at creating something similar. Would it be possible for you to share the tools and techniques you use to make your videos? It would be an honor to learn from someone as skilled as yourself. I would be eternally grateful if you could make a video about it, your guidance would be invaluable!
Hi, thanks! We use manim, a python library, then kdenlive. In any case, i recommend you try making your own videos! A good place to submit them to is 3blue1brown competition, this helped us a lot!
Very nice explanation and animation. I came to understanding A* in a different, maybe the classic way, but this one with node elevation is definitely a very interesting view on the same thing. I started with Dijkstra (which is in turn a BFS with priorities added to the edges) and instead of using distance as the priority, use something that is commonly called a "cost". This cost can (under the rules you described well in the video) consist of more factors added up - like the road distance and straight-line distance. One can then add other factors as fuel consumption, elevation and toll prices as other factors by summing them up. I think some real-world route-finding algorithms are based on a bi-directional version of A*, where the boundary gets expanded from both starting and ending points and the route is formed when the boundaries meet (perhaps oversimplified). It should be also possible to find more alternative routes this way when the algorithm doesn't terminate immediately after the first route is found but tries, say, three times where some cost is already added to existing routes.
I love how the height (h) in this conceptual map is actually not spatial height Z, but the travel time height! Using the fourth dimention provided by time instead of the third dimention provided by height. It is a really special way of thinking about problem solving.
I always thought of A* in a similar way, however I thought of potential as more of an abstract selection/queuing process than something physical. This is why manhattan distance and binary heaps are so effective at speeding up the process. There is a fractal nature to the algorithm because the set of explored paths can be expressed as a node, where the paths escaping it are dynamically recomputed based on the discovered cost of the path to the discovered nodes adjacent to undiscovered nodes. This of course isn't how the algorithm works but it is isometric with the method. There is inherent comparative nature to the algorithm, where the actual distance does not matter, but instead the relative measure between possible paths.
Funnily enough, I learnt about A* before I learnt about Dijkstra's. When they eventually taught us Dijkstra's in university, my thought was basically "Oh, it's just A* without the heuristic" :D
I am a physics undergrad and as soon as you started talking about potentials and showed the road map in a 3D view with Prague being at a high point and Rome at a lower one, it immediately started screaming in to my ears "Gradient". The way I visualize the problem is that we are trying to find the direction at which we must move from each node in order to have the most steep descent because the steepest descent means that the next node is going to be closer to Rome than the rest. This is exactly what the gradient in vector calculus shows you. For anyone who isn't familiar a gradient is a vector that shows you the direction of steepest ascent so by simply taking the negative of that you find the direction of steepest descent. So all you have to do to solve the problem is at each node to take the negative gradient for each path and then simply choose the one at which the grad was smallest. Then you repeat for the next node until you reach Rome at which point the grad will be zero because you're at the lowest point.
There is a connection with physics that I did not want to get into in the video: The way you operate with potentials mathematically is a discrete analogue to how you operate with physical potentials/fields.
Subscribed lightning fast, I need more algorithms intuitively taught not the mindless pseudo code without intuition or reason why something is done. Keep coming the great series on algorithms, I would also love to see systems topics covered, compilers, HPC and microprocessors. Do consider
I love this video, just one suggestion on the color scheme: the color of the dots being almost the same as the color of the sea made it hard to distinguish between them and to read the numbers. I would have chosen to not color the sea at all, just imply it with the outline to europe. Or made the sea a slightly different beige color, so it was easy to see it as different from the continent, but still having a big contrast with the dots and the numbers. Other than that, I loved this video! I learned a lot
@@PolylogCSI have a similar kind of feedback so I’ll put it here. I’m watching the video with subtitles on mobile and the place you’ve chosen to put most of the info like formulas and data gets covered by the subs, so maybe a higher placement of that information would be better for upcoming videos. Thanks for an otherwise great and informative video!
This is a really nice video. this reminds me of a similar idea with kirchhoff's law. A cicuit is like a directed acylic graphy of potential differences.
I mentioned something similar on the previous video, but yeah this is exactly how optimal cube solvers work! Optimal meaning number of moves while also proving it's an optimal solution. Generally the heuristic is done through an extremely large table which tells how many moves it takes to solve a specific part of the cube. (on the previous video I was talking about a near-optimal algorithm, which ironically tends to find an optimal solution thousands times faster than optimal but then takes significantly longer to prove it's optimal) Oh and in case anyone's wondering why people care so much about cube solving algorithms, besides doing it for the sake of the challenge itself, a big reason is they are funnily enough very useful for _scrambling_ cubes, whether for practice or competitions. Doing ~30 random moves seems to weigh all possible states pretty equally, but there's not much of a way to prove that. Meanwhile using the solver lets you guarantee that every state is equally likely, and even does so in ~17 moves.
By the way, did anyone prove something about expansion properties of the Rubik cube graph? If yes, then you could actually prove that doing some small number of moves puts you in a basically random position.
@@PolylogCS I'm not really sure if anyone has done so to be honest. The 30 moves is more of a community consensus, based on it being suspected the 3x3 god's number was 20 for quite a long time and then you do a few extra turns for good measure. But either way modern solvers produce better scrambles simply by virtue of you spend less time scrambling the cube.
12:30 I would be interested to see another video with "potential other tricks" that other real world applications use. One thing that came out of my head is to precompute some simpler and higher level maps (with fewer nodes) and try to use them when calculating shortest path from A to B.
That is basically what they do, I think - most, anyway. It's called a "contraction hierarchy" and uses precomputation to turn any map search into the search for a common ancestor in a tree.
Nitpick: consistency (although I know it as monotonicity) is not required for A* to work. It's only that this is required if you want to view A* as Dijkstra on a reweighted graph as in your first application. But as soon as you simply use the heuristic to choose the next node to explore, as in your second implementation, the algorithm will work just fine.
Hi, I tried to make it clearer in the video description: with admissibility the algorithm returns the correct solution, however it may take exponential time in the size of the graph.
It’s basically the same as allowing negative edges for Dijkstra’s algorithm (considering there are no negative cycles though). Algorithm is still correct in a sense of the result output. But it may work exponentially long on these graphs.
@@lokosvyat Yep. There is the additional catch that here we finish once we reach destination for the first time. Admissibility is saying that when this happens, we have already computed the correct value there. Of course, if we finish when the heap gets empty, we do not need admissibility, but then we are solving the problem of finding distance to all other nodes, where A star is useless, since we can just use Dijkstra for that.
@@PolylogCS Yeah, it was just an analogy to indicate why it can be bad. I mean, you can run Dijkstra's algo on this graph and the result will be correct. But it may take a long time. I think no one denies the advantage of A* :)
I feel like a huge mapping database like Google Maps could use some other strategies to improve the A* heuristic as well, like using previous queries to see what routes are generally faster than others.
@@cooperised I would assume so yes, they have plenty of real time traffic data from the location and status information sent by android devices. They know for example not only where the device is but also whether the step tracker thinks you are walking/running/cycling etc. Also whether you previously spent time idle at a known transit stop etc so they can get a pretty good idea which devices are in private automobiles and average their speeds to estimate current congestion levels (simply compare this number to the speed limit). But then for calculating travel time the latter step is not necessary, that is only useful to highlight those segments on the map, for average travel time you need only the segment length and average speed, divide the former by the latter and there you have it.
They must have super massive location specific caching systems. I imagine they get (Prague, Rome) a bit more often in Prague and Rome than they do in other parts of the world ;)
@@RHCPhooligan Yeah, caching would be great for this. Sending the requests from NY, for someone planning a vacation, to their eurocache may be faster than calculating it in us-east. Your (Prague, Rome) comment made me wonder if routes are always symmetrical.
Being a theoretician, i had a similar sentiment until I started to view the algorithm as an application of potential reweighting, which is a beatiful trick with many theoretical applications.
Great explanations! I think it needs to be noted though that Google maps algorithm isn't formally 100% shortest path. Must be way more optimized for global scale, with precomputation etc, sacrificing formal correctness. On an occasion I've been able to add waypoints and have Google come up with a shorter version of my intended drive.
That's interesting, do you have a working example you can link? One thing that definitely happens is that highways are preferred over small roads even if longer, but that is not a sacrifice but reasonable decision. Your example could still be consistent with Google maps giving exact shortest path under some metric
@@PolylogCS Another example is if you're taking a long path, Google maps will slightly prefer paths that pass nearby to periodic petrol/gas stations or charging stations. If a slightly shorter path exists that stays away from certain amenities like this Google maps will prefer the slightly longer path as well, at least from what I can tell. It's also worth noting that many of these map systems make extensive use of precalculated data, for long-distance routes there is often some pre-calculated, more accurate data for the distances between places like major cities, and for further optimizing speed sometimes it'll prefer routing you through these pre-calculated higher-level networks. The service may also make use of publicly available data provided by country/state agencies such as traffic data, the shortest route may not be the fastest depending on time of day, road conditions, traffic, etc. The Heuristics used by a service like Google maps are quite complex.
Google Maps is provably reluctant to back-track, so much so that it will choose (timewise) longer routes than the fastest route. This has been true for at least fifteen years.
Okay. As an amateur programmer, I'm trying to get my head around how we actually Dijkstra navigate the graph without actually storing the whole graph in memory - for something like the Tile Puzzle, with 10 trillion nodes, we would clearly have to do this, since storing the map is not viable. As such we translate a Puzzle State into a Node along with its Potential (using the 'optimistic distance' as out potential as suggested in the video). We then interpret every legal move as an 'edge' to-from the State Node we're at and a 'nearby' State Node. When Dijkstra removes the Start node and adds in all its 'neighbors', that is the first time those neighboring nodes exist; we have to generate these new State Nodes and, at the same time, determine if it is a Node that already exists - the only way I can think to do this is by comparing it to every previously generated State Node. With a good Potential function, we should not add too many more nodes than we need to, so the 'neighborhood' of nodes to check should remain relatively small. More to the point, the only nodes being stored in memory at once time are those that are at any point part of the Neighborhood (or Boundary in the code). Is there a good way to estimate the Memory Impact of the algorithm as a function of size of the input space (in a similar way that we estimate Runtime in that big-Oh way?). At the end of the process, we have only acknowledged the existence of a tiny fraction of the total graph, and used the structure of the puzzle itself as a way to directly 'compute' the base edge lengths and the Potentials on an as-needed basis, which is pretty genius. It also cleanly explains why having an edge with a negative length would break the algorithm entirely, because the choice of how to expand the neighborhood depends on the fact that you are always getting 'further away' from the starting point in a global sense; that the length of your 'best current path' is always increasing, which is violated if a path-length can be negative. I would love to see some example where the appropriately well-chosen Potential could be used to fix the negative path length problem.
Very good video you got my sub, one precision about Maps and other maps software , 1 to 1 shortest path A* is great however when you have an extensible graph then a matrix distance is needed which is using dijkstra, when you talk about maps the distance matrix is using a dijkstra algorithm and once the shortest road is define a 1 to 1 is defined, maps turn by turn api will use A*.
Blog post with more details: vasekrozhon.wordpress.com/2024/09/04/the-hidden-beauty-of-the-a-algorithm/
This explanation gets an A*
Underrated comment
I see what you did there 😆
@@slavicprogrammer6100 Not anymore.
@@mustafa-damm there's a blanket
Polylog is A* 😊
I think what would have added more interest is selecting two locations separated by a path obstacle, e.g. Athens to Palermo where the straight line Cartesian distance is small but the road path has to detour around the Adriatic Sea.
Then it would be interesting to see how the algorithm adjusts the path weights.
Agree it would be interesting to see what happens there! I wanted to mainly rely on an example where everything works exactly as we want, thus Prague and Rome. But giving your example e.g. at the very end would be definitely very nice!
EDIT: Made a clip with a similar example and linked in the video description. (github.com/...)
@@PolylogCS I'm struggling to find it in the description. O.o Perhaps I'm being stupid. Could you post a link here?
@@PolylogCS Oh beloved convexity, my dear. Thank you for always make your way into simplifing optimization problems.
@@PolylogCS Beautiful! Further, what a surprise to actually discover a fellow matfyzák doing these videos. :)
I have that problem here. eBay tells me that Cardiff UK is only 25 miles away. But there is a long estuary in between. It's actually 90 miles by road.
I love it when math and science communicators use good animations. Good job!
This is a trick question, because we know, that all paths lead to Rome.
I like how he says pseudocode then proceeds to write Python.
Python is pseudocode
lmaooo
@@theguynextdoor--_--9591fcking diabolical
python reads like pseudocode so its often used as a substitute. it also has an added bonus of standard definitions for terms so its easier to understand
@@x3melodycat ok
While the first part of the video was clarifying and reinforced what I already knew + gave better intuition, the last part with the 15 puzzle was really illuminating! It kinda illustrates a thought I've had lately that "everything can be a graph problem if you squint hard enough"!
An important thing he forgot to mention about the puzzle: It doesn't find **a** solution, it finds **the best** solution. That's the point of A*: returning the same optimal solution as Dijkstra, but faster. I love A*.
@@m.sierra5258 thanks for feedback! I mentioned it at the very beginning of that chapter, but it is very easy to miss...
A pathfinding algo used for rubiks cube.... WTF
One major way that maps can give you answers so quickly is that they actually cache the results.
One person searches for Prague to Rome and that result is saved for later searches, making the second search exceedingly faster.
Saved results have a max timeout where they will always be forgotten after a certain time as they will invariably become obsolete at some point.
And each node along saved paths have a list of saved routes going through it, should anything change with a node then all saved results are immediately invalidated or re-evaluated.
Frequent searches for routes (say London City to Heathrow) are given a higher priority than infrequent searches such as Yakutsk to Lisbon.
The high value routes are ALWAYS re-evaluated on node changes rather than just forgotten. To ensure that when the next search happens, the result is as always pretty much immediate.
There's most likely also an imbalance in node lengths. Despite road ABC being shorter than DEF, DEF is always FASTER than ABC in the long term.
Another place where results are cached are on the internet as a whole. Your ISP (Internet Service Provider) keeps the last result of a certain web request stored locally so subsequent requests can be satisfied much faster than having to contact the endpoint in say California when the client is in Berlin. A local copy in Berlin can be served in milliseconds where having to fetch all resources from California could take several seconds.
Heck, even your browser has a local cache of pages. Especially the heavy stuff like images, audio and video. (Which is funny by the way, you are not allowed to download copyrighted material but at the same time your browser does exactly this when it caches copyright protected images etc)
On an even grander scale, RUclips caches A LOT. When a new video is released in Tokyo and saved to a data center in Japan. That is the only actual copy of the video until it is slowly trickled across all RUclips's datacenters.
But when a person from Norway watches that video for the first time, that video is immediately stored in a data center in Norway and so anyone else watching the video from Norway will have no issues with buffering it unlike the first one.
Eventually, all of RUclipss data centers will have a (basic) copy of a newly released (and popular) video such that new viewers can at least see the video in real time without having to buffer it (though they might be seeing it in 480p at first)
And videos that aren't as popular will simply fade away from non-major data centers over time as fewer to no one is watching them.
In short, everything on the internet is generally going fast because results are cached. That is why data centers need petabytes of storage. Not because there's petabytes of video being uploaded to each of them but the fact that ANYTHING that is uploaded ANYWHERE is cached locally in data centers even if that local data center only really serves a thousand clients.
"Frequent searches for routes"
In that case there must be some sort of "location rounding" included to find earlier searches. Practical everyone have different start or finish address. Most likely starting from or going to their home address and that will be different for each search.
@@Foersom_ Yes, locally going from north part of city A to south part of city B creates a path to any one point where a cached path exists that take you from city A to city B the fastest.
The algo is still the same, just that once you hit a cached path going to your destination city (read, not locale) it will piggyback on that cached path.
At the same time it does a reverse search from the locale in city B which would take you to city A until it finds the cached path A->B and then says "hop off this path at this node in City B".
Well explained. I have been using A* pathfinding in my game for over a year now. It's insanely fast in video games compared to Djikstra for example, because the areas to be calculated are often very very large. I roughly knew how A* works, but this video certainly clarified well!
I am happy that the RUclips algorithm persisted and recommended me this video over and over again. It is really good!
A+, keep the algo videos coming, so helpful, great explanation ..
Excellent video! Wonderful explanation involving many intuitions with cool examples to boot. Had a lot of fun listening and learning!
this video is like a 20 minute long beautiful symphony of ideas
Using the third dimension to visualise the direct distance of each node to Rome is GENIUS. I don't know if that was your idea but regardless the animation is great and the explanation is clear, this is a great video.
At first I wasn't very interested, I just figured I should start learning algorithms to improve my code, but by the time in all came together in "Implementation" my tiny little mind was blown. Intuitive, creative, and just downright cool; not only do you make A* easier to grasp, you also made it more interesting than a simple description of the process
Potentials has so many potentials... love the explanation @polylog
Great video! Thanks. So happy that this type of content is shared and celebrated on RUclips. I subscribed. :)
Thanks!
Great video! The use of Also Spoke Zarathustra by Strauss for the reveal of the "physical" interpretation of node potentials really tickled my funny bone.
I also really loved how you avoid introducing the vocabulary of heuristic until it basically falls out of massaging the initial algorithm - I've often seen A* introduced and explained directly reasoning with straight-line distances, and I never really grasped how that is more of a special case for applying A* than the general case [operating on the graph by redistributing potentials].
Thanks!
The German "also" differs in meaning from the English "also", so it's either "Thus spoke Zarathustra" or "Also sprach Zarathustra". If you mix the languages, you distort the meaning.
Amazing animations! The idea of modifying Dijkstra's algorithm with a 3d visualization is mind-blowing.
I hit like only for that 3D map explanation. Great work friend !
Great video, I think I finally understand A*after all these years!
For the 15-puzzle, I thought we had to divide the heuristic by 2, but we're not swapping numbers so that doesn't make sense lol
Actually, when we first wrote the code for the 15 puzzle, we also divided by two as we did not think it out properly and then were surprised that the heuristic sucked :D
Skvělé video, posílám pozdrav z Prahy !
:)
I love the intuitive understanding you present here, brilliant stuff 👍
All paths lead to rome we don’t need an algorithm
Very interesting application to the 15 puzzle!
Okay, writing down my understanding: A* is a way to speed up a Djikstra search through a graph by providing an opportunity to input information from what the graph represents. For example, if each node in the graph represents a point in 2D space, you can use those points to set a “potential” for each node, which warps the graph into a new one which Djikstra will solve more quickly. So it will still produce the same output as Djikstra - but it changes the order in which Djikstra searches so that it makes its way to the solution sooner. Because Djikstra searches shorter path lengths first, if you warp the graph such that the better paths are shorter, it’ll search them first. It’s just a matter of warping the graph into something that still works for Djikstra, using a method which is actually faster. Do I have this right?
Yes! But be careful, purely geometrical intuition is hard here, since the graph is directed, with different length in one direction and the opposite one.
@@PolylogCS Excellent, thank you!
This was a great video, and you should be very proud of it.
*Dijkstra, but yes! en.wikipedia.org/wiki/IJ_(digraph)
This is actually a common pattern in Computer Science. Usually when we find something that works we try to optimize it giving information to the structure of the input data. That's what happened going from Fully connected Neural networks to Convolutional Neural networks for example.
A* is basically guiding your search with an heuristic function you designed intuitively, or gained by other means (AI Analysis of problem), its all really about the quality of the heuristic and how well the search problem space is fit for it
Damn man this was by far the best explanation of an algorithm I have ever seen
A great video and an amazing channel!!
I can't wait to see what you have in store for us next.
Wow this is probably the best explanation of A* I've seen. I finally understand why A* always gives the true shortest path, rather than an estimate.
I coded an A* algorithm from scratch after reading how it worked, and not looking up any code snippets. As a junior coder, it was very fulfilling and enlightening. I highly recommend anyone dabbling in this stuff, such as unity, to give it a try.
Using A* to solve search problems is always fun. I've had a bunch of random problems about finding the shortest way to solve something where I pretty quickly decide that the graph search is just the best way. No wonder why these graph search algorithms are the main thing I remember from AI course...
I am a very confused aerospace engineering student who stumbled upon map algorithm youtube by clicking a video that I thought referred to the aerodynamic variable A*, which we use to find the throat area for perfectly sonic flow through a wind tunnel. A few videos later and I feel very educated about something completely different. :)
wow, this might be one of my favorite explanations of an algorithm ever! Thank you!
I did not squint at the rubrics cube to see if I was able to predict whatever you were going to say. Nope...didn't at all...
On a different note, thanks for this clear explanation. Definietly a good starting point to understanding what this A* algorithm is about.
That explains something I've noticed about route planning apps. I live near a river, downstream of the most likely crossing for most destinations on the other side of the river. The route my satnav, Google Maps, etc. plots *to* the crossing depends on where I'm going *after* the crossing. I'd worked out that it was looking at routes nearest a straight line between my start and end points. The video has confirmed it, and told me what the algorithm is.
I've been searching for Rubik's solution. Thanks dude.
i got really confused, but it kinda cleared up when i realized that by potentials i think you were talking about the heuristic measure for how you choose the next `min` distance node
Mathematicians took Dijkstra's Algorithm, slapped a heuristic on top and named it the All Star Algorithm. Love that.
Skvělá práce. Moc povedené.
Zrovna letos jsem pro Advent of Code hledal IDA* (iterative deepening A star).
Díky :) Tom Sláma má pěknou aplikaci a star v advent of code
@@PolylogCS viděl jsem video. Konkrétně jde taky o ty roboty. Má někde odkaz na repo? Možná bych se inspiroval.
Lol @ the space odyssey music. Also the idea you highlighted at the beginning is nice. The weights on the edges makes the problem a discrete analog of finding length minimizing geodesics in Finsler geometry, as opposed to in Riemannian geometry.
Wow, I am absolutely blown away by the quality of your videos! The content, the script, and the overall production value is truly amazing. I am a CS student myself, and I have been considering starting my own channel. Your work has truly inspired me and I can't wait to try my hand at creating something similar. Would it be possible for you to share the tools and techniques you use to make your videos? It would be an honor to learn from someone as skilled as yourself. I would be eternally grateful if you could make a video about it, your guidance would be invaluable!
I think you are using manim but I am not sure. I am thinking of using it for my projects as well!
Hi, thanks! We use manim, a python library, then kdenlive. In any case, i recommend you try making your own videos! A good place to submit them to is 3blue1brown competition, this helped us a lot!
Happy to find a great channel like yours!
Nice subject, clear explanation, good animation. I did learn something
skvěle vysvětlené jako vždy.
Very nice explanation and animation.
I came to understanding A* in a different, maybe the classic way, but this one with node elevation is definitely a very interesting view on the same thing. I started with Dijkstra (which is in turn a BFS with priorities added to the edges) and instead of using distance as the priority, use something that is commonly called a "cost". This cost can (under the rules you described well in the video) consist of more factors added up - like the road distance and straight-line distance. One can then add other factors as fuel consumption, elevation and toll prices as other factors by summing them up.
I think some real-world route-finding algorithms are based on a bi-directional version of A*, where the boundary gets expanded from both starting and ending points and the route is formed when the boundaries meet (perhaps oversimplified).
It should be also possible to find more alternative routes this way when the algorithm doesn't terminate immediately after the first route is found but tries, say, three times where some cost is already added to existing routes.
4:34 I knew what was coming for the explanation, but the Space Odyssey just made me burst into laughter, caught me completely by surprise! A+ editing!
I love how the height (h) in this conceptual map is actually not spatial height Z, but the travel time height! Using the fourth dimention provided by time instead of the third dimention provided by height. It is a really special way of thinking about problem solving.
You can be the 3 blue 1 brown of CS. Please continue making videos.
I always thought of A* in a similar way, however I thought of potential as more of an abstract selection/queuing process than something physical. This is why manhattan distance and binary heaps are so effective at speeding up the process. There is a fractal nature to the algorithm because the set of explored paths can be expressed as a node, where the paths escaping it are dynamically recomputed based on the discovered cost of the path to the discovered nodes adjacent to undiscovered nodes. This of course isn't how the algorithm works but it is isometric with the method. There is inherent comparative nature to the algorithm, where the actual distance does not matter, but instead the relative measure between possible paths.
Funnily enough, I learnt about A* before I learnt about Dijkstra's. When they eventually taught us Dijkstra's in university, my thought was basically "Oh, it's just A* without the heuristic" :D
why didnt I find this channel earlier ? Great job with the animation ! Subscribed.
I am a physics undergrad and as soon as you started talking about potentials and showed the road map in a 3D view with Prague being at a high point and Rome at a lower one, it immediately started screaming in to my ears "Gradient". The way I visualize the problem is that we are trying to find the direction at which we must move from each node in order to have the most steep descent because the steepest descent means that the next node is going to be closer to Rome than the rest. This is exactly what the gradient in vector calculus shows you. For anyone who isn't familiar a gradient is a vector that shows you the direction of steepest ascent so by simply taking the negative of that you find the direction of steepest descent. So all you have to do to solve the problem is at each node to take the negative gradient for each path and then simply choose the one at which the grad was smallest. Then you repeat for the next node until you reach Rome at which point the grad will be zero because you're at the lowest point.
It's not that easy. This greedy approach misses shorter paths hidden behind a temporarily longer path.
@@poopcatapult2623 I see. You're right I didn't think of such a scenario, I basically assumed straight lines, no backtracking etc. Thanks for the help
There is a connection with physics that I did not want to get into in the video: The way you operate with potentials mathematically is a discrete analogue to how you operate with physical potentials/fields.
@@sirkinguin4890what you described sound like best first search
Subscribed lightning fast, I need more algorithms intuitively taught not the mindless pseudo code without intuition or reason why something is done. Keep coming the great series on algorithms, I would also love to see systems topics covered, compilers, HPC and microprocessors. Do consider
I love this video, just one suggestion on the color scheme: the color of the dots being almost the same as the color of the sea made it hard to distinguish between them and to read the numbers. I would have chosen to not color the sea at all, just imply it with the outline to europe. Or made the sea a slightly different beige color, so it was easy to see it as different from the continent, but still having a big contrast with the dots and the numbers.
Other than that, I loved this video! I learned a lot
Thanks for the feedback!
@@PolylogCSI have a similar kind of feedback so I’ll put it here. I’m watching the video with subtitles on mobile and the place you’ve chosen to put most of the info like formulas and data gets covered by the subs, so maybe a higher placement of that information would be better for upcoming videos. Thanks for an otherwise great and informative video!
Yay, some guy with an accent explaining programming concepts! Never seen this before!
New math video! I knew it was worth subscribing to your channel!
Thanks for mentioning L'viv!
you use soooo good animations. appreciate that!
Well made and easy to understand. Great work!
Thanks!
Thanks for the vid! You explained it better than my professor for sure
This is what I was looking for. Thank you so much.
Loved the video. Very easy to follow along!
This is a really nice video. this reminds me of a similar idea with kirchhoff's law. A cicuit is like a directed acylic graphy of potential differences.
I had to implement the A* algorithm in a college class. Beautiful explanation! I wish I saw this 3 years ago lol.
What a beautiful explanation!
Really nice video. Great job!
That is explained so well that I understand it too.
Thanks for the video!
This video is so good that I lost track of time and feel like 2 mins.
Good explanation!
Thank you!
The algorithm gave me some gradient descent vibes, really interesting dijkstra modification right there.
Nice take! There is a quite deep link to optimization: if you know linear programming duality, potentials are dual to distances in this sense. :)
I mentioned something similar on the previous video, but yeah this is exactly how optimal cube solvers work! Optimal meaning number of moves while also proving it's an optimal solution. Generally the heuristic is done through an extremely large table which tells how many moves it takes to solve a specific part of the cube.
(on the previous video I was talking about a near-optimal algorithm, which ironically tends to find an optimal solution thousands times faster than optimal but then takes significantly longer to prove it's optimal)
Oh and in case anyone's wondering why people care so much about cube solving algorithms, besides doing it for the sake of the challenge itself, a big reason is they are funnily enough very useful for _scrambling_ cubes, whether for practice or competitions. Doing ~30 random moves seems to weigh all possible states pretty equally, but there's not much of a way to prove that. Meanwhile using the solver lets you guarantee that every state is equally likely, and even does so in ~17 moves.
By the way, did anyone prove something about expansion properties of the Rubik cube graph? If yes, then you could actually prove that doing some small number of moves puts you in a basically random position.
@@PolylogCS I'm not really sure if anyone has done so to be honest. The 30 moves is more of a community consensus, based on it being suspected the 3x3 god's number was 20 for quite a long time and then you do a few extra turns for good measure. But either way modern solvers produce better scrambles simply by virtue of you spend less time scrambling the cube.
This is a fantastic explanation! Thank you very much!
11:53 That "If you are a flat earther..." threw me off way too hard
Really amazing explanation, now i get it, thanks
Great to hear!
thanks a lot. very simple and clear explanation.
Really interesting!
Came looking for copper, found gold, amazing work.
12:30 I would be interested to see another video with "potential other tricks" that other real world applications use. One thing that came out of my head is to precompute some simpler and higher level maps (with fewer nodes) and try to use them when calculating shortest path from A to B.
Interesting idea! I would say that the main trick is precomputation of lot of information. Maybe we make video about it one day :)
That is basically what they do, I think - most, anyway. It's called a "contraction hierarchy" and uses precomputation to turn any map search into the search for a common ancestor in a tree.
@@cooperised Thanks a lot. I did not know about that method.
Nitpick: consistency (although I know it as monotonicity) is not required for A* to work.
It's only that this is required if you want to view A* as Dijkstra on a reweighted graph as in your first application. But as soon as you simply use the heuristic to choose the next node to explore, as in your second implementation, the algorithm will work just fine.
Hi, I tried to make it clearer in the video description: with admissibility the algorithm returns the correct solution, however it may take exponential time in the size of the graph.
It’s basically the same as allowing negative edges for Dijkstra’s algorithm (considering there are no negative cycles though). Algorithm is still correct in a sense of the result output. But it may work exponentially long on these graphs.
@@lokosvyat Yep. There is the additional catch that here we finish once we reach destination for the first time. Admissibility is saying that when this happens, we have already computed the correct value there. Of course, if we finish when the heap gets empty, we do not need admissibility, but then we are solving the problem of finding distance to all other nodes, where A star is useless, since we can just use Dijkstra for that.
@@PolylogCS Yeah, it was just an analogy to indicate why it can be bad. I mean, you can run Dijkstra's algo on this graph and the result will be correct. But it may take a long time. I think no one denies the advantage of A* :)
What a stellar explanation with references to Interstellar music :D
Most people know that as Space Odyssey music hehe... But it predates movies
Love your videos. You are very good at explaining algorithms. Thank you. :)
Super video kluci!!
Animator deserves a cookie. 👍
Amazing explanation
Excellent explanation!
>lets write down the pseudocode for our algorithm
Proceeds to write it in python
😂
I feel like a huge mapping database like Google Maps could use some other strategies to improve the A* heuristic as well, like using previous queries to see what routes are generally faster than others.
Yes! Precomputation is actually the main trick, I would say.
Doesn't it use contraction hierarchies instead? Precomputed in bulk every few minutes to account for live traffic.
@@cooperised I would assume so yes, they have plenty of real time traffic data from the location and status information sent by android devices. They know for example not only where the device is but also whether the step tracker thinks you are walking/running/cycling etc. Also whether you previously spent time idle at a known transit stop etc so they can get a pretty good idea which devices are in private automobiles and average their speeds to estimate current congestion levels (simply compare this number to the speed limit). But then for calculating travel time the latter step is not necessary, that is only useful to highlight those segments on the map, for average travel time you need only the segment length and average speed, divide the former by the latter and there you have it.
They must have super massive location specific caching systems. I imagine they get (Prague, Rome) a bit more often in Prague and Rome than they do in other parts of the world ;)
@@RHCPhooligan Yeah, caching would be great for this. Sending the requests from NY, for someone planning a vacation, to their eurocache may be faster than calculating it in us-east. Your (Prague, Rome) comment made me wonder if routes are always symmetrical.
can you explain the new minpath algorithm for graphs with negative edges that quantamagazine made an article about recently?
It is quite complicated, actually. I suggest you look at a talk from the authors. :) I think i would need like an hour to explain it...
I have a very strong love/hate relationship with A*. On the one hand, it's a really cool algorithm. On the other, I'm deathly allergic to heuristics.
Being a theoretician, i had a similar sentiment until I started to view the algorithm as an application of potential reweighting, which is a beatiful trick with many theoretical applications.
@@PolylogCS It really is. It's just that coming up with heuristics makes me cry.
Great explanations!
I think it needs to be noted though that Google maps algorithm isn't formally 100% shortest path. Must be way more optimized for global scale, with precomputation etc, sacrificing formal correctness. On an occasion I've been able to add waypoints and have Google come up with a shorter version of my intended drive.
That's interesting, do you have a working example you can link? One thing that definitely happens is that highways are preferred over small roads even if longer, but that is not a sacrifice but reasonable decision. Your example could still be consistent with Google maps giving exact shortest path under some metric
@@PolylogCS Another example is if you're taking a long path, Google maps will slightly prefer paths that pass nearby to periodic petrol/gas stations or charging stations. If a slightly shorter path exists that stays away from certain amenities like this Google maps will prefer the slightly longer path as well, at least from what I can tell.
It's also worth noting that many of these map systems make extensive use of precalculated data, for long-distance routes there is often some pre-calculated, more accurate data for the distances between places like major cities, and for further optimizing speed sometimes it'll prefer routing you through these pre-calculated higher-level networks.
The service may also make use of publicly available data provided by country/state agencies such as traffic data, the shortest route may not be the fastest depending on time of day, road conditions, traffic, etc. The Heuristics used by a service like Google maps are quite complex.
Because Google optimizes paths for time to travel and quality of roads which might not implicate the shortest path.
Google Maps is provably reluctant to back-track, so much so that it will choose (timewise) longer routes than the fastest route. This has been true for at least fifteen years.
Amazing video!!! Congrats
I hope this work get proper appreciation
Okay. As an amateur programmer, I'm trying to get my head around how we actually Dijkstra navigate the graph without actually storing the whole graph in memory - for something like the Tile Puzzle, with 10 trillion nodes, we would clearly have to do this, since storing the map is not viable. As such we translate a Puzzle State into a Node along with its Potential (using the 'optimistic distance' as out potential as suggested in the video).
We then interpret every legal move as an 'edge' to-from the State Node we're at and a 'nearby' State Node. When Dijkstra removes the Start node and adds in all its 'neighbors', that is the first time those neighboring nodes exist; we have to generate these new State Nodes and, at the same time, determine if it is a Node that already exists - the only way I can think to do this is by comparing it to every previously generated State Node. With a good Potential function, we should not add too many more nodes than we need to, so the 'neighborhood' of nodes to check should remain relatively small. More to the point, the only nodes being stored in memory at once time are those that are at any point part of the Neighborhood (or Boundary in the code). Is there a good way to estimate the Memory Impact of the algorithm as a function of size of the input space (in a similar way that we estimate Runtime in that big-Oh way?).
At the end of the process, we have only acknowledged the existence of a tiny fraction of the total graph, and used the structure of the puzzle itself as a way to directly 'compute' the base edge lengths and the Potentials on an as-needed basis, which is pretty genius. It also cleanly explains why having an edge with a negative length would break the algorithm entirely, because the choice of how to expand the neighborhood depends on the fact that you are always getting 'further away' from the starting point in a global sense; that the length of your 'best current path' is always increasing, which is violated if a path-length can be negative.
I would love to see some example where the appropriately well-chosen Potential could be used to fix the negative path length problem.
I never thought that A* could be used for more than just graphs of euclidean distances, fuckin smart mate.
I learnt something new. Thanks! Very well explained. Your pseudo code looks suspiciously similar to python🙂
Thanks for valuable information
This is such a good video. How does this have so little views . Amazing explanation and graphics. Love it
Very good video you got my sub, one precision about Maps and other maps software
, 1 to 1 shortest path A* is great however when you have an extensible graph then a matrix distance is needed which is using dijkstra, when you talk about maps the distance matrix is using a dijkstra algorithm and once the shortest road is define a 1 to 1 is defined, maps turn by turn api will use A*.