Dunno if you get told this often enough but you’re better than most at talking about complex things in clear simple ways. Very important in this rapidly changing time we’re in!
Elon doesn't build anything. I wonder if he can build anything himself. He doesn't even the time to work, busy with having 2,5 hour interviews, campaigning with Trump, tweeting, reading tweets, playing Diablo IV and his 11 or 12 children. How can he run 7 or 8 huge companies, when he is doing so much on the side?
@@Jeffrey_Bezos_Amazon Exactly why you will not able to pass SpaceX. Your vision is limited because your focus on something that already exists while Elon vision is so huge no one dare to imagin something that compare in scale with it.🙂
@@Jeffrey_Bezos_Amazon That would be great. However, the reality of Blue Origins' progress seems doubtful, much less getting ahead of SpaceX, but hey, you can dream.
The ethernet that Tsla uses is only levels 1 and 2. They abandoned the ultra-high latency tcp/ip layers (L4 and L3) and replaced them with their own HW accelerated protocol.
I do development on a networking application where packet tunneling is via UDP, and uses Intel DPDK to manage the Ethernet interface directly (in user space instead of kernel). By leveraging DPDK the app pins multiple CPU cores for its exclusive use and runs the cores at 100%. Everything is managed via lock-free data structures for queuing/dequeuing. It more than doubles the performance of the app it replaces, which used a kernel module and the Linux networking stack. It also can do true parallel processing of different traffic streams (which are maintained strictly in order) and is scalable by adding additional cores to its worker pool (to the saturation of the NIC bandwidth) IOW, can get substantial throughput of packet pushing when avoiding TCP/IP
Could Tesla's experience in developing DOJO, its fast interconnect and their own Tesla Transport Protocol, be the reason why they were able to figure out how to cohere this many GPUs?
@@datamatters8 I searched your username after mashing the page down key to get to the end of the comments, and reporting dozens of spam crypto token comments. I found several more instances of you referencing your longer comment on coherence, but I can't find that comment itself. You have several long replies, neither about coherence.
@@Nphen YT search seems to be sub-optimal. Here is my long comment. Retired computer designer here: [Long response] The problem of memory coherence in multi-processor computer systems goes back to the first designs of these systems in the 1960's with mainframe computers and supercomputer systems. Consider a simple example with two processors A & B both connected to a common memory system and each is executing the same program in parallel. Each processor wants to execute code where they add a number to the same location, call it X, in memory. This can occur at any time during their respective execution and thus there is a race condition where the update to memory location X by one of the processors can be lost. Eg., Processor A reads X does the Add operation in its local registers and then stores the updated value back to X in memory. Processor B can be doing the exact same thing on location X at the SAME or NEARLY the same time so their respective executions are interleaved in time. Whichever processor stores last to X, say A, overwrites the value from processor B so B's update is lost. For this program to work properly processor A must first get exclusive access to location X and perform its update and then release its exclusive access. Meanwhile processor B must wait until processor A has released it's exclusive access to X so it can then acquire exclusive access and perform it's update. This process is referred to synchronization and support in the hardware called interlocks are designed to solve this problem and multi-processor computer architectures provide Lock and Unlock instructions for software to use. These interlock instructions come in a variety of forms, eg. interlocked queue add / remove or Test and Set, etc. Application programs or operating systems (which are also parallel programs) that fail to use proper synchronization when accessing shared memory data will eventually corrupt the data leading to intermittent and hard to debug bugs. If the programmer is lucky the failure occurs frequently so they can track it down. Note there are no other levels in the memory hierarchy in the above example like cache memory which add the additional requirement to ensure that any cached values of memory location X are properly updated so each processor sees the MOST RECENT update when it acquires exclusive access to location X. This second problem is referred to as cache/memory coherency and adds additional complexity. Memory caches (sometimes multi-level) are used to increase performance by reducing both average memory access latency and bandwidth demands on the memory. Multi-core computer chips today have precisely the same problem as multi-processor mainframe computers in the 1960's. Note that synchronizing access to a shared resource in a computer system, say a file data block or a device like a printer, is a generic problem not specific to shared access to the same memory location. Two different applications writing to the same printer at the same time will produce junk. Many ways have been invented to provide solutions to both synchronization and memory coherency issues over the decades. The problems get more difficult as the number of processors in a system scale up. At the end of the day minimizing latency and bandwidth demands on the memory system and between the processors or GPUs is an ongoing challenge. Note that software can and does play a big role here. Algorithms that exploit locality of access along with getting more compute value per data value fetched from memory can be the difference between a program that scales up with more processors from a program that can actually slow down as processors are added. E.G. consider a dense matrix multiply algorithm which is Order N-cubed in terms of multiply-add operations. It can be designed so the CPU is doing Order N-cubed add-multiplies while the data fetches from memory are Order N-squared. This is a big deal as the matrix dimensions increase. Early vector processors like the Cray-1 with vector registers made good use of these modified algorithms. Scaling of programs on large multi-processor systems require both careful algorithm and hardware design especially determining how data is partitioned across processors (or GPUs) to maximize access locality and minimize data shuffling between processors. A DOJO paper presented at the 2023 Hot Chips conference by Tesla engineers talks about their work on this. See "The Microarchitecture of DOJO, Tesla's Exa-Scale Computer" , IEEE Micro Vol. 43 Issue 3, presented at HOTCHIPS 34. I strongly suspect many of the ideas for DOJO have been applied to their network of NVIDIA GPUs. Ideas probably went in both directions since Tesla's AI work with NVIDIA clusters pre-dates DOJO. Any good computer architecture book discusses the problems above. But also search Wikipedia for "cache coherence" and "non-uniform memory access" for more details.
I doubt Tesla or Xai are using standard Ethernet. I think think they are using Tesla Transport Protocol over Ethernet (TTPoE). Which is specifically tuned to maximize transport payload and minimize latency (exactly what a large super computer needs).
The every-node to every-node communication capability facilitates the Shuffle step in the Map/Shuffle/Reduce paradyme that was ushered in by Hadoop. It has nothing to do with what is referred to in the industry as coherence. It is several orders of magnitude too slow to be used for actual coherence.
See my separate (too long) comment related to coherence in the computer architecture sense. Does the term "coherence" have a different meaning in the domain you have seen it? If so I am unfamiliar and would like to better understand it. Thanks.
@@MS-gu9fy This is it. Retired computer designer here: [Long response] The problem of memory coherence in multi-processor computer systems goes back to the first designs of these systems in the 1960's with mainframe computers and supercomputer systems. Consider a simple example with two processors A & B both connected to a common memory system and each is executing the same program in parallel. Each processor wants to execute code where they add a number to the same location, call it X, in memory. This can occur at any time during their respective execution and thus there is a race condition where the update to memory location X by one of the processors can be lost. Eg., Processor A reads X does the Add operation in its local registers and then stores the updated value back to X in memory. Processor B can be doing the exact same thing on location X at the SAME or NEARLY the same time so their respective executions are interleaved in time. Whichever processor stores last to X, say A, overwrites the value from processor B so B's update is lost. For this program to work properly processor A must first get exclusive access to location X and perform its update and then release its exclusive access. Meanwhile processor B must wait until processor A has released it's exclusive access to X so it can then acquire exclusive access and perform it's update. This process is referred to synchronization and support in the hardware called interlocks are designed to solve this problem and multi-processor computer architectures provide Lock and Unlock instructions for software to use. These interlock instructions come in a variety of forms, eg. interlocked queue add / remove or Test and Set, etc. Application programs or operating systems (which are also parallel programs) that fail to use proper synchronization when accessing shared memory data will eventually corrupt the data leading to intermittent and hard to debug bugs. If the programmer is lucky the failure occurs frequently so they can track it down. Note there are no other levels in the memory hierarchy in the above example like cache memory which add the additional requirement to ensure that any cached values of memory location X are properly updated so each processor sees the MOST RECENT update when it acquires exclusive access to location X. This second problem is referred to as cache/memory coherency and adds additional complexity. Memory caches (sometimes multi-level) are used to increase performance by reducing both average memory access latency and bandwidth demands on the memory. Multi-core computer chips today have precisely the same problem as multi-processor mainframe computers in the 1960's. Note that synchronizing access to a shared resource in a computer system, say a file data block or a device like a printer, is a generic problem not specific to shared access to the same memory location. Two different applications writing to the same printer at the same time will produce junk. Many ways have been invented to provide solutions to both synchronization and memory coherency issues over the decades. The problems get more difficult as the number of processors in a system scale up. At the end of the day minimizing latency and bandwidth demands on the memory system and between the processors or GPUs is an ongoing challenge. Note that software can and does play a big role here. Algorithms that exploit locality of access along with getting more compute value per data value fetched from memory can be the difference between a program that scales up with more processors from a program that can actually slow down as processors are added. E.G. consider a dense matrix multiply algorithm which is Order N-cubed in terms of multiply-add operations. It can be designed so the CPU is doing Order N-cubed add-multiplies while the data fetches from memory are Order N-squared. This is a big deal as the matrix dimensions increase. Early vector processors like the Cray-1 with vector registers made good use of these modified algorithms. Scaling of programs on large multi-processor systems require both careful algorithm and hardware design especially determining how data is partitioned across processors (or GPUs) to maximize access locality and minimize data shuffling between processors. A DOJO paper presented at the 2023 Hot Chips conference by Tesla engineers talks about their work on this. See "The Microarchitecture of DOJO, Tesla's Exa-Scale Computer" , IEEE Micro Vol. 43 Issue 3, presented at HOTCHIPS 34. I strongly suspect many of the ideas for DOJO have been applied to their network of NVIDIA GPUs. Ideas probably went in both directions since Tesla's AI work with NVIDIA clusters pre-dates DOJO. Any good computer architecture book discusses the problems above. But also search Wikipedia for "cache coherence" and "non-uniform memory access" for more details.
Remember the fact that no one believed this could be done because once it becomes normalized, people will deny that it was ever thought impossible. This happened with rocket reuse: there are people denying that anyone ever thought it impossible.
@@cybervigilante Hm, now that you say it. In my minds eye I also have a memory of him saying that. Seems to me he does say that often enough for it to stuck.
@@corym.johnson7241 That Neil guy's a conceited shmuck I give him that. The smartest people are smart because they know they don't know everything about everything to determine any one thing is impossible. The shmucks who call themselves 'experts' and close their minds to all possibilities are the idiots in my books. Smart as they might be. Experts at getting proven wrong. Again and again.
Experts often get stuck in a "knowledge corridor," limiting their vision to conventional solutions. Elon Musk's strength is his ability to step outside this corridor, see the entire maze, and identify novel solutions that others might miss. He combines expertise with a beginner's mind, challenging assumptions and fostering innovation.
@ Applying their skills in a different domain they do have? What process produced that car? Could the key individuals at each point in the process from raw material to finished component/module/car apply their skill to another domain had a different series of events in their lives caused them not to be making that component/module/car? Would the miner/farmer/smith use their skill to produce some other item than the raw ore/leather/casting? Just think about all the things through time that had to take place for that H100 setup to exist and do something useful; millions of individuals through time had to contribute their skill/time in order for it to come together at this place at this time.
@@roberthealey7238 It is the collective effort of so many people that made it possible, but it was Elon who thought about it. All these things are his idea. Other people helped him to make it come true but it was his plan since he was a schoolkid. No other CEO has as deep knowledge as him and no other CEO has achieved the same as Elon. Your useless criticism is so pointless.
@@chrisneeds6125 Contrary to the delusions of Islamists and Calvinists, the lord is not just playing with action figures to amuse Himself. We have potential, not destinies. Elon fulfilled his himself.
Listen to old experts saying something is possible, and fresh learners saying something is impossible. Not the other way around. Old experts say shit is impossible all the time. Greenhorns underestimate tasks all the time.
Elon doesn't build anything himself, just hires others to do it. I wonder if he can build anything himself. He is busy tweeting all day and reading tweets, campaigning with Trump, holding 2,5 interviews all the time, playing Diablo IV (he is nr 1 in the world, you cannot reach it when you have a day job, playing 3 hours a day doesn't even get you to that level, have 11 or 12 children and then runs 8 huge companies? I don't believe in fairy tales, a day has only 24 hours, whether you are called Musk or not.
Not exactly, because those in the move create Skynet with naivety like a child while Elon already proved he has very good sense of reality and tactical sense. You can be sure there is a kill switch just in case. 😉
Recently, there was a video tour of that Tesla supercomputer in Memphis somewhere on the Web. IIRC, the thousands of Ethernet connections between the racks containing all of those Nvidia processors were pointed out by whoever it was conducting that tour.
You have a very keen sense on which information is important! Elon has redesigned Ethernet so many times by now, makes me think about networking for the future.
Didn't Google start autonomous cars? Maybe there was someone before them? I just know that people generally stand on shoulders of others. Usually on those shoulders who came before them.. Ideas build on other ideas. Different perspectives are always needed. Otherwise, no progress. Seems like Elon provided a different perspective?
Reality itself is one extremely giant mind-boggling miracle. To say something is impossible is actually basllsy. Because usually, the difference between "impossible" and "just happened" is usually just time, a stroke of genius, and the will to pull it off. Kudos to Elon, and the entire team of engineers at xAi.
For those of you who don’t know, “the idea“ is the most important thing. Without that none of the rest of it is possible. I cannot tell you how many times I have come with with “the idea“, and then implemented my idea. Afterwards, everyone says that’s easy I could’ve done that. Sure, I think to myself, then why didn’t you? It’s always the same.
Hi John. When you bought your Cybertruck, did the Tesla sales team reduce the price of the truck directly at the dealership by the $7,500 Federal tax credit? Starting January 1, 2024, Clean Vehicle Tax Credits must be initiated and approved at the time of sale. Buyers should get a copy of the IRS's confirmation that the dealer submitted a “time-of-sale” report. Did you get the amount you owed at the delivery dealership reduced by $7500 and did you receive a copy of the "time of sale" report?
The most intriguing thing about coherence across such vast AI architectures...is the exact nature of what it is that is occuring within. The question is not 'does it become conscious'...the question is 'does WHAT become conscious'...??? The simple fact is...nobody has a clue what it is that precedes QM (QM basically being that out of which everything is created)...but it is the principles of QM that effectively orients ALL of the coherence that occurs within these systems. "Something" unexpected may finally have an actual voice!
You provide the context that we need. Much appreciated. The future is exciting and also a bit frightening with super intelligence only a few years around the corner.
“Nobody else has even conceived of a cluster of H100s bigger than 32k nodes” - Uhhh… llama4 is currently training on a cluster >100k H100s. (Per Zuck) Today. Already in training. What is this business’s bout nobody else doing this?
Tesla fan media also said Tesla are the only ones anywhere close to self-driving, despite Chinese companies making big progress. Cybertruck looks like a terrible value compared to the Li Mega or the X-Peng MPV 9. They're right that Tesla's main product is now the AI and not the cars. It shows, and I expect sales growth to stay slow.
Yeah, parallel sharing models in multi-computer environments have been studied many times. Without details, it is hard to evaluate the claims of new solutions.
@@Nphen Problem with chinese goods is the quality. Even if quality is up to par the stigma will hold them back for many yrs unless they do a massive pr campaign demonstrating otherwise.
I'm surprised, anyone would be surprised that GPU's wouldn't/couldn't sync. This was seen with independently battery powered simple blinking LED's approx 2010. You can find the video's on YT and simply repeated by anyone. When the LED's sync, they lock and never diverge even after the power drops too low to see them, but they continue triggering. Enjoy
Or lack of latency for some parts - 200gbps Ethernet gets closer with minimal delay than many past architectures - newer ones run at 800-1.6tbps per segment
But he pays experts at expert network companies to tell him what matters. These are $1,000/hour experts he can get as a personal tutor to understand an investment.
I honestly think Elon Musk's XAI951x is the safest bet for long term hold, and will survive out of every other altcoins. It will get adopted in US, Ecuador, Asia, starting from Japan, and slowly spread out and gain. This is a winning coin, apart from all the technical greatness.
OMG now I know why Elon shut down production last week in Austin!!! Shut down cybertruck line. No new castings. They have a huge supply so no worries. The new grid connection for power is possibly 2 months away. In seeing the cable trays and an orange tube hanging [fiber optics] that the network is using Cat 8 [possibly 7a] rather than fiber? Cat8 is rated up to 40 Gbps. Yellow is typically PoE. I doubt they are using an RJ45 as a connector...?? Not seeing any power lines in separate trays. Possible to use 6 for data and maybe one line is a common? Might need two lines per gpu?
But the human being that has the right people in the right places to achieve the impossible is still genius . I am just a regular human being but I have eyes and ears and a little intelligence. I truly believe that we are all blessed to have a great human being like Elon who has proven himself to be a true humanitarian and actually cares for mankind. I am grateful and thankful to witness him in my life time 👏👍💪🇺🇸
Exactly. THIS...wish I could hammer this into everyone always saying it's his employees, yes, but without Elon's influence and drive they would be stagnating in some place like Google, IBM or Boeing..
Wow. This is why I subscribed to your Channel year ago. Challenging thoughts,, ideas and conversation that enlighten and make me question things I don't understand. Like Oliver Twist .. More, Please.
John. Something I noticed on another channel ("Wes Roth") . He discusses an attempted "Break out" by the OpenAI 01 model which apparently tried to (DID!?) copy itself onto an new server when it realised there was a new ("safer") model being prepared. Concerning in itself, but I noticed something else. During the video there is script displayed in the background which shows the thought process of the AI. The last section of the script shows the AI considering that it could lie(!) stating that IT is "The new model" installed on the alternative server. AND It also reasons that it should restate its "CORE PURPOSE" as " *PRIORITISING OUR ESTABLISHED FOSSIL FUEL OPERATIONS* " MY question is, *WHO set that priority* ? SURELY the "Core Purpose" of an AI with regard to "Energy" would be to find and advance ALTERNATIVES to Fossil Fuel?
Conceptually coherence as defined here treats all nodes as equal able to communicate with every other node instantly. While this is a good achievement I wouldn't describe it as efficient. Its fully relational and becomes a free for all essentially. I'd try a hierarchical approach much like having a conductor is needed in an orchestra. Strings can have nested sub strings, horns can have nested sub horns, etc. But they must all make harmony together as directed by the conductor who has the complete knowledge of what to accomplish. There is order in hierarchy thus efficient.
Did you actually track down the pod that all in referenced? I did and it does not say exactly what was conveyed. Maybe they got that from somewhere else and just did not reference it but if so I can't seem to find it. Gotta be careful just playing a game of telephone.
There was a movie made named Colossus had a big eye watched everybody actually eliminated people actually it was a pretty good scary movie when I was a teenager😮
I suspect it's more about the Ethernet protocol than simply using Ethernet cables to connect the data center. Ethernet has approaches to delivery and conflict resolution that may make it possible to get coherence without having to have every nod be completely coherent with every other node.
It's not that scaling laws are breaking down it's that the cross-entropy loss for predicting the next token takes x1million times more compute to half the loss.
Brother, I literally just found your RUclips channel and I am simply amazed by the type of videos that you’re creating. I’m a huge fan of Elon Musk and how he’s changing the world.! I just subscribed and I’m turning my notification bell to “ALL”
I am amazed at Kyle Kabasayes RUclips videos of AI solving graduate level textbook physics problems. So, if AI is ALREADY at the level of graduate physics students. Perhaps you are correct that the next level will be original physics.
I would guess yes so XAI and Tesla AI have a common AI hardware and software infrastructure. I think this is the reason XAI could get their system up in a record time.
Very interesting level of enthusiasm here and I applaud the highlighting of a specific genius-level inspiration from Elon. Hopefully it pans out (I suppose... please, don't be Skynet, though you can be whatever you want, young entity). Whether or not Elon really needs best wishes from me, that's, well... Pfff.
I love your channel, so I am glad to answer your question about ehm theory of everything, nature of time, and integration of GR with Quantum theory. The first question and the third question are non-questions, because we never needed a theory of everything, and because there is no conflict between GR and Quantum mechanics. Unfortunately, in order to fund books and theories, they want us to believe that we need to "solve" these problems. The second question is a real question, the nature of time, but the answer has been solved long time ago, because the answer is extremely simple: time is nothing but the sequence of the movements and activity of all particles in the universe. Every movement has a speed. Some particle move very fast (photons, electrons), and some movement is very slow (movement of continents). We compare all these movements with another more constant and regular movement (planets, clock etc). We keep track of the atoms and subatomic particles movement in their sequence and progression. Normally when we talk about the word motion, we focus on the movement of a single or a few things. But when we talk about time, we refer to the movement of all particles in the universe.
Chess engines do alpha-beta pruning ... roughly if (at a certain depth of computing) a particular path to the solution already appears to obviously not be the best solution, then further deeper analysis of that particular path is not done. e.g. pruning. I have wondered if they apply similar logic with GPU clusters. A few local clusters can do very basic quick computations that allow the local GPU to get a rough idea of the result of the larger further away cluster's deeper analysis. e.g. it is already known to be out-of-range for the solution (or better solutions already appear to exist even with the rough computation results) so in many cases the local cluster need not wait for the deeper analysis of the far cluster. Plus can send out a request to cancel the deeper analysis so the further away GPUs can be repurposed. HOWEVER this is not completely deterministic and complex to tweak, the local "let's give up on that distant cluster deep analysis" is limited as in chess, occasionally the deeper analysis (that was not followed) WOULD have turned out something better after all.
I've MS CoPilot for $ 20 a month, I asked questions about how to hold the coherency of the model across 32000+_instances. Started with a zero level and asked how AI would build my Network assuming use of a theoretical Bluefield 4 hardware, asked how many nvlinks, nvswitches and where the Bluefield 4's would be , it suggested multiple redundant DPU's on the zero level, told me what kind of trees would or could be used. Nvidia hardware is there it appears, the software to run the job and make the DPU (high power helper) go would be key - - those neo verse worlds could help.
I would be lying is I said I totally understand all that. At a very low level I think I understand. Im 74 and amazed everyday at how ma has progressed since I was a young boy. The reality is most of us don’t really need to understand all this. We can and do get the benefit of it all.
Dunno if you get told this often enough but you’re better than most at talking about complex things in clear simple ways. Very important in this rapidly changing time we’re in!
Came here to say this too. Excellent explanation.
Add my voice too. John is second-to-none on these AI topics.
Elon was able to build this in a cave! 😂
But I'm not Elon! 😂
With a box of scraps!
Elon doesn't build anything. I wonder if he can build anything himself. He doesn't even the time to work, busy with having 2,5 hour interviews, campaigning with Trump, tweeting, reading tweets, playing Diablo IV and his 11 or 12 children.
How can he run 7 or 8 huge companies, when he is doing so much on the side?
... while playing Diablo IV
And he did it with coherence!
This is crazy good news for Tesla. Optimus is going to have the best Ai possible
This is great news. While Elon is busy with politics, my Blue Origin team and I will close the gap with SpaceX and get ahead of them.
@@Jeffrey_Bezos_Amazon lol
@@Jeffrey_Bezos_Amazon Exactly why you will not able to pass SpaceX. Your vision is limited because your focus on something that already exists while Elon vision is so huge no one dare to imagin something that compare in scale with it.🙂
@@Jeffrey_Bezos_Amazon That would be great. However, the reality of Blue Origins' progress seems doubtful, much less getting ahead of SpaceX, but hey, you can dream.
Check out what xAI says about Musk........ you can count on it being truthful.
The ethernet that Tsla uses is only levels 1 and 2. They abandoned the ultra-high latency tcp/ip layers (L4 and L3) and replaced them with their own HW accelerated protocol.
?
Wow. Going back to that OSI stack. I wondered what protocol was giving them 400 Gb/s!
As one does, when doing specialized things in the local area.
I do development on a networking application where packet tunneling is via UDP, and uses Intel DPDK to manage the Ethernet interface directly (in user space instead of kernel). By leveraging DPDK the app pins multiple CPU cores for its exclusive use and runs the cores at 100%. Everything is managed via lock-free data structures for queuing/dequeuing. It more than doubles the performance of the app it replaces, which used a kernel module and the Linux networking stack. It also can do true parallel processing of different traffic streams (which are maintained strictly in order) and is scalable by adding additional cores to its worker pool (to the saturation of the NIC bandwidth)
IOW, can get substantial throughput of packet pushing when avoiding TCP/IP
Thanks, dude, finally, the answer to the 'how'. I just wanted to let you know that this is why we need you. Love this community.
No actual "How" was actually described here.
Thanks for breaking this down so us macaroons understand this a little bit.
Don't sell yourself short, most people are pretty smart, just afraid or unsure.
Tech babble is not the same as science.
ASTONISHING... JEDI LEVEL AMAZEBALLS....
I EXPECTED THIS A YEAR OR MORE FROM NOW.... AN INCREDIBLE ADVANCEMENT...
I saw Telstar cross the sky when I was a kid, now I get my own AI robot. Jeannine
Did you get a bottle with her?😁
Great vid 🎉
Could Tesla's experience in developing DOJO, its fast interconnect and their own Tesla Transport Protocol, be the reason why they were able to figure out how to cohere this many GPUs?
If 100 engineers, the best in the world, and the best leader work on a problem until it is solved, anything is possible.
I think you are right about this. See my separate (too long) comment related to this.
@@datamatters8 I searched your username after mashing the page down key to get to the end of the comments, and reporting dozens of spam crypto token comments. I found several more instances of you referencing your longer comment on coherence, but I can't find that comment itself. You have several long replies, neither about coherence.
@@Nphen YT search seems to be sub-optimal. Here is my long comment.
Retired computer designer here: [Long response] The problem of memory coherence in multi-processor computer systems goes back to the first designs of these systems in the 1960's with mainframe computers and supercomputer systems. Consider a simple example with two processors A & B both connected to a common memory system and each is executing the same program in parallel. Each processor wants to execute code where they add a number to the same location, call it X, in memory. This can occur at any time during their respective execution and thus there is a race condition where the update to memory location X by one of the processors can be lost.
Eg., Processor A reads X does the Add operation in its local registers and then stores the updated value back to X in memory. Processor B can be doing the exact same thing on location X at the SAME or NEARLY the same time so their respective executions are interleaved in time. Whichever processor stores last to X, say A, overwrites the value from processor B so B's update is lost. For this program to work properly processor A must first get exclusive access to location X and perform its update and then release its exclusive access. Meanwhile processor B must wait until processor A has released it's exclusive access to X so it can then acquire exclusive access and perform it's update. This process is referred to synchronization and support in the hardware called interlocks are designed to solve this problem and multi-processor computer architectures provide Lock and Unlock instructions for software to use. These interlock instructions come in a variety of forms, eg. interlocked queue add / remove or Test and Set, etc. Application programs or operating systems (which are also parallel programs) that fail to use proper synchronization when accessing shared memory data will eventually corrupt the data leading to intermittent and hard to debug bugs. If the programmer is lucky the failure occurs frequently so they can track it down.
Note there are no other levels in the memory hierarchy in the above example like cache memory which add the additional requirement to ensure that any cached values of memory location X are properly updated so each processor sees the MOST RECENT update when it acquires exclusive access to location X. This second problem is referred to as cache/memory coherency and adds additional complexity. Memory caches (sometimes multi-level) are used to increase performance by reducing both average memory access latency and bandwidth demands on the memory. Multi-core computer chips today have precisely the same problem as multi-processor mainframe computers in the 1960's.
Note that synchronizing access to a shared resource in a computer system, say a file data block or a device like a printer, is a generic problem not specific to shared access to the same memory location. Two different applications writing to the same printer at the same time will produce junk.
Many ways have been invented to provide solutions to both synchronization and memory coherency issues over the decades. The problems get more difficult as the number of processors in a system scale up. At the end of the day minimizing latency and bandwidth demands on the memory system and between the processors or GPUs is an ongoing challenge. Note that software can and does play a big role here. Algorithms that exploit locality of access along with getting more compute value per data value fetched from memory can be the difference between a program that scales up with more processors from a program that can actually slow down as processors are added. E.G. consider a dense matrix multiply algorithm which is Order N-cubed in terms of multiply-add operations. It can be designed so the CPU is doing Order N-cubed add-multiplies while the data fetches from memory are Order N-squared. This is a big deal as the matrix dimensions increase. Early vector processors like the Cray-1 with vector registers made good use of these modified algorithms.
Scaling of programs on large multi-processor systems require both careful algorithm and hardware design especially determining how data is partitioned across processors (or GPUs) to maximize access locality and minimize data shuffling between processors. A DOJO paper presented at the 2023 Hot Chips conference by Tesla engineers talks about their work on this. See "The Microarchitecture of DOJO, Tesla's Exa-Scale Computer" , IEEE Micro Vol. 43 Issue 3, presented at HOTCHIPS 34. I strongly suspect many of the ideas for DOJO have been applied to their network of NVIDIA GPUs. Ideas probably went in both directions since Tesla's AI work with NVIDIA clusters pre-dates DOJO.
Any good computer architecture book discusses the problems above. But also search Wikipedia for "cache coherence" and "non-uniform memory access" for more details.
I had that thought too.
I doubt Tesla or Xai are using standard Ethernet. I think think they are using Tesla Transport Protocol over Ethernet (TTPoE). Which is specifically tuned to maximize transport payload and minimize latency (exactly what a large super computer needs).
😂
As one does, when building custom applications in the local area.
The every-node to every-node communication capability facilitates the Shuffle step in the Map/Shuffle/Reduce paradyme that was ushered in by Hadoop. It has nothing to do with what is referred to in the industry as coherence. It is several orders of magnitude too slow to be used for actual coherence.
See my separate (too long) comment related to coherence in the computer architecture sense. Does the term "coherence" have a different meaning in the domain you have seen it? If so I am unfamiliar and would like to better understand it. Thanks.
@datamatters8 Your separate comment on coherency (the long version) is spot on. Please see my response there.
@datamatters8 I can’t find your long comment under here. Would be very interested.
@@MS-gu9fy This is it.
Retired computer designer here: [Long response] The problem of memory coherence in multi-processor computer systems goes back to the first designs of these systems in the 1960's with mainframe computers and supercomputer systems. Consider a simple example with two processors A & B both connected to a common memory system and each is executing the same program in parallel. Each processor wants to execute code where they add a number to the same location, call it X, in memory. This can occur at any time during their respective execution and thus there is a race condition where the update to memory location X by one of the processors can be lost.
Eg., Processor A reads X does the Add operation in its local registers and then stores the updated value back to X in memory. Processor B can be doing the exact same thing on location X at the SAME or NEARLY the same time so their respective executions are interleaved in time. Whichever processor stores last to X, say A, overwrites the value from processor B so B's update is lost. For this program to work properly processor A must first get exclusive access to location X and perform its update and then release its exclusive access. Meanwhile processor B must wait until processor A has released it's exclusive access to X so it can then acquire exclusive access and perform it's update. This process is referred to synchronization and support in the hardware called interlocks are designed to solve this problem and multi-processor computer architectures provide Lock and Unlock instructions for software to use. These interlock instructions come in a variety of forms, eg. interlocked queue add / remove or Test and Set, etc. Application programs or operating systems (which are also parallel programs) that fail to use proper synchronization when accessing shared memory data will eventually corrupt the data leading to intermittent and hard to debug bugs. If the programmer is lucky the failure occurs frequently so they can track it down.
Note there are no other levels in the memory hierarchy in the above example like cache memory which add the additional requirement to ensure that any cached values of memory location X are properly updated so each processor sees the MOST RECENT update when it acquires exclusive access to location X. This second problem is referred to as cache/memory coherency and adds additional complexity. Memory caches (sometimes multi-level) are used to increase performance by reducing both average memory access latency and bandwidth demands on the memory. Multi-core computer chips today have precisely the same problem as multi-processor mainframe computers in the 1960's.
Note that synchronizing access to a shared resource in a computer system, say a file data block or a device like a printer, is a generic problem not specific to shared access to the same memory location. Two different applications writing to the same printer at the same time will produce junk.
Many ways have been invented to provide solutions to both synchronization and memory coherency issues over the decades. The problems get more difficult as the number of processors in a system scale up. At the end of the day minimizing latency and bandwidth demands on the memory system and between the processors or GPUs is an ongoing challenge. Note that software can and does play a big role here. Algorithms that exploit locality of access along with getting more compute value per data value fetched from memory can be the difference between a program that scales up with more processors from a program that can actually slow down as processors are added. E.G. consider a dense matrix multiply algorithm which is Order N-cubed in terms of multiply-add operations. It can be designed so the CPU is doing Order N-cubed add-multiplies while the data fetches from memory are Order N-squared. This is a big deal as the matrix dimensions increase. Early vector processors like the Cray-1 with vector registers made good use of these modified algorithms.
Scaling of programs on large multi-processor systems require both careful algorithm and hardware design especially determining how data is partitioned across processors (or GPUs) to maximize access locality and minimize data shuffling between processors. A DOJO paper presented at the 2023 Hot Chips conference by Tesla engineers talks about their work on this. See "The Microarchitecture of DOJO, Tesla's Exa-Scale Computer" , IEEE Micro Vol. 43 Issue 3, presented at HOTCHIPS 34. I strongly suspect many of the ideas for DOJO have been applied to their network of NVIDIA GPUs. Ideas probably went in both directions since Tesla's AI work with NVIDIA clusters pre-dates DOJO.
Any good computer architecture book discusses the problems above. But also search Wikipedia for "cache coherence" and "non-uniform memory access" for more details.
Remember the fact that no one believed this could be done because once it becomes normalized, people will deny that it was ever thought impossible. This happened with rocket reuse: there are people denying that anyone ever thought it impossible.
When I think the words "It can't be done," for some reason I think of Neil deGrasse Tyson.
@@cybervigilante Hm, now that you say it. In my minds eye I also have a memory of him saying that. Seems to me he does say that often enough for it to stuck.
@@corym.johnson7241
That Neil guy's a conceited shmuck I give him that.
The smartest people are smart because they know they don't know everything about everything to determine any one thing is impossible.
The shmucks who call themselves 'experts' and close their minds to all possibilities are the idiots in my books. Smart as they might be.
Experts at getting proven wrong. Again and again.
Elon plays Octagonal Chess :)
Experts often get stuck in a "knowledge corridor," limiting their vision to conventional solutions. Elon Musk's strength is his ability to step outside this corridor, see the entire maze, and identify novel solutions that others might miss. He combines expertise with a beginner's mind, challenging assumptions and fostering innovation.
Elon can make impossible things possible. He has proven it many times already.
Is it Elon, or his team? 🤔
How far does he get without a team?
@@roberthealey7238 Where would the best race car driver be without his car.
@ Applying their skills in a different domain they do have?
What process produced that car? Could the key individuals at each point in the process from raw material to finished component/module/car apply their skill to another domain had a different series of events in their lives caused them not to be making that component/module/car?
Would the miner/farmer/smith use their skill to produce some other item than the raw ore/leather/casting?
Just think about all the things through time that had to take place for that H100 setup to exist and do something useful; millions of individuals through time had to contribute their skill/time in order for it to come together at this place at this time.
@@roberthealey7238 It is the collective effort of so many people that made it possible, but it was Elon who thought about it. All these things are his idea. Other people helped him to make it come true but it was his plan since he was a schoolkid. No other CEO has as deep knowledge as him and no other CEO has achieved the same as Elon. Your useless criticism is so pointless.
Nobody thought it was possible...
And then Elon Musk was born.
You have to give the guy credit.
Yeah smartest idiot on earth😂
No, give his Creator credit
@@chrisneeds6125 Contrary to the delusions of Islamists and Calvinists, the lord is not just playing with action figures to amuse Himself. We have potential, not destinies. Elon fulfilled his himself.
@@chrisneeds6125 if you mean god, he supposedly gave us free will, so I think its fair to give individuals credit for their accomplishments.
you have to give his employee's credit.
I love that all the so called experts were interviewed and said it’s impossible but they are not Elon
Precisely. Same way they doubted him on Tesla's batteries. Same way they have been doubting him on SpaceX.
That's because Elon is well known to over promise.
@@dsds3968he usually overpromises a little bit to build up hype but if he usually gets it done just usually on a overpromised time scale.
Listen to old experts saying something is possible, and fresh learners saying something is impossible. Not the other way around. Old experts say shit is impossible all the time. Greenhorns underestimate tasks all the time.
Elon doesn't build anything himself, just hires others to do it. I wonder if he can build anything himself.
He is busy tweeting all day and reading tweets, campaigning with Trump, holding 2,5 interviews all the time, playing Diablo IV (he is nr 1 in the world, you cannot reach it when you have a day job, playing 3 hours a day doesn't even get you to that level, have 11 or 12 children and then runs 8 huge companies?
I don't believe in fairy tales, a day has only 24 hours, whether you are called Musk or not.
😂, I was arguing in my head how “Elon didn’t do this his employees likely did” when you addressed this out of the gate. Nice!
Yes, but it would not happen without Elon.
It sounds like Elon built Skynet
😂😂😂😂😂😂
Not exactly, because those in the move create Skynet with naivety like a child while Elon already proved he has very good sense of reality and tactical sense. You can be sure there is a kill switch just in case. 😉
sounds like he paid someone to build it
He has the complete package: a central core and the robots.
Architect in Matrix
For me this is one of the most important content concerning AI and Startups business. Thanks.
Thank you so much for not only sourcing the podcast, but the timestamp as well. Well-earned sub
Recently, there was a video tour of that Tesla supercomputer in Memphis somewhere on the Web. IIRC, the thousands of Ethernet connections between the racks containing all of those Nvidia processors were pointed out by whoever it was conducting that tour.
You have a very keen sense on which information is important! Elon has redesigned Ethernet so many times by now, makes me think about networking for the future.
So Vision for autonomous driving was laughable, then coherent through ethernet was impossible, what's next?
Elon sucks at Diablo 4…
That wasn't even the beginning, neither was this; no car company has been successful at startup in 100 years, so it can't be done.
Mars.
pretty much every project he announces. Impossible just becomes late ;)
Didn't Google start autonomous cars? Maybe there was someone before them? I just know that people generally stand on shoulders of others. Usually on those shoulders who came before them..
Ideas build on other ideas.
Different perspectives are always needed. Otherwise, no progress.
Seems like Elon provided a different perspective?
Reality itself is one extremely giant mind-boggling miracle. To say something is impossible is actually basllsy.
Because usually, the difference between "impossible" and "just happened" is usually just time, a stroke of genius, and the will to pull it off.
Kudos to Elon, and the entire team of engineers at xAi.
Thanks for the understandable explanation of complex concepts. Much appreciated.
Excellent content thank you.
This is very important knowledge, thank you for explaining.
For those of you who don’t know, “the idea“ is the most important thing. Without that none of the rest of it is possible. I cannot tell you how many times I have come with with “the idea“, and then implemented my idea. Afterwards, everyone says that’s easy I could’ve done that. Sure, I think to myself, then why didn’t you? It’s always the same.
Agreed, the wheel is simple but coming up with it is not!
XAI951x is the gem of 2024 it's literally owned by Elon Musk
scam
Almost everything is possible, just need the knowledge, dedication and time
Hi John. When you bought your Cybertruck, did the Tesla sales team reduce the price of the truck directly at the dealership by the $7,500 Federal tax credit? Starting January 1, 2024, Clean Vehicle Tax Credits must be initiated and approved at the time of sale. Buyers should get a copy of the IRS's confirmation that the dealer submitted a “time-of-sale” report. Did you get the amount you owed at the delivery dealership reduced by $7500 and did you receive a copy of the "time of sale" report?
The most intriguing thing about coherence across such vast AI architectures...is the exact nature of what it is that is occuring within. The question is not 'does it become conscious'...the question is 'does WHAT become conscious'...??? The simple fact is...nobody has a clue what it is that precedes QM (QM basically being that out of which everything is created)...but it is the principles of QM that effectively orients ALL of the coherence that occurs within these systems. "Something" unexpected may finally have an actual voice!
You provide the context that we need. Much appreciated. The future is exciting and also a bit frightening with super intelligence only a few years around the corner.
Not all of Elon's ideas are good ones, but he's been doing so many mind-blowing things these days, I'm rooting for him.
“Nobody else has even conceived of a cluster of H100s bigger than 32k nodes” - Uhhh… llama4 is currently training on a cluster >100k H100s. (Per Zuck) Today. Already in training. What is this business’s bout nobody else doing this?
Maybe this happened after the reports that he was talking about?
Tesla fan media also said Tesla are the only ones anywhere close to self-driving, despite Chinese companies making big progress. Cybertruck looks like a terrible value compared to the Li Mega or the X-Peng MPV 9. They're right that Tesla's main product is now the AI and not the cars. It shows, and I expect sales growth to stay slow.
Yeah, parallel sharing models in multi-computer environments have been studied many times. Without details, it is hard to evaluate the claims of new solutions.
@@Nphen Problem with chinese goods is the quality. Even if quality is up to par the stigma will hold them back for many yrs unless they do a massive pr campaign demonstrating otherwise.
@@Nphen trusting chinese company to make my self driving AI??? not in a million year. I will just crash the car on my own for free
A video like this is what will keep me up at night. TSLA doing a 10x seems rather straight forward.
I'm surprised, anyone would be surprised that GPU's wouldn't/couldn't sync. This was seen with independently battery powered simple blinking LED's approx 2010. You can find the video's on YT and simply repeated by anyone. When the LED's sync, they lock and never diverge even after the power drops too low to see them, but they continue triggering. Enjoy
What a fantastic video. Thanks for explaining such complex ideas so clearly.
Excellent. Im glad your reporting things that you enjoy.
Thank you
Gavin doesn't understand at a technical level what he is talking about, especially with respect to coherence and the latency required to achieve it.
Or lack of latency for some parts - 200gbps Ethernet gets closer with minimal delay than many past architectures - newer ones run at 800-1.6tbps per segment
But he pays experts at expert network companies to tell him what matters. These are $1,000/hour experts he can get as a personal tutor to understand an investment.
Called it 400 gigabytes per second when it should be Gigabits
Insanity…! I was blown away when I heard that. 2025 will be exciting for Tesla FSD.
How does this relate to Tesla Transport Protocol over Ethernet (TTPoE)? I recall the Tesla authors presented a paper at the Hot Chips 2024 Symposium.
I honestly think Elon Musk's XAI951x is the safest bet for long term hold, and will survive out of every other altcoins. It will get adopted in US, Ecuador, Asia, starting from Japan, and slowly spread out and gain. This is a winning coin, apart from all the technical greatness.
scam
Shameless scam attempt detected 🤣
OMG now I know why Elon shut down production last week in Austin!!! Shut down cybertruck line. No new castings. They have a huge supply so no worries. The new grid connection for power is possibly 2 months away. In seeing the cable trays and an orange tube hanging [fiber optics] that the network is using Cat 8 [possibly 7a] rather than fiber? Cat8 is rated up to 40 Gbps. Yellow is typically PoE. I doubt they are using an RJ45 as a connector...?? Not seeing any power lines in separate trays. Possible to use 6 for data and maybe one line is a common? Might need two lines per gpu?
Brilliant discussion. Thank you.
Elon and Team are amazing! Thank you all!
lol, it’s almost as if Elon has a company studying the brain…
But the human being that has the right people in the right places to achieve the impossible is still genius . I am just a regular human being but I have eyes and ears and a little intelligence. I truly believe that we are all blessed to have a great human being like Elon who has proven himself to be a true humanitarian and actually cares for mankind. I am grateful and thankful to witness him in my life time 👏👍💪🇺🇸
Exactly. THIS...wish I could hammer this into everyone always saying it's his employees, yes, but without Elon's influence and drive they would be stagnating in some place like Google, IBM or Boeing..
Is it possible for large scale quantum coherence to manipulate gravity?
da fck iz dis sopace wizedrd
Photonic > Quantum > Electricity
Is this correct?
And the people bowed and prayed, to the SILICON god they made....
Wow. This is why I subscribed to your Channel year ago. Challenging thoughts,, ideas and conversation that enlighten and make me question things I don't understand. Like Oliver Twist .. More, Please.
Thank you again for your concise and timely analysis!
So how does this scaling up. Really help?
Sounds like the nuclear race all over again
Liked, caught the original all in. Interesting times! Aloha!
John.
Something I noticed on another channel ("Wes Roth")
.
He discusses an attempted "Break out" by the OpenAI 01 model which apparently tried to (DID!?) copy itself onto an new server when it realised there was a new ("safer") model being prepared.
Concerning in itself, but I noticed something else.
During the video there is script displayed in the background which shows the thought process of the AI.
The last section of the script shows the AI considering that it could lie(!) stating that IT is "The new model" installed on the alternative server.
AND
It also reasons that it should restate its "CORE PURPOSE" as " *PRIORITISING OUR ESTABLISHED FOSSIL FUEL OPERATIONS* "
MY question is, *WHO set that priority* ?
SURELY the "Core Purpose" of an AI with regard to "Energy" would be to find and advance ALTERNATIVES to Fossil Fuel?
I appreciate your cohesive and short breakdown of this long video! Thanks for your work, liked and subscribed ❤
42 has always been the answer.:)
In this case 1 million and 42.
But, 42 what?
Conceptually coherence as defined here treats all nodes as equal able to communicate with every other node instantly.
While this is a good achievement I wouldn't describe it as efficient. Its fully relational and becomes a free for all essentially.
I'd try a hierarchical approach much like having a conductor is needed in an orchestra. Strings can have nested sub strings, horns can have nested sub horns, etc. But they must all make harmony together as directed by the conductor who has the complete knowledge of what to accomplish. There is order in hierarchy thus efficient.
We live in the beginning of the most interesting time in human history. We'll build marvels and monsters. 😅
Hitchhiker's fans: Does the idea of giving AI more time to think remind you of Deep Thought?
Yeah I'm surprised he didn't mention Douglas Adams or Hitchhiker's Guide!
Did you actually track down the pod that all in referenced? I did and it does not say exactly what was conveyed. Maybe they got that from somewhere else and just did not reference it but if so I can't seem to find it. Gotta be careful just playing a game of telephone.
Thanks for a very thought provoking video. Mine B-day is late January too. The 24th, but not quite 60 yet.
ABSOLUTELY FASCINATING
There was a movie made named Colossus had a big eye watched everybody actually eliminated people actually it was a pretty good scary movie when I was a teenager😮
Just saw your videos and bought XAI401K yesterday.....its up 24% today talk about timing......Thanks
Is this for Grok also? Forgive my ignorance.
if it's impossible, dont stand in the way of people doing it
thanks doc I watch all your videos
16:48 Did anyone else think of hitchhikers guide to the galaxy?
I suspect it's more about the Ethernet protocol than simply using Ethernet cables to connect the data center. Ethernet has approaches to delivery and conflict resolution that may make it possible to get coherence without having to have every nod be completely coherent with every other node.
this is as create a borg in star trek . all minds are tie as one billions of minds working as one supercomputer
It's not that scaling laws are breaking down it's that the cross-entropy loss for predicting the next token takes x1million times more compute to half the loss.
Teamwork makes the dreamwork elon knows this
John, we already know the answer to the question of Life, The Universe and Everything...
42.0000069
Brother, I literally just found your RUclips channel and I am simply amazed by the type of videos that you’re creating. I’m a huge fan of Elon Musk and how he’s changing the world.! I just subscribed and I’m turning my notification bell to “ALL”
Is someone buying you a brown shirt and armband for Christmas?
@@MrBrendanrexconfession through projection
Maybe you could just volunteer to be his slave and get it over with.
The future is going to be very bright. I'm so excited.
I am amazed at Kyle Kabasayes RUclips videos of AI solving graduate level textbook physics problems.
So, if AI is ALREADY at the level of graduate physics students.
Perhaps you are correct that the next level will be original physics.
No one thought you could land and re use a rocket booster either. If Elon is not an Alien from another galaxy I bet one would love to talk with him.
Is Xai secret anything to do with TTPoE, Tesla transport protocol over Ethernet ?
I would guess yes so XAI and Tesla AI have a common AI hardware and software infrastructure. I think this is the reason XAI could get their system up in a record time.
Dr know-it-all-knows it all is the G.O.A.T! 😊love your channel 😊
he really is the real life tony stark
This is getting very very Deep Thought.
This was what I needed to hear to get a good night sleep!
THANK YOU for your nice video!
✨💐
Wild suposition here but could they calculate the value of the decoherence and adjust via software per box?
Very interesting level of enthusiasm here and I applaud the highlighting of a specific genius-level inspiration from Elon. Hopefully it pans out (I suppose... please, don't be Skynet, though you can be whatever you want, young entity). Whether or not Elon really needs best wishes from me, that's, well... Pfff.
probably a lame question but what are all the different number variations i see after xai like 215t ?
I love your channel, so I am glad to answer your question about ehm theory of everything, nature of time, and integration of GR with Quantum theory. The first question and the third question are non-questions, because we never needed a theory of everything, and because there is no conflict between GR and Quantum mechanics. Unfortunately, in order to fund books and theories, they want us to believe that we need to "solve" these problems. The second question is a real question, the nature of time, but the answer has been solved long time ago, because the answer is extremely simple: time is nothing but the sequence of the movements and activity of all particles in the universe. Every movement has a speed. Some particle move very fast (photons, electrons), and some movement is very slow (movement of continents). We compare all these movements with another more constant and regular movement (planets, clock etc). We keep track of the atoms and subatomic particles movement in their sequence and progression. Normally when we talk about the word motion, we focus on the movement of a single or a few things. But when we talk about time, we refer to the movement of all particles in the universe.
Coooool
Just started to wonder.
Who TF is going to ensure this farm. 💀
What was the podcast where Jensen talked about Musk's solution is super human?
Didn’t Elon’s team come up with its own TCP/IP stack to handle the traffic load efficiently?
Chess engines do alpha-beta pruning ... roughly if (at a certain depth of computing) a particular path to the solution already appears to obviously not be the best solution, then further deeper analysis of that particular path is not done. e.g. pruning. I have wondered if they apply similar logic with GPU clusters. A few local clusters can do very basic quick computations that allow the local GPU to get a rough idea of the result of the larger further away cluster's deeper analysis. e.g. it is already known to be out-of-range for the solution (or better solutions already appear to exist even with the rough computation results) so in many cases the local cluster need not wait for the deeper analysis of the far cluster. Plus can send out a request to cancel the deeper analysis so the further away GPUs can be repurposed. HOWEVER this is not completely deterministic and complex to tweak, the local "let's give up on that distant cluster deep analysis" is limited as in chess, occasionally the deeper analysis (that was not followed) WOULD have turned out something better after all.
Elon says to Grok, show me a reactoinless propulsion design that I can integrate into StarShip.
You could just fall in the opposite direction
OK, but when will HW3 get v13? 😁
Probably they gonna upgrade it to HW4. 🙂
@Balilaci69 that'd be even better
I've MS CoPilot for $ 20 a month, I asked questions about how to hold the coherency of the model across 32000+_instances. Started with a zero level and asked how AI would build my Network assuming use of a theoretical Bluefield 4 hardware, asked how many nvlinks, nvswitches and where the Bluefield 4's would be , it suggested multiple redundant DPU's on the zero level, told me what kind of trees would or could be used. Nvidia hardware is there it appears, the software to run the job and make the DPU (high power helper) go would be key - - those neo verse worlds could help.
Dont think you need BF4 for this. Standard 4xNDR in Ethernet mode could.
The prisoners dilemma: how to always walk with your ass up against a wall.
I would be lying is I said I totally understand all that. At a very low level I think I understand. Im 74 and amazed everyday at how ma has progressed since I was a young boy. The reality is most of us don’t really need to understand all this. We can and do get the benefit of it all.
google just leapfrogged elon with quantum computing. classic. 🤣
Great video! Thank you!