This was a wonderful presentation of beyond cutting-edge network technology created by Tesla. Dr. Know-it-all did a masterful job on this one! Kudos!!! ❤
This discussion isn’t necessarily new. It just gets shut down due to the high costs of implementation. We talked about such a system for a network application for ISS and it never got out of the talking phase. Non-technical managers at typical companies never want to absorb the cost or the risk.
Seems like IP theft unless xAI is licensing this tech from Tesla. Hoping Elon has all of his bases covered on these technology transfers between Tesla, a public company, and his private ones. I understand that IP has actually flowed in both directions over the years, including SpaceX.
@@ericmckinney5898by the time Tesla installs the blakwells they are going to be up to speed with the learnings of x and it will probably be invaluable for Tesla imagin that they will have the largest computer in existence and the second largest
DO you think the various media houses will ever publish for the public to understand that Tesla is miles ahead of anyone? Elon is truly the Techno King
The fact that they found a way to scale a single cluster as much as they want, while all the others are limited to 25-30k GPUs, is massive. This, along with the fact that they build it so fast, gives them at minimum a 14-18 months advantage in compute power (massive advantage, especially if they scale it to 200-500k GPUs). Grok will not have a competition in 2025 and most likely well in to 2026, as the scaling laws are far from reaching their limits, and xAI will be the only company (well, also Tesla, but it's not direct competition) to reap the massive benefits of all this concentrated power.
Not sure why you did not know about this, as Tesla presented Dojo and the Tesla Transport Protocol (TTP) at the Hot Chips 34 conference in 2022, where they showcased the microarchitecture of the Dojo supercomputer and its custom protocol.
Hi Dr. Know-It-All, It seems there may be some confusion here. First, you mentioned sub-millisecond latency, but the XAi node cluster actually uses a **400GbE backbone** between nodes, achieving **sub-microsecond (30ns)** latency, which is about **30,000 times faster** than what you referenced. To put it into perspective, the latency for a round trip over Ethernet is in the **same range** as a **DDR5** local memory lookup, meaning a cluster machine can access memory on another node with nearly the same latency as accessing its own local memory, with a slight reduction in bandwidth (about half the bandwidth). You stated in your video that TCP can only be implemented in software. This is incorrect. High-end routers and switches have **hardware-enabled TCP** and can process **TCP packets at wire speed**. In the case of Tesla, the protocol is interesting because it removes the **TCP/IP overhead**, making Ethernet a **reliable transport protocol**. It also addresses some of the limitations in the standard **TCP flow** and **handshake**. **In other words, they can use less expensive switches** while still achieving high performance. Thank you for bringing this to my attention. I’m interested to see how this develops and whether they have a deal to manufacture Ethernet cards with the protocol builtin or if they simply used RoCE or have hardware built that can doe TTP.
I wonder if Tesla now has silicon implementing TTTP and being deployed in Texas and Memphis. If they do this can give them 1-2 year lease on large coherent AI clusters. Also explains why Elon is buying as many GPUs as he can as he can leverage that advantage in this 1-2 year window. If so, it would be a HUGE!! Win!
Yeah, I was thinking the exact same thing. Pretty sure it was a million. If he did say a billion, well all that it represents then is that this method of supercomputing is scalable, based solely on human's time, effort, and resource availability. If so, humans can make it happen to the billion.
@@SirHargreeves It all depends on hardware efficiencies happening over time, for example making processors with smaller nanometer wires. Maybe we get down to the micrometer, etc, and you can pack more in at a higher energy savings. It all depends on where our roadblocks are.
Thank you for the detailed explanation. I think Mesochronous computing is a stroke of genius -- so very out of the box thinking, and once you see it, the benefits of it are obvious. Some of the aspects of this architecture seem to incorporate the notion of bounded eventual consistency -- where "eventual" here is a very short time interval.
Very interesting video. I'm not a tech guy, but this video gave me a really simplified and basic idea of what's going on with all this coherence within AI/GPU clusters stuff. Thanks, Doc! ❤
So sounds like they effetively made a new nic/transceiver for L1 - L4. Makes sense. Just added 2 osi more layers in the hw to reduce latency in network and transport layer to wire speed rather than having the os process it. There actually are innovations that do something similar, just not for this purpose. The timing system really makes the system scale and cohere.
There is still a wall. Tesla just moved the wall to the back 40. New hardware provides most of the leap in scaling and coherence. AI-based system configuration and monitoring (pytorch) provides in cluster virtual machine design and monitoring, off cluster dataset organization and data compression/expansion, and training cycle control. Once again, Tesla and the Xs are showing the world that factory design is far more important and far more complex than the products produced by the factory.
I know likening a supercomputer to a human brain has been done a billion times, but this new development is REALLY starting to mean we are creating something analogous (and more powerful).
Today's supercomputers are still digital deterministic devices. We are learning ways to make them perform like an analog non-deterministic brain. One of the next quantum leaps in computing will be the return to analog computing.
I'm not any networking engineer, but hearing you lay this out, to me sounds like the same process is happening from how we went for single core CPU chips to more and more multi core chips. I wonder if that was some of the thinking of correlation.
I'm 54 and my wife and I are VERY worried about our future, gas and food prices rising daily. We have had our savings dwindle with the cost of living into the stratosphere, and we are finding it impossible to replace them. We can get by, but can't seem to get ahead. My condolences to anyone retiring in this crisis, 30 years nonstop just for a crooked system to take all you worked for.
I feel your pain mate, as a fellow retiree, I’d suggest you look into passive index fund investing and learn some more. For me, I had my share of ups and downs when I first started looking for a consistent passive income so I hired an expert advisor for aid, and following her advice, I poured $30k in value stocks and digital assets, Up to 200k so far and pretty sure I'm ready for whatever comes.
@@ЕленаФирсова-ц6м The crazy part is that those advisors are probably outperforming the market and raising good returns but some are charging fees over fees that drain your portfolio. Is this the case with yours too?
While I think your assessment is correct that the current TTP can’t be flexible, that’s by design criteria. I would expect designers could model for the 90% case in other environments initially between servers in server farms, then along hops. They’d be most determined to do so as traffic could be increased with more efficient transport. Then finally the last mile would benefit greatly.
It seems as though the timing parameters, such as the background timers, could be programmable so that it could be used in slower environments, similar to the way TCP/IP works. Back in my days as a software developer, anything you could do in software or could be done in hardware. It just costs more!
@@coachkevinyoung yeah. The innovation the Elon's team is taking advantage of is the ability to make chips much more cheaply than in previous times. I think your saying should be updated to say: You can do anything in hardware or software, but it'll cost you money or compute cycles.
I think the bigger point on this information is the TTP (Tesla Transport Protocol) that is in hardware. I'm thinking about this and by the time you recognize the issue, think of possible options with current technology, design the chip and protocol and submit it for manufacturing, and test and get it into production, Tesla must of started this will over a years (or two) ago. They recognized and projected a bottleneck that they would not hit for a year of two and worked on a solution. This is a crazy amount of steps ahead of their competition that today didn't think is was possible. This is truly a company looking into the future to see what is needed now to remove roadblocks tomorrow. Crazy.
So often, Elon, or aka Elon companies, create geometric innovations in technology by applying basic physics principles to real-world conundrums that have alluded some of the biggest companies in technology for decades. 😂❤ I mean if your going to have a mantra first principles thinking is a pretty good one. 😊❤
A lot of early computers had large programs segmented into 4096 byte blocks. The maximum addressable program+data space was 0 to 4095 (12 bits). So a 16 bit virtual address space was a huge improvement, especially when each word addresses increased to stored 16 bits or two bytes.
Some of this smells like bullshit. I’m pretty sure we’ve been doing TCP/IP on server NIC hardware for about 15 years now some of these NIC’s are so powerful they can run VM’s on the NIC. And to be clear, your internet is hardware based, has been for the last 20/25 years or when ever it was that the Cisco BFR came out. Now a faster, higher frequency clock is a real improvement. Same with caching more on hardware. But again, it’s still incoherent and will crap out once your timing shift exceeds your clock frequency (minus the error rate). I really like the idea of the clock offset creating a wave of partitions that have enough bandwidth to complete their ops before the next wave of ops, re-use the same bandwidth. But a clock tic is a cycle, and what we can stick in that cycle depends on how well/fine we can divide the tic.
I think having layer 3 ipv4, v6 in hardware is very common. Am I mistaken? Sure, years ago you could run that layer in software on say a Linux box, and that becomes your router, but nowadays everyone impliments that in hardware I think. There may be exceptions like software defined networks, but you'd never use a SDN for a high performance cluster or network.
Ford invented the production pipeline, Tesla/AI invented the AI training pipeline that’s constantly moving along, not a “step” at time but continuously flowing 🥳😱
I thought Elon’s words were 30 to 100 thousand and up to a “million” GPUs, not a “billion”. A billion is way too many, even for Elon’s optimistic past estimates.
So the race is on who can spend the most and get them connected in mass? By submitting the pattend did the secret get out? Who has the best AI today and into the future? Was does this mean for power consumption? Should this back and forth save heat and therefore be more efficent? Where do the lost nodes go?
this is bigger than google's gemini 2.0 or openai's pro mode releases. elon's computer clusters will give agentic generative workflows will have superior reasoning due to this coherency. 2025 will end up being a big big win for reasoning auto agents and agentic swarms.
So if this approach is the only way to string together hundreds of thousands of GP used together and Tesla holds the patterns does that mean? Tesla for the foreseeable future will be the only ones in town with super massive data centers
Software is cheaper the silicon, so, that's why tcp/ip is software based. Wait until the cam put the entire nueral stack on silicon, instead of a inference acceleration device, the driving ai is a dedicated chip built from training, probably need something akin to a foga with a gpu vlwi structure. Also the ttp seems to be very buffer bloat avoidant 256k storage points right to it
Some vendors use an entire silicon die to hold n* GPU cores, communicating with n*n on-die photonic fabric channels. This is another excellent scalable technology. It reconfigures around failing cores and cache, so resiliency is great.
They found out that training longer on a given set of data will improve the inference dataset. The resulting inference neural net works better, faster. Training on faster hardware also helps. @@philipp594
Many data constraints on training FSD can be reduced using generative AI to expand the existing datasets. Every time you change the model for your AI, you have to train the new version of the model and then validate the model for performance while screening for new emergent errors and reemergent old errors. Then you need to make corrections to the new model and repeat the process. It is much like the old days when program steps were punched into cards one card at a time. You did not learn you had an error until you waited overnight for the program to run. Reducing the time between runs to less than the time to correct an error changed computer programming forever. We are still looking for a method to achieve the same transformation to the process of building AIs. At this time using AIs to design, generate data for, train, and validate new AIs holds the greatest promise to approach this transformation.
These folks are really not the first to do this ditch Ethernet and do it in hardware.. Inmos OS and DS links (mod 80s) Fujitsu K computer (and its forebears), iBM BlueGene etc...Worth noting that INMOS DS links supported wormhole touting in the early 90s...
How come in that video on the all in pod they act like no one else has been able to get up to 100,000 gpus when meta has as far as I can tell from the headlines.
What is a GPU here? A GPU chip and even many processor chips contain many individual GPUs. So in this context, is a GPU an entire Nvidia chip, for example, or just one of the GPUs in a chip?
When it works *_coherently,_* the entire building / datacenter contains a *_single huge GPU,_* no matter how many • Cores, • Pixel pipelines, • Vertex shaders, • Memory addresses, • Texture mapping units, or • Render output units are contained in any chip, board, rack, or the entire datacenter.
This video will cause or lead a Lawsuit against Elon Musk and XAI for sharing/stealing Tesla patent technology without Tesla board approval & proper payment/compensation
@allangraham970 A lawsuit has already filed against Elon & Tesla for apparently transferring Nivida GPUs from Tesla to XAI , so idk how many things r actually legally contracted & how much of it is Elon doing his own thing without thinking about bureaucracy
Question: Is Tesla making their own factory robots? They bought a German company some time back that we haven't heard from in some time Is the company they bought helping with Optimus? Unboxed Method?
Just swapped all of my last ETH and swapped it into XAI600K. Already up a little bit. Unfortunately I have some other junk staked which won’t free up for a while. Still now I am on the train!
Your ability to synthesize complex topics and make them digestible is remarkable - no mistaking that you're a fantastic professor. Thank you!
Uuiuuuuupuooooooooououoopportunity to get the job and the kids okoooooooooyoyyykykkkkkkkkkkkkkkkykykkky
PThis
This was a wonderful presentation of beyond cutting-edge network technology created by Tesla. Dr. Know-it-all did a masterful job on this one! Kudos!!! ❤
Thanks!
Really appreciate the depth and effort. Best
This discussion isn’t necessarily new. It just gets shut down due to the high costs of implementation. We talked about such a system for a network application for ISS and it never got out of the talking phase. Non-technical managers at typical companies never want to absorb the cost or the risk.
As a TSLA shareholder, I think TSLA should have gotten $5B worth of xAI or more for the use of our patent that was central to xAI success.
Seems like IP theft unless xAI is licensing this tech from Tesla. Hoping Elon has all of his bases covered on these technology transfers between Tesla, a public company, and his private ones. I understand that IP has actually flowed in both directions over the years, including SpaceX.
I suspect you'll get more than 5B in value back from what xAI is doing.
@redneckcoder. Thru what mechanism?
@@ericmckinney5898by the time Tesla installs the blakwells they are going to be up to speed with the learnings of x and it will probably be invaluable for Tesla imagin that they will have the largest computer in existence and the second largest
@@ericmckinney5898 R&D for effectively free.
DO you think the various media houses will ever publish for the public to understand that Tesla is miles ahead of anyone? Elon is truly the Techno King
They'll claim it at some point when it's undeniable, then magically say they've been saying the whole time.
No because politics. And billionaire bad.
The fact that they found a way to scale a single cluster as much as they want, while all the others are limited to 25-30k GPUs, is massive.
This, along with the fact that they build it so fast, gives them at minimum a 14-18 months advantage in compute power (massive advantage, especially if they scale it to 200-500k GPUs).
Grok will not have a competition in 2025 and most likely well in to 2026, as the scaling laws are far from reaching their limits, and xAI will be the only company (well, also Tesla, but it's not direct competition) to reap the massive benefits of all this concentrated power.
this is the best AI news of the year... that xAI did this leap!
Not sure why you did not know about this, as Tesla presented Dojo and the Tesla Transport Protocol (TTP) at the Hot Chips 34 conference in 2022, where they showcased the microarchitecture of the Dojo supercomputer and its custom protocol.
At least he didn't call the new patent, Skynet!
I am very sure he at least thought about it 🙂
Hi Dr. Know-It-All,
It seems there may be some confusion here. First, you mentioned sub-millisecond latency, but the XAi node cluster actually uses a **400GbE backbone** between nodes, achieving **sub-microsecond (30ns)** latency, which is about **30,000 times faster** than what you referenced. To put it into perspective, the latency for a round trip over Ethernet is in the **same range** as a **DDR5** local memory lookup, meaning a cluster machine can access memory on another node with nearly the same latency as accessing its own local memory, with a slight reduction in bandwidth (about half the bandwidth).
You stated in your video that TCP can only be implemented in software. This is incorrect. High-end routers and switches have **hardware-enabled TCP** and can process **TCP packets at wire speed**. In the case of Tesla, the protocol is interesting because it removes the **TCP/IP overhead**, making Ethernet a **reliable transport protocol**. It also addresses some of the limitations in the standard **TCP flow** and **handshake**. **In other words, they can use less expensive switches** while still achieving high performance.
Thank you for bringing this to my attention. I’m interested to see how this develops and whether they have a deal to manufacture Ethernet cards with the protocol builtin or if they simply used RoCE or have hardware built that can doe TTP.
I am impressed by your very concise summation
Oh my, how my head hurts.
I wonder if Tesla now has silicon implementing TTTP and being deployed in Texas and Memphis. If they do this can give them 1-2 year lease on large coherent AI clusters. Also explains why Elon is buying as many GPUs as he can as he can leverage that advantage in this 1-2 year window. If so, it would be a HUGE!! Win!
John I love this content! keep it up!
Key invention is the replay data path. OMG, that is brilliant. I never would have thought of that!!
Just an awesome explanation John, you are the best mate!
A million cpus not a billion I think is what was said.
Yeah, I was thinking the exact same thing. Pretty sure it was a million. If he did say a billion, well all that it represents then is that this method of supercomputing is scalable, based solely on human's time, effort, and resource availability. If so, humans can make it happen to the billion.
You’d need a Dyson sphere to power 1 billion
Musk said a million, yet maybe this new hardware architecture can support a billion?
@@SirHargreeves It all depends on hardware efficiencies happening over time, for example making processors with smaller nanometer wires. Maybe we get down to the micrometer, etc, and you can pack more in at a higher energy savings. It all depends on where our roadblocks are.
Musk did reply to a post on X re billion GPUs.
It’s a possible future, not that far away.
Everything is crazy anyway.
gonna need to watch that twice. thanks doc
you are an amazing teacher
Fantastic breakdown as always ! If you're competing against musk prepare for Innovation things are not going to stay the same
Thank you for the detailed explanation. I think Mesochronous computing is a stroke of genius -- so very out of the box thinking, and once you see it, the benefits of it are obvious. Some of the aspects of this architecture seem to incorporate the notion of bounded eventual consistency -- where "eventual" here is a very short time interval.
Good job on explaining the use of hardware vs software in the management of packets distribution/reception conformation. Well done.
Very interesting video. I'm not a tech guy, but this video gave me a really simplified and basic idea of what's going on with all this coherence within AI/GPU clusters stuff. Thanks, Doc! ❤
❤
This basically like FPGA computer vs software maybe 100X faster
So sounds like they effetively made a new nic/transceiver for L1 - L4. Makes sense. Just added 2 osi more layers in the hw to reduce latency in network and transport layer to wire speed rather than having the os process it. There actually are innovations that do something similar, just not for this purpose. The timing system really makes the system scale and cohere.
I always like it whenever @drknowitall says; "In others".
That's usually were he dumb it down for me 😅😅😅😅
There is still a wall. Tesla just moved the wall to the back 40. New hardware provides most of the leap in scaling and coherence.
AI-based system configuration and monitoring (pytorch) provides in cluster virtual machine design and monitoring, off cluster dataset organization and data compression/expansion, and training cycle control.
Once again, Tesla and the Xs are showing the world that factory design is far more important and far more complex than the products produced by the factory.
I know likening a supercomputer to a human brain has been done a billion times, but this new development is REALLY starting to mean we are creating something analogous (and more powerful).
Today's supercomputers are still digital deterministic devices. We are learning ways to make them perform like an analog non-deterministic brain. One of the next quantum leaps in computing will be the return to analog computing.
I'm not any networking engineer, but hearing you lay this out, to me sounds like the same process is happening from how we went for single core CPU chips to more and more multi core chips. I wonder if that was some of the thinking of correlation.
That was a great explanation
I just learned so much.
Brain exploded 🤯
I'm 54 and my wife and I are VERY worried about our future, gas and food prices rising daily. We have had our savings dwindle with the cost of living into the stratosphere, and we are finding it impossible to replace them. We can get by, but can't seem to get ahead. My condolences to anyone retiring in this crisis, 30 years nonstop just for a crooked system to take all you worked for.
I feel your pain mate, as a fellow retiree, I’d suggest you look into passive index fund investing and learn some more. For me, I had my share of ups and downs when I first started looking for a consistent passive income so I hired an expert advisor for aid, and following her advice, I poured $30k in value stocks and digital assets, Up to 200k so far and pretty sure I'm ready for whatever comes.
@@ЕленаФирсова-ц6м That's actually quite impressive, I could use some Info on your FA, I am looking to make a change on my finances this year as well
@@VivekLuna My advisor is *MARGARET MOLLI ALVEY*
You can look her up online
@@ЕленаФирсова-ц6м The crazy part is that those advisors are probably outperforming the market and raising good returns but some are charging fees over fees that drain your portfolio. Is this the case with yours too?
Elon‘s companies are just amazing ❤
So if Meta wants to do it Teslas way they would have to pay Tesla for the patent usage????
While I think your assessment is correct that the current TTP can’t be flexible, that’s by design criteria. I would expect designers could model for the 90% case in other environments initially between servers in server farms, then along hops. They’d be most determined to do so as traffic could be increased with more efficient transport. Then finally the last mile would benefit greatly.
It seems as though the timing parameters, such as the background timers, could be programmable so that it could be used in slower environments, similar to the way TCP/IP works. Back in my days as a software developer, anything you could do in software or could be done in hardware. It just costs more!
@@coachkevinyoung yeah. The innovation the Elon's team is taking advantage of is the ability to make chips much more cheaply than in previous times. I think your saying should be updated to say: You can do anything in hardware or software, but it'll cost you money or compute cycles.
Is this also the networking technology they use in the cybertruck?
No.
That was great
I think the bigger point on this information is the TTP (Tesla Transport Protocol) that is in hardware. I'm thinking about this and by the time you recognize the issue, think of possible options with current technology, design the chip and protocol and submit it for manufacturing, and test and get it into production, Tesla must of started this will over a years (or two) ago. They recognized and projected a bottleneck that they would not hit for a year of two and worked on a solution. This is a crazy amount of steps ahead of their competition that today didn't think is was possible. This is truly a company looking into the future to see what is needed now to remove roadblocks tomorrow. Crazy.
So often, Elon, or aka Elon companies, create geometric innovations in technology by applying basic physics principles to real-world conundrums that have alluded some of the biggest companies in technology for decades. 😂❤
I mean if your going to have a mantra first principles thinking is a pretty good one. 😊❤
alluded -> eluded
*eluded.
my first computer had 48K of ram total. it is a next step and i hope it works well.
Your first computer had 48K of RAM? That was huge compared to my first computer, a Commodore VIC-20 with a whopping 3K of RAM!
A lot of early computers had large programs segmented into 4096 byte blocks. The maximum addressable program+data space was 0 to 4095 (12 bits).
So a 16 bit virtual address space was a huge improvement, especially when each word addresses increased to stored 16 bits or two bytes.
Thanks for this great video, Dr Gibbs. Learned something new. :)
156.25 MHz may be the frequency, but that's only counting the up or down edge; if both edges are counted, the clock rate can be doubled
The doc helps us, Laura help the doc with a like and comment
I wonder if the whole Mixture of Experts approach was just an attempt to scale to more GPUs by abandoning coherence.
Great content. I think i missed SMR's video on this.😁
3:25 Your lossy system is your wife system? 😂😂😂
Most good video calls often use the lower layer UDP with a smart layer on top. TCP makes things worse.
Some of this smells like bullshit. I’m pretty sure we’ve been doing TCP/IP on server NIC hardware for about 15 years now some of these NIC’s are so powerful they can run VM’s on the NIC. And to be clear, your internet is hardware based, has been for the last 20/25 years or when ever it was that the Cisco BFR came out. Now a faster, higher frequency clock is a real improvement. Same with caching more on hardware. But again, it’s still incoherent and will crap out once your timing shift exceeds your clock frequency (minus the error rate). I really like the idea of the clock offset creating a wave of partitions that have enough bandwidth to complete their ops before the next wave of ops, re-use the same bandwidth. But a clock tic is a cycle, and what we can stick in that cycle depends on how well/fine we can divide the tic.
Sorry for all the edits, my train of thought is also incoherent.
YES. They did not eliminate the wall they just moved it to a point where it is almost non-relevant.
I suspect they are already doing this in the DOJO system.
Glad to signup and hungry for more!
I think having layer 3 ipv4, v6 in hardware is very common. Am I mistaken? Sure, years ago you could run that layer in software on say a Linux box, and that becomes your router, but nowadays everyone impliments that in hardware I think. There may be exceptions like software defined networks, but you'd never use a SDN for a high performance cluster or network.
Ford invented the production pipeline, Tesla/AI invented the AI training pipeline that’s constantly moving along, not a “step” at time but continuously flowing 🥳😱
He’s the reason why there is no sales on gpu’s at Best Buy ):
Power move would be to give this tech to OpenAI so he can crush them in a fair fight
I thought Elon’s words were 30 to 100 thousand and up to a “million” GPUs, not a “billion”. A billion is way too many, even for Elon’s optimistic past estimates.
I am bullish for $XAI600K and $NEXO only!!
Let's go XAI600K so much potential to go to the moon.
XAI600K will probably be involved
There is no spoon. 😊
Can this be used for training multi modal transformer based models?
As John explained, it would be totally unnecessary for LLM's!
@@gregbailey45 I am not thinking inference. i am thinking how much more powerful training can be
Set to XAI600K 2$
So the race is on who can spend the most and get them connected in mass? By submitting the pattend did the secret get out? Who has the best AI today and into the future? Was does this mean for power consumption? Should this back and forth save heat and therefore be more efficent? Where do the lost nodes go?
Watch it again.
And again.
And again...
this is bigger than google's gemini 2.0 or openai's pro mode releases. elon's computer clusters will give agentic generative workflows will have superior reasoning due to this coherency. 2025 will end up being a big big win for reasoning auto agents and agentic swarms.
So if this approach is the only way to string together hundreds of thousands of GP used together and Tesla holds the patterns does that mean? Tesla for the foreseeable future will be the only ones in town with super massive data centers
Patterns?
The next question is.... when are they selling the protocol in digital form like bitcoin? :-)
fpga for flexibilitiy?
XAI600K going up like crazy! Pick up around 0.67 and now it’s hit $1! I wish i had bought more!
Scam
Software is cheaper the silicon, so, that's why tcp/ip is software based. Wait until the cam put the entire nueral stack on silicon, instead of a inference acceleration device, the driving ai is a dedicated chip built from training, probably need something akin to a foga with a gpu vlwi structure. Also the ttp seems to be very buffer bloat avoidant 256k storage points right to it
you think this will trigger a massive demand for Nvidia GPUs from other players trying to follow XAi ??
Some vendors use an entire silicon die to hold n* GPU cores, communicating with n*n on-die photonic fabric channels. This is another excellent scalable technology. It reconfigures around failing cores and cache, so resiliency is great.
So…looking at the titles of this guy’s videos…it looks like he’s exclusively promoting Elon Musk. Tesla this and Tesla that.
And there is Willow Quantum chip
for research only
Why does it matter how fast they train if they are data constrained. (Tesla).
This is not about Tesla but xAI
@ Tesla is literally the first word in the video title.
@philipp594 Clickbait. xAI has very low Mindshare.
They found out that training longer on a given set of data will improve the inference dataset. The resulting inference neural net works better, faster. Training on faster hardware also helps.
@@philipp594
Many data constraints on training FSD can be reduced using generative AI to expand the existing datasets.
Every time you change the model for your AI, you have to train the new version of the model and then validate the model for performance while screening for new emergent errors and reemergent old errors. Then you need to make corrections to the new model and repeat the process.
It is much like the old days when program steps were punched into cards one card at a time. You did not learn you had an error until you waited overnight for the program to run. Reducing the time between runs to less than the time to correct an error changed computer programming forever.
We are still looking for a method to achieve the same transformation to the process of building AIs. At this time using AIs to design, generate data for, train, and validate new AIs holds the greatest promise to approach this transformation.
Where to buy XAI600K pls
Scam
Neat
if Tesla has patented this method how do the other players follow? China of course will just copy it.
When I look up META it says their computer has 600,000 GPUs. So where does your 32,000 limit come from?
Good question. Maybe they operate in parallel.
@@gregbailey45 I've heard of that, I think it's in Kansas.
These folks are really not the first to do this ditch Ethernet and do it in hardware.. Inmos OS and DS links (mod 80s) Fujitsu K computer (and its forebears), iBM BlueGene etc...Worth noting that INMOS DS links supported wormhole touting in the early 90s...
How come in that video on the all in pod they act like no one else has been able to get up to 100,000 gpus when meta has as far as I can tell from the headlines.
there could be a difference in the perfect sync method and less efficient gpu cluster architecture
hmm, DOGE should abolish the patent system. since it is only a scourge. completely useless for it's intended purpose. just a lawyer swamp
❤
Could an AI such as this hold off a computer virus?
What is a GPU here? A GPU chip and even many processor chips contain many individual GPUs. So in this context, is a GPU an entire Nvidia chip, for example, or just one of the GPUs in a chip?
This is warehouse level so a GPU should be one GPU rack or 8 GPU package each package contains multiple GPU chips.
When it works *_coherently,_* the entire building / datacenter contains a *_single huge GPU,_* no matter how many
• Cores,
• Pixel pipelines,
• Vertex shaders,
• Memory addresses,
• Texture mapping units, or
• Render output units
are contained in any chip, board, rack, or the entire datacenter.
@@imconsequetau5275 Uhm - the whole video is about how they broke the 30 thousand GPU limit. So that's 30,000 buildings? ;-)
Human brain
This video will cause or lead a Lawsuit against Elon Musk and XAI for sharing/stealing Tesla patent technology without Tesla board approval & proper payment/compensation
Tesla And XAi extremely likely to a contract in place already for how technology is shared as technology appears to flow in both directions
@allangraham970 A lawsuit has already filed against Elon & Tesla for apparently transferring Nivida GPUs from Tesla to XAI , so idk how many things r actually legally contracted & how much of it is Elon doing his own thing without thinking about bureaucracy
So Elon is doing patents now? Little loss of idealism, but life will do that to you.
Question:
Is Tesla making their own factory robots?
They bought a German company some time back that we haven't heard from in some time
Is the company they bought helping with Optimus? Unboxed Method?
7 minutes in and still all you've done is say they did it. You haven't said any word about how
Tesla is just a car company
Ive stared buying XAI600K ,and staked them.
I can’t continue to listen to you Bable about packets. Your mind is lost.
Do not sleep on XAI600K people
FSD when?
Where do you buy XAI600K?
Just swapped all of my last ETH and swapped it into XAI600K. Already up a little bit. Unfortunately I have some other junk staked which won’t free up for a while. Still now I am on the train!
Zzzzzzzzzzzzzzzzzz zzzzzzzzzzzzzzzzzzzzzzzzzz zzz
XAI600K all the wayyyy. For all of you that said its a ghost chain. You are wrong haha.
In your opinion, XAI600K for $10? 1 year or so?