This would be great for engineering calculations like CFD or FEA. I know these scale very well. But I would go with Xavier NX boards $400 each + $200 for the mate. Would be a nice compute machine. Each node capable of 8 TFlops!!
@@TECHNDJ I used an old 8 core Xeon to help rendering and computation from the main (i3 built for low idle) pc. Compute on demand. It really was great. Then I had a 1920X and the Xeon was no good anymore for that. Thinking of getting a 3000 series or 5000 series CPU with as many cores possible on the cheapest board possible, small SSD as cache/swap, fastest ram I can find, Intel 10Gbit cards yanked from eBay for cheap. You could make something along the same lines happen. Get an A520 (mind the vrm), a 5600/5800X or an Apu (saves getting that Nvidia 710), 3600 or faster memory, old SSD you have somewhere, or the cheapest dram cached nvme drive, 128GB is enough, and let it boot from the main rig. Hell you could just try and get the lowest cost i3-8100 from eBay as an alternative. This way you can try and find out if it is for you at all. Note though that I use a BSD for this, Linux should work fine too, Windows is reportedly possible for some programs but a major hassle and I would have no clue how to do it.
@@jamegumb7298 great! But most of my projects requires CUDA and Tensor operations. I would prefer to have a high end gpu with some midrange cpu for decent gaming too in vacations.
@@TECHNDJ You could certainly do that. 2 of the smallest mini-itx cases, Ryzen 3100 in, best card you can get. Or any Apu in a cheap thin mini-itx board with 2×nvme slot, and put Coral edge tpu in the M.2 positions, and one in an adapter in the pcie slot. That makes 3 per board. If it has an opening for WiFi that can fit one tpu as well.
Nice. If using Windows Terminal, you can split panes with a command such as: wt.exe -p "Ubuntu" ; split-pane -p "Ubuntu"; split-pane -H -p "Ubuntu" .. along with a Desktop shortcut for it.
I just thought it would be nice to see a 19 inch rack mount server style that runs out more slots than 4 like 16 units where you can use a variety of sub specializations, like ssd storage, single gpu or cpu cards from various makers. A blade setup could be done where 8 units array out in clusters and you could easily implement active water chilling cooling to keep all units well under normal operating temps and have it run rather quiet, than there's the idea for a half wide server cabinet to make a sort of mini server.
Would a fully loaded setup of xaviers make a good crypto mining rig, specifically $FLUX? I want to run verthash and it only requires a 2GB GPU, so I'm hoping this speeds up the rate of return.
Imagine a 1u 16 inch depth chassis, 20 Jetsons wide, 5 Jetsons deep, 100 jetsons total. 1500w, 600 xavier cores, 38.4 thousand Volta Cuda cores, 4800 Tensor cores 1600GB of -RAM- eMMC and 800GB of RAM Throw on a 40G network card and an internal 100Gbps switching motherboard and you're off to the races
Think there are a few good use cases for this kind of setup: I'd want to establish a workflow that didn't require custom code for each of those use cases tho. Some means of 'slaving' this compute to another platform which drove tasking would be my ideal scenario. Let's see what the interwebs gives me, if nothing I guess I'll hire someone.
GARY, SO NODES ARE THE INDIVIDUAL SLOTS FIR THE CPU'S? HOW MUCH POWER DRAW IF WANTING TO INSTALL UBUNTU WIT BITCOIN MINNING>? OR PASSWORD CRACKING WIT JOHN? OR AIRACRACK-NG
Gary. Awesome video! I have the Mate and I have it fully populated. Do you have a walkthrough for running the cluster on this? I've been able to do some neat stuff. I've tried some MPI from your other walkthrough but the crazy thing all the Nanos run the square root program individually and then report back. So it ends up taking longer. Lol! Instead of working as a cluster they are just each doing the program by themselves. So it works in a way. I'm missing something. I followed your walkthrough as instructed by I have failed. They all run it and all show pass. So yes I setup ssh-keygen and shared it as stated, I've been able to run the MPI with one IP and it runs on the GPU and I've been able to run it with the clusterfile containing all IPs and as I said they run but run the program individually and the time takes longer because it's waiting for each one to run the program by their self. It's so silly. If you can help it would be greatly appreciated. Thank you again for another great video!!
#garyexplains I have a question on this MPI setup. I am still getting four Nodes running but running in a wrong way. I tried (time mpiexec --mca btl_tcp_if_include 192.168.86.60/4 --hostfile clusterfile ./simpleMPI) and the same thing. All 4 nodes run the program individually and all pass. The "time" statement even shows what would be the result in the amount of time it would take for all four to run the program individually instead as a proper cluster, dividing the program up into four parts and giving my answer in a fraction of the single nodes time. If you need any question answered for you to help point me in the right direction please let me know. Thank you for your great videos!
I would like to know what your using to cluster everything together and divide up the workload. I've been considering a cluster like this for a while now for password cracking for pentest's but havn't a clue how to cluster it all together.
Sir, is it possible to have three Jetson Mates all with four nano's installed and joined together as one large cluster? Forgive me if I'm understanding is lacking, as I'm new to this and learning. For the most part I'm self taught, but I need a hand in my learning as it pertains to AI, clusters and gpu's. Thank you sir for your time. As a side note to explain what I'm trying to do, I'm interested in creating my own AI machine learning environment. I know that I can buy gpu's and create my own AI, however most of the options available are quite expensive, so I'm looking for a cost efficient way to accomplish my goals.
I just ordered the Jetson mate after watching this. Question, if I order the modules from Nvidia, do they come with the heat sink or do I have to buy the development kits to get the heat sink?
@@GaryExplains Ah, I see. That straightens out those crossed wires, thanks. But to the thrust of the question, how does the performance stack up with a traditional rig on a price by price basis?
Well, that is too much of an open question for two reasons. 1. The point of the Jetson Mate is not just the processing power but also as a testing/learning environment for MPI and HPC. The point is that it has 4 nodes. How you program 4 nodes is very different to how you program a single node. 2. What are the specs of your theoretical $600 machine. Especially with regards to GPU.
Does there need to be a different jumper for TX2 NX SOMs? The Jetson TX2 NX i got from Seeed Studio wouldnt power on with the Jetson Mate Cluster Now, Jetson Nanos work just fine in the Jetson Mate Cluster
I have flashed JetPack 4.6 in my B01 DevKit Module using Etcher which my SoMs are TF version. I can only access Master module not the others. how could i see all the device connected together with Jtop ? thanks
How fast is the backplane that the Jetsons plug into on the Mate? Just curious how fast the individual Jetsons can talk to each other internally on the Mate verses over 1Gb Ethernet.
@@GaryExplains That's such a shame. Since Jetson is becoming such a great platform/environment for CV, having the backplate essentially be an NVLink or SLI equivalent would have been really nice to take better advantage of CUDA.
I could and I have covered the original NVIDIA Jetson Orin Nano Dev kit here on this channel, but the Jetson Mate took 4 modules, the NVIDIA JONS dev kit only takes 1 module.
Photogrammatory doesn't require CUDA. You can use OpenMVG + OpenMVS on AMD. There were more alternatives for amd gpus but I do not remember them since this setup was fastest when I researched this topic.
@@venisonsteak8437 it doesn't require cuda, sure, but it helps and makes it faster, VisualSFM doesn't have to have cuda, but it helps, meshroom also has a version without cuda but it is almost unusable, i'm aware it's not needed, but that wasn't my question ;) in a time with limited access to new GPUs this could be a direct option as alternative
The gaps are probably the latency from switching to the gpu, only way to truly optimize that is probably to run a different program at the same time I guess
With my fan on max, all around the bottom/side there is loads of air blowing out. So from the top blowing down and coming out all around the lower part of the case.
The cards should be connected with a more sophisticated manner than ethernet, it is slow when you try to use MPI that has communication. They need to make a pci express controller that connects all 4. Then I will consider buying.
Actually HyperTransport is way more suitable for this purpose. And there's far more experience using it between multiple cores because AMD was using it as on-chip bus long before PCIe was specified. It was before that a Broadcom switch bus.
Have you thought of a project as to where yiu could add, a solar panel to this NVIDIA cluster and cee at max load for resiliancy in bitcoin minning? So revenue would be work 24/7 without bustin you walket on energy bill?
@@YounesLayachi Really good point. I went and looked it up. Looks like it's a jetson thing, rather than a generalized one... github.com/rbonghi/jetson_stats
What can you do with that, how does it compare to other computers. Do you have any benchmarks? It costs over 2000$? I'd like to have some perspective. How does it compare to the Apple M1 for example? How super is it? In terms of size, cost, performance, versatility, power consumption? Or it is just fun because it helps us understand how real supercomputers work? I'm genuinely confused. But that is a great video! I'm just having a lot of questions!
@Johnny Car Ok, that would be great, but can you add others? There are kind of 4 in this one. How powerful is it and what could be done with it? I lack perspective on this one. Is it comparable to an i7 with a good graphic card but smaler, arm based and very small and power efficient?
The point of the Jetson Mate is not just the processing power but also as a testing/learning environment for MPI and HPC. The point is that it has 4 nodes. How you program 4 nodes is very different to how you program a single node. There are examples of using modules like these (including Raspberry Pi boards) to build clusters with hundreds of nodes.
What sort of workloads could this be capable of (maybe just using the 2gb jetson nano modules) and would ti be possible to mine some form of crypto (especially scrypt based coins) using it?
@@RunTheTape no I wont? It's a perfectly reasonable question when I want to do stuff like this for a hobby. Just trying to learn more about this stuff that I dont understand. It's not cancer, its curiosity.
@@_._shinonome_._ here’s my answer. Jetson nano and its family was built for young engineers and passionate tinkerers, to be used for AI deep learning and computer vision stuff. Like for robots, self driving, object recognition etc. Using them outside their scope for something as idiotic and senseless as crypto coin minining would be not only a waste of energy but truly insulting to the great minds that invented those electronic components. As it is with the GPUs too. Good bye.
so, do I need 4 nano kits or 4 modules? Are the individual heat sinks required or not? HUH? HUH? If they are required, and I buy 4 nano boards, where the heck do I get the 4 heatsinks? Just a really dumb implementation question....
You have a 256 long password! Wow, I bet that is hard to remember! As I show it only takes a few minutes to brute force a small string like a password. This is what hackers who have access to stolen databases do everyday, plus rainbow tables of course. But how were the rainbow tables created, like this!
@@GaryExplains Okay! small string like a password maybe cracked! how many stupid people write short passwords? Long passwords maybe a requirement for preventing the crack!
3:40. At the risk of coming off as pedantic, this actually isn't possible. The nature of hashing means the operation is really truly one directional, and there is absolutely no way to reliably reproduce the original data. All you are doing when you are attempting to brute force a hash like that is looking for any value which produces the hash, which there is, quite literally, an infinite number of values which will produce the same hash. So in the case of cracking a password, that probably doesn't matter, as you simply want to get past the security mechanism. But let's say you took a hash of a great novel, the odds of whatever hash collision you found producing the original work is basically zero. This becomes even more the case as the original input data increase in size, as you are quite likely to come up with a different hash collision before you iterate upon the original source data. Beyond that, even if you do happen to come across the original source data, there is no way for you to know it, as again, there are an infinite number of input values which produce the same hash, and you don't know that you happened to guess the same one that produced the hash in question.
@@GaryExplains I feel like you're missing the point. But any value that hashes out the same would pass as the password, assuming it doesn't exceed the field limit (as the field limit would reject strings too large). Which SHA256 is a cryptographic hash, basically meaning that it tries to avoid clumping (see birthday paradox), so with a suitably constrained input domain matched with a suitably large hash domain, you may actually manage to eliminate the possibility of collisions within that domain mapping. However, that doesn't change the FACT hashing, by its very nature, is a one direction process. I'm not arguing that one can crack passwords by iterating over the available domain space and comparing the hashes. I'm pointing out that your comments of "wanting to find out what string generated the hash" is not only "hard" it's mathematically impossible. When you use the word "hard" in computer science, it means possible but difficult; it's not only hard, it's not possible. That is, you can never have complete certainty that the string you found is the same string used. Though again, given a suitably constrained input domain (especially one which is fully enumerable) to a suitably large hash domain, then it would be possible to prove the mapping, by enumerating the entire input domain and proving no collisions exist. (Though, that still isn't a bidirectional relationship, as there is no function to reverse the mapping.) Not that you said that. In fact, what you did say was, "easy to go from the source data to the hash, but from the hash back to the source data is very very hard." Well, that's just fundamentally misrepresenting hashing, as there is no relationship back, AND for any input domain that is larger than the hash domain (which is most applications) it is utterly impossible to even implement the "guess and check" to find the original source. (Assuming you actually care about finding the true original.) Maybe you think it's no big deal, however, there are very practical reasons why this matters a great deal. Let's consider the possibility that you have an ISO for a Linux distribution, and you want to check the SHA256 for the ISO to make sure it hasn't been tampered with. You will notice that your SHA256 hash is only 256bits long, which means, it can't UNIQUELY map more than 256bits of data. (Also interesting, because it's a cryptographic hash, it's certain to have multiple collisions within just the 256bit space which it itself exists within, meaning its ability to do a unique mapping is only for spaces much smaller.) Which means any amount of data over 256bits has to start doubling up on positions within the 256bit address space. (How do you uniquely represent the 257th bit of data?) Your Linux ISO may be 9GB, or more than 37 million times as much data as the 256bit hash space. That means, it is mathematically possible to alter the ISO to include some malicious code, and then to pad the ISO in such away to produce the same hash. (This would be "HARD" to do against SHA256, but "possible.") Now, I've never seen this successfully done with SHA256, but I have seen it done with MD5. In fact, I saw it done in such a way that the bit count worked out to be the same. The alteration was only detectable by applying a different hashing function. (Or by doing a bit by bit comparison.) The way you discussed the topic, though I know it's only meant to be illustrative of a clustering/programming problem, implies that there is some sort of bidirectional relationship between hashing and the original data. If the relationship was bidirectional, it wouldn't be a hash. It would be cryptography. Where you could reverse the process given the key. Where each unique input produces one very specific unique output. However, a hash produces an infinite number of collisions for every single specific hash address. Your code is simply iterating until it finds one, whether it's the original string or not. And if the allowed strings are 64 characters or longer, there's almost certainly multiple possible collisions. Though even baring that, it doesn't change the absence of a bidirectional relationship.
Dude, your reply is too long and to be honest I didn't read it. I understand how hashes work. I guess I assumed a level of knowledge about hashes and how passwords are stored etc, as I have covered this subject several times before. So my bad for not being explicit enough for beginners.
ASIC stands for Application-specific integrated circuit. It basically means a special chip which does a specific job, so in this case it does the hashing algorithms, and does them fast. It is highly specialized. You could try making your own, or you could buy a rig. But they are expensive.
They easily could. They could slap a HyperTransport bus together - already their on-chip bus and long proven in Broadcom switches...modularize it so you can talk to it direct or break it into PCIe lanes accessed via NVME...or hell invent NVME backplane native, why not. It's not like an SSD cares what core/CPU it's talking to.
@@crhu319 In light of Monero do you know if there is code available to say activate each node in the Jetson Nano cluster to mine Monero and have each node forward hash rates to the master and then the master forwards the combined hash rates to the mining pool? Additionally, since the Nano does have CUDA cores there is source code for XMRIG to activate the GPU in order to utilize not only the CPU but GPU cores on the device.
I don't have any PoE equipment at home and I guess that isn't a common thing in many people's houses. If you go PoE only then you restrict the potential market.
@@GaryExplains every PoE device I've seen also has an alternate power option like a barrel connector. It would be nice if PoE was more commonly included as an option alongside USB powered devices.
It's a PD type C power supply. I use a 90 watt PD power supply. But you need to look at the specs. There is a wiki for the Mate that will point you to what they recommend.
Hi, i got cluster mini, thank you somuch for nice product. i m planning to have 2 jetson nano, 1 is master and another is worker. i followed wiki guide all executed, but i want to run few programs on both nanos and see performance. i saw video of ruclips.net/video/nWzcEUj0OHc/видео.html i need info on how he run on specific nodes, combining all nodes etc, please help me in this regard, thanks
you should publish software regardless because an imperfect program is always good learning material for learning programmer
This would be great for engineering calculations like CFD or FEA. I know these scale very well. But I would go with Xavier NX boards $400 each + $200 for the mate. Would be a nice compute machine. Each node capable of 8 TFlops!!
Do u think this setup can out perform my x86 beast rocking 6C12T ryzen 3600 and RTX 2060 Super?
@@TECHNDJ nope. its better to use a x86_64 ryzen 5 than 4 of these.
@@TECHNDJ I used an old 8 core Xeon to help rendering and computation from the main (i3 built for low idle) pc. Compute on demand. It really was great.
Then I had a 1920X and the Xeon was no good anymore for that.
Thinking of getting a 3000 series or 5000 series CPU with as many cores possible on the cheapest board possible, small SSD as cache/swap, fastest ram I can find, Intel 10Gbit cards yanked from eBay for cheap.
You could make something along the same lines happen. Get an A520 (mind the vrm), a 5600/5800X or an Apu (saves getting that Nvidia 710), 3600 or faster memory, old SSD you have somewhere, or the cheapest dram cached nvme drive, 128GB is enough, and let it boot from the main rig. Hell you could just try and get the lowest cost i3-8100 from eBay as an alternative. This way you can try and find out if it is for you at all. Note though that I use a BSD for this, Linux should work fine too, Windows is reportedly possible for some programs but a major hassle and I would have no clue how to do it.
@@jamegumb7298 great! But most of my projects requires CUDA and Tensor operations. I would prefer to have a high end gpu with some midrange cpu for decent gaming too in vacations.
@@TECHNDJ You could certainly do that. 2 of the smallest mini-itx cases, Ryzen 3100 in, best card you can get. Or any Apu in a cheap thin mini-itx board with 2×nvme slot, and put Coral edge tpu in the M.2 positions, and one in an adapter in the pcie slot. That makes 3 per board. If it has an opening for WiFi that can fit one tpu as well.
Just a suggestion: Put your website in the description too, would make it much easier to sign up for the newsletter.
Great video, as always!
*GARY!!!*
*Good Afternoon Professor!*
*Good Afternoon Fellow Classmates!*
MARK!!!
Hi Mark!!
@@worthlessguy7477 *HEMANTH!*
Nice. If using Windows Terminal, you can split panes with a command such as:
wt.exe -p "Ubuntu" ; split-pane -p "Ubuntu"; split-pane -H -p "Ubuntu"
.. along with a Desktop shortcut for it.
I just thought it would be nice to see a 19 inch rack mount server style that runs out more slots than 4 like 16 units where you can use a variety of sub specializations, like ssd storage, single gpu or cpu cards from various makers. A blade setup could be done where 8 units array out in clusters and you could easily implement active water chilling cooling to keep all units well under normal operating temps and have it run rather quiet, than there's the idea for a half wide server cabinet to make a sort of mini server.
Would a fully loaded setup of xaviers make a good crypto mining rig, specifically $FLUX? I want to run verthash and it only requires a 2GB GPU, so I'm hoping this speeds up the rate of return.
That would be quite interesting to see!
I really thought it was a camping cooktop!
LOL
Kinda, when it got hotter and no cooling system
run BFGminer at full throttle..i'm pretty sure you can cook something on it :)
same 😅
Imagine a 1u 16 inch depth chassis, 20 Jetsons wide, 5 Jetsons deep, 100 jetsons total.
1500w, 600 xavier cores, 38.4 thousand Volta Cuda cores, 4800 Tensor cores 1600GB of -RAM- eMMC and 800GB of RAM
Throw on a 40G network card and an internal 100Gbps switching motherboard and you're off to the races
Are you using SHA2 cpu extensions? 6-7x acceleration is possible.
good to know what's in your Jetson, mate!
Will be great if you show exactly how to install , configure the software and run your test...
Think there are a few good use cases for this kind of setup: I'd want to establish a workflow that didn't require custom code for each of those use cases tho.
Some means of 'slaving' this compute to another platform which drove tasking would be my ideal scenario.
Let's see what the interwebs gives me, if nothing I guess I'll hire someone.
can you render blender stuff with this
GARY, SO NODES ARE THE INDIVIDUAL SLOTS FIR THE CPU'S? HOW MUCH POWER DRAW IF WANTING TO INSTALL UBUNTU WIT BITCOIN MINNING>? OR PASSWORD CRACKING WIT JOHN? OR AIRACRACK-NG
I cover that in my Jackery 1000 video.
Gary. Awesome video! I have the Mate and I have it fully populated. Do you have a walkthrough for running the cluster on this? I've been able to do some neat stuff. I've tried some MPI from your other walkthrough but the crazy thing all the Nanos run the square root program individually and then report back. So it ends up taking longer. Lol! Instead of working as a cluster they are just each doing the program by themselves. So it works in a way. I'm missing something. I followed your walkthrough as instructed by I have failed.
They all run it and all show pass. So yes I setup ssh-keygen and shared it as stated, I've been able to run the MPI with one IP and it runs on the GPU and I've been able to run it with the clusterfile containing all IPs and as I said they run but run the program individually and the time takes longer because it's waiting for each one to run the program by their self. It's so silly. If you can help it would be greatly appreciated.
Thank you again for another great video!!
This is brilliant Gary! Is this good for video rendering?
#garyexplains I have a question on this MPI setup. I am still getting four Nodes running but running in a wrong way. I tried (time mpiexec --mca btl_tcp_if_include 192.168.86.60/4 --hostfile clusterfile ./simpleMPI) and the same thing. All 4 nodes run the program individually and all pass. The "time" statement even shows what would be the result in the amount of time it would take for all four to run the program individually instead as a proper cluster, dividing the program up into four parts and giving my answer in a fraction of the single nodes time.
If you need any question answered for you to help point me in the right direction please let me know.
Thank you for your great videos!
how did you get the cpu & gpu usge things, whats the command?
I would like to know what your using to cluster everything together and divide up the workload. I've been considering a cluster like this for a while now for password cracking for pentest's but havn't a clue how to cluster it all together.
I don't know what this is but now I want one on my desk.
Would this be viable for ML, particularly Dask clusters?
Sir, is it possible to have three Jetson Mates all with four nano's installed and joined together as one large cluster? Forgive me if I'm understanding is lacking, as I'm new to this and learning. For the most part I'm self taught, but I need a hand in my learning as it pertains to AI, clusters and gpu's. Thank you sir for your time. As a side note to explain what I'm trying to do, I'm interested in creating my own AI machine learning environment. I know that I can buy gpu's and create my own AI, however most of the options available are quite expensive, so I'm looking for a cost efficient way to accomplish my goals.
I just ordered the Jetson mate after watching this. Question, if I order the modules from Nvidia, do they come with the heat sink or do I have to buy the development kits to get the heat sink?
Modules have no heatsink
So, about 200USD for the motherboard and about 100 for each of the nanos. How would this compare with a 600USD old fashioned server?
Jetson Nano NX is like $400 here in my place, from initial search
@@someoneyouneverknow7529 oh wow, a quick google gave me 100 but likely that was mea culpa and for something else. So we're taking an expensive rig.
There is a big difference between the Nano and the Xavier NX. One of you is quoting the Nano price the other is quoting the Xavier NX price.
@@GaryExplains Ah, I see. That straightens out those crossed wires, thanks.
But to the thrust of the question, how does the performance stack up with a traditional rig on a price by price basis?
Well, that is too much of an open question for two reasons. 1. The point of the Jetson Mate is not just the processing power but also as a testing/learning environment for MPI and HPC. The point is that it has 4 nodes. How you program 4 nodes is very different to how you program a single node. 2. What are the specs of your theoretical $600 machine. Especially with regards to GPU.
That's the seeed studio case for the recomputer.
Does there need to be a different jumper for TX2 NX SOMs?
The Jetson TX2 NX i got from Seeed Studio wouldnt power on with the Jetson Mate Cluster
Now, Jetson Nanos work just fine in the Jetson Mate Cluster
I have flashed JetPack 4.6 in my B01 DevKit Module using Etcher which my SoMs are TF version. I can only access Master module not the others. how could i see all the device connected together with Jtop ?
thanks
is it possible to reduce fps during inference?
mini super computer. such a cute name.
Where Can I find a power supply for the Jets to make
Is there any practical usage? I want to buy it too. :> Multiple object detection of many camera inputs?
Does it boot only with SD card or is their any other option available ?
I am in love. Need that thing
What's the price vs power comparison between this and a cheap option?
Is there any way to switch off the PWM fan's light ???
How fast is the backplane that the Jetsons plug into on the Mate? Just curious how fast the individual Jetsons can talk to each other internally on the Mate verses over 1Gb Ethernet.
I don't think there is an interconnection over the backplane, they only communicate over Ethernet.
@@GaryExplains That's such a shame. Since Jetson is becoming such a great platform/environment for CV, having the backplate essentially be an NVLink or SLI equivalent would have been really nice to take better advantage of CUDA.
Hi, how about memory for this ? is it individual modules i mean can we connect SD card or SSD to each modules ?
Can you do a similar setup for the new Nvidia JONS?
I could and I have covered the original NVIDIA Jetson Orin Nano Dev kit here on this channel, but the Jetson Mate took 4 modules, the NVIDIA JONS dev kit only takes 1 module.
Please share some benchmarks like Geekbench and 3DMark and some gaming benchmarks
Geekbench and 3DMark are unable to take advantage of distributed systems like the Jetson Mate.
So Photogrammetry depends alot on CUDA cores (so AMD GPUs are unfortunately useless) can this rig be used for that if the software were to support it?
Photogrammatory doesn't require CUDA cores.
Photogrammatory doesn't require CUDA. You can use OpenMVG + OpenMVS on AMD. There were more alternatives for amd gpus but I do not remember them since this setup was fastest when I researched this topic.
@@venisonsteak8437 it doesn't require cuda, sure, but it helps and makes it faster, VisualSFM doesn't have to have cuda, but it helps, meshroom also has a version without cuda but it is almost unusable, i'm aware it's not needed, but that wasn't my question ;) in a time with limited access to new GPUs this could be a direct option as alternative
The gaps are probably the latency from switching to the gpu, only way to truly optimize that is probably to run a different program at the same time I guess
how is the air supposed to flow out? i cant see holes on the side of the case where the air - thats pressed from top with the big fan - can escape?!?
There are plenty of ventilation holes. I guess it isn't clear on the shots I took.
With my fan on max, all around the bottom/side there is loads of air blowing out. So from the top blowing down and coming out all around the lower part of the case.
Cute :) thx for the work.
Can they run handbrake in any meaningfull way vs a ryzen? Just currious :)
Yes Gary. Explain! 😄😄
The cards should be connected with a more sophisticated manner than ethernet, it is slow when you try to use MPI that has communication.
They need to make a pci express controller that connects all 4. Then I will consider buying.
Nothing wrong with 100Gbps Ethernet as a backplane.
Actually HyperTransport is way more suitable for this purpose.
And there's far more experience using it between multiple cores because AMD was using it as on-chip bus long before PCIe was specified. It was before that a Broadcom switch bus.
Mr gary... can we using rasperry p4 ram 8gb for programing
I would use it as a BOINC compute node, or a cryptocurrency mining node!
Boinc is nice
I don't get this - would this be suitable as a PC for 3D modelling etc?
Hi Gary. At 0:30 did you say Jensen?!?
Lol, yes! 🤦♂️
@@GaryExplains it could be Jensen's hobby project :)
how does this compare to the Jetson Xavier AGX?
Who makes a comparable board for the AGX modules?
Have you thought of a project as to where yiu could add, a solar panel to this NVIDIA cluster and cee at max load for resiliancy in bitcoin minning? So revenue would be work 24/7 without bustin you walket on energy bill?
Yes. See my Jackery 1000 video.
Hey Gary. What Power Supply use you for the Jetson Mate?
I used the PinePower which I reviewed here on this channel.
@@GaryExplains thanks for the help.
Thanks Garry you RoKK!!
Looks like you're using Windows Terminal, why you're opening four windows, when you can split it? Also you can use tmux for that
Just a suggestion
What was that htop like program used for knowing the status and usage of everything..
It is called jtop and I cover it in the written instructions plus I mention it in the other NVIDIA Supercomputer video.
@@GaryExplains Thanks :-). I thought it was a generic opensource program. I think im still stuck with nvtop in PC.
What monitoring app are you using? I haven't seen that before!
It says at the top "jtop nano"
@@YounesLayachi Really good point. I went and looked it up. Looks like it's a jetson thing, rather than a generalized one... github.com/rbonghi/jetson_stats
I cover jtop in the setup instructions, link is the in description. I also mention it in the other supercomputer video.
Could this host 4 or more virtual machines
What can you do with that, how does it compare to other computers. Do you have any benchmarks? It costs over 2000$? I'd like to have some perspective. How does it compare to the Apple M1 for example? How super is it? In terms of size, cost, performance, versatility, power consumption? Or it is just fun because it helps us understand how real supercomputers work? I'm genuinely confused. But that is a great video! I'm just having a lot of questions!
@Johnny Car Ok, that would be great, but can you add others? There are kind of 4 in this one. How powerful is it and what could be done with it? I lack perspective on this one. Is it comparable to an i7 with a good graphic card but smaler, arm based and very small and power efficient?
The point of the Jetson Mate is not just the processing power but also as a testing/learning environment for MPI and HPC. The point is that it has 4 nodes. How you program 4 nodes is very different to how you program a single node. There are examples of using modules like these (including Raspberry Pi boards) to build clusters with hundreds of nodes.
Thank you!
What sort of workloads could this be capable of (maybe just using the 2gb jetson nano modules) and would ti be possible to mine some form of crypto (especially scrypt based coins) using it?
OMG. Jeeez you guys will never give up this cancer use of technology, that is mining.
@@RunTheTape no I wont? It's a perfectly reasonable question when I want to do stuff like this for a hobby. Just trying to learn more about this stuff that I dont understand. It's not cancer, its curiosity.
@@_._shinonome_._ It's making our planet more hostile and hot with no real use in return.
@@RunTheTape you do realise I just want to know for a hobby, if you dont have an answer to my comment then leave
@@_._shinonome_._ here’s my answer. Jetson nano and its family was built for young engineers and passionate tinkerers, to be used for AI deep learning and computer vision stuff. Like for robots, self driving, object recognition etc. Using them outside their scope for something as idiotic and senseless as crypto coin minining would be not only a waste of energy but truly insulting to the great minds that invented those electronic components. As it is with the GPUs too. Good bye.
Killer. Can it do Playstation2 emulation?
How is this thing handling concurrency?
In what sense?
What is the system monitor program?
It is called jtop, it is specifically for the Jetson boards. More details in the written instructions.
For a second I thought this was an April Fools' joke
so, do I need 4 nano kits or 4 modules? Are the individual heat sinks required or not? HUH? HUH? If they are required, and I buy 4 nano boards, where the heck do I get the 4 heatsinks? Just a really dumb implementation question....
The cheapest way to get a 4 Nanos is to buy 4 kits and then you get the heatsinks as well.
@@GaryExplains So I found out! Thanks!!!
Did you plug in Nanos or Xavier GPU's?
Did you watch the video?
@@GaryExplains I did at work, must have been with the boss.
Ah, ok. Well I used 3 Nano modules and one Xavier NX. You can mix and match in the Jetson Mate.
@@GaryExplains I am working on a Lotto wheel and need to do literally 667 Trillion calculations(25,827,165^2). This might be fast enough to do it! 😊
I wonder how I can have this used with JTR.
JTR the Swedish boy band? JTR the Joint Travel Regulations?
@@GaryExplains The swedish boy band of course.
Great Video!
Glad you enjoyed it
That's cool and all, but my several year old GTX 1080 Ti can guess 4,426,400,000 combinations per second on SHA256 hashes.
Sure. But the point is to learn how to use distributed computing without needing to spend big money.
@@GaryExplains yeah that's true. Plus in sure that uses far less power.
In theory and practice, don't try to do brute force for finding matched sha256 hashes. Don't fool you!
Eh?
@@GaryExplains Total of combinations is 2^256 = 1.157920e+77. It's a fold of many digits, unreachable by a single supercomputer all a year.
You have a 256 long password! Wow, I bet that is hard to remember! As I show it only takes a few minutes to brute force a small string like a password. This is what hackers who have access to stolen databases do everyday, plus rainbow tables of course. But how were the rainbow tables created, like this!
You do realise that to brute force crack a sha256 hash you don't try all the combinations of the hash, right?
@@GaryExplains Okay! small string like a password maybe cracked! how many stupid people write short passwords? Long passwords maybe a requirement for preventing the crack!
But can it play minesweeper?
Very cool
But can it run crysis?
Wondering the same question
It can play Doom 3. See my review video of the Xavier NX.
I wonder if one could play games on these cluster.
No, not as a cluster.
Nice!
publish early, publish often.
I thought he was going to do the whole hash out loud for a while
What the hell, I though it's an stove on thumbnail
I bought a Jetson mate and I can’t find a Power supply
3:40. At the risk of coming off as pedantic, this actually isn't possible. The nature of hashing means the operation is really truly one directional, and there is absolutely no way to reliably reproduce the original data. All you are doing when you are attempting to brute force a hash like that is looking for any value which produces the hash, which there is, quite literally, an infinite number of values which will produce the same hash. So in the case of cracking a password, that probably doesn't matter, as you simply want to get past the security mechanism. But let's say you took a hash of a great novel, the odds of whatever hash collision you found producing the original work is basically zero. This becomes even more the case as the original input data increase in size, as you are quite likely to come up with a different hash collision before you iterate upon the original source data.
Beyond that, even if you do happen to come across the original source data, there is no way for you to know it, as again, there are an infinite number of input values which produce the same hash, and you don't know that you happened to guess the same one that produced the hash in question.
I was referring to passwords.
@@GaryExplains My point stands.
Yes, for hashes of large data, of course, no one has ever suggested otherwise. But for passwords stored in databases this method is used every day.
@@GaryExplains I feel like you're missing the point. But any value that hashes out the same would pass as the password, assuming it doesn't exceed the field limit (as the field limit would reject strings too large). Which SHA256 is a cryptographic hash, basically meaning that it tries to avoid clumping (see birthday paradox), so with a suitably constrained input domain matched with a suitably large hash domain, you may actually manage to eliminate the possibility of collisions within that domain mapping. However, that doesn't change the FACT hashing, by its very nature, is a one direction process.
I'm not arguing that one can crack passwords by iterating over the available domain space and comparing the hashes. I'm pointing out that your comments of "wanting to find out what string generated the hash" is not only "hard" it's mathematically impossible. When you use the word "hard" in computer science, it means possible but difficult; it's not only hard, it's not possible. That is, you can never have complete certainty that the string you found is the same string used. Though again, given a suitably constrained input domain (especially one which is fully enumerable) to a suitably large hash domain, then it would be possible to prove the mapping, by enumerating the entire input domain and proving no collisions exist. (Though, that still isn't a bidirectional relationship, as there is no function to reverse the mapping.) Not that you said that. In fact, what you did say was, "easy to go from the source data to the hash, but from the hash back to the source data is very very hard." Well, that's just fundamentally misrepresenting hashing, as there is no relationship back, AND for any input domain that is larger than the hash domain (which is most applications) it is utterly impossible to even implement the "guess and check" to find the original source. (Assuming you actually care about finding the true original.)
Maybe you think it's no big deal, however, there are very practical reasons why this matters a great deal. Let's consider the possibility that you have an ISO for a Linux distribution, and you want to check the SHA256 for the ISO to make sure it hasn't been tampered with. You will notice that your SHA256 hash is only 256bits long, which means, it can't UNIQUELY map more than 256bits of data. (Also interesting, because it's a cryptographic hash, it's certain to have multiple collisions within just the 256bit space which it itself exists within, meaning its ability to do a unique mapping is only for spaces much smaller.) Which means any amount of data over 256bits has to start doubling up on positions within the 256bit address space. (How do you uniquely represent the 257th bit of data?) Your Linux ISO may be 9GB, or more than 37 million times as much data as the 256bit hash space. That means, it is mathematically possible to alter the ISO to include some malicious code, and then to pad the ISO in such away to produce the same hash. (This would be "HARD" to do against SHA256, but "possible.") Now, I've never seen this successfully done with SHA256, but I have seen it done with MD5. In fact, I saw it done in such a way that the bit count worked out to be the same. The alteration was only detectable by applying a different hashing function. (Or by doing a bit by bit comparison.)
The way you discussed the topic, though I know it's only meant to be illustrative of a clustering/programming problem, implies that there is some sort of bidirectional relationship between hashing and the original data. If the relationship was bidirectional, it wouldn't be a hash. It would be cryptography. Where you could reverse the process given the key. Where each unique input produces one very specific unique output. However, a hash produces an infinite number of collisions for every single specific hash address. Your code is simply iterating until it finds one, whether it's the original string or not. And if the allowed strings are 64 characters or longer, there's almost certainly multiple possible collisions. Though even baring that, it doesn't change the absence of a bidirectional relationship.
Dude, your reply is too long and to be honest I didn't read it. I understand how hashes work. I guess I assumed a level of knowledge about hashes and how passwords are stored etc, as I have covered this subject several times before. So my bad for not being explicit enough for beginners.
But can it run Crysis? :)
Clearly arm is future
It's been the future since it's been able to run smartphones, ie 2 decades
He forgot to mention it can even have an HDD stack
It looks like a camping stove. Can it make some soup?
also wtf is an ASIC Miner and why can we make our own ?
ASIC stands for Application-specific integrated circuit. It basically means a special chip which does a specific job, so in this case it does the hashing algorithms, and does them fast. It is highly specialized. You could try making your own, or you could buy a rig. But they are expensive.
👍
Run Hashcat Benchmark on it!!!
Noob question can this work to mine a crypto?
A hand calculator and a pencil can be used to mine crypto.
I dont really see the point of this as you may get a nvidia card that has 10000 cores and 320 tensor cores for about USD 1800.
Better question would be, does AMD have LITERALLY ANY ANSWER to this obvious gap in their offerings? I mean, they are nvidia's chief competitor.
They easily could. They could slap a HyperTransport bus together - already their on-chip bus and long proven in Broadcom switches...modularize it so you can talk to it direct or break it into PCIe lanes accessed via NVME...or hell invent NVME backplane native, why not. It's not like an SSD cares what core/CPU it's talking to.
As usual, great presentation. What about mining cryprocurrency, i.e. Bitcoin, Litecoin, Monero, etc.
Monero yes, it is deliberately made easier to.mine on CPU and very hard on GPU/ASIC. That's to avoid creating ewaste.
@@crhu319 In light of Monero do you know if there is code available to say activate each node in the Jetson Nano cluster to mine Monero and have each node forward hash rates to the master and then the master forwards the combined hash rates to the mining pool?
Additionally, since the Nano does have CUDA cores there is source code for XMRIG to activate the GPU in order to utilize not only the CPU but GPU cores on the device.
Great video Gary! But does it mine Bitcoin? LOL
Watch my Jackery 1000 review for the answer.
@@GaryExplains Self promoting answer, very well played mate! :-P
But can it run Windows? owo
No. Plus who has ever heard of a supercomputer running Windows!!! 🤣
@@GaryExplains 🤣🤣
Why the hell power it by USB-C ? Power over Ethernet can handle that wattage. So sick of unnecessary cables.
Data centers are all PoE not USB-C.
I don't have any PoE equipment at home and I guess that isn't a common thing in many people's houses. If you go PoE only then you restrict the potential market.
@@GaryExplains every PoE device I've seen also has an alternate power option like a barrel connector. It would be nice if PoE was more commonly included as an option alongside USB powered devices.
How would you go mining Bitcoin?
@Michael Gee mining is cancer mate. Go eat some magic internet money.
JETSON NANO *
Eh?
65Watts at 5VDC ? That's 13 Amps. In a USB cable ? Is that burning plastic I smell?
No. USB PD doesn't use 5V.
It's a PD type C power supply. I use a 90 watt PD power supply. But you need to look at the specs. There is a wiki for the Mate that will point you to what they recommend.
Hi,
i got cluster mini, thank you somuch for nice product.
i m planning to have 2 jetson nano, 1 is master and another is worker.
i followed wiki guide all executed, but i want to run few programs on both nanos and see performance.
i saw video of ruclips.net/video/nWzcEUj0OHc/видео.html
i need info on how he run on specific nodes, combining all nodes etc, please help me in this regard, thanks