@@512TheWolf512 Open source with an optional official support contract, has always seemed to be ideal to me. You get the benefit of multiple sources validating code, and the ability to pay for a disaster. The official provider cant stop people from making their own guides so if there is enough documentation online, you're smart enough and have enough time (The last one is basically never in a corporate environment) then you're golden. The owners get the added bonus of less staff due to the community's forks, fixing and updating code for you.
@@michallv Reminds WinRAR. Never once bought it for myself at home, but back when I was in high school, I remember talking to IT people and them saying that the school district needed a quote for a price to buy WinRAR.
It's also great for businesses that want to test run a software before committing the full price, companies will do everything possible to stick to "if it ain't broken don't fix it", even when it means having to deal with extremely outdated and annoying (but technically functional) software, but if they can test it for free before spending big bucks to upgrade, they are more likely to do it.
Idk if Linus or the staff will see this, but I had an idea for a video I thought would be useful (especially for myself). Have you thought about a video on PC maintenance? Idk if it has already been done, but something like "Here's what you should be doing for your PC health every month, 6 months, year, etc." Cleaning fans, reapplying thermal paste, updating BIOS/drivers, checking stressed connections, etc.
I was an adjunct professor teaching a class on microprocessor architectures. One of the students asked me about how to fix her machine that would run for a while and then give her a BSOD. (Blue Screen Of Death). I told her "I bet you have a cat." She was quite surprised, then asked "How did you know?" It was time for her to clean the insides of her PC, especially the CPU heatsink. She was expecting a software issue...
I've been running an HA cluster at home for a year or two now. The downside is Proxmox doesn't understand "Hey! We have a power outage and there's only 5 minutes left of the UPS." It really doesn't shut down nicely---the cluster will keep migrating VMs as you try to do clean power-off in the dark. For server-down maintenance, it's fantastic though.
Yes, because other than stated in this video, the purpose of this tech is not to help you with hardware failures, it's for load balancing. Won't do shit to keep your system running. It's useless for most people, who do not run a datacenter.
NUT is your best friend, i have it running on a PI hooked to a UPS, when it looses power it monitors runtime, if it gets too low, SAFE shutdown all VM's and then host.
0:01 That chair needs to be replaced before it sheds all over the office. Faux leather gets EVERYWHERE when it starts flaking off and once it starts flaking, it accelerates fast.
This was my senior thesis for university! I designed a distributed fault coincidence avoidance solution using Proxmox VE with DRBD as VM backing storage. It bounced the virtual machine across the machine cluster randomly to reduce the odds of a fault coinciding with the critical process. It technically outperforms VSphere FT (But is not technically a full tolerance solution so it's not necessarily comparable.)
That's so cool. Can I get the link to your paper or your LinkedIn? Lmao this seems so random, but I am also a graduate searching for good thesis topics to study.
The thing about docker containers that can resolve the issue that Jake mentioned: A: run a Virtual Machine just to host these docker containers and the that VM will migrate around the hosts as needed. B: run that container in Kubernetes. Then you can configure load balancing, scaling and other cool features.
Came here to make the Kubernetes comment. If you run everything in containers, you get all this shit for free with very little effort, especially if you run managed k8s like Rancher or one of the cloud managed k8s services.
I just wanted to comment to say that Jake you’re looking good, my dude! You mentioned before about losing weight and it’s clear you’ve lost some more :) keep up the amazing work!
Having spent as much time as I have in my career troubleshooting DRBD issues, getting calls in the middle of the night about dreaded split brain, etc. I would gladly trade some level of performance in order to not use DRBD. Realistically you should have a cluster for storage and a cluster for compute, and therefore not have to worry about using something like DRBD to keep things in sync. With that being said, it's nice to finally see LMG moving closer to enterprise level infrastructure!
I was tasked with setting up a redundant storage cluster for an existing vm computer cluster. Literally got stuck for months back and forth with drbd support trying to get it working. They have a small team amd their support is good. But I seemingly managed to stumble over every bug and problem with their software just setting it up. Gave up in the end and decided to try with Microsoft hypervisor...
I was already planning a very expensive upgrade to my home lab that will take a while to save up for and you show me this really cool stuff that I would also love to do. Please stop.
Three things: 1) I'm already running this at home with three OASLOA Mini PCs (which sports an Intel N95 processor (4-core/4-thread)) in each node, along with 16 GB of RAM, and a 512 NVMe SSD. The system itself has dual GbE NICs, so I was able to use one of them for the clustering backend, and then present the other interface as the front end. (Each node, at the time, was only like $154.) 2) My 3-node Proxmox HA cluster was actually set up in December 2023, specifically with Windows AD DC, DNS, and Pi-Hole in mind, but then ended up changing to AdGuard Home, after getting lots of DNS overlimit warnings/errors. (Sidebar: I just migrated my Ubuntu VM from one node to another over GbE. It had to move/copy 10.8 GiB of RAM over, so that took the most time. Downtime was in the sub-300ms range. Total time was about 160 seconds.) 3) 100 Gbps isn't *that* expensive anymore. The most expensive part will likely be the switch, if you're using a switch. (There are lower cost switches, in terms of absolute price, but if you can and are willing to spend quite a bit more, you can end up with a much bigger switch where you'd be able to put a LOT more systems on 100 Gbps network vs. getting a cheaper switch, but with fewer number of ports overall. I run a 36-port 100 Gbps Infiniband switch in my basement. I have, I think, either 6 or 7 systems hooked up to it right now, but I can hook up to 29-30 more, if I need to.) On a $/Gbps basis, 100 Gbps ends up being cheaper, overall.
A video sponsored by intel about a PC that can't fail with its current reliability issues is so funny. I know these deals are sometimes a long process but wow intel did not gain much out of this particular video's sponsorship.
Good for two virtual desktops basically doing nothing. I wonder if there was a couple of hundred task worker / devs or heavy users doing there things. Not so sure.
@@glaucorodrigues6400 in reality, those couple of hundred Devs would be spread across the Hypervisor hosts already, so you'd only get "up-to" 2 minutes of disruption on 1/4th, 1/5th etc of them (depending how many hypervisor hosts you have, probably a lot more than 4 for a couple hundred heavy users)
Yeah that kinda made me laugh - "We made a thing that can migrate VMs real fast... But we wait 2 minutes to do it"... Kinda undermines the whole point of the system...
@@glaucorodrigues6400 If you have a couple hundred task workers on 4 machines... you are already expecting to have a bad day. While VM's are voodoo... thin provision at your own risk I guess.
Very cool tech. I’ve used HA (High Availability - this tech in the VMWare world) for years and it’s amazing. Even more amazing is FT (Fault Tolerance) where a complete mirror VM is already running on a second host and no “migrating” occurs. Packets just redirect and it’s very magical. Shared storage is usually involved though, so this storage sync method is very cool if you don’t have shared storage. One nitpick though. Your thumbnail implied this would help with blue screens, but any issue at the software/OS level would not be prevented with this setup at all. If you add watchdog tools to reboot or migrate a VM if a heartbeat is lost, that’s completely different than the host becoming unavailable from a hardware issue and doesn’t need this “fancy” of a setup to just address a software glitch. If this setup has a heartbeat watchdog feature, I don’t see it mentioned here, but again, software glitches like a blue screen are an entirely different problem with entirely different solutions than hardware failures.
finally hearing them discuss active directory and diskless systems coming up in future videos has me geeked , I've been requesting this since they started discussing vms ! even tried to audition a script for pxe network booting but they didn't take me up on the offer 😢 can't wait to see their take on it though, been a long time coming
From experience 1) that is not enough RAM, I'd recommend at least twice as much. If you have memory heavy workloads, I'd recommend even more. 2) You should really use a dedicated NIC for management and coro sync. 3) If you use network storage, you want jumbo frames and therefore a dedicated network. 4) if you have heavy guest traffic, you should definitely use a dedicated NIC for that as well.
For the memory: it depends on of how much workload do they have, but it doesn't seem they have a lot of memory there. As for the network: that setup is just ridicules. You could put there like two 4 port 25GBps cards, and another gigabit card for management and have much more stable and robust setup for less money.
It's funny to watch LTT slowly work through the last 50 years of server & DC innovation, as they grow and run into every issue that originally spawned those innovations to begin with. Eventually they might actually arrive in the present best practices.
The weird thing the most times on LTT nearly every video there is a windows machine that is in the blue upgrading screen when you actually want to use it. I think this is a bigger problem especially with the recent crowd strike failure taking out the worlds computers.
@@colinstu In a modern server system you typically have not just redundant hardware or the ability to migrate or restart a single VM, since that only protects you against failure of the hardware. What you do instead, is to have multiple instances of the same software running across multiple hardware. In comparison to this VM based redundancy that additionally protects you from random failures in OS, libraries or the application code, which VM migration does nothing against. Additionally it also unlocks the potential for scale out autoscaling, and zero downtime upgrade strategies. On a single VM your best option for upgrades is to do a snapshot to return to, but the application will still be down for the length of the update. When you have multiple instances of the software however, you can upgrade them at different times preserving overall availability, only update a few and check on feedback first (canary deployment), etc. Of course details vary a bit depending on how your software works exactly.
@@autarchprinceps You know you could've said "clustering" or "CARP" in the first post and avoid all this pussyfooting around. Yeah no crap. That also involves at least 3x the hardware (usually can't just cluster between two machines). I don't see them spending that kind of money, especially not for a goofy media company.
I know LTT has never been very 'enterprise-y', but I find it hilarious that the example 'perfect use case' of a hyper-converged HA cluster of hosts is DNS and Active Directory DCs. Some of the most fault-tolerant systems, which are specifically designed in a way for multiple VMs to be used.
Yeah, I wouldn't even necessarily include the DCs in the failover. Leaves more resources for other VMs. Just make sure you have enough of them and don't have them running on the same node. Linus seemed so excited for AD, I wonder why they did not implement it yet. It's not that hard compared to other stuff they did
@@timcappell71 He had said it in some video, but I don't quite remember it right, but I think it was due to not having a dedicated IT team to manged it. Of course, they can manged it themselves, but it would take time from other stuff.
@@michaelkreitzer1369no need to. Have a VRRP/keepalived IP in front of multiple servers. DNS is literally the worst possible example. It has good caching and replication built in, with it being (mostly) UDP you won’t even have a TCP stream disrupted if it changes machines during a failover. It’s the one thing that just works in a replicated fashion. (Also I guess AD also offers some HA replicated setup…)
I have a Proxmox cluster at home that runs off of 3 mini PCs: AMD Ryzen 7 5800Us, 32GB of RAM, 2.5GB ethernet, and shared storage via NFS on my NAS. Those mini PCs give me 48 vCores and 96GB of RAM in my cluster, which is plenty for my home lab. Also was relatively cheap, got those all running for about $1000 (NAS not included, would be about $1000 more for my RAID1 36TB NAS). Also because the Ryzen 7 5800Us are laptop chips, this thing sips power and has great efficiency.
@@BoraHorzaGobuchul it looks like if I mention any links or brands I get deleted, but if you search "5800u mini PC dual LAN" you'll likely find something very similar to what I have.
I have a somewhat mini version of this hardware in a way with two laptops of Ryzen PRO 2300U, 8 GB RAM, 512 GB drive, plus a NAS with 20 TB usable (1 Gbps network tho). I'm now considering getting a third of such a laptop and configuring it into a cluster too... (currently only one of them has Proxmox and the other just Debian) But not sure if it's worth it with such cheap machines, plus they could use some RAM upgrade perhaps. But hey, at least they were cheap (~50$) and have built-in battery backup...
I have the same but with 4 intel N100's half the price, 1/4 the power usage, but also 1/4 the performance. But still, it's plenty to run everything I need. however, I'm gonna look at those ryzen 9000 series laptop chips.
I mean I've known about live migration for years and it still impresses me :P It's particularly cool seeing it done open source with custom HCI nodes, I am curious why they didn't go with OpenStack though.
@@almc8445 IIRC they have experience with Proxmox VE. Also Proxmox seems way easier to manage on this small scale, while still providing all necessary features.
2004 actually, so 20 by now. But yes, what I thought as well. It has come as far as being removed or at least deprecated by many newer DC/Hypervisors/-scalers in one way or another, as it's main flaw is, that it only protects you against things failing below the VM level. If the OS or the App fails, you're still just as screwed. A more modern design, e.g. putting DNS in multiple autoscaling containers or even just VMs on similar hardware cluster, would also allow you to 1. have a failure in any level of application or below, 2. scale and distribute load with demand, and 3. if you do it right, even allow you to be protected against failure do to changes/updates, as you can do various forms of rolling, blue green, or more complex etc. deployments, and still have both older and newer version available to fall back to. With a single VM with live migration, you only option then would be to restore to a snapshot, assuming you have one recent enough. Still causes downtime and potentially significant data loss though, which a modern system wouldn't need to risk. That's also why the container doesn't have live migration capability unlike the VM in their example as well. It assumes you have moved beyond that, if you have implemented containers into your architecture. Nothing theoretically preventing you from implementing the same things on a container level, just nobody cares anymore. Also more resource hungry to implement, as you need constant memory duplication synchronised with I/O activity, if you really want it to work for failure and not just planned migrations. It wouldn't work seamlessly if your memory was restored to a state before you send a network or disk request after all, as it would cause potentially catastrophic inconsistencies.
@@almc8445 why would they prefer OpenStack? Proxmox is much more integrated and easier to learn/understand. OpenStack might have been a good option if they were doing this for clients.
I think yayIHaveAUserName mentioned this in the disussion forum you linked, but unplugging your server and having whatever VMs are on there going down actually means that those VMs have to be rebooted on other servers. So technically...those VMs are failing, they're just being rebooted automatically. This matters if you're running a program that does not save state before it crashes because you might lose all your progress. It might also matter because you could mess up your filesystem if there were important write operations happening at the time of the crash. Very cool technology, but the video title is not 100% achieved, in my opinion. Also, the clustering section goes really quickly over fault-tolerance (i.e. "quorum"), but I don't feel like it was very well motivated other than just saying having two computers is not safe. Unless I misunderstood, the piece that seems like its missing is that this clustering program seems to be trying to handle Byzantine fault tolerance, where a computer could have a malicious user giving false data, which although is out of scope with your video, is the reason 2 computers with one fault is not safe for knowing what is the current, valid state of the system. Otherwise, why not trust the other computer to have the correct data? Simple redundancy would let you trust the one working computer with the source of truth.
@aeroragesys: Quorum is a mechanism for deciding which part of a cluster is ok, and which is not. If you have two servers, and they lost connection to each other, they don't know if "I'm the one" who is isolated, or the other one is. So where do you put your workload? You cannot run your VM on both servers, because when connection to other one comes back, you will have split brain situation. Imigine you are running database, and some data come to one db, and some to other. You cannot recover from that mess. Back to quorum: when you have two nodes that doesn't see each other, but you have a third witness node, the one that see a witness assume he is ok, and runs workload, and one who doesn't see witness assume that he is the one that have a problem and do not run any workload. If you have odd number of nodes you just vote. Part of the cluster who have majority runs workload. If no one have majority (eg. LAN switch is down, no server in cluster see any other server) no one runs the workload - cluster is down. It's better to be down than corrupt your VMs! If you have even number of nodes you need another node (a witness node) to tie break.
So good to see clever solutions for self hosted stuff being promoted here, especially that I did a similar as a hobby project with 3 hp t630 some time ago and couldn't believe how cool and seamless it is
"beginners should start with Ceph" said no one ever. Ceph has gotten easier to setup and maintain over the years, but that doesn't make it less of a complex distributed application with surprisingly strict requirements on certain aspects of the setup like network stability. It's easy to get it to e.g. dead lock all your vms because mons failed over due to some packetloss in an overloaded switch. Been there, debugged that.
Having run some large Ceph clusters (multi PB), I usually find as long as the network is stable, and performant enough, Ceph is rock solid. Proxmox does a great job of deploying Ceph for you. DRDB on the other hand, I ran out of performance pretty quickly on some database servers.
I've had a three node CEPH cluster at home for about 9 years. My only Linux experience before that was using ZFS for a file server. I've never used Proxmox or 45Drives' Houston UI, but it seems like they make CEPH a lot simpler. Just don't expect very many IOPS out of CEPH.
Before everyone’s like “sponsored by Intel on a reliability video”, this isn’t blonde Linus, this was probably filmed like 30 years ago As for that unplugging PC bit, one of my old teammates was part of Live Migration in AWS, which is that “carry over an instance from server to server without noticing a difference”
Who cares? They are a big enough channel to eat the cost of not running the video given current events or to at least delay the video. If you screw over an entire 2 generations of customers you deserve to be clowned regardless of context.
This video was shot (at least partially) on July 18th or 19th.. Linus had the "bald spot" caused by Bell testing the hair bleaching.. And he told on the wan show that he has to go around like that for a couple of days... So... Intel was still fine back then. Well, better than now at least.
@watercannonscollaboration2281 Except you can see the blond patch in back, so we know exactly what day it was done: July 25th (one day under 3 weeks ago from this video) at some time before the WAN Show.
Huge props to the team on this one, this can't have been an easy video to plan/film and keep entertaining. It's extremely exciting and awesome technology but very difficult to show, Good Job!
FWIW, it looks like someone messed up the links in the description. The link to the intel Xeon goes to the website for the crucial RAM. Everything else seems to be correct thought. Neat video! I will literally never be able use one of these, but it's cool to see how it's done.
13:29 are you telling me LMG has been around for as long as it has, with as many employees as it has, with as many severs as it has... and you haven't been using AD up to this point...?
Dude, I was just about to ask the same question. And aswell, they just NOW started a virtualisation Cluster, huuuuh? I work in it consulting and at a certain size, everyone of our customers has a virtualisation cluster.
At a certain scale it's not worth the overhead/cost and if I'm being honest, if you have a Greenfield estate I would just bang everyone on azure directory and side step on-prem AD. Current company is around 800 users with a decent chunk of infra and no AD in sight. Currently looking to implement it for a product as it will provide a better user experience to the workers if a certain bit of kit.
@@toon908 Remember in that 2 weeks they also had to build, test and commission 3 more servers, deploy them and configure the cluster, then film the demos and *then* edit the video. Pretty quick actually.
For my work, the production environment runs on Solaris clusters for HA. It was interesting to the differences between the configuration and management tools and interfaces. One thing I would note is that one still needs a good UPS system across all that hardware infrastructure or all the HA fail over won't mean anything with a power outage. I know you all have those big Eaton UPS systems there.
This is great, your PC blue-screens and proxmox moves the VM to another server where it will continue to blue-screen. It's perfect. VMware have had this functionality for a decade, they even have HA VMs where the memory is constantly synced giving an instant failover that is several hundred times faster.
Yeah, most of the time a BSOD has got nothing to do with hardware failure an thus this entire stupidly expensive sponsor/PR-money video is useless as a method of avoiding bluescreens.
Ltt never ceases to amaze 😂 they have over 100 employees, 100g netowrking, multiple sites, but have yet to implement basic enterprise infrastructure like AD!
I wouldn’t even bother with on-prem AD at this point, they seem to operate already mostly in the cloud and with SaaS apps. Would just be easier to point local servers and authentication to a cloud service like EntraID, Okta, etc.
I'll be using my first commenter privilege today. Because of LTT I'm now in ECE engineering (The closest branch I could find to work in pc's) and plan to do vlsi and hopefully will end up in chip design on either the gpu or cpu side. I do watch the main channel but WAN is my jam. (P.s please release WAN on spotify earlier cause 2 days is too long) I love the wan show too damn much. Much love! Sumanth
One gotcha to watch out for: it's not advisable to mix Intel and AMD x86-64 machines in the same Proxmox cluster because live migration is likely to fail, especially if you set the CPU type to "host". You might get live migration to work if you set the lowest common denominator of CPU flags of all the machines in the cluster, but even that isn't guaranteed. Doing some live migration testing recently in such a mixed cluster showed that you can actually get your VMs' CPU threads to soft lock during/after a live migration between AMD and Intel cluster nodes. This doesn't just cause the migrated VM to hang but might actually hang other VMs too if they shared that node's threads between VMs. In an ideal world, all your nodes should be from the same chip manufacturer *and* the same generation. Do a lot of live migration testing and let some VMs settle for 30 mins after migration while running some typical test tasks: crashes can be seen 20-30 mins *after* migration!
saying drbd is open source is *technically* true, but if you have ever had the displeasure of trying to actually build their code, you will find out just how far you can stretch the term "open source"
@@FlyboyHelosim "Where is the documentation!?" Right here "Where is the documentation that helps me make any sense of the documentation!? That was me trying to figure out how to make some kernel level changes to make it so the touchscreen on my laptop didn't permanently turn off whenever the laptop went to sleep. I had found some forum post that said " make these changes" but didn't say where I needed to make those changes. I tried looking through the documentation, and I felt like I needed a masters degree in Linux to understand any of it. I gave up and reinstalled Windows so I could use the working driver.
@@dekkonot sure, but sometimes getting something to compile is like trying to trial and error summoning a demon with black magic. It's esoteric, the error messages make no sense, and no one is willing to write the programming version of the freaking Ars Goetia so I can do it!
I have been running Proxmox for a few years. I run primarily older servers. I have a few Dell servers that are pre 2014. NAS, Plex, DNS, Email, AD, and a few home automation programs that I wrote. I also have a PBS deployed that handles the backups and distributes them to my offsite backup server. I do need to convert a huge container to a VM soon. But overall I can say that I LOVE how flexible and simple Proxmox was to set up and build on. Really simplified all of my tinkering.
8:40 "If you have a server that doesn't have IPMI I don't even know if that's a server, really" And that's why I've gone to the effort of adding it to my servers. Before I got my hands on it, it was a used Dell optiplex. Then I stuffed pikvm into it and it became a server. It really is the best to be able to adjust bios settings and install different operating systems without having to get on a plane and go to the machine.
@@Frank-li8uj They are Potentially also affected by the same issues, though it's not confirmed yet from what I've heard (or not). It depends on if the chips where made at the same Fab and so on...
As some one that has designed and built plenty Hyper-V cluster and dabbled with VmWare and vSphere... This is extremely cool, just because they are allowing you to build an Enterprise level hosting environment for free
I love this! i am really enjoying toying with VMs and server infrastructure on my spare time. While I won't be able to deploy something like this atm from home. It was still very entertaining!
Probably RAM or Power supply. Check your reliability history to try diagnose what is causing it, it will tell you weather it was a hardware issue and give you some info that might help diagnose the source.
Fault tolerance in Cloud Computing is basically the same thing. We just use multiple servers which we don't own and have no physical access to. We rent them in a way which reduces the costs a lot. Seeing this being done on physical hardware was a great experience and a good demonstration of how it is done. Knowing the theory behind it and seeing it happen was fun. It is kind of uninteresting when you use an interface to do it and have no contact with the actual hardware. Gonna do this one day when I create my own server.
live migration is super cool, a while back I tried it on a small 2 host proxmox cluster and I was just AMAZED to see it migrate a VM from one host to another with only a few 100 ms of actual downtime
Just wanted to say I love the way Jake describes stuff. I have almost no idea about any of this stuff, but listening to Jake talk about it, I just kind of 'get the idea' enough to follow along. Sometimes hearing people talk about super technical stuff outside of my sphere, I get extremely lost very quickly. Not saying I understood everything, but I understood *enough* to follow along with the video. Y'know?
Sapphire Rapids is pretty reliable since it’s based on the old Alder Lake micro architecture, its performance is inferior to EPYC but it’s easier to get your hands on entry level XEON.
Really cool video. Damn, those are some really cool features. I love this kind of stuff. Even using Docker that thing works. I have no use for any of this, but it does give a perspective on how far we have come in regards to IT infra. This stuff is usually a nightmare.
@@nicesmile3125 nah, they did a test spot on the back of his head before bleaching the rest. So this was maybe the day before he bleached the rest of it
I think it's very cool that I learned all of this in some college classes and literally got a degree for putting this kind of technology to use. I appreciate the refresher course Linus!
As a rule of thumb, any component that is a few years old is more-or-less absolutely reliable. I've still got a system running on a Ryzen 1700; though that was solid when it released, let alone 5 or whatever-years on from now.
It's not the age of the component, it's the run time. Some people will only notice the degradation in their years old Intel 14th gen CPUs in a few years if they only use it lightly and don't update their bios
I build this setup with 10 year old servers, 1gbe between them and Proxmox and linstor a few years ago. This served the production traffic for a not so small professional webhosting scenario. It needed some finetuning but linstor works quite well with not so fast bandwith.
15:30 You guys have to try out Kubernetes (RKE2 for example for an easy FOSS distro to use)! Seeing containers being the slow option for high availability is hurting my soul guys. Set it up cloud native CSI like Longhorn or Rook-ceph and watch even stateful workloads switch crazy seamlessly. You can even do load balanced work loads if you want to be even more crazy about it (using ingress controllers which is just a proxy, service meshes which is super advanced proxies, or loadbalancer types which normally operate at the network level instead (meaning less features but less overhead too)). Again please, this is such a cool topic I loved the video and want to see more!
That's cool to see you playing with that. But one thing that you failed to mention is that most services we use are running a similar process. But it's a great video
"all at a modest 300W TDP"? Are you freaking kidding me? C'mon guys, yes it's more efficient then a bloody 14900KS, but this thing is designed to run 24/7, you can get much more efficient 32 Core Processors. Just not from Intel lol.
@@insu_na This lmao. AMDs server CPUs were dog thrash early on. They're just now getting to the point where you could consider them for your next deployment. AMD is legendarily awful in the server space.
@@Pipsispite They've been undefeated in performance for many years now. Sure idle wattage may not be great but datacenters don't leave their hardware idle most of the time
Really cool that this has become open source now. I remember setting up lab with a VMware HA cluster with at home with two ESXi nodes and an Openfiler NAS with iSCSI for shared storage in 2009. Not the most reliable of solutions but it was free (software) and it worked. 😅
You know what I love? The business model where the software is free for home enthusiasts but funded by sales of commercial licenses
This model always makes me happy. Makes me think of shareware.
it also speeds up teaching of staff to use it, since they can just tinker with it privately if they want, outside of doing a job
@@512TheWolf512 Open source with an optional official support contract, has always seemed to be ideal to me. You get the benefit of multiple sources validating code, and the ability to pay for a disaster. The official provider cant stop people from making their own guides so if there is enough documentation online, you're smart enough and have enough time (The last one is basically never in a corporate environment) then you're golden. The owners get the added bonus of less staff due to the community's forks, fixing and updating code for you.
@@michallv Reminds WinRAR. Never once bought it for myself at home, but back when I was in high school, I remember talking to IT people and them saying that the school district needed a quote for a price to buy WinRAR.
It's also great for businesses that want to test run a software before committing the full price, companies will do everything possible to stick to "if it ain't broken don't fix it", even when it means having to deal with extremely outdated and annoying (but technically functional) software, but if they can test it for free before spending big bucks to upgrade, they are more likely to do it.
„PC that can‘t fail.“ … „sponsored by Intel“. Gold.
... with Threadripper 3960X in it.
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
@@vvmbt German spotted
@@cieknie Intel Xeon Platinum 8562Y+
@@Pommezul In the Netherlands we use opening and closing quotation marks, which... I hate, but... these are just evil.
Idk if Linus or the staff will see this, but I had an idea for a video I thought would be useful (especially for myself). Have you thought about a video on PC maintenance? Idk if it has already been done, but something like "Here's what you should be doing for your PC health every month, 6 months, year, etc." Cleaning fans, reapplying thermal paste, updating BIOS/drivers, checking stressed connections, etc.
@qwebb911 Upvoted this. PC and Laptop maintenance.
Not broken? Don't touch. Sneezing? Clean.
might be worth posting on the forums
I was an adjunct professor teaching a class on microprocessor architectures. One of the students asked me about how to fix her machine that would run for a while and then give her a BSOD. (Blue Screen Of Death).
I told her "I bet you have a cat." She was quite surprised, then asked "How did you know?"
It was time for her to clean the insides of her PC, especially the CPU heatsink.
She was expecting a software issue...
should add in a section about if you have to move and what to do before and after any shipping
I've been running an HA cluster at home for a year or two now. The downside is Proxmox doesn't understand "Hey! We have a power outage and there's only 5 minutes left of the UPS." It really doesn't shut down nicely---the cluster will keep migrating VMs as you try to do clean power-off in the dark. For server-down maintenance, it's fantastic though.
You can with NUT
Interesting use case. I agree re NUT.
Yes, because other than stated in this video, the purpose of this tech is not to help you with hardware failures, it's for load balancing. Won't do shit to keep your system running. It's useless for most people, who do not run a datacenter.
NUT is your best friend, i have it running on a PI hooked to a UPS, when it looses power it monitors runtime, if it gets too low, SAFE shutdown all VM's and then host.
What about adding bigger battery's to your ups
0:01 That chair needs to be replaced before it sheds all over the office. Faux leather gets EVERYWHERE when it starts flaking off and once it starts flaking, it accelerates fast.
Womp womp who cares just vacuum it
@@PneumaticFrog I was going to denigrate for womp womping, but the office is more than likely cleaned and vacuumed every day.
Dudes making computers that aren't supposed to fail using Intel who has an issue with their newest chips failing and you're worried about the chair?
@@afterglow-podcastXEON is a different process and design philosophy
@@PheonixRise666 if I go to a restaurant and get food poisoning from a burger I'm not going back for the chicken sandwich.
This was my senior thesis for university!
I designed a distributed fault coincidence avoidance solution using Proxmox VE with DRBD as VM backing storage.
It bounced the virtual machine across the machine cluster randomly to reduce the odds of a fault coinciding with the critical process.
It technically outperforms VSphere FT (But is not technically a full tolerance solution so it's not necessarily comparable.)
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
i have no idea on what you just said, but it sounds cool af
That's so cool. Can I get the link to your paper or your LinkedIn? Lmao this seems so random, but I am also a graduate searching for good thesis topics to study.
@@voldyriddle3337 Sammmme
You mean you played around with redundant bits?
"pc that can't fail, courtesy of intel"? lmao, given their recent issues
100% Intel are doing damage control on their fatally and terminally flawed CPUs from 2023-2024
hey, there’s a reason it’s a xeon and not their normal core series
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
He aint wrong lol
@@Havocpsi who do you mean by "he"? linus?
Can’t believe there isn’t a jump to TechnologyConnections saying “ through the magic of buying two of them”
fr? He's fan of technologyConnections?
@@seen-bc9eq Who isn't?
@@seen-bc9eq He is, Luke ESPECIALLY is
That or Cathode Ray Dude's two of them cats image
That's my favourite way to say redundancy.
The thing about docker containers that can resolve the issue that Jake mentioned:
A: run a Virtual Machine just to host these docker containers and the that VM will migrate around the hosts as needed.
B: run that container in Kubernetes. Then you can configure load balancing, scaling and other cool features.
... Kata Containers?
@@IngwiePhoenix_nb Sorry, I misspelled Kubernates. You can Google it as k8s or a lightweight version is k3s.
Yeah, make VMs for Kubernetes cluster and then let Kubernetes move pods between nodes.
@@gatisluck That wasn't a Docker container, it was LXC. You can see that he has a separate VM for Docker
Came here to make the Kubernetes comment. If you run everything in containers, you get all this shit for free with very little effort, especially if you run managed k8s like Rancher or one of the cloud managed k8s services.
I just wanted to comment to say that Jake you’re looking good, my dude! You mentioned before about losing weight and it’s clear you’ve lost some more :) keep up the amazing work!
why you need Linus ?
A machine that can't fail, sponsored by Intel, who currently have countless CPU's failing in businesses and servers across the planet? Amazing timing.
Xeon hasnt however been apart of that. In fact more threadrippers die on a daily basis in that lineup. But dont let that stop your reality.
Hi Harry!
Ayyy, you're the guy with the Portal animations!
@@themightyredemptionYeah I'm gonna need a source better than "trust me bro" for that
Didnt expect to find you here, nice
A Video about reliability sponsored by INTEL… huh
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
well its a xeon not an i9 and i7 lol hahah
For most of the times intel was more reliable, and still is, when calculated per capita
"hold my ring bus"
-Intel
@@nk70 it is xeons. Far far far more reliable than any consumer
Having spent as much time as I have in my career troubleshooting DRBD issues, getting calls in the middle of the night about dreaded split brain, etc. I would gladly trade some level of performance in order to not use DRBD.
Realistically you should have a cluster for storage and a cluster for compute, and therefore not have to worry about using something like DRBD to keep things in sync. With that being said, it's nice to finally see LMG moving closer to enterprise level infrastructure!
Could still see New New New New Whonic there tho :P
Agreed tho, feels good to see them not live on jank where it matters o.o
I was tasked with setting up a redundant storage cluster for an existing vm computer cluster.
Literally got stuck for months back and forth with drbd support trying to get it working. They have a small team amd their support is good. But I seemingly managed to stumble over every bug and problem with their software just setting it up. Gave up in the end and decided to try with Microsoft hypervisor...
I was already planning a very expensive upgrade to my home lab that will take a while to save up for and you show me this really cool stuff that I would also love to do. Please stop.
Three things:
1) I'm already running this at home with three OASLOA Mini PCs (which sports an Intel N95 processor (4-core/4-thread)) in each node, along with 16 GB of RAM, and a 512 NVMe SSD. The system itself has dual GbE NICs, so I was able to use one of them for the clustering backend, and then present the other interface as the front end. (Each node, at the time, was only like $154.)
2) My 3-node Proxmox HA cluster was actually set up in December 2023, specifically with Windows AD DC, DNS, and Pi-Hole in mind, but then ended up changing to AdGuard Home, after getting lots of DNS overlimit warnings/errors.
(Sidebar: I just migrated my Ubuntu VM from one node to another over GbE. It had to move/copy 10.8 GiB of RAM over, so that took the most time. Downtime was in the sub-300ms range. Total time was about 160 seconds.)
3) 100 Gbps isn't *that* expensive anymore. The most expensive part will likely be the switch, if you're using a switch. (There are lower cost switches, in terms of absolute price, but if you can and are willing to spend quite a bit more, you can end up with a much bigger switch where you'd be able to put a LOT more systems on 100 Gbps network vs. getting a cheaper switch, but with fewer number of ports overall. I run a 36-port 100 Gbps Infiniband switch in my basement. I have, I think, either 6 or 7 systems hooked up to it right now, but I can hook up to 29-30 more, if I need to.) On a $/Gbps basis, 100 Gbps ends up being cheaper, overall.
Intel and stability... intresting!
Only 13th and 14th gen I7s and I9s are failing, i still got a I5 6400 running fine
Ha!
@@gdguy57 iPad Pro
NICE
@@Batcave4956 trash
A video sponsored by intel about a PC that can't fail with its current reliability issues is so funny. I know these deals are sometimes a long process but wow intel did not gain much out of this particular video's sponsorship.
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
quite the opposite, the purpose is to clear intel's image of instability
@@toddhowardfr it clearly didn't work. Maybe if they didn't fuck over their clients with problems, it would have helped
@@alexrevenger234the video hasn’t been out for even an hour, wdym ?
When this video was planned likely many months ago at least that image wasn't about instability as it is now@@toddhowardfr
16:45 - Yes. Proxmox VE's High Availability full failover window is hard-coded at 120 seconds.
Good for two virtual desktops basically doing nothing. I wonder if there was a couple of hundred task worker / devs or heavy users doing there things.
Not so sure.
@@glaucorodrigues6400 in reality, those couple of hundred Devs would be spread across the Hypervisor hosts already, so you'd only get "up-to" 2 minutes of disruption on 1/4th, 1/5th etc of them (depending how many hypervisor hosts you have, probably a lot more than 4 for a couple hundred heavy users)
Yeah that kinda made me laugh - "We made a thing that can migrate VMs real fast... But we wait 2 minutes to do it"... Kinda undermines the whole point of the system...
@@glaucorodrigues6400 If you have a couple hundred task workers on 4 machines... you are already expecting to have a bad day. While VM's are voodoo... thin provision at your own risk I guess.
@@almc8445 maybe not optimal, but still faster than having a user go to IT, having them figure out why the server is down, and having them restart it
I love it when Jake is showing us more and more network/server functionality. I would love to see Jake host videos like these!
You just made the project I have at work significantly easier, because you literally made a guide on how to do it. Thanks, guys!
Very cool tech. I’ve used HA (High Availability - this tech in the VMWare world) for years and it’s amazing. Even more amazing is FT (Fault Tolerance) where a complete mirror VM is already running on a second host and no “migrating” occurs. Packets just redirect and it’s very magical. Shared storage is usually involved though, so this storage sync method is very cool if you don’t have shared storage.
One nitpick though. Your thumbnail implied this would help with blue screens, but any issue at the software/OS level would not be prevented with this setup at all. If you add watchdog tools to reboot or migrate a VM if a heartbeat is lost, that’s completely different than the host becoming unavailable from a hardware issue and doesn’t need this “fancy” of a setup to just address a software glitch.
If this setup has a heartbeat watchdog feature, I don’t see it mentioned here, but again, software glitches like a blue screen are an entirely different problem with entirely different solutions than hardware failures.
My main takes from this video are:
1. Don't dye/bleach only the back of your head
2. Jake is looking good!
It was only a day, and Bell's fault, empowered by dBrand.
He bleached his whole head. Now he's just going back, but apparently forgot a spot 😂
@@powerdude_dk No, this is from before bleaching his whole hair. They did a test patch before the WAN show
@Gabu_ Great catch, I didn't notice his bleached spot but seeing takes this video to the next level.
@@Gabu_ yeah, found out it was before the whole bleach 👍
finally hearing them discuss active directory and diskless systems coming up in future videos has me geeked , I've been requesting this since they started discussing vms ! even tried to audition a script for pxe network booting but they didn't take me up on the offer 😢
can't wait to see their take on it though, been a long time coming
From experience
1) that is not enough RAM, I'd recommend at least twice as much. If you have memory heavy workloads, I'd recommend even more.
2) You should really use a dedicated NIC for management and coro sync.
3) If you use network storage, you want jumbo frames and therefore a dedicated network.
4) if you have heavy guest traffic, you should definitely use a dedicated NIC for that as well.
All of this is true.
For the memory: it depends on of how much workload do they have, but it doesn't seem they have a lot of memory there.
As for the network: that setup is just ridicules. You could put there like two 4 port 25GBps cards, and another gigabit card for management and have much more stable and robust setup for less money.
It's funny to watch LTT slowly work through the last 50 years of server & DC innovation, as they grow and run into every issue that originally spawned those innovations to begin with. Eventually they might actually arrive in the present best practices.
Do anything, turn it into content
The weird thing the most times on LTT nearly every video there is a windows machine that is in the blue upgrading screen when you actually want to use it. I think this is a bigger problem especially with the recent crowd strike failure taking out the worlds computers.
and what's that? don't tell me it's some cloud bullshit
@@colinstu In a modern server system you typically have not just redundant hardware or the ability to migrate or restart a single VM, since that only protects you against failure of the hardware. What you do instead, is to have multiple instances of the same software running across multiple hardware. In comparison to this VM based redundancy that additionally protects you from random failures in OS, libraries or the application code, which VM migration does nothing against. Additionally it also unlocks the potential for scale out autoscaling, and zero downtime upgrade strategies. On a single VM your best option for upgrades is to do a snapshot to return to, but the application will still be down for the length of the update. When you have multiple instances of the software however, you can upgrade them at different times preserving overall availability, only update a few and check on feedback first (canary deployment), etc.
Of course details vary a bit depending on how your software works exactly.
@@autarchprinceps You know you could've said "clustering" or "CARP" in the first post and avoid all this pussyfooting around. Yeah no crap. That also involves at least 3x the hardware (usually can't just cluster between two machines). I don't see them spending that kind of money, especially not for a goofy media company.
I know LTT has never been very 'enterprise-y', but I find it hilarious that the example 'perfect use case' of a hyper-converged HA cluster of hosts is DNS and Active Directory DCs. Some of the most fault-tolerant systems, which are specifically designed in a way for multiple VMs to be used.
That's a pretty good point. However, it's surprisingly hard to get Linux or Windows to actually _use_ their secondary DNS servers!
Yeah, I wouldn't even necessarily include the DCs in the failover. Leaves more resources for other VMs. Just make sure you have enough of them and don't have them running on the same node. Linus seemed so excited for AD, I wonder why they did not implement it yet. It's not that hard compared to other stuff they did
@@timcappell71 He had said it in some video, but I don't quite remember it right, but I think it was due to not having a dedicated IT team to manged it. Of course, they can manged it themselves, but it would take time from other stuff.
@@michaelkreitzer1369no need to. Have a VRRP/keepalived IP in front of multiple servers. DNS is literally the worst possible example. It has good caching and replication built in, with it being (mostly) UDP you won’t even have a TCP stream disrupted if it changes machines during a failover. It’s the one thing that just works in a replicated fashion. (Also I guess AD also offers some HA replicated setup…)
@@tmbchwldt3508 Ya but these features are basically free. Once you have a N+# failover cluster it's more work to not apply it to all VM's.
I have a Proxmox cluster at home that runs off of 3 mini PCs: AMD Ryzen 7 5800Us, 32GB of RAM, 2.5GB ethernet, and shared storage via NFS on my NAS. Those mini PCs give me 48 vCores and 96GB of RAM in my cluster, which is plenty for my home lab. Also was relatively cheap, got those all running for about $1000 (NAS not included, would be about $1000 more for my RAID1 36TB NAS). Also because the Ryzen 7 5800Us are laptop chips, this thing sips power and has great efficiency.
Which minipcs did you use?
@@BoraHorzaGobuchul it looks like if I mention any links or brands I get deleted, but if you search "5800u mini PC dual LAN" you'll likely find something very similar to what I have.
I have a somewhat mini version of this hardware in a way with two laptops of Ryzen PRO 2300U, 8 GB RAM, 512 GB drive, plus a NAS with 20 TB usable (1 Gbps network tho). I'm now considering getting a third of such a laptop and configuring it into a cluster too... (currently only one of them has Proxmox and the other just Debian) But not sure if it's worth it with such cheap machines, plus they could use some RAM upgrade perhaps. But hey, at least they were cheap (~50$) and have built-in battery backup...
@@BoraHorzaGobuchul probably Asus PN judging only by the limited specs.
I have the same but with 4 intel N100's
half the price, 1/4 the power usage, but also 1/4 the performance.
But still, it's plenty to run everything I need.
however, I'm gonna look at those ryzen 9000 series laptop chips.
Livemigration was already a thing 15 years ago with Xen etc. Cool that this still impresses people today :D
I mean I've known about live migration for years and it still impresses me :P
It's particularly cool seeing it done open source with custom HCI nodes, I am curious why they didn't go with OpenStack though.
@@almc8445 IIRC they have experience with Proxmox VE. Also Proxmox seems way easier to manage on this small scale, while still providing all necessary features.
@@almc8445 openstack always smelt like weird enterprise software - "better buy our mainframe with support license, if you want to properly use it..."
2004 actually, so 20 by now. But yes, what I thought as well. It has come as far as being removed or at least deprecated by many newer DC/Hypervisors/-scalers in one way or another, as it's main flaw is, that it only protects you against things failing below the VM level. If the OS or the App fails, you're still just as screwed. A more modern design, e.g. putting DNS in multiple autoscaling containers or even just VMs on similar hardware cluster, would also allow you to 1. have a failure in any level of application or below, 2. scale and distribute load with demand, and 3. if you do it right, even allow you to be protected against failure do to changes/updates, as you can do various forms of rolling, blue green, or more complex etc. deployments, and still have both older and newer version available to fall back to. With a single VM with live migration, you only option then would be to restore to a snapshot, assuming you have one recent enough. Still causes downtime and potentially significant data loss though, which a modern system wouldn't need to risk.
That's also why the container doesn't have live migration capability unlike the VM in their example as well. It assumes you have moved beyond that, if you have implemented containers into your architecture. Nothing theoretically preventing you from implementing the same things on a container level, just nobody cares anymore.
Also more resource hungry to implement, as you need constant memory duplication synchronised with I/O activity, if you really want it to work for failure and not just planned migrations. It wouldn't work seamlessly if your memory was restored to a state before you send a network or disk request after all, as it would cause potentially catastrophic inconsistencies.
@@almc8445 why would they prefer OpenStack? Proxmox is much more integrated and easier to learn/understand. OpenStack might have been a good option if they were doing this for clients.
I think yayIHaveAUserName mentioned this in the disussion forum you linked, but unplugging your server and having whatever VMs are on there going down actually means that those VMs have to be rebooted on other servers. So technically...those VMs are failing, they're just being rebooted automatically. This matters if you're running a program that does not save state before it crashes because you might lose all your progress. It might also matter because you could mess up your filesystem if there were important write operations happening at the time of the crash. Very cool technology, but the video title is not 100% achieved, in my opinion.
Also, the clustering section goes really quickly over fault-tolerance (i.e. "quorum"), but I don't feel like it was very well motivated other than just saying having two computers is not safe. Unless I misunderstood, the piece that seems like its missing is that this clustering program seems to be trying to handle Byzantine fault tolerance, where a computer could have a malicious user giving false data, which although is out of scope with your video, is the reason 2 computers with one fault is not safe for knowing what is the current, valid state of the system. Otherwise, why not trust the other computer to have the correct data? Simple redundancy would let you trust the one working computer with the source of truth.
@aeroragesys: Quorum is a mechanism for deciding which part of a cluster is ok, and which is not. If you have two servers, and they lost connection to each other, they don't know if "I'm the one" who is isolated, or the other one is. So where do you put your workload? You cannot run your VM on both servers, because when connection to other one comes back, you will have split brain situation. Imigine you are running database, and some data come to one db, and some to other. You cannot recover from that mess.
Back to quorum: when you have two nodes that doesn't see each other, but you have a third witness node, the one that see a witness assume he is ok, and runs workload, and one who doesn't see witness assume that he is the one that have a problem and do not run any workload.
If you have odd number of nodes you just vote. Part of the cluster who have majority runs workload. If no one have majority (eg. LAN switch is down, no server in cluster see any other server) no one runs the workload - cluster is down. It's better to be down than corrupt your VMs!
If you have even number of nodes you need another node (a witness node) to tie break.
So good to see clever solutions for self hosted stuff being promoted here, especially that I did a similar as a hobby project with 3 hp t630 some time ago and couldn't believe how cool and seamless it is
"beginners should start with Ceph" said no one ever. Ceph has gotten easier to setup and maintain over the years, but that doesn't make it less of a complex distributed application with surprisingly strict requirements on certain aspects of the setup like network stability. It's easy to get it to e.g. dead lock all your vms because mons failed over due to some packetloss in an overloaded switch. Been there, debugged that.
Sounds oof
Having run some large Ceph clusters (multi PB), I usually find as long as the network is stable, and performant enough, Ceph is rock solid. Proxmox does a great job of deploying Ceph for you. DRDB on the other hand, I ran out of performance pretty quickly on some database servers.
Oh yes. And the upgrades. And rebalances. Perfect software for a curious beginner
I've had a three node CEPH cluster at home for about 9 years. My only Linux experience before that was using ZFS for a file server. I've never used Proxmox or 45Drives' Houston UI, but it seems like they make CEPH a lot simpler. Just don't expect very many IOPS out of CEPH.
Why would they use old DRDB over the builtin ceph
Before everyone’s like “sponsored by Intel on a reliability video”, this isn’t blonde Linus, this was probably filmed like 30 years ago
As for that unplugging PC bit, one of my old teammates was part of Live Migration in AWS, which is that “carry over an instance from server to server without noticing a difference”
Who cares? They are a big enough channel to eat the cost of not running the video given current events or to at least delay the video. If you screw over an entire 2 generations of customers you deserve to be clowned regardless of context.
This video was shot (at least partially) on July 18th or 19th.. Linus had the "bald spot" caused by Bell testing the hair bleaching.. And he told on the wan show that he has to go around like that for a couple of days...
So... Intel was still fine back then. Well, better than now at least.
@@MrSousuke87Intel has been under this mess for a few months now. Longer than when that happened with Linus
@@MrSousuke87 except people were talking about this for more than a year and intel knew for 2 years
@watercannonscollaboration2281 Except you can see the blond patch in back, so we know exactly what day it was done: July 25th (one day under 3 weeks ago from this video) at some time before the WAN Show.
Huge props to the team on this one, this can't have been an easy video to plan/film and keep entertaining. It's extremely exciting and awesome technology but very difficult to show, Good Job!
The timing of this video is great, I recently got into self-hosting/home server things and was planning on looking into virtualization next!
FWIW, it looks like someone messed up the links in the description. The link to the intel Xeon goes to the website for the crucial RAM. Everything else seems to be correct thought. Neat video! I will literally never be able use one of these, but it's cool to see how it's done.
13:29 are you telling me LMG has been around for as long as it has, with as many employees as it has, with as many severs as it has... and you haven't been using AD up to this point...?
Dude, I was just about to ask the same question. And aswell, they just NOW started a virtualisation Cluster, huuuuh? I work in it consulting and at a certain size, everyone of our customers has a virtualisation cluster.
My mind was damn blown too!
I'm not in context of Windows server ecosystem, why is AD desirable other than SSO?
At a certain scale it's not worth the overhead/cost and if I'm being honest, if you have a Greenfield estate I would just bang everyone on azure directory and side step on-prem AD.
Current company is around 800 users with a decent chunk of infra and no AD in sight. Currently looking to implement it for a product as it will provide a better user experience to the workers if a certain bit of kit.
... Yep, insane.
That blondespot at 5:15 tells us when this was recorded.
Like telling the age of a tree by it's rings
its mad they post vid from over 2 weeks ago before hair dye, makes u wounder hows takin over 2 weeks 2 edit this vid
@@toon908its Sponsored by Intel. Sponsors often give you a release window or a exact date especially if its a Video like this one
@@toon908 Remember in that 2 weeks they also had to build, test and commission 3 more servers, deploy them and configure the cluster, then film the demos and *then* edit the video. Pretty quick actually.
Jokes on you, Linus. I just failed my mom.
🤡
lol!
buddy nobody cares
@@QuicklyFreezeBout me too? 🥺
bros a pc
For my work, the production environment runs on Solaris clusters for HA. It was interesting to the differences between the configuration and management tools and interfaces.
One thing I would note is that one still needs a good UPS system across all that hardware infrastructure or all the HA fail over won't mean anything with a power outage. I know you all have those big Eaton UPS systems there.
This is great, your PC blue-screens and proxmox moves the VM to another server where it will continue to blue-screen. It's perfect.
VMware have had this functionality for a decade, they even have HA VMs where the memory is constantly synced giving an instant failover that is several hundred times faster.
Yeah, most of the time a BSOD has got nothing to do with hardware failure an thus this entire stupidly expensive sponsor/PR-money video is useless as a method of avoiding bluescreens.
not that i usually comment on people's appearance but wow, has jake lost weight? he's looking great, keep up the good work pal
Ltt never ceases to amaze 😂 they have over 100 employees, 100g netowrking, multiple sites, but have yet to implement basic enterprise infrastructure like AD!
They probably had Google workspaces and okta . You'd be surprised by how far you can get with just those two.
Almost like they explain in the video why they don't have AD yet lol
I wouldn’t even bother with on-prem AD at this point, they seem to operate already mostly in the cloud and with SaaS apps. Would just be easier to point local servers and authentication to a cloud service like EntraID, Okta, etc.
I'll be using my first commenter privilege today. Because of LTT I'm now in ECE engineering (The closest branch I could find to work in pc's) and plan to do vlsi and hopefully will end up in chip design on either the gpu or cpu side. I do watch the main channel but WAN is my jam. (P.s please release WAN on spotify earlier cause 2 days is too long) I love the wan show too damn much. Much love!
Sumanth
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
@@Idiana_Kami what kind of work load would you want that pc for? Unless you're in AI or some DS course a regular laptop should be capable enough
@@ingenious3259 They've been spamming the same comment across the entire comment section.
@@xdevs23 ah damn, well I hope they got their answer lol
@@Idiana_Kamican you please stop posting the same question all over the place? Why don't you search it up on the web like anybody else?
One gotcha to watch out for: it's not advisable to mix Intel and AMD x86-64 machines in the same Proxmox cluster because live migration is likely to fail, especially if you set the CPU type to "host". You might get live migration to work if you set the lowest common denominator of CPU flags of all the machines in the cluster, but even that isn't guaranteed.
Doing some live migration testing recently in such a mixed cluster showed that you can actually get your VMs' CPU threads to soft lock during/after a live migration between AMD and Intel cluster nodes. This doesn't just cause the migrated VM to hang but might actually hang other VMs too if they shared that node's threads between VMs. In an ideal world, all your nodes should be from the same chip manufacturer *and* the same generation. Do a lot of live migration testing and let some VMs settle for 30 mins after migration while running some typical test tasks: crashes can be seen 20-30 mins *after* migration!
I was actually planning on setting this up when I get home from work... funny timing you release a video about it today.
saying drbd is open source is *technically* true, but if you have ever had the displeasure of trying to actually build their code, you will find out just how far you can stretch the term "open source"
Yeah, it can get kind of annoying for some. "We provide the source, but you figure out how to compile it. Good luck"
Hey open source doesn't mean easy source
Linux in a nutshell.
@@FlyboyHelosim "Where is the documentation!?"
Right here
"Where is the documentation that helps me make any sense of the documentation!?
That was me trying to figure out how to make some kernel level changes to make it so the touchscreen on my laptop didn't permanently turn off whenever the laptop went to sleep. I had found some forum post that said " make these changes" but didn't say where I needed to make those changes. I tried looking through the documentation, and I felt like I needed a masters degree in Linux to understand any of it. I gave up and reinstalled Windows so I could use the working driver.
@@dekkonot sure, but sometimes getting something to compile is like trying to trial and error summoning a demon with black magic. It's esoteric, the error messages make no sense, and no one is willing to write the programming version of the freaking Ars Goetia so I can do it!
I didn't even realize that this was pre-bleach linus until he turned around and had the bleached spot
Take that Intel money while you can!
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
Umm take what money? A sponsored video works my Intel paying money
@@sevenofzachyes? That's what OP said, take the intel money?
@@Idiana_Kami will you stop? Go search it up. You're unlikely to get an answer here anyway.
😅 sorry I totally misread OP, I'll go touch grass now @@kentacy69
This is what I used to do couple a years ago when working for hp enterprise as a consultant. So much fun.
I have been running Proxmox for a few years. I run primarily older servers. I have a few Dell servers that are pre 2014. NAS, Plex, DNS, Email, AD, and a few home automation programs that I wrote. I also have a PBS deployed that handles the backups and distributes them to my offsite backup server. I do need to convert a huge container to a VM soon. But overall I can say that I LOVE how flexible and simple Proxmox was to set up and build on. Really simplified all of my tinkering.
7:46 love that older coworker's sudden glare, and framed perfectly 😄
I think that was one of the administrators.
8:40 "If you have a server that doesn't have IPMI I don't even know if that's a server, really"
And that's why I've gone to the effort of adding it to my servers. Before I got my hands on it, it was a used Dell optiplex. Then I stuffed pikvm into it and it became a server. It really is the best to be able to adjust bios settings and install different operating systems without having to get on a plane and go to the machine.
wait untill you discover how insecure ipmi is ... :)
"sponsored by Intel" the absolute irony of this
except its a xeon not 13/14th gen cpu
@@Frank-li8uj They are Potentially also affected by the same issues, though it's not confirmed yet from what I've heard (or not).
It depends on if the chips where made at the same Fab and so on...
As some one that has designed and built plenty Hyper-V cluster and dabbled with VmWare and vSphere...
This is extremely cool, just because they are allowing you to build an Enterprise level hosting environment for free
I love this! i am really enjoying toying with VMs and server infrastructure on my spare time. While I won't be able to deploy something like this atm from home. It was still very entertaining!
Linus, I already have! The pc I built from scratch 3 years ago has never once crashed on me!
Same my friend, whats your build. Mines 4 years old, got dat i10300 cpu, dat rtx 2060, dat ram, dat non-reputabl psu
same here my i7-3770 is running a little hot at the moment but its been working great
Mine has but when you tinker with things such as overclocking and undervolting you're bound to have some crashes every now and then :P.
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
The only things that ever BSOD for me are bad Nvidia drivers and once a bad AMD chipset driver
0:10 i hope he eats you're lunch Linus
@@MichaelGetty I hope he eats you are lunch Linus
he obviously meant to say "i hope he eats; you're lunch, Linus"
Why is he lunch?
"Linus, I am going to eat you're... *Lunch* "
Lunch tech tips
What's stable today can become unstable tomorrow...
Thanks Intel 😂
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
I've been running my Proxmox Cluster "Galaxy" on 3x N100 Mini PCs for over a year now.
Proxmox HA saved my bacon soooo many times.
Great video guys! Great to see Proxmox getting some love.
Was really hoping this video would magically tell me how to fix my intermitten bluescreening, now THAT is something id buy floatplane for.
Have you checked the event viewer? Sometimes it just straight up tells you what exactly faulted and caused a BSOD.
Lol I know right?
Probably RAM or Power supply. Check your reliability history to try diagnose what is causing it, it will tell you weather it was a hardware issue and give you some info that might help diagnose the source.
If on Windows:
In the search bar: Control panel
In control panel: go to security and maintenance -> reliability monitor
On top of the other recommendations, wipe your GPU driver and possibly chipset too. Both of those have caused BSODs for me in the past
Linus acting like Michael from the office in this one 🤣
18min video uploaded 4min ago damn you watching on 10x speed this shit
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
0:42 you have a boss?
@@frillexsick Yeah fr, i though he was the CEO, because he is literally in the name of the company/RUclips Chanel(s) but i guess not
Linus hired a new ceo to replace himself. He still owns the company/is a shareholder.
@@MidshipRunabout2 HE STEPPED DOWN?
@@brunekxxx91 Yes he did, this happened like a year ago.
@@MidshipRunabout2 Oh uhm well, i don't watch the channel often
Fault tolerance in Cloud Computing is basically the same thing. We just use multiple servers which we don't own and have no physical access to. We rent them in a way which reduces the costs a lot. Seeing this being done on physical hardware was a great experience and a good demonstration of how it is done. Knowing the theory behind it and seeing it happen was fun. It is kind of uninteresting when you use an interface to do it and have no contact with the actual hardware. Gonna do this one day when I create my own server.
live migration is super cool, a while back I tried it on a small 2 host proxmox cluster and I was just AMAZED to see it migrate a VM from one host to another with only a few 100 ms of actual downtime
1:47 for a moment I thought Linus was going bald but it’s just his dye
0:45 More Servers!
$100K worth of equipment to be able to fail over a DNS server.
Or...
...you could have two $100 DNS servers.
Just wanted to say I love the way Jake describes stuff. I have almost no idea about any of this stuff, but listening to Jake talk about it, I just kind of 'get the idea' enough to follow along. Sometimes hearing people talk about super technical stuff outside of my sphere, I get extremely lost very quickly.
Not saying I understood everything, but I understood *enough* to follow along with the video. Y'know?
Love this kind of videos as a network/infrastructure technician ❤
The "more than one dell not more than one dad" joke was so bad that it was good... I spit up my coffee and needed a minuted to recover 🤣
Gotta appreciate they put some effort in it as well with the ball and all.
Video about blue screens
Sponsored by Intel 😂😂😂
Dead inside
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
time to install cloudstrike too
I'm not going to lie, I winced when he shoved that pc off the table at 2:13
It probably fell onto a cushion, but yea same
I remember labbing up proxmox and ceph a few years ago. It sure has come a long way in a short time.
Love the continuing server content with Jake.
1:53 *Mutahar enters the chat*
"and you can too"
I can fail?
I mean yes, and we have a whole profession around trying to turn you off and on again when that happens.
Truly inspirational
Ah yes Intel, the company renowned for its reliability.
Sapphire Rapids is pretty reliable since it’s based on the old Alder Lake micro architecture, its performance is inferior to EPYC but it’s easier to get your hands on entry level XEON.
It was believed so for years!
Just more proof that nothing is forever
jakes work and knowledge of corpo nets is quite impressive at this point.. they grow up so fast.
Really cool video. Damn, those are some really cool features. I love this kind of stuff. Even using Docker that thing works. I have no use for any of this, but it does give a perspective on how far we have come in regards to IT infra. This stuff is usually a nightmare.
Had nothing but slow performance and trouble with those Patriot drives. Also considering the recent issues with Intel CPUS... that title is funny
They're perfectly fine as a hypervisor boot/os drive though.
3:35 Funny how that 32 core CPU uses less power than Intel’s current lineup…
Oh.. the ironic one.
Seeing how linus is not yet blonde in the video, we know they unironicly made this.
No, this is him now, he just dyed his hair brown on top of the blond, you can see where he missed the blond on the back of his head. LOL
@@nicesmile3125 nah, they did a test spot on the back of his head before bleaching the rest. So this was maybe the day before he bleached the rest of it
I think it's very cool that I learned all of this in some college classes and literally got a degree for putting this kind of technology to use. I appreciate the refresher course Linus!
Good on Jake, he's looking good. Proud of your journey dude.
Filmed pre-bleaching, but post doing the patch of test bleaching.
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
As a rule of thumb, any component that is a few years old is more-or-less absolutely reliable.
I've still got a system running on a Ryzen 1700; though that was solid when it released, let alone 5 or whatever-years on from now.
My unraid server is runing on a first gen i5 760 14 years old still going strong ha
It's not the age of the component, it's the run time. Some people will only notice the degradation in their years old Intel 14th gen CPUs in a few years if they only use it lightly and don't update their bios
Wait 0:40 you have a boss.
he said before he was stepping down from ceo to have more time
I build this setup with 10 year old servers, 1gbe between them and Proxmox and linstor a few years ago. This served the production traffic for a not so small professional webhosting scenario. It needed some finetuning but linstor works quite well with not so fast bandwith.
15:30 You guys have to try out Kubernetes (RKE2 for example for an easy FOSS distro to use)! Seeing containers being the slow option for high availability is hurting my soul guys.
Set it up cloud native CSI like Longhorn or Rook-ceph and watch even stateful workloads switch crazy seamlessly. You can even do load balanced work loads if you want to be even more crazy about it (using ingress controllers which is just a proxy, service meshes which is super advanced proxies, or loadbalancer types which normally operate at the network level instead (meaning less features but less overhead too)).
Again please, this is such a cool topic I loved the video and want to see more!
LMG does not run Active Directory Domain Services with a company with 100+ employees!? How do they deem this acceptable?
New meme 17:34
18:48 - me when i look at my vape after not using it for 10 seconds.
That's cool to see you playing with that. But one thing that you failed to mention is that most services we use are running a similar process. But it's a great video
Looking forward to trying out my scribedriver pen when it gets here 😃 patiently waiting for the release of the precision screwdriver!
Won't be Surprised if one of the employees at LMG has a bottle of Whiskey🥃 in his drawer
Colton 😂
can i use remote desktop for daily use in uni `via laptop in class , pc in hostel room?
"all at a modest 300W TDP"? Are you freaking kidding me? C'mon guys, yes it's more efficient then a bloody 14900KS, but this thing is designed to run 24/7, you can get much more efficient 32 Core Processors. Just not from Intel lol.
you say that, and then I look at my EPYC 7402P pulling 200W at idle 😭
@@insu_na This lmao. AMDs server CPUs were dog thrash early on. They're just now getting to the point where you could consider them for your next deployment. AMD is legendarily awful in the server space.
@@Pipsispite ehh, I think Milan and Genoa are already pretty good.
@@Pipsispite They've been undefeated in performance for many years now. Sure idle wattage may not be great but datacenters don't leave their hardware idle most of the time
@@Pipsispite So... if the EPYCs are so awful, then explain to me why AMD is gaining market share in the server space for years now?
0:02 Simgot EW200!?
yeah it seems , wonder what dac he use
That’s a crazy notice! Great value too!
Really cool that this has become open source now.
I remember setting up lab with a VMware HA cluster with at home with two ESXi nodes and an Openfiler NAS with iSCSI for shared storage in 2009. Not the most reliable of solutions but it was free (software) and it worked. 😅
Linus is strong doing these videos considering the rough period he had recently...but love having him around. One of the cool guys on youtube.