The reason "downsized" packet performance is important is that TCP-ACKs and other small packets exist organically. So packets-per-second is actually a relevant and important metric for router performance. With sufficiently large payloads, throughput is just a direct memory access benchmark because you're just copying stuff around and not doing that much "thinking."
DPDK has a way to handle this, but you're right, it's optimized for heavier throughput, with smaller packets seeing higher latency then you would normally see. (Set your icmp packet sizes larger and you'll see the latency problem disappear) Think about DPDK like more of a "pipe" connecting the two end devices together, it doesn't matter how much volume of fluid you put in there and it'll get there in a timely fashion, especially if there is heavy throughout on the system. DPDK works great for things like SCTP. 🤘
Properly efficient routing (and switching) is done WITHOUT copying. The hardware receives the frame into memory, and that's where it stays. Any forwarding is done by reference to that memory buffer. Copying is what kept linux and BSD networking so slow for so long.
@@jfbeam that’s kind of true but irrelevant because the Direct Memory Access thing I’m talking about is how the NIC gets the data from main memory via PCIe. en.m.wikipedia.org/wiki/Direct_memory_access
High-end networking geek here. Small packet performance is critical for core internet equipment. If you can’t send 14.4m 64 byte frames plus receive 14.4m 64 byte frames at the same time: your not doing 10 GbE. Bulk throughout is trivial and doesn’t require any fancy hardware: a mundane PC can easily do tens of Gbps with large packets.
That’s what i have been saying all along. 10 Gb/s the easy way is 812743 Mpps with 1500 byte MTU packages. 10 Gb/s the hard way is 14,88 Mpps with 64 bytes MTU packets. 10 Gb/s the realistic way is a realistic IMIX traffic profile.
Yep, with them open sourcing the necessary parts, proprietary solution starts to feel better than the vpp, due to compatibility with commonly used tools.
It's not so much "open source" as them not caring what you do with the SDK after your $40k check clears. (when almost everything is in a microcode blackbox, there's no secrets in the SDK.)
NXP really seems like a good partner. They were willing to open source, AND helped them to find an alternative(and risk loosing income from licensing fees).
@@jfbeam lol yeah sure, feel free to release as "opensource" or even just allow redistribution of the stuff from most vendor SDK and see how fast you get sued to oblivion for breaching NDA and the SDK's own license. If NXP has moved all the proprietary bs to a firmware/microcode and allow people to redistribute it, its a good thing.
It's not that the packet size needs to be downsized but the PPS that VPP can do. There are a lot of things that use small packets like DNS, etc and there is specifically a test called IMIX. It's not perfect but the idea is to test throughput using various packet sizes that mimic more of a real world solution. A lot of commercial routers can put up huge numbers with 1500 (and more with 9000) byte packets but even when you MTU is set that high you will find the average package size is much lower. It would be good to know the performance of the router with 64-byte packets (the lowest) as well as IMIX (or something else that is not the ideal max packet size). Again it doesn't matter what your MTU is set to, it's the average sized packet. Thinks like DNS or ACKs are going to be a lot of smaller packets.
Thank you for fighting through this. I'm really glad to hear you have fully open-source options on the table. CPU microcode being closed is not new, and I consider that just something that you have to put up with in this world (so far).
Not only am I excited about the results (which are solid. nicely done.), I'm so grateful you listened to the comments here and the other people following this project. Thank You for listening. I can't wait for availability of these routers. This is a fun project to follow and at the end of this rainbow is a useful and maintainable tool.
If you want the VPP interfaces to be visible in the kernel, you can use lcp plugin. Also using 2 workers for VPP could give you better performance, but there is the price of higher consumption...
Well I'm not getting anything done for the next 21 minutes. EDIT: I'm Glad NXP is allowing their binaries to be shared under the project's licensing. That's huge! Looking forward to seeing future parts of this project. :D
Even though I am from Brazil and consequently will not be able to buy it when it is finished because of our import tax, I am following this project closely. I am amazed to see it being built in public. Congratulations to everyone involved.
Choosing DPDK & VPP and open sourcing everything is what makes this project to me very interesting! I´m glad you don´t go with a HW specific solution. This will make future HW updates much easier. I also hope you will give a push to DPDK adoption for other open source projects, helping to improve home networking equipment performance. As a side effect this might help to save some energy and resources. Bravo!
😊 I'm glad everyone said it.. PPS vs Throughput.. As a former WISP owner that used Mikrotik at first 😂 I had my days of sadness with Tilera vs MIPS vs ARM!
Nice, I am used to only seeing DPDK and it's other friends in the data center as part of NSX-T, I never brought it up as I only ever saw it for specific NICs and not on any embedded stuff outside of something like a DPU. I think this would be a super cool thing to get into more common use. I think if you make the router OS able to be virtualized with the same feature set when paired with a supported NIC (it was mostly Intel and Broadcom last I checked) that could really open up home lab stuff for some cool things like "pocket universes" or just an easier way to play with OSPF/BGP with enough oomph behind it to make it fun for the lab. This could open a lot of SDN fun up.
Router performance is best measured in packets per second; not throughput. It is relatively easy to push 10Gbit with jumboframes (or even standard 1500 byte frames), however a router must be able to support other traffic types too. Gaming, Voice/Video calls, anything 'realtime' really - all rely on small packets for the majority of bytes transferred (To keep latency and delays down). For example an average voice call does 50 packets per second, each at about 200 bytes. Would you be able to use this routing method, at the higher speeds, when traffic with those characteristics is being routed? That may not be of interest when using the router as your consumer home router - but it would when using in a more enterprise environment.
I see two possible issues with this approach. First is the power usage and heat generation, since one of the cores is constantly at 100% even when your network does nothing at all. Second is possible latency increase, since when you poll instead of using interrupts then you don't respond to events immediately when they happen but on the next poll, so time between polls is your extra latency. This one may not be an issue, since 100% cpu suggests busywaiting without any sleep, but I would still like to see a test confirming if latency is not increased.
DPDK has no impacts on latency under real world circumstances. Many datacenters (ISPs) running DPDK applicances for DDoS mitigation. At my previous employeer we had a DPDK appliance to protect game server traffic from DDoS and maybe you know that but game server traffic is one of the most challenging things when it comes to latency. And guess, it worked like a charm, it used iBGP to route entire networks thru the appliance after being processed from the router before hitting the core switching stack.
@@BirknerAlexI'm sure ya right that latency isn't an issue. The cores only job is busy-waiting, when not already processing data currently anyway. It even saves on ISR entry/exit delays, so it should be rather faster than slower in my view. I'd still share the concern about the wasted energy tho. Sure those cores are more efficient than some beefy x86 Server CPU, but they're not "free", I mean that's why ARM also has power saving modes and dynamic frequency scaling for example. I also don't whine about one core on a 64 core server with a daily load overage of 40 being dedicated to burn 1% more energy in my rack at work, and that core will have plenty of real work to do as well. But let's be honest, any home router is going to spent 99% of it's time idling, or exchanging a few dozen packets a second processing background noise. I got a 6 core Xeon E-2146G in my server sipping like sub 10-12W System total with the 10G NIC (excluding SSDs and some other extensions/peripherals, don't have exact numbers in my head anymore rn), so this router core HW (without Wi-Fi and other stuff ofc) hopefully ain't gonna be sipping such numbers for way less performance, just because it's busy all day checking if maybe finally someone has a packet for DPDK. I'm very interested in this project, because hosting/running your Router/Firewall/Networking on your main server has plenty downsides, and I still run a separate AP for Wi-Fi ofc, but not if I'd be investing in yet another 24/7 energy hog 😅
Polling *can* have way less latency than interrupts. Interrupts are good for occasional events, but when you have a lot of them you have to do context switching for every interrupt separately, and finish performing the previous thing, before you can take on the next one. When polling, you can bunch multiple events together into an array/vector, and process them in one go. So the result is that you have worse latency in optimal conditions and low load, but better performance under heavy conditions. In Computer graphics the same kind of optimisation is used. Graphics drivers when receiving data/drawcalls used to wait for more data, to send a bigger chunk of data, and the graphics driver would need some algorithm to estimate how much to wait, and how often to send data to GPU; but with the introduction of Vulkan this task (along with other tasks) moved to the application/game. And a lot of other optimisations and tweaks that used to be done in the driver (Kernelmode) during the directx11/openGL days, are now done inside of a game (Usermode) thanks to Vulkan. Do you see the resemblance to where DPDK+VPP is going 😉?
@@sasjadevries Yep, I mentioned in one of the other comments that I worked with microcontrollers that had a similar batching interrupt system, where an SPI controller would send an interrupt when a message arrives but will not send any more interrupts for next messages until you empty the buffer, so you can batch process multiple messages on single interrupt without creating hundreds of context switches. That would be good to have on linux, but I gess it might not be so easy without hardware support.
@@hubertnnn VPP stands for Vector Packet Processing. So sending a vector of packages is the whole idea of that software package😉, they even put it in the name. I'm not a network expert, but I just got curious a few months ago, when someone in the comments mentioned VyOS, with the DPDK+FDIOVPP stack. That's when I looked into it... Basically the whole vectorisation is done in VPP, and their main selling points are that VPP runs 100% in userspace, that it's hardware agnostic, and deployment agnostic. I kinda like this approach for high throughput. So the low level interface (DPDK) is simpler and more predictable, by polling instead of handling interrupts. And the application level software (VPP) can pick the vector size that fits the polling rate.
Great video, thanks for covering DPDK and VPP in depth. I've spent many hours reading and always been frustrated with how developer-centric the documentation is. Really looking forward to getting hands on with your product! Really hope VyOS will pull finger and get VPP working too in mainline.
Consider automating your build using a continuous integration system of some sort (Github Actions, Azure, Jenkins, whatever). It's a bit of work now, but you'll thank yourself in the future when you have to merge patches and recompile everything.
Interesting proposal of the isolated single core for network. If it's always at 100%, how can you measure the % of utilization for real trafik in this core? And what is the impact in consumption of that solution?
First, I‘m very impressed by the performance! One question though, as one Core is now constantly pegged, in what way does this impact power consumption? Is there a notable difference between this solution and the old proprietary sdk when at idle or when routing?
It's pegged, but probably not actually using much power. I guess it's just busy-waiting on packets, which should be just some conditional branches and compares, nothing too crazy. Nevertheless, it's taking CPU time that could have been used elsewhere.
I'm fine with the compromise that the microcode be proprietary. I do think the sources should be available, or at least they should have them security-audited and the results published.
Pardon my ignorance, would it make any sense to replace the current Linux stack for dpdk + vpp for perf? Or would create a big hole in security in normal cliental pc without any real improvement because muticore bulk x86 CPUs?
Oh, you're GOOD. You blind them with SCIENCE! Any investors looking through any of these videos are GUARANTEED to glaze over within the first 30 seconds, close the clip and just go "oh, he clearly knows what he's talking about..."
You're saying that 100%ing that core only add like half a watt to the power draw. But doesn't that still prevent it from scaling down the clocks and even shifting the whole SoC into its low-power states? I mean, you can't enter s2idle and run the userspace at the same time, right?
According to the docs, for best performance, the CPU governor should be set, well, to performance, so I think that yes, it will prevent it from scaling down the clocks. That being said, according to VPP documentation, it's also possible to change between polling and interrupt mode depending on the load (of traffic). I'm afraid this part is quite new to me, so I'm trying to figure it out myself as well.
@tomazzaman Hmm, maybe for home applications the governor could only ramp up when the load gets high enough? Likewise the polling mode could also be enabled automatically when needed.
Hallo Tomaž. Miroslav 'ere. Would this move of device control lend itself to being able to edit MAC address easily? I am hypothetically talking of changing MAC to that of the one on ONT, so ISP has no clue that particular client is removing 20W power drain, issues with patency and potential copper burnout and all PC in the place being ruined by lightening strike (happened to LUG member here in London) associated with ONT. Use MAC of the ISP provided ONT on SFP? Can this be done from The TUX side of things? I am not GURU. Blagidaram!
I really like performance graphing with grafana/prometheus or whatever else. I assume vpp already has the capabilities for an external software to pull that info?
my previous answer was hidden, cause i linked to the vpp manual. Short answer: yes, yes there is. Long answer: they have a python package that can access a lot of statistics, it's really easy to use!
With DPDK running on 4 interfaces and leaving one interface at the kernel level, would it be possible to create an out-of-band management interface that would have a separate routing table and keep it available and reachable if any crash occurs in DPDK?
mm i like what a few ppl have mentioned in the comments here about testing with 10gb/s each direction, with the data being all tiny packets like you might see on a super busy network. 10gb/s in a solid data stream isn't the same as 10gb/s of all varieties of packet hammering the device?
This sounds amazing! I do wonder how it would compare to VPP with XDP (that is if the XDP plugin is in a working state). Either way this sounds like a great project and i am looking forward to upgrading my turris omnia. I have been waiting for a 10G upgrade path.
Bluesky only... Bruh. Haters gonna hate on every platform. You do you on any & all platforms that reach your audience the best. Don't let a few haters ruin it for other ppl. Love this project!
I'm not a networking god like some of the commenters here. I'm just a cybersecurity risk analyst. But I fully agree with the consensus that true performance tests are done at edge and corner case scenarios, because that's where mistakes can happen. That's the point at which just a little too much traffic can knock something over because a single malicious packet was able to get through somehow. I'm not saying you're prone to the mistakes that other companies have made - I'm just pointing out that with everything in the world being networked now, things need to be designed with cybersecurity and resiliency first and foremost. The enemy is always getting better. Just testing performance for performance's sake is no longer a valid benchmark. It needs to be either standardized in some way (not possible with most of your testing) or it needs to show both extremes to be scientifically rigorous.
I wonder why DPDK chose to constantly poll the interface vs asking the kernel to notify it when a packet arrives. And then continue in userspace. Aka use the kernel only for the raw packet notification.
@@triffid0hunter Doesn't polling need context switching too? I mean each time you ask the kernel "is there a packet yet?" you enter momentarily kernel space for the kernel response. I could be totally wrong though.
Yeah, this was a bigger issue (10Gbit) in the pre-Nehalem days (talking DC side where you have more powerful processors, there was a day when there were struggles). Just like entropy exhaustion was also a huge issue. Advancements have made those things less of an issue.
1ghz is not that far from the today's standard 3ghz. The absolute recrod maximum today is 5ghz. I would understand a 100x difference, but everyone seems to forget how fast current tech is. And this is forgotten because of bad software.
@tomazzaman I would rather take the proprietary option as this is something I don't think I can get as a regular consumer and it seems to provide better integration with the hardware which could lead to better performance. Otherwise I could simply get a Raspberry Pi or something similar such as the Banana Pi R4 and build my own router from scratch.
UEFI and ACPI which are used in regular PC hardware instead of device tree are the worse option technically. A superior way would be to have a device tree provided by the BIOS and there would be no need for UEFI or ACPI after that.
This has been debated over times by hardware vendors and OS providers. Their conclusion was that DTs don't offer the required to support consumer computing platforms. On servers is more about using the already existing frameworks which are tailored to UEFI and ACPI. DTs are better suited to embedded platforms with lots of hardwired hardware and almost no configurability as everything is mostly and already confined in the SoC. As the case of your phone or your router.
@@hyoenmadan Device trees could provide everything that UEFI and ACPI can do but it would require support from OS vendors, namely Microsoft. And since Microsoft has always been betting on backwards compatibility and vendor lock-in, it makes zero financial sense for Microsoft to support switching from UEFI to device trees. As a result, device tree has no future in PC hardware as long as Microsoft Windows has such a big market share.
Adds about half a watt to total power consumption. Negligible, but once we have the cases manufactured, we'll of course run the proper tests to make sure it really doesn't impact anything.
@tomazzaman does the software only work with polling? I haven't done much with networking on embedded linux or device drivers myself but it seems not ideal to constantly poll for data (but i have 0 experience with this so who am i to judge)
What kind of performance difference is there between VPP and native linux packet forwarding? What happens performance wise if you switch the network driver to polling mode in linux?
My guess would be that at zero to very little traffic, Kernel networking with interrupts would be slightly more efficient/faster, but at any level of traffic above that VPP would get the advantage. That because, even with no traffic at all, VPP would still be polling NICs to know if there's something new while the kernel would be doing other things waiting for NICs to tell it something happened. With more traffic instead things wouldn't change much for VPP, but a lot for the kernel which would be receiving lots of interrupts and having to constantly stop what it was doing to listen to what NICs have to say.
@@hubertnnn That could be an option (or you could install your own OS). But I think it's likely a router will always in a situation where VPP has the advantage
I ran an EPC (vEPC) LTE core using DPDK. It was "black box" aside from the usual sysadmin/sysop stuff. It's cool how fast you can push your hardware. Vendor had some weird DPE (data plane) bug but that's aside from the point.
I have no clue what's happening in this video anymore. So many acronyms. I clicked on it not knowing what I was getting into, and then it got into topics I have no clue about. I've never heard of your channel before, so this is all random for me, but I eventually got very lost trying to figure out what the video is even talking about.
Would some variable polling frequency be able to lower the worry some have about power consumption? With few traffic, the frequency could be lower with up to 100ms sleep, and if packets arrive, the sleep is eliminated ?
According to VPP docs, it can switch between polling and interrupt modes, and it the latter, it should be able to lower the frequency. Can't confirm for sure though, at least not yet.
I didn't know DPDK existed. Looks very promising. However, it feels like it is non-standard. I don't know if administrators really want to deal with DPDK/VPP if Linux already provides really good infrastructure.But hey, as long as it works and doesn't interfere with what I do - that's fine. It would be interesting to see the latency on this. The bandwidth may be high but latency is also something to keep in mind.
DPDK is used in server environments where the host OS doesn't need to interfere with the data, when you can tunnel the traffic to separate containers or guest VM's. Cellular infrastructure/mobility cores for LTE infrastructure do this to optimize hardware requirements (and space available). It's really cool tech, weird to think about it's complexities tho. Not sure how Linus or any of the kernel maintainers agreed to implement it 😂
Never heard of it either, but the interface looks like CISCO's router command line interface, and since it was made by cisco, I wouldn't be surprised if it actually is their router CLI. And if it is, then administrators already know it. It would be a bit worse for non administrators, because cisco's cli is a pain to learn with many non obvious things.
@@hubertnnn but administrators don't necessarily know how to work with Cisco. I also don't quite trust Cisco as these are big corporations and you don't really know what you're getting into and it could turn out to be a huge enterprise mess.
Hey I don’t remember if you mentioned it in one of your other videos but out of curiosity, will the router be able to support multiple wan’s? I feel like there’s not too much in that market and it would be a great idea to implement, especially at 10g speeds.
@ good point. I was looking at one of those ubiquiti dream machines for my network when I asked that question… I figured that I’d rather use your router instead when that time comes. Do you feel like it’ll be something that you’ll already have coded in?
Very nicely done. From the outside it seems a long way to get this type of through put. It looks like you added a few more gray hairs setting this all up. But in the end it works. Well done! I realize you might have a limited lab environment but it would be interesting to setup all 10GbE ports with a iperf system. I think there was 4 in your build (sorry poor memory). See if you can push 10GbE per port through the router. You might have a 2x2 setup for iperf testing. The idea is to see at what point you saturate that single core, and then need to dedicate a second or third core to networking while keeping the remaining core(s) for the kernel and system management (snmp, dhcp and such). I think its a good plan to keep one of the interfaces (1GbE) connected to the kernel as an out of band management interface. This will keep port forwarding of data interfaces offline until the system is fully booted and the system status/setup is confirmed.
can you make one more enterprise with more ram etc - cheap 100g router/switch - i think quite a few people will start going from 10g to 100g fiber - it is the first place you invest
VPP's PPPoE plugin just for server, not client tho. by the time I writing this comment, the wireguard plugin still dont support dynamic endpoint, means, all wg endpoint need to be static address, so not good for home gateway or home router
@ using DPDK with VPP is a bold move. Most of open source small solutions are still in Kernel Mode. I hope Your device will be available in short time.
10:11 I was really excited to purchase your router when it was ready but now that you decided to publicly tie a political opinion to it I think I'll go elsewhere.
I agree with all the comments about imix, but I don't think it's relevant for a SOHO CPE. Maybe you should get a cisco trex box and generate some real numbers.
Oh wow! The magic of DPDK+VPP! +1 vote to go in that direction from me. And I do not do Twitter, or X, or BlueSky... I hope you still check your old fashion email address for when I need to get in touch with you outside of a RUclips reply comment. 😎 I will next send this to my "Right Hand Man in America" who built our custom Linux firewall platform 20 years ago. I really like your enthusiasm dedicated to this much needed space in the IT industry. Thank you SO much! Over time, I hope you will have enough volume growth that you would consider a next level up model with more network ports. 🥳🧐
Maybe you could initially offer the device without these drivers in the kernel, but include an extra "I wan't that released and open-sourced, here's $20 for that" option for us to pick during purchase.
I mistakenly deleted a really good comment thread, when in reality I was trying to pin one by @owenhilyard3157. Apologies.
Ask google to undelete.
And make that video about the tunnel.
The reason "downsized" packet performance is important is that TCP-ACKs and other small packets exist organically. So packets-per-second is actually a relevant and important metric for router performance. With sufficiently large payloads, throughput is just a direct memory access benchmark because you're just copying stuff around and not doing that much "thinking."
DPDK has a way to handle this, but you're right, it's optimized for heavier throughput, with smaller packets seeing higher latency then you would normally see. (Set your icmp packet sizes larger and you'll see the latency problem disappear)
Think about DPDK like more of a "pipe" connecting the two end devices together, it doesn't matter how much volume of fluid you put in there and it'll get there in a timely fashion, especially if there is heavy throughout on the system.
DPDK works great for things like SCTP. 🤘
Properly efficient routing (and switching) is done WITHOUT copying. The hardware receives the frame into memory, and that's where it stays. Any forwarding is done by reference to that memory buffer. Copying is what kept linux and BSD networking so slow for so long.
@@jfbeam that’s kind of true but irrelevant because the Direct Memory Access thing I’m talking about is how the NIC gets the data from main memory via PCIe.
en.m.wikipedia.org/wiki/Direct_memory_access
There's a reason real enterprise routers are rated in pps not bps.
Thank you for pointing in out!
High-end networking geek here. Small packet performance is critical for core internet equipment. If you can’t send 14.4m 64 byte frames plus receive 14.4m 64 byte frames at the same time: your not doing 10 GbE.
Bulk throughout is trivial and doesn’t require any fancy hardware: a mundane PC can easily do tens of Gbps with large packets.
there is linux vpp stuff, there is a guy doing a ring with x86 of the shelve hardware.
@@lyth1umDo you know which guy is doing that stuff? I want to check it out
That’s what i have been saying all along. 10 Gb/s the easy way is 812743 Mpps with 1500 byte MTU packages. 10 Gb/s the hard way is 14,88 Mpps with 64 bytes MTU packets. 10 Gb/s the realistic way is a realistic IMIX traffic profile.
@@Galileocrafteror jumbo frames.
I'm very happy to hear that NXP was willing to open source what would be needed, both options seem very promising to me.
Yep, with them open sourcing the necessary parts, proprietary solution starts to feel better than the vpp, due to compatibility with commonly used tools.
It's not so much "open source" as them not caring what you do with the SDK after your $40k check clears. (when almost everything is in a microcode blackbox, there's no secrets in the SDK.)
@@jfbeam you're right, to a degree, but microcode isn't something that's normally revealed by, well, anyone really.
NXP really seems like a good partner. They were willing to open source, AND helped them to find an alternative(and risk loosing income from licensing fees).
@@jfbeam lol yeah sure, feel free to release as "opensource" or even just allow redistribution of the stuff from most vendor SDK and see how fast you get sued to oblivion for breaching NDA and the SDK's own license. If NXP has moved all the proprietary bs to a firmware/microcode and allow people to redistribute it, its a good thing.
It's not that the packet size needs to be downsized but the PPS that VPP can do. There are a lot of things that use small packets like DNS, etc and there is specifically a test called IMIX. It's not perfect but the idea is to test throughput using various packet sizes that mimic more of a real world solution.
A lot of commercial routers can put up huge numbers with 1500 (and more with 9000) byte packets but even when you MTU is set that high you will find the average package size is much lower. It would be good to know the performance of the router with 64-byte packets (the lowest) as well as IMIX (or something else that is not the ideal max packet size).
Again it doesn't matter what your MTU is set to, it's the average sized packet. Thinks like DNS or ACKs are going to be a lot of smaller packets.
Thank you for fighting through this. I'm really glad to hear you have fully open-source options on the table.
CPU microcode being closed is not new, and I consider that just something that you have to put up with in this world (so far).
Not only am I excited about the results (which are solid. nicely done.), I'm so grateful you listened to the comments here and the other people following this project. Thank You for listening. I can't wait for availability of these routers. This is a fun project to follow and at the end of this rainbow is a useful and maintainable tool.
Try to test flow offloading in kernel too. It might allow to process 10 Gb/s on vanilla kernel without any additional userspace software.
If you want the VPP interfaces to be visible in the kernel, you can use lcp plugin.
Also using 2 workers for VPP could give you better performance, but there is the price of higher consumption...
Well I'm not getting anything done for the next 21 minutes.
EDIT: I'm Glad NXP is allowing their binaries to be shared under the project's licensing. That's huge! Looking forward to seeing future parts of this project. :D
Even though I am from Brazil and consequently will not be able to buy it when it is finished because of our import tax, I am following this project closely. I am amazed to see it being built in public. Congratulations to everyone involved.
Choosing DPDK & VPP and open sourcing everything is what makes this project to me very interesting! I´m glad you don´t go with a HW specific solution. This will make future HW updates much easier. I also hope you will give a push to DPDK adoption for other open source projects, helping to improve home networking equipment performance. As a side effect this might help to save some energy and resources. Bravo!
The encryption-decryption part is really impressive
That's true! Even many enteprise solutions guarantee about 3Gbps when encrypting/decrypting traffic on 10Gbps interface!
@@morsikpl yeah, that's some fast hardware.
😊 I'm glad everyone said it.. PPS vs Throughput.. As a former WISP owner that used Mikrotik at first 😂 I had my days of sadness with Tilera vs MIPS vs ARM!
RIP Tilera
What is the issue with Mikrotik?
@@TheStuartstardust there is no problem. Just some platforms perform better than others
Nice, I am used to only seeing DPDK and it's other friends in the data center as part of NSX-T, I never brought it up as I only ever saw it for specific NICs and not on any embedded stuff outside of something like a DPU. I think this would be a super cool thing to get into more common use. I think if you make the router OS able to be virtualized with the same feature set when paired with a supported NIC (it was mostly Intel and Broadcom last I checked) that could really open up home lab stuff for some cool things like "pocket universes" or just an easier way to play with OSPF/BGP with enough oomph behind it to make it fun for the lab. This could open a lot of SDN fun up.
A hacker that somehow gets access to the terminal will be instantly lost because he can't use ping 😂
Router performance is best measured in packets per second; not throughput. It is relatively easy to push 10Gbit with jumboframes (or even standard 1500 byte frames), however a router must be able to support other traffic types too. Gaming, Voice/Video calls, anything 'realtime' really - all rely on small packets for the majority of bytes transferred (To keep latency and delays down). For example an average voice call does 50 packets per second, each at about 200 bytes. Would you be able to use this routing method, at the higher speeds, when traffic with those characteristics is being routed? That may not be of interest when using the router as your consumer home router - but it would when using in a more enterprise environment.
Thanks!
Thank you! 🙏
I see two possible issues with this approach.
First is the power usage and heat generation, since one of the cores is constantly at 100% even when your network does nothing at all.
Second is possible latency increase, since when you poll instead of using interrupts then you don't respond to events immediately when they happen but on the next poll, so time between polls is your extra latency. This one may not be an issue, since 100% cpu suggests busywaiting without any sleep, but I would still like to see a test confirming if latency is not increased.
DPDK has no impacts on latency under real world circumstances. Many datacenters (ISPs) running DPDK applicances for DDoS mitigation. At my previous employeer we had a DPDK appliance to protect game server traffic from DDoS and maybe you know that but game server traffic is one of the most challenging things when it comes to latency. And guess, it worked like a charm, it used iBGP to route entire networks thru the appliance after being processed from the router before hitting the core switching stack.
@@BirknerAlexI'm sure ya right that latency isn't an issue. The cores only job is busy-waiting, when not already processing data currently anyway. It even saves on ISR entry/exit delays, so it should be rather faster than slower in my view.
I'd still share the concern about the wasted energy tho. Sure those cores are more efficient than some beefy x86 Server CPU, but they're not "free", I mean that's why ARM also has power saving modes and dynamic frequency scaling for example. I also don't whine about one core on a 64 core server with a daily load overage of 40 being dedicated to burn 1% more energy in my rack at work, and that core will have plenty of real work to do as well. But let's be honest, any home router is going to spent 99% of it's time idling, or exchanging a few dozen packets a second processing background noise. I got a 6 core Xeon E-2146G in my server sipping like sub 10-12W System total with the 10G NIC (excluding SSDs and some other extensions/peripherals, don't have exact numbers in my head anymore rn), so this router core HW (without Wi-Fi and other stuff ofc) hopefully ain't gonna be sipping such numbers for way less performance, just because it's busy all day checking if maybe finally someone has a packet for DPDK. I'm very interested in this project, because hosting/running your Router/Firewall/Networking on your main server has plenty downsides, and I still run a separate AP for Wi-Fi ofc, but not if I'd be investing in yet another 24/7 energy hog 😅
Polling *can* have way less latency than interrupts. Interrupts are good for occasional events, but when you have a lot of them you have to do context switching for every interrupt separately, and finish performing the previous thing, before you can take on the next one. When polling, you can bunch multiple events together into an array/vector, and process them in one go. So the result is that you have worse latency in optimal conditions and low load, but better performance under heavy conditions.
In Computer graphics the same kind of optimisation is used. Graphics drivers when receiving data/drawcalls used to wait for more data, to send a bigger chunk of data, and the graphics driver would need some algorithm to estimate how much to wait, and how often to send data to GPU; but with the introduction of Vulkan this task (along with other tasks) moved to the application/game. And a lot of other optimisations and tweaks that used to be done in the driver (Kernelmode) during the directx11/openGL days, are now done inside of a game (Usermode) thanks to Vulkan.
Do you see the resemblance to where DPDK+VPP is going 😉?
@@sasjadevries Yep, I mentioned in one of the other comments that I worked with microcontrollers that had a similar batching interrupt system, where an SPI controller would send an interrupt when a message arrives but will not send any more interrupts for next messages until you empty the buffer, so you can batch process multiple messages on single interrupt without creating hundreds of context switches. That would be good to have on linux, but I gess it might not be so easy without hardware support.
@@hubertnnn VPP stands for Vector Packet Processing. So sending a vector of packages is the whole idea of that software package😉, they even put it in the name.
I'm not a network expert, but I just got curious a few months ago, when someone in the comments mentioned VyOS, with the DPDK+FDIOVPP stack. That's when I looked into it...
Basically the whole vectorisation is done in VPP, and their main selling points are that VPP runs 100% in userspace, that it's hardware agnostic, and deployment agnostic.
I kinda like this approach for high throughput. So the low level interface (DPDK) is simpler and more predictable, by polling instead of handling interrupts. And the application level software (VPP) can pick the vector size that fits the polling rate.
Pleased to hear that NXP responded positively to the feedback and even could suggest an alternative
Great video, thanks for covering DPDK and VPP in depth. I've spent many hours reading and always been frustrated with how developer-centric the documentation is. Really looking forward to getting hands on with your product! Really hope VyOS will pull finger and get VPP working too in mainline.
Wouldn't it be enough to use Linux kernel and io_uring interface to minimize the overhead you normally get with interrupts?
Hats off to you, this is a great development.
Tomaz talks like an US drill sergeant, I am always stressed out after watching his video's. Sometimes I even start doing push ups 😅
GET DOWN AND GIVE ME 20!
@@tomazzaman 🤣
@@tomazzamanyes sir!!!
It's because English is not his primary language.
@@PoteRomo So his native language is always spoken so violently? 🤔😲😂
Consider automating your build using a continuous integration system of some sort (Github Actions, Azure, Jenkins, whatever). It's a bit of work now, but you'll thank yourself in the future when you have to merge patches and recompile everything.
That sounds like an amazing usecase for something like the hermit kernel :o
I like where it is going...
I just evaluated Netgate's VPP based TNSR, but is was a disaster. Crossing fingers that this project will be better.
RISC-V someday, a few years down the road could make this truly open source.
Interesting proposal of the isolated single core for network. If it's always at 100%, how can you measure the % of utilization for real trafik in this core? And what is the impact in consumption of that solution?
First, I‘m very impressed by the performance!
One question though, as one Core is now constantly pegged, in what way does this impact power consumption? Is there a notable difference between this solution and the old proprietary sdk when at idle or when routing?
It's pegged, but probably not actually using much power. I guess it's just busy-waiting on packets, which should be just some conditional branches and compares, nothing too crazy. Nevertheless, it's taking CPU time that could have been used elsewhere.
You're right, the core is at 100%, but it's basically just a constant loop of polling the interfaces.
I'm fine with the compromise that the microcode be proprietary. I do think the sources should be available, or at least they should have them security-audited and the results published.
I liked your video because you put the answer in the title.
How does having a single core pinned to 100% affect "idle" power consumption over using Linux kernel based networking?
I'd like to know this as well
I'd like to see a performance run over a couple of hours with the standard IMIX stream. It is more relistic for day to day use.
Pardon my ignorance, would it make any sense to replace the current Linux stack for dpdk + vpp for perf? Or would create a big hole in security in normal cliental pc without any real improvement because muticore bulk x86 CPUs?
Good work!! DPDK is the way to go!! This is the future for Linux based routers.
Oh, you're GOOD. You blind them with SCIENCE! Any investors looking through any of these videos are GUARANTEED to glaze over within the first 30 seconds, close the clip and just go "oh, he clearly knows what he's talking about..."
You're saying that 100%ing that core only add like half a watt to the power draw. But doesn't that still prevent it from scaling down the clocks and even shifting the whole SoC into its low-power states? I mean, you can't enter s2idle and run the userspace at the same time, right?
According to the docs, for best performance, the CPU governor should be set, well, to performance, so I think that yes, it will prevent it from scaling down the clocks. That being said, according to VPP documentation, it's also possible to change between polling and interrupt mode depending on the load (of traffic). I'm afraid this part is quite new to me, so I'm trying to figure it out myself as well.
@tomazzaman Hmm, maybe for home applications the governor could only ramp up when the load gets high enough? Likewise the polling mode could also be enabled automatically when needed.
Hallo Tomaž. Miroslav 'ere. Would this move of device control lend itself to being able to edit MAC address easily? I am hypothetically talking of changing MAC to that of the one on ONT, so ISP has no clue that particular client is removing 20W power drain, issues with patency and potential copper burnout and all PC in the place being ruined by lightening strike (happened to LUG member here in London) associated with ONT. Use MAC of the ISP provided ONT on SFP? Can this be done from The TUX side of things? I am not GURU. Blagidaram!
Should I be using jumbo frames for my home 10gbe network? Do i need to do any special separation for like IOT devices or anything
I really like performance graphing with grafana/prometheus or whatever else. I assume vpp already has the capabilities for an external software to pull that info?
my previous answer was hidden, cause i linked to the vpp manual.
Short answer: yes, yes there is.
Long answer: they have a python package that can access a lot of statistics, it's really easy to use!
With DPDK running on 4 interfaces and leaving one interface at the kernel level, would it be possible to create an out-of-band management interface that would have a separate routing table and keep it available and reachable if any crash occurs in DPDK?
Yep, fairly simple, actually. In fact, I'm strongly in favor of this approach!
mm i like what a few ppl have mentioned in the comments here about testing with 10gb/s each direction, with the data being all tiny packets like you might see on a super busy network. 10gb/s in a solid data stream isn't the same as 10gb/s of all varieties of packet hammering the device?
Sweet! RoutSI is going to be super power efficient if that's all it takes to run 10Gb!
This sounds amazing!
I do wonder how it would compare to VPP with XDP (that is if the XDP plugin is in a working state). Either way this sounds like a great project and i am looking forward to upgrading my turris omnia. I have been waiting for a 10G upgrade path.
Bluesky only... Bruh. Haters gonna hate on every platform. You do you on any & all platforms that reach your audience the best. Don't let a few haters ruin it for other ppl. Love this project!
I'm not a networking god like some of the commenters here. I'm just a cybersecurity risk analyst. But I fully agree with the consensus that true performance tests are done at edge and corner case scenarios, because that's where mistakes can happen. That's the point at which just a little too much traffic can knock something over because a single malicious packet was able to get through somehow. I'm not saying you're prone to the mistakes that other companies have made - I'm just pointing out that with everything in the world being networked now, things need to be designed with cybersecurity and resiliency first and foremost. The enemy is always getting better. Just testing performance for performance's sake is no longer a valid benchmark. It needs to be either standardized in some way (not possible with most of your testing) or it needs to show both extremes to be scientifically rigorous.
I wonder why DPDK chose to constantly poll the interface vs asking the kernel to notify it when a packet arrives. And then continue in userspace. Aka use the kernel only for the raw packet notification.
Context switching is expensive - en.wikipedia.org/wiki/Context_switch#Cost
@@triffid0hunter Doesn't polling need context switching too? I mean each time you ask the kernel "is there a packet yet?" you enter momentarily kernel space for the kernel response. I could be totally wrong though.
@@sledgex9the whole point is to allow direct device-userspace interaction without the kernel being involved
I need to get my hands on one of these router when it available
So on twitter a post was called a tweet. Is a BlueSky post called a BS?
Fair question! 😂
Yeah, this was a bigger issue (10Gbit) in the pre-Nehalem days (talking DC side where you have more powerful processors, there was a day when there were struggles). Just like entropy exhaustion was also a huge issue. Advancements have made those things less of an issue.
1ghz is not that far from the today's standard 3ghz.
The absolute recrod maximum today is 5ghz.
I would understand a 100x difference, but everyone seems to forget how fast current tech is.
And this is forgotten because of bad software.
I'm a bit concerned about the 100% core usage, won't it increase the idle power usage of the router? 🤔
Negligible. Maybe half a watt.
@tomazzaman I would rather take the proprietary option as this is something I don't think I can get as a regular consumer and it seems to provide better integration with the hardware which could lead to better performance. Otherwise I could simply get a Raspberry Pi or something similar such as the Banana Pi R4 and build my own router from scratch.
UEFI and ACPI which are used in regular PC hardware instead of device tree are the worse option technically. A superior way would be to have a device tree provided by the BIOS and there would be no need for UEFI or ACPI after that.
This has been debated over times by hardware vendors and OS providers. Their conclusion was that DTs don't offer the required to support consumer computing platforms. On servers is more about using the already existing frameworks which are tailored to UEFI and ACPI.
DTs are better suited to embedded platforms with lots of hardwired hardware and almost no configurability as everything is mostly and already confined in the SoC. As the case of your phone or your router.
@@hyoenmadan Device trees could provide everything that UEFI and ACPI can do but it would require support from OS vendors, namely Microsoft.
And since Microsoft has always been betting on backwards compatibility and vendor lock-in, it makes zero financial sense for Microsoft to support switching from UEFI to device trees.
As a result, device tree has no future in PC hardware as long as Microsoft Windows has such a big market share.
What about the thermals now that one core is pinned at 100% all the time?
Adds about half a watt to total power consumption. Negligible, but once we have the cases manufactured, we'll of course run the proper tests to make sure it really doesn't impact anything.
@tomazzaman does the software only work with polling? I haven't done much with networking on embedded linux or device drivers myself but it seems not ideal to constantly poll for data (but i have 0 experience with this so who am i to judge)
What kind of performance difference is there between VPP and native linux packet forwarding? What happens performance wise if you switch the network driver to polling mode in linux?
My guess would be that at zero to very little traffic, Kernel networking with interrupts would be slightly more efficient/faster, but at any level of traffic above that VPP would get the advantage.
That because, even with no traffic at all, VPP would still be polling NICs to know if there's something new while the kernel would be doing other things waiting for NICs to tell it something happened.
With more traffic instead things wouldn't change much for VPP, but a lot for the kernel which would be receiving lots of interrupts and having to constantly stop what it was doing to listen to what NICs have to say.
@@qdaniele97 I see that as well. Maybe release 2 versions of the OS/firmware, one with classic Linux kernel (the default one) and one with VPP.
@@hubertnnn That could be an option (or you could install your own OS).
But I think it's likely a router will always in a situation where VPP has the advantage
What about latency? Will it be better than for example a Fritzbox 74/7590(AX) ? And will your router function with MIC activation?
Can you use the device tree to define custom pcie hardware that can't be recognized by the BIOS?
so are you applying binary patches to the kernel?
why aren't dpdk and vpp just built into linux in the first place? when you create a bridge or enable ip_forwarding..
lmao as an embedded developer I totally feel the cross-compiling mess
we truly need to tame all of those build systems and packaging formats with something like Nix or Guix
I ran an EPC (vEPC) LTE core using DPDK. It was "black box" aside from the usual sysadmin/sysop stuff.
It's cool how fast you can push your hardware. Vendor had some weird DPE (data plane) bug but that's aside from the point.
I have no clue what's happening in this video anymore. So many acronyms. I clicked on it not knowing what I was getting into, and then it got into topics I have no clue about. I've never heard of your channel before, so this is all random for me, but I eventually got very lost trying to figure out what the video is even talking about.
Would some variable polling frequency be able to lower the worry some have about power consumption? With few traffic, the frequency could be lower with up to 100ms sleep, and if packets arrive, the sleep is eliminated ?
According to VPP docs, it can switch between polling and interrupt modes, and it the latter, it should be able to lower the frequency. Can't confirm for sure though, at least not yet.
Soooooo, no iptables? no nftables?
Correct. VPP does come with it's own firewall though. Both statefull and stateless.
I guess Mikrotik like rb4011 or rb5009 can do that without any issue
I didn't know DPDK existed. Looks very promising. However, it feels like it is non-standard. I don't know if administrators really want to deal with DPDK/VPP if Linux already provides really good infrastructure.But hey, as long as it works and doesn't interfere with what I do - that's fine. It would be interesting to see the latency on this. The bandwidth may be high but latency is also something to keep in mind.
DPDK is used in server environments where the host OS doesn't need to interfere with the data, when you can tunnel the traffic to separate containers or guest VM's.
Cellular infrastructure/mobility cores for LTE infrastructure do this to optimize hardware requirements (and space available). It's really cool tech, weird to think about it's complexities tho. Not sure how Linus or any of the kernel maintainers agreed to implement it 😂
Never heard of it either, but the interface looks like CISCO's router command line interface, and since it was made by cisco, I wouldn't be surprised if it actually is their router CLI.
And if it is, then administrators already know it. It would be a bit worse for non administrators, because cisco's cli is a pain to learn with many non obvious things.
@@hubertnnn but administrators don't necessarily know how to work with Cisco. I also don't quite trust Cisco as these are big corporations and you don't really know what you're getting into and it could turn out to be a huge enterprise mess.
What of 25 Gbps traffic, can a single core push that much traffic with DPDK and VPP enabled?
No idea, honestly, I don't have access to a 25Gb NIC.
What performance can we expect using stock linux drivers and networking stack?
I was still able to get 10Gb, but never on a single thread, those get to about half that.
Hey I don’t remember if you mentioned it in one of your other videos but out of curiosity, will the router be able to support multiple wan’s? I feel like there’s not too much in that market and it would be a great idea to implement, especially at 10g speeds.
I see no reason not to - multi-WAN is a software thing.
@ good point. I was looking at one of those ubiquiti dream machines for my network when I asked that question… I figured that I’d rather use your router instead when that time comes. Do you feel like it’ll be something that you’ll already have coded in?
“Coded in”
Also with 50 firewall rules?
dpdk, yep, called it.
So DPDK runs in userspace?
Yep.
I wonder if that router can install or support RouterOS ? thanks in advance
Unlikely, at least in the near future. We're already stretched too thin as it is.
Where do you buy your custom keyboards now?
Still have a couple in stock, so no need to, but I must admit, I did buy the wooting to test out their halo-effect switches :)
What’s the power consumption change with that core always blasted?
Negligible difference. Around half a watt.
also huge fan of the project
Very nicely done. From the outside it seems a long way to get this type of through put. It looks like you added a few more gray hairs setting this all up. But in the end it works. Well done!
I realize you might have a limited lab environment but it would be interesting to setup all 10GbE ports with a iperf system. I think there was 4 in your build (sorry poor memory). See if you can push 10GbE per port through the router. You might have a 2x2 setup for iperf testing. The idea is to see at what point you saturate that single core, and then need to dedicate a second or third core to networking while keeping the remaining core(s) for the kernel and system management (snmp, dhcp and such).
I think its a good plan to keep one of the interfaces (1GbE) connected to the kernel as an out of band management interface. This will keep port forwarding of data interfaces offline until the system is fully booted and the system status/setup is confirmed.
can you make one more enterprise with more ram etc - cheap 100g router/switch - i think quite a few people will start going from 10g to 100g fiber - it is the first place you invest
VPP's PPPoE plugin just for server, not client tho. by the time I writing this comment, the wireguard plugin still dont support dynamic endpoint, means, all wg endpoint need to be static address, so not good for home gateway or home router
Fight for this love
Nice Video… are You planning to put this project on kickstarter or similar platform? I am interested in the product :)
Not sure yet, we'll be fundraising soon (venture capital) so I want to use that for the first batch, rather than my viewers' money.
@ using DPDK with VPP is a bold move. Most of open source small solutions are still in Kernel Mode. I hope Your device will be available in short time.
Can you make a homelab tour
Yes to tunnel video pls 🙏
10:11 I was really excited to purchase your router when it was ready but now that you decided to publicly tie a political opinion to it I think I'll go elsewhere.
I agree with all the comments about imix, but I don't think it's relevant for a SOHO CPE. Maybe you should get a cisco trex box and generate some real numbers.
Isn't this quite a clickbait title?. The CPU doesn't do shit since everything is offloaded as you demonstrated many time
100% non stop on one core... there goes power efficiency and heat. So you keep the CPU pegged 24/7 for that 5 minutes of 10Gb transfer you do per day.
BlueSky gave me AlDS
Sure.. now put some vlans, firewall rules and so forth & try again :)
Yep, on my to-do list!
I would love if operating systems like pfsense could come installed. Are there plans for rack mountable 1u form factors?
Tomaž is gold
good job
Well .... Perhaps at 1.2 GHz, if it has hardware acceleration.
What about L3
Nice!
All of that work for 2xSFP+ & 3x2.5GbE?
Oh wow! The magic of DPDK+VPP! +1 vote to go in that direction from me. And I do not do Twitter, or X, or BlueSky... I hope you still check your old fashion email address for when I need to get in touch with you outside of a RUclips reply comment. 😎 I will next send this to my "Right Hand Man in America" who built our custom Linux firewall platform 20 years ago. I really like your enthusiasm dedicated to this much needed space in the IT industry. Thank you SO much! Over time, I hope you will have enough volume growth that you would consider a next level up model with more network ports. 🥳🧐
Single core can't, single core plus a lot of asics additions - yes
Maybe you could initially offer the device without these drivers in the kernel, but include an extra "I wan't that released and open-sourced, here's $20 for that" option for us to pick during purchase.