Excellent presentation .. Really liked the part#1 on VMDq as well. There is a slight break at the 7-min mar, where the flow is defined, and the address resolution is explained.
Step 1 & 2: Packet Arrives, sent to the L2 Sorter/Switch. Step 3: Packet is sorted based upon destination MAC address; in this case, it matches Pool/VF 1. Step 4: NIC initiates DMA action to move packet to VM Step 5: DMA action hits the Intel Chipset, where VT-d (configured by the Hypervisor) performs the required Address Translation, for the DMA operation; resulting in the packet being DMA’d into the VM’s VF Driver buffers. Step 6: NIC posts MSI-X interrupt indicating a Rx transaction has been completed. This interrupt is received by the Hypervisor. Step 7: The Hypervisor injects a virtual interrupt to the VM indicating a Rx transaction has been completed, the VM’s VF Driver then processes the packet. www.intel.com/content/www/us/en/embedded/products/networking/82599-sr-iov-driver-companion-guide.html
@meandi02 I was thinking about that myself recently. However SR-IOV is for a PCIe Endpoint (Ethernet, Video, RAID, etc.). GPU isn't really a PCIe endpoint (doesn't move data in and out of the system). So not sure if GPU would be a good fit for SR-IOV.
@live3 Not sure what you are asking. The whole point of SR-IOV is to remove the hypervisor and CPU from ever touching the data until it gets in the VM.
VMDq is alive and well and supported in Windows and VMware. VMDq is a NIC + OS technology as opposed to a platform+NIC+OS technology. Which means there is no BIOS support needed and why you don't see any VMDq options in BIOS.
Everything I've read about SR-IOV is relating to providing a NIC directly to a guest VM. I was wondering, can a SR-IOV card (such as the X520) be configured instead to provide multiple virtual interfaces to the ESXi host? The Cisco vNIC (i.e., P81E) can provide logical interfaces to the hypervisor to appear as though it has a separate physical NIC (or two) for iSCSI, Service Console, VMNetwork, vMotion, etc. I'm looking for an equivalent that doesn't require Cisco UCS hardware.
Thank you Patrick for the great presentation. One doubt - Without interrupt, How the VM VF knows packet availability in its queue ? Does VF poll its queue?
The VMs to get interrupts. They used to be virtual - where the kernel would receive the interrupt and then pass it to the hypervisor which would signal virtual interrupt. With latest generation ecosystem and Intel VT-d technology, VMs can get HW interrupts.
Nice video. Couple of questions.. 1. Is VM to VM packet flow switched internally without coming out of pNIC? 2. I assume VF driver still does not allow promiscuous mode. If drivers are modified to allow promiscuous will it get VM to VM traffic also?
Glad you liked it. Is a bit dated now, however still fairly accurate. The default behavior on Intel devices is that VM to VM traffic is switched internally. You could modify the driver to do VEPA, which would send the traffic to an external switch. Promiscous mode is not supported by the drivers Intel provides, this is done intentionally for security purposes. However one could modify our drivers to enable a promiscous mode, which would also filter VM to VM traffic. This isn't something Intel can provide support for (such as helping or explaining how to modify the open source drivers) Hope that helps.
Mike Blaszczak This is still the latest and greatest. From an explanation perspective it is sill accurate and informative. Not doing Virtualization stuff anymore, so I'll likely not be doing any new videos on SR-IOV.
Yes traffic is switched internally to the device. There is a movement in the industry to have privileged VM, which can put the VF into promiscuous mode, however at this time I do not know the status of that work as I don't do virtualization as my day job anymore.
Can SR-IOV also be beneficial in a non-virtual single Linux Kernel environment? Was this added to the kernel? I saw it was only introduced in a fairly recent patch set... in august 2021.
will this still need an interupt if dma can forward the packet to the virtual machine's virtual HDD partition on m.2 nvme then decrypt packet within the virtual environment or must it be decrypted in the virtual adapter?
These are for Ethernet packets, so they do not get DMA's do a HDD, but instead to memory buffer that the VF driver allocates. Some new generation Ethernet devices can do encryption/decryption themselves if so configured, I've not personally done that.
the PF is still there for your networking needs as well. You can use it for anything you like - for example assign it to a bridge or vswitch for vms that you do not need faster performance for and can use an emulated network device with.
Another Question, Further according to Explanation I believe even the PF can be used to Host VM ?? via VSwitch. DO you have any document explain configuration of the PF to a VSwitch for example: OVS Itself.??
When using pci direct pass-through, a VM can directly configure the device to do DMA and when data comes in, the device directly puts the data into guest's memory. With the one time configuration of VMM and EPT, this removes the need for CPU from the process of moving data to guest. So in slide 27, I don't see why the summary of Intel SR-IOV is this.
There is usually nothing technically preventing this from working, however it would require the driver to support both mode simultaneously and that may not be implemented it depends on the vendor. SR-IOV and pass through are not the same thing, however a hypervisor can use pass-through to give a virtual function to a VM
Either. However, for Intel devices, the driver does not support putting the VF into promiscuous mode (as a security precaution), so putting it into OVS doesn't do much for you, unless you modify the driver, which as it is Open Source is an option.
If anti spoofcheck is disabled on VF and if VM sends a packet with a source MAC address (say MAC B) different than the one assigned to VF (say MAC A), how does PF/L2 sorter identify and forward the incoming packet to VF as the destination MAC of the incoming packet would be MAC B?
Spoofcheck is on Tx, so it has no impact on Rx. The VF will only receive traffic with a destination MAC address that was assigned to it and programmed into the NIC. You can send as many spoofed src MAC packets as you want, however you will only receive packets whose dest MAC matches your own.
@@Kutta32 Thanks for the reply. Based on my understanding, spoofcheck is being disabled to allow MAC failover between VMs to support High Availability in deployments. For example let's say VM 1 is assigned MAC A and VM 2 is assigned MAC B. VM 2 is a backup instance for VM 1. When VM 1 fails, VM 2 becomes active and starts sending packets using MAC A as source MAC. But based on what you are saying, VM 2 will never receive any packet that is sent to MAC A as it does not own that MAC. So if spoofcheck is applicable only for the Tx, then MAC failover cannot be supported right?
@@hemanthchenji6324 In that case you could have both VMs use the same MAC address, but have one on hot-standby; wherein it still receives packets but does not respond until the 1st fails. Usually in this situation there is a controller of some sort that will kickoff the failure and some sort of switching solution (either a vSwitch or physical switch) will start sending the traffic to the 2nd port (if doing HA on more than 1 physical port, which is what I would suggest). However this is not my area - I haven't actually worked directly on SR-IOV in nearly 8 years :-)
Hello Patrick, I would like to know, how does the Packet Flow in two VM Connected to Same VF need to communicate. Does the PCI does the L2 Switching ?? and If so how does the Bandwidth is allocated between the those VF.
The short answer is that VMware has announced support for this. I believe we have some information on this up at our blog site: communities.intel.com/community/wired If there isn't anything of use to you there, post your question and I'll go dig up what I can for you.
Hello Patrick, Could you please suggest do we have the similar thing for the FC workload. I mean with Qlogic Hba cards? Please correct me if my question is not right.
As a general rule, I tend not to do videos on competing products :-) That being said, in most cases SR-IOV works pretty much the same, whether it be Ethernet,video, storage etc.
Just posted a new blog with accompanying whitepaper on how to configure QoS, such as teaming, VLANs and rate limiting with SR-IOV, up on communities.intel.com/community/wired
wow.. it has been 13 years and I found this as the best & most basic explanation for SR-IOV... Thanks, Patrick!
Wow, this is *such* a great presentation and explanation of the technology! Many thanks Pat! 👍
Really Really Helpful. Could not find a better explanation to SR-IOV anywhere else. Thank you Sir!
Thanks a lot @Patrick Kutch. This is still very helpful.
Awesome video, democratising complex technologies for the rest of us :)
Excellent presentation .. Really liked the part#1 on VMDq as well. There is a slight break at the 7-min mar, where the flow is defined, and the address resolution is explained.
Step 1 & 2: Packet Arrives, sent to the L2 Sorter/Switch.
Step 3: Packet is sorted based upon destination MAC address; in this case, it matches
Pool/VF 1.
Step 4: NIC initiates DMA action to move packet to VM
Step 5: DMA action hits the Intel Chipset, where VT-d (configured by the Hypervisor)
performs the required Address Translation, for the DMA operation; resulting in the packet
being DMA’d into the VM’s VF Driver buffers.
Step 6: NIC posts MSI-X interrupt indicating a Rx transaction has been completed. This
interrupt is received by the Hypervisor.
Step 7: The Hypervisor injects a virtual interrupt to the VM indicating a Rx transaction has
been completed, the VM’s VF Driver then processes the packet.
www.intel.com/content/www/us/en/embedded/products/networking/82599-sr-iov-driver-companion-guide.html
Magnificent! Still actual and very helpful. Thank you! ^^
Much appreciated. I'm actually kind of amazed how many people still watch this.
@meandi02 I was thinking about that myself recently. However SR-IOV is for a PCIe Endpoint (Ethernet, Video, RAID, etc.). GPU isn't really a PCIe endpoint (doesn't move data in and out of the system).
So not sure if GPU would be a good fit for SR-IOV.
Thank you much. It was all in PowerPoint (my boss calls me the PowerPoint Ninja). Took several months to finish it up.
Ninja!
hi from 2021!
@live3 Not sure what you are asking. The whole point of SR-IOV is to remove the hypervisor and CPU from ever touching the data until it gets in the VM.
Depends on how many VF's you want. Intel has some 1Gbps NICs that support 8 VF's per port. We have two 10Gbps devices that support 64 per port.
VMDq is alive and well and supported in Windows and VMware. VMDq is a NIC + OS technology as opposed to a platform+NIC+OS technology. Which means there is no BIOS support needed and why you don't see any VMDq options in BIOS.
I like the video. How did you do the animation? Was it in PowerPoint?
excellent animation and explanation!
Everything I've read about SR-IOV is relating to providing a NIC directly to a guest VM. I was wondering, can a SR-IOV card (such as the X520) be configured instead to provide multiple virtual interfaces to the ESXi host? The Cisco vNIC (i.e., P81E) can provide logical interfaces to the hypervisor to appear as though it has a separate physical NIC (or two) for iSCSI, Service Console, VMNetwork, vMotion, etc. I'm looking for an equivalent that doesn't require Cisco UCS hardware.
Very good animation! Thanks so much.
THANK. YOU! This is an excellent presentation.
Thank you Patrick for the great presentation. One doubt - Without interrupt, How the VM VF knows packet availability in its queue ? Does VF poll its queue?
The VMs to get interrupts. They used to be virtual - where the kernel would receive the interrupt and then pass it to the hypervisor which would signal virtual interrupt. With latest generation ecosystem and Intel VT-d technology, VMs can get HW interrupts.
@@Kutta32 10 years later and you're still active! Admirable
Great explanation.. thanks!
Great explanation !!
Nice video. Couple of questions..
1. Is VM to VM packet flow switched internally without coming out of pNIC?
2. I assume VF driver still does not allow promiscuous mode. If drivers are modified to allow promiscuous will it get VM to VM traffic also?
Glad you liked it. Is a bit dated now, however still fairly accurate.
The default behavior on Intel devices is that VM to VM traffic is switched internally. You could modify the driver to do VEPA, which would send the traffic to an external switch.
Promiscous mode is not supported by the drivers Intel provides, this is done intentionally for security purposes. However one could modify our drivers to enable a promiscous mode, which would also filter VM to VM traffic. This isn't something Intel can provide support for (such as helping or explaining how to modify the open source drivers)
Hope that helps.
Patrick Kutch
Thanks Patrick! This was certainly helpful.
Patrick Kutch
Is a newer presentation available?
Mike Blaszczak This is still the latest and greatest. From an explanation perspective it is sill accurate and informative. Not doing Virtualization stuff anymore, so I'll likely not be doing any new videos on SR-IOV.
Yes traffic is switched internally to the device. There is a movement in the industry to have privileged VM, which can put the VF into promiscuous mode, however at this time I do not know the status of that work as I don't do virtualization as my day job anymore.
Excellent video thanks for it !
kudos, very well explained
Can SR-IOV also be beneficial in a non-virtual single Linux Kernel environment? Was this added to the kernel? I saw it was only introduced in a fairly recent patch set... in august 2021.
It can. Take a look at one of my other videos in this channel - about Flexible Port Partitioning
Hi Patrick,
Is the Vmdq technology obsolete? I have been with Server Bios for 2 years and only see SR-IOV and Vtd ....
What happened at 7:07 in the video? Looks like a bad edit; something is missing.
Just some lag in Powerpoint. Was kinda pushing PPT to its limits :-)
Where is Part1 of this video ?
In the same RUclips Channel where this one was found :-)
very nice presented
will this still need an interupt if dma can forward the packet to the virtual machine's virtual HDD partition on m.2 nvme then decrypt packet within the virtual environment or must it be decrypted in the virtual adapter?
These are for Ethernet packets, so they do not get DMA's do a HDD, but instead to memory buffer that the VF driver allocates. Some new generation Ethernet devices can do encryption/decryption themselves if so configured, I've not personally done that.
Could you please explain the packet flow with PF also in this case? If VF is used for packet dma, what is the use of PF here?
the PF is still there for your networking needs as well. You can use it for anything you like - for example assign it to a bridge or vswitch for vms that you do not need faster performance for and can use an emulated network device with.
very much informative.
Another Question, Further according to Explanation I believe even the PF can be used to Host VM ?? via VSwitch.
DO you have any document explain configuration of the PF to a VSwitch for example: OVS Itself.??
The PF is usually what is attached to a vSwitch and there are tons of examples that are available
When using pci direct pass-through, a VM can directly configure the device to do DMA and when data comes in, the device directly puts the data into guest's memory. With the one time configuration of VMM and EPT, this removes the need for CPU from the process of moving data to guest. So in slide 27, I don't see why the summary of Intel SR-IOV is this.
and even 10 years later it's not a part of commercial GPUs
Please clarify, Can SR-IOV and VMDq be enabled in one NIC working together?
is PCI PT and SR-IOV are the same?
There is usually nothing technically preventing this from working, however it would require the driver to support both mode simultaneously and that may not be implemented it depends on the vendor. SR-IOV and pass through are not the same thing, however a hypervisor can use pass-through to give a virtual function to a VM
I have two questions. 1, Can sr-iov and VMDq be enabled in one NIC ? 2, Can the sr-iov nic , PF or PF be attached to openvswitch ?
Yes they can both be enabled in one device - however you would need the OS to support that. Question two - yes they can be attached to OVS.
SR-IOV to guest OVS or host OVS?
Either. However, for Intel devices, the driver does not support putting the VF into promiscuous mode (as a security precaution), so putting it into OVS doesn't do much for you, unless you modify the driver, which as it is Open Source is an option.
If anti spoofcheck is disabled on VF and if VM sends a packet with a source MAC address (say MAC B) different than the one assigned to VF (say MAC A), how does PF/L2 sorter identify and forward the incoming packet to VF as the destination MAC of the incoming packet would be MAC B?
Spoofcheck is on Tx, so it has no impact on Rx. The VF will only receive traffic with a destination MAC address that was assigned to it and programmed into the NIC. You can send as many spoofed src MAC packets as you want, however you will only receive packets whose dest MAC matches your own.
@@Kutta32 Thanks for the reply. Based on my understanding, spoofcheck is being disabled to allow MAC failover between VMs to support High Availability in deployments. For example let's say VM 1 is assigned MAC A and VM 2 is assigned MAC B. VM 2 is a backup instance for VM 1. When VM 1 fails, VM 2 becomes active and starts sending packets using MAC A as source MAC. But based on what you are saying, VM 2 will never receive any packet that is sent to MAC A as it does not own that MAC. So if spoofcheck is applicable only for the Tx, then MAC failover cannot be supported right?
@@hemanthchenji6324 In that case you could have both VMs use the same MAC address, but have one on hot-standby; wherein it still receives packets but does not respond until the 1st fails. Usually in this situation there is a controller of some sort that will kickoff the failure and some sort of switching solution (either a vSwitch or physical switch) will start sending the traffic to the 2nd port (if doing HA on more than 1 physical port, which is what I would suggest). However this is not my area - I haven't actually worked directly on SR-IOV in nearly 8 years :-)
@@Kutta32 Thanks very much
Hello Patrick,
I would like to know, how does the Packet Flow in two VM Connected to Same VF need to communicate.
Does the PCI does the L2 Switching ?? and If so how does the Bandwidth is allocated between the those VF.
You cannot connect a VF to different VMs, there can be only one 'owner' of a PCIe device, in this case it is a single VM.
The short answer is that VMware has announced support for this. I believe we have some information on this up at our blog site: communities.intel.com/community/wired
If there isn't anything of use to you there, post your question and I'll go dig up what I can for you.
Really Helpful..!!
Can IPv6 packet pass through VF created on Intel I350 Gigabit?
SR-IOV is a L2 technology (based on MAC and VLAN) not IP address, so yes.
what is DMA'd?
Direct Memory Access - the ability to move data from place to another without the use of the CPU to do the work.
Thank you!
Direct Memory Access
Hello Patrick, Could you please suggest do we have the similar thing for the FC workload. I mean with Qlogic Hba cards? Please correct me if my question is not right.
As a general rule, I tend not to do videos on competing products :-) That being said, in most cases SR-IOV works pretty much the same, whether it be Ethernet,video, storage etc.
Just posted a new blog with accompanying whitepaper on how to configure QoS, such as teaming, VLANs and rate limiting with SR-IOV, up on communities.intel.com/community/wired
Arnold Dale
405 Chloe Inlet
Okay, so this is why VMWare 6.7 Enterprise is $4,229 per CPU.
Ibrahim Cape
Annetta Port
Clementine Heights
Maida Forge
first