It’s not easy for me to describe the issue in one line on the title. Let me give some background here. I have 2 set of VMs. Set 1 has VM A & VM B. Set 2 has VM C & VM D. Each VM has a vNIC configured with a private IP address. VM A and VM C also have another vNIC configured with an L3 (Routable) IP address. Each set’s private IP addresses are the same. To make sure no confusion I implemented a vRouter VM for each set. The vRouter is same as VM A or VM C, it has two vNICs. One is connected to L3 network, another is connected to the private network. This way can keep the private network traffic not going outside of the set. So the both set no disturb each other when I set same private IP addresses.
Following are IP addresses I set for each VM:
VM A: 192.168.0.11
VM B: 192.168.0.12
VM C: 192.168.0.11
VM D: 192.168.0.12
The problem is I still can get ping responding on VM A to 192.168.0.12 when I turn off VM B. I expected to see the L2 traffic goes to it own vRouter and finds VM B is offline. But tracert command shows me the traffic goes from VM A’s L3 network to vRouter of the 2nd set, and then get the answer from VM D. Looks like the L2 ping package is broadcasting on L3 network.
The issue was fixed by enabling a feature on L3 network. It called “Enforce Subnet Check for IP Learning“. Cisco changed the name to “Limit IP Learning To Subnet“. It’s a VLAN level setting. It will not allow broadcasting the private Ip traffic on an L3 network. It forces private IP traffic to go to L2 network only.
I have a box uses Emulex OneConnect OCe10102 network adapters. The adapter is quite old and Emulex brand card doesn’t support ESXi 6.0. I upgraded the server to ESXi 6.0 and the Emulex adapters lost.
In the initial troubleshooting, I noticed that the adapters are still visible in BIOS. So it should be some driver level issues. I checked VMware Compatibility Guide. The model OCe10102 doesn’t support by ESXi 6.0.
If you run the following command you will still be able to see the adapters in PCI list on ESXi.
esxcli hardware pci list
So it indicates the adapters are not visible in ESXi since the newer Emulex driver doesn’t contain the model of the adapter in ESXi 6.0 native driver.
Then I uninstalled the native Emulex driver for ESXi 6.0 by the following command and rebooted the ESXi host.
The adapters still not visible after rebooting since no any drivers for Emulex adapters. Then I downloaded the Emulex drivers for ESXi 5.5 on VMware website and uploaded the “offline” package in the zip file to /tmp directory of the host. Then installed the driver by the following command:
There are several layers of networking on the virtualization infrastructure. Guest operating system, Virtual Machine, ESXi driver, physical network adapters, RJ45/SFP and network switches…etc. Sometimes it’s hard to say where exactly caused a problem. Especially hardware layer problems. Today I worked on a very interesting case, it may give some ideas to troubleshooting network performance issue which is caused by hardware layers.
A user told me he was bothered by network performance of a virtual machine. It’s slow to copy data to NFS share. But responding to “ping” command looked good. I didn’t see any issue on virtual machine layer. VMware Tools was up to date, Windows OS was patched, virtual network adapter type was VMXNET3 and VM version was also up to date.
When I tried to copy an image file to share folder of the virtual machine, I did see sometimes speed was fast, but sometimes not. Since I have two physical uplinks, it led me to guess it could be one of the uplinks.
After a lot of swapping and cable changing, we eventually figured out there was a bad SFP on network switch end. I was able to observe the issue by using “psping.exe” of Microsoft Sysinternals. I used the following command to send the different size of ping package to the virtual machine. Network drops were increasing when I increased package size.
The size could be 1k, 2m or even larger. I think this is a good way to identify problem outside of ESXi. Especially SFP problem as such kind of problem didn’t give any CRC or error count on network switch level.
You can also use Windows native command “ping.exe” as following. The size unit is “bytes”. For example, you need to input 4096 if you want to send 4kb.
I built a simple Auto Deploy environment by vSphere 6.5 on nested environment. I created virtual ESXi hosts on a physical ESXi host to do the testing. The whole configuration was smoothly, I’m impressed Auto Deploy can be implemented in few hours. One thing bothered me was networking.
New ESXi hosts cannot get IP addresses properly somehow. It’s not a single problem. The symptoms are ESXi hosts cannot get IP address, or the Configure Management Network was grayed out on console, or ESXi hosts can get IP address but no responding to ping. Just quick post my solutions here.
To fix all these problems you need to do following:
Enable Promiscuous Mode on the vSwtich which is attached to nested ESXi hosts on physical ESXi hosts.
(I did that on Web Client of vCenter 6.5 U1. You may see different procedure on earlier versions.) Edit the host profile of Auto Deploy — Networking configuration — Host port group — Highlight Management Network — The option Determine how MAC address for vmknic should be decided — Choose Use the MAC Address from which the system was PXE booted.
If you don’t do step 1, your nested ESXi hosts may not able to get DHCP IP addresses properly, or it can get IP addresses but maps to a new MAC address lead to network packages cannot be transmitted.
Nested ESXi hosts get a DHCP IP addresses when do PXE booting. The hosts get another new IP addresses when apply host profile as soon as management network is created. It could be two different IP addresses and the MAC address of management network could be a new one that not same to any of vmnics. It will be hard to trace back on network switch in real environment, so I think it’s better also to do step 2.
Update 10/25/2017 — You should choose “User must explicitly choose the policy option” in step 2 above if you have multiple NICs. The reason is DHCP IP address during PXE may be captured by random NICs. If you choose what I mentioned in step 2, you will see DHCP server may learns MAC address of a none management network NICs associated with management IP address. Please refer this article for more detail.
Just noticed a issue that nothing reported in ‘Hardware Status‘ tab of ESXi hosts in vSphere Web Client. KB 2112847 gives a solution but not works for me. The feature can be used to monitor hardware failures. I figured out a way to workaround it. You just need to login by Administrator account and click ‘Update‘ button under ‘Monitor‘ – ‘Hardware Status‘ for each ESXi host. You will get the status after few minutes.