ALUA Devices on ESXi 5.0

You may see the keyword ALUA frequently if you read VMware storage documents, so what’s the ALUA exactly is? How it reflects in ESXi 5.0? What’s the advantage of ALUA? I certainly have the questions, you?

First of all, ALUA is short word of “Asymmetric Logic Unit Access”, you probably already knowJ, ALUA is a SCSI standard, it’s not support by all storage arrays, but I think most large company should have the ALUA supported array. There are different articles tried to explain what ALUA is, I’m not a storage expert, I just want to give my interpretation. You may don’t agree, have question about that, please give me a comment, I’m willing to talk about that.

Generally, storage array ( Active-Active ) have two controllers (SPA, SPB), each controller have two paths (SPA0, SPA1, SPB0, SPB1), data transmits between ESX and storage array through these paths, in older ESX version, it can only use FIXED path selection policy to transmit data through a single path. Here is a potential problem, for example, you have 10 ESX hosts in a cluster mounts a LUN, one half hosts use SPA0, and the other half hosts use SPB0, it’s would cause path thrashing since first half hosts pull the LUN to storage controller SPA, and other half pulls the LUN back to storage controller SPB over and over again. Another scenario is the LUN owned by SPA but some ESX hosts transmit data through SPB for some reason.

Whatever caused the path thrashing, I guess that’s why I can saw following error in vmkernel.log:

2013-01-15T05:36:33.831Z cpu14:4110)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.60a9800064676a2d6b5a6c33474b5138" state in doubt; requested fast path state update...

ALUA give the ability to avoid the frequently switching between storage controllers, ALUA provides two types of paths: Optimized / Non-Optimized, Optimized means data transmit between ESX host and storage controller through owning controller, Non-Optimized means data transmit through non-owning controller without switch controller. Non-Optimized path transmit data to non-owning controller then transmit data to owning controller internally, then do underlay operation, as you can see it cause latency.

So how we know does ESXi 5.0 host running properly with ALUA? Let me show you some command:

Esxcli storage nmp device list –d NAA ID

Output like that:

naa.600601602c802900146f4f294d8ee011
   Device Display Name: DGC Fibre Channel Disk (naa.600601602c802900146f4f294d8ee011)
   Storage Array Type: VMW_SATP_ALUA_CX
   Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_support=on; explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=AO}{TPG_id=2,TPG_state=ANO}}
   Path Selection Policy: VMW_PSP_FIXED
   Path Selection Policy Device Config: {preferred=vmhba2:C0:T1:L14;current=vmhba2:C0:T1:L14}
  Path Selection Policy Device Custom Config:
  Working Paths: vmhba2:C0:T1:L14

Okay, let’s focus on the highlight line, it’s actually three sections:

{navireg=on, ipfilter=on}
{implicit_support=on;explicit_support=on;explicit_allow=on;alua_followover=on;
{TPG_id=1,TPG_state=AO}{TPG_id=2,TPG_state=ANO}
}

Navireg means whether or not register the device with Navisphere automatically.

Ipfiler means whether or not STOP sending the host name for Navisphere registration.

Implicit_support means whether or not device TPG state is managed by storage device self.

Explicit_support means whether or not device TPG state can be managed by ESXi host.

Explicit_allow means whether or not user allows the STAP to use its explicit ALUA capability.

Alua_followover means whether or not the ESX host follow alternative path instead of preferred path.

TPG means Target Port Group, it’s different path routing group with different state, like Optimized, Non-Optimized, Standby…etc.

AO means Active/Optimized path routing

ANO means Active/Non-Optimized path routing