How to configure vSAN on nested ESXi hosts with SSD hard disk

There are lot of articles introduce vSAN feature and steps by steps guide. I referred William Lam’s article & Duncan’s article to configure vSAN on my lab, I was true I exactly followed his steps to configure the vSAN, but I can not saw anything under disk field under Disk Management.

Please note: Following steps does not work for ESXi 6.0 RC on VMware Workstation 10. You have to set scsix:y.virtualssd = 0 in vmx file to mark the disk as non-SSD. Please refer to William’s article for detail.

After looked into it deeper, I found something interesting:

esxcli storage core device list

I got that output:

mpx.vmhba1:C0:T1:L0
Display Name: Local VMware, Disk (mpx.vmhba1:C0:T1:L0)
Has Settable Display Name: false
Size: 5120
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0
Vendor: VMware,
Model: VMware Virtual S
Revision: 1.0
SCSI Level: 2
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: unknown
Attached Filters:
VAAI Status: unsupported
Other UIDs: vml.0000000000766d686261313a313a30
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32

Initially, I thought that disk marked as SSD since I ran command to enable SSD. Actually it’s not like that, it shows SSD since my hard disk is SSD!!!! I don’t have to run the command introduced in the articles to turn SSD on, it’s nature SSD. lol

What I need to do is actually totally oppositely. That’s the steps I used to enable vSAN:

1. Create two disks.

2. Login ESXi hosts by SSH.

3. Run following command, find out the two disks you want to use for vSAN. Record runtime name.

esxcli storage core device list

4. Run following command to disable SSD for one disk.

esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device vmhba1:C0:T2:L0 –option “disable_ssd”

5. Follow up the articles above to enable vSAN ports, create clusters, enable vSAN on cluster and join ESXi hosts to clusters.

How to setup NTP services by PowerCLI

NTP service is very important for troubleshooting, vmkernel log timestamp is incorrect if your NTP service is not running and ESXi system time is wrong. It can also impact to VM system time even you disable time synchronization on VMware Tools since VM still need to sync time with ESXi after awake from suspended status, finish vMotion, or revert from snapshot.

I know it’s simple to configure NTP services on single how, what if you want to configure NTP service on massed hosts?

Basically we have 3 steps to make sure NTP service working properly:

  • Configure NTP server IP address.
  • Bring up NTP service.
  • Set services startup along with ESXi system.

Let’s try PowerCLI:

Get-VMHOST -Location Cluster Name | Add-VMHostNtpServer -NtpServer “NTP server address

Get-VMHOST -Location Cluster Name | Get-VMHostService| Where-Object {$_.key -eq “ntpd”} | Start-VMHostService

Get-VMHOST -Location Cluster Name | Get-VMHostService| Where-Object {$_.key -eq “ntpd”} | Set-VMHostService –Policy On

Travel to Chengdu again

It’s about 5 years since last time I visited Chengdu. A beautiful city, people say “you gonna love it, and wanna live there if you come to Chengdu”. People looks like live very relax in Chengdu, they drink tea in park, play Mahjong and enjoy professional people scrape their ears (most like ears massage). I was being Chengdu for 3 month, so I’m kind familiar with this city. All memory is 5 years ago.

I was excited to get my Raspberry Pi on morning, I plan to play on it all the day. But my wife wants to discuss travel plan when we had lunch. She told me she visited Chengdu several times before, but no one is real travel, they just went to the city, got goods, then went back, she only know one place “He Hua Chi” – a clothes market. They bought clothes there and sales in their city.

Finally, we had a 3 days trip in Chengdu! That’s a crazy plan for me since I never tried planning and going in same day! We flight to Chengdu on night, and checked in a great hotel. Since it’s close to Chinese New Year, so no much people and traffic, I felt it’s like an empty city. 5 years, a lot of change, I can felt my heart beat rock when I saw some building I was familiar with. I didn’t go back to the city after I got new job in Xi’an, a lot of things I was missing every day…He’s, Minto, Chunxi road, JinLi…etc.

在出租车上,看着外边的建筑,似曾相逢,却又遥远模糊,那些熟悉的建筑还能勾起往日的点点滴滴。5年前,我还是个年轻小伙,还是个不知明天在哪里的待业青年,还是个受了就业打击的毕业不久的大学生,还是个刚刚经历了变态公司折磨的小网管,还是个创业失败的年轻人。5年后,之前的点滴成功、失败、挫折、荣耀都成为了我今天做为一个男人的生活积累、人生经验。无论是痛苦也好,高兴也好,他都是宝贵的。

成都,希望有机会再来。 Smile

How to decode ESXi 5.x SCSI error code

Storage is critical component for virtualization, lot of VM performance issue is related to storage latency. You may see similar error message on vmkernel log for some case:

2014-02-11T07:18:20.541Z cpu8:425351)ScsiDeviceIO: 2331: Cmd(0x4124425bc700) 0x2a, CmdSN 0xd5 from world 602789 to dev “naa.514f0c5c11a00025” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0

It much like language of another planet when I first time saw itJ. Let’s see how to “translate” it to human language.

First, I split it to several sections:

a) 2014-02-11T07:18:20.541Z cpu8:425351)

b) ScsiDeviceIO: 2331: Cmd(0x4124425bc700) 0x2a, CmdSN 0xd5

c) from world 602789

d) to dev “naa.514f0c5c11a00025”

e) failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0

Section A shows the UTC time when the error occurred.

Section B shows what command is sent. (Actually I don’t even know what the command means is, please let me know if you know it.)

Section C shows which world the command related to.

You can found which world it is by following command

ps | grep 602789

Section D shows which storage device it show error message.

You could identify which datastore it is by following command if your datastore contains single LUN:

esxcfg-scsidevs –m naa.514f0c5c11a00025

You could also check out LUN setting and information by following command:

esxcli storage core device list –d naa.514f0c5c11a00025

esxcli storage nmp device list –d naa.514f0c5c11a00025

Section E shows SCSI sense code. That’s the part I want to give more detail.

It’s breakdown to two sections:

SCSI status codeH:0x0 D:0x2 P:0x0

H means host status

D means device status

P means plugin status

Sense data0x4 0x44 0x0

0x4 means Sense Key

0x44 means Additional Sense Code

0x0 means ASC Qualifier

Before decode, you should translate each code to NNNh notation, 0xNNN = NNNh. For example 0x7a = 7Ah, 0x77 = 77h.

SCSI status code is easy to decode. You just need to change the format and check out the code from http://www.t10.org/lists/2status.htm.

In our example H:0x0 D:0x2 P:0x0, host code 0x0 (00h) means ESX host side is good, device code 0x2 (02h) means device is not ready, plugin status code 0x0 (00h) means LUN plugin is good. (Clarify: device code 0x2 is actually means “check condition”, it’s not really means “device is not ready”, it’s just for easy understand, but looks like it confuse since “Check Condition” has different means with “Device is not Ready”. Thanks Tony point out that. )

Sense data is a little bit complicate. You have to refer two links http://www.t10.org/lists/2sensekey.htm and http://www.t10.org/lists/asc-num.txt.

In our example: 0x4 0x44 0x0, Sense Key 0x4 (4h) means HARDWARE ERROR, Additional Sense Code is 0x44 (44h) and ASC Qualifier is 0x0 (00h), combine the both code to 44h/00h, it means INTERNAL TARGET FAILURE.

Okay, then we put all decode language together:

ESX host side is good, device is not ready, LUN plugin is good because HARDWARE ERROR INTERNAL TARGET FAILURE

Actually I dumped this code from an fnic firmware/driver incompatible case. Is it make your troubleshooting more easy?J

You could also refer to following links to get more detail:

Understanding SCSI device/target NMP errors/conditions in ESX/ESXi 4.x and ESXi 5.x

Understanding SCSI host-side NMP errors/conditions in ESX 4.x and ESXi 5.x

Interpreting SCSI sense codes in VMware ESXi and ESX

Interpreting SCSI sense codes in VMware ESXi and ESX

Website comes back online!

I didn’t know ICANN request email address verification, I thought freedom everywhere outside China, but looks like it’s not. 🙂

My domain was suspended due to that ICANN policy, and my QQ mailbox unable to recieve verification email from ICANN, what a unfreedom country it is! Finally I have to change my domain register mailbox to Gmail to get the email.

Shit GFW! ( Check out here to learn more about GFW )

How to Grant Multiple Domain Groups Permission to Multiple Folders on vCenter Server

If you have set of group VMs and particular group can access each set VMs, you should grant access on vSphere Client or vSphere Web Client.

SSO is slowly sometimes, you could use following CLI to do it more efficient.

New-VIPermission -Entity “Folder Name” -Principal “Domaingroup name” -Role “Role name

You could do it faster for regular folder name or group name by excel and notepad:

New-VIPermission -Entity “

Folder Name

” -Principal “

Domaingroup name

” -Role “Role name

Guess how to do it. Smile

How to Configure Serial Console for VM by Avocent ACS v6000 Virtual Advanced Console Server

Serials console is very helpful to troubleshooting Linux problem, you can see additional system message via serial console if your Linux server hung. It is essential component on physical server for troubleshooting. It’s challenge to manage serial consoles if your datacenter is very big. You may deploy console server for central management of serial consoles, you don’t have to connect your computer with serial console one by one, you just need connect console server IP follow with port name by telnet protocol.

Time comes to today, virtualization world. How you connect serial console of Linux virtual machine? Can we do exactly same like physical server? Answer is YES! There is couple of way to connect serial console of VM, each way has different benefit. I’m going to introduce the best one!

VMware has a KB article 1022303 introduces how to implement virtual console server, but it’s not very clearly, I went to wrong way by follow up the KB.

Deploy Avocent ACS v6000 virtual advanced console server

1. Download the software image from Emerson website.

2. Install the software on console server VM by follow up ACS v6000 Installer/User Guide.

Configure Linux VM serial console

1. Add a serial port to target Linux VM you want to use serial console.

2. Configure the serial port, Select Use Network option.

3. Select Client (VM initiates connection) option.

4. Input ACSID in Port URI field.

5. Select Use Virtual Serial Console Concentrator option.

6. Input telnet://console server ip:8801 in vSPC URI field.

7. Select Yield CPU on poll option.

8. Make sure Connected and Connect at power on options are selected.
Note: It indicates wrong setting on serial port if Connected option goes back to deselect status automatically after you save the setting.

Enable kernel message on Linux VM

1. Login to your target Linux VM by SSH.

2. Copy following strings to SSH, it will enable kernel message on serial console.
cat <<EOFEOF > /etc/init/serial-ttyS0.conf

# This service maintains a getty on /dev/ttyS0.

start on stopped rc RUNLEVEL=[2345]

stop on starting runlevel [016]

respawn

exec /sbin/agetty /dev/ttyS0 115200 vt100-nav

EOFEOF

3. Run following command.
initctl start serial-ttyS0

Enable serial console on Linux VM

1. Edit grub.conf by following command.
vi /boot/grub/grub.conf

2. Add following lines after hiddenmenu option.
serial –unit=1 –speed=19200

terminal –timeout=8 console serial

3. Add following line in each kernel line.
console=tty0 console=ttyS0,115200

4. Reboot VM.

Configure ASC v6000 console server

1. Login management website of ASC v6000 console server.

2. Go to PortsSerial Ports.

3. Enable ttyS1 device.

4. Go to Access option, you will see the serial console is automatically mapped to serial console of target Linux VM.

Validation

1. Login serial console of target Linux VM via console server by telnet, SSH or serial viewer.

2. Login SSH of target Linux VM directly.

3. On SSH session, run following command to trigger kernel message.
echo h > /proc/sysrq-trigger

4. You will see message on serial console screen.

Bought a Cisco Linksys WRT54G2 v1 Router

My wifi router was purchased one years ago, for some reason it’s not stable now, sometimes lost package when I ping it. After chat with my friends Steven, he recommended Cisco Linksys WRT54G2 which he is using, a very stable router. You can install DD-WRT firmware and leverage more new free feature, such as NFS, firewall, VPN…etc. I’m newbie, I’ll try explore more on my spare time. That’s a used router on120RMB (around $20).

I successfully installed it and turned on internet after 30 minutes, but I cannot log in my VPS by SSH on port 21, then I asked Steven’s VPS SSH to give a try, his port is 443, I cannot connect his also!! I suspected firewall on WRT54G2 maybe blocked the two ports since the router much like a Linux system, you can even run iptables command. Finally I figured out my network provider blocked port 21 and 443, how funny it is that I spend 3 hours to investigated firewall, routing table, wifi setting, NAT…etc. But forgot my sweet network provider!!

大意了,大意了!没想到21和443同时被封了,搞得我以为是路由器设定问题,弄了3个小时!最后把VPS端口号改成8080,上的飕飕的。这让我想起弟弟的iPhone没有电话信号了,他做的第一件事是拿去经销商处修理,由于过年,需要30天才能返回,其实最简单的是去先换张卡看看是不是真的手机有问题。随着经验的积累,我们遇到事情好像也更复杂的看待,其实有时候复杂的搞不定还是可以从简单入手的,尤其是系统排错方面。不知IT项目实施方面是否也如此?