I always treat virtual machine snapshots like a big risk. It caused several outages in our infrastructure. Please check out Best practices for virtual machine snapshots in the VMware to understand how it impacts production.
虚拟机快照对我来说绝对是个大威胁,已经在我的生产环境里发生过好几次由此引发的故障了。如果你要了解快照对生产环境的影响可以看看:Best practices for virtual machine snapshots in the VMware
It’s better delete snapshots as early as possible. But what if you have a large infrastructure? Application teams request to create snapshots before changing, delete after validation. You may get dozens snapshot requests every month. People talk about snapshot automation in community, there are several ways to get it done. Here are scripts I wrote to automate creating/deleting snapshots.
You can run its by scheduling a job on Windows Secheduled Tasks. The scripts have friendly logging system, you could easy figure out why script stops running. Its also write error code and detail to Windows Event Log, you may need it for ticketing system (Some monitor systems, such as SCOM, can capture specified Windows Event ID and create tickets in ticketing system). You can also create a troubleshooting guide on your website and put the URL on variable $TroubleshootingGuide. It presents on detail of the Windows Event log.
Particular for Snapshot Creation Script. I know vCenter Server Scheduled Task can do same, but my script always keeps two snapshots and delete oldest one when 3rd snapshot is created. The script was wrote long time back, it’s not very strong. It’s just for special requirement.
—Chinese Version—
人们都在说尽快删除快照,但是如果你的基础架构比较大,几乎每个月你可能都会接到很多要求创建快照的请求,应用层的部门在做变更前会要求创建快照,然后在变更后要求删除快照。想想都会疯。我看到有人在社区里讨论如何将快照自动化,也提供了很多实现的方式。我也写了个创建和删除快照的脚本,下面共享给大家。
你可以在Windows计划任务中定时运行这个脚本。它的日志系统很友好,如果运行出错了,你可以通过日志快速获得出错原因。它还可以将错误代码写入Windows日志,这个对于工单系统非常有用(例如SCOM就可以通过监视自定义的日志代码触发在工单系统中开case)。你也可以在变量$TroubleshootingGuide中添加排错手册的地址,它会显示在日志详情中。
Snapshot Creation Script比较特殊。我知道vCenter Server Scheduled Task也可以实现创建快照的功能,但这个脚本可以保持始终有两个快照,当第三个快照生成时会删除最老的快照。这个脚本是我很早前写的,不是很严谨,只是为了特殊需求写的。