“DNS bad key” on Windows Failover Cluster

Cover photo.

I used to see two common errors on the new created Microsoft Failover Cluster. “DNS bad key” and CNO update error.

Cluster network name resource ‘Cluster Name’ failed registration of one or more associated DNS name(s):

DNS bad key

or

Cluster network name resource failed registration of one or more associated DNS names(s) because the access to update the secure DNS Zone was denied.

The error “DNS bad key” is more often rather than the other error. I did a lot of study on the internet. The issue was fixed after applying the following steps:

  1. Right click the Windows button – Click Run.
  2. Run following command to enter Network Connections.
    ncpa.cpl
  3. Go to Properties of the network adapter that you are using for Microsoft Failover Cluster.
  4. Go to Internet Protocol Version 4 (TCP/IPv4)AdvancedDNS tab.
  5. Deselect the Register this connection’s addresses in DNS.
Screenshot of the key option to fix the issue.

The cluster error events appears in event log in regular frequency. If you want to test it without waiting. You can initiate a core cluster resource failover to test it.

How to move core cluster resource?

Microsoft Cluster Failover – Right click the cluster – More actionsMove Core Cluster Resources.

Cipher Suites on Windows Server 2016/2019

“Static Key Ciphers” are used on Windows Server 2016/2019 for backward compatibility with legacy applications. It existing on Windows operating system by default. Hackers can decrypt the traffic if the weak cipher suites are being used. Hence how to secure the traffic is important for Windows security.

In short, certain communication security protocols and cipher suites should be disabled on Windows Server 2016/2019.

What’s Cipher?

Cipher is the algorithm of translation between plaintext and ciphertext. There are two algorithm categories: The symmetric key algorithm and the asymmetric key algorithm. Symmetric key algorithms use one key for encryption and decryption. Asymmetric key algorithms use different keys for encryption and decryption.

The popular ciphers are DES, AES, RSA, SHA…etc. However, some of them are out-of-date. And some maybe not in compliance with certain information security standards.

What’s the Cipher Suite?

A cipher suite is a set of ciphers and security protocols. A server encrypts data with a cipher suite. And a client decrypts data with the same cipher suite.

Naming Convention of Ciphers

Different Windows Server versions support different cipher suites. Following is the default cipher suite list for TLS protocol on Windows Server 2016/2019. As you can see, Windows Server 2019 supports few advanced cipher suites in addition.

Cipher Suites have an order on Windows. It always picks up the best cipher suite. “The best” means it must match two criteria:

  1. At least one cipher suite in the order must be supported by the application.
  2. The chosen cipher suite is the top one in the supported list.

If Windows cannot find a suitable cipher suite, then the communication is failed. As a result, you will see error messages in Windows Event Log (Similar to the event log samples below).

Windows Server 2016 Windows Server 2019
N/A TLS_AES_128_GCM_SHA256
N/A TLS_AES_256_GCM_SHA384
TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA N/A
TLS_DHE_DSS_WITH_AES_128_CBC_SHA N/A
TLS_DHE_DSS_WITH_AES_128_CBC_SHA256 N/A
TLS_DHE_DSS_WITH_AES_256_CBC_SHA N/A
TLS_DHE_DSS_WITH_AES_256_CBC_SHA256 N/A
TLS_DHE_RSA_WITH_AES_128_CBC_SHA N/A
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256* TLS_DHE_RSA_WITH_AES_128_GCM_SHA256*
TLS_DHE_RSA_WITH_AES_256_CBC_SHA N/A
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384* TLS_DHE_RSA_WITH_AES_256_GCM_SHA384*
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256* TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256*
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384* TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384*
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256* TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256*
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384* TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384*
TLS_PSK_WITH_AES_128_CBC_SHA256 TLS_PSK_WITH_AES_128_CBC_SHA256
TLS_PSK_WITH_AES_128_GCM_SHA256 TLS_PSK_WITH_AES_128_GCM_SHA256
TLS_PSK_WITH_AES_256_CBC_SHA384 TLS_PSK_WITH_AES_256_CBC_SHA384
TLS_PSK_WITH_AES_256_GCM_SHA384 TLS_PSK_WITH_AES_256_GCM_SHA384
TLS_PSK_WITH_NULL_SHA256 TLS_PSK_WITH_NULL_SHA256
TLS_PSK_WITH_NULL_SHA384 TLS_PSK_WITH_NULL_SHA384
TLS_RSA_WITH_3DES_EDE_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA
TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_128_CBC_SHA
TLS_RSA_WITH_AES_128_CBC_SHA256 TLS_RSA_WITH_AES_128_CBC_SHA256
TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_128_GCM_SHA256
TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA
TLS_RSA_WITH_AES_256_CBC_SHA256 TLS_RSA_WITH_AES_256_CBC_SHA256
TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_AES_256_GCM_SHA384
TLS_RSA_WITH_NULL_SHA TLS_RSA_WITH_NULL_SHA
TLS_RSA_WITH_NULL_SHA256 TLS_RSA_WITH_NULL_SHA256
TLS_RSA_WITH_RC4_128_MD5 N/A
TLS_RSA_WITH_RC4_128_SHA N/A

Which Should Be Disabled?

Firstly we need to look into the communication security protocols. SSL 1.0, SSL 2.0, SSL 3.0, TLS 1.0, TLS 1.1 and TLS 1.2 are popular protocols. They are enabled on Windows Server 2016/2019 by default. However, most of them are out-of-date due to certain vulnerabilities. For example, SSL 3.0 is killed by the POODLE attack. So the suggested protocol is TLS 1.2.

The protocols are controlled by registry keys. The registry location is HKLM SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols.

There are two keys for each protocol: Client and Server. The Microsoft KB only introduced how to disable PCT 1.0 for Server. You need to do the same for the Client. The KB was written for the earlier Windows version. But it also applicable for Windows Server 2016/2019

Secondly, dealing with cipher suites. There are a lot of articles on the internet to talk about cipher suites. But no straight answer on what should be disabled and how. I think the easiest way is to compare your current cipher configuration with the blacklist of RFC 7540.

I compared Windows Server cipher suites with it. All cipher suites in the table above are on the blacklist except the green text. In other words, the green text cipher suites are safe for TLS 1.2.

If you follow the blacklist. As a result, there will be only 6 cipher suites for Windows Server 2016 and 8 for Windows Server 2019. Most importantly. It may cause a lot of problems since the cipher suites may not be supported by the majority of 3rd party applications.

So, to balance security and compatibility. I think it may be reasonable to disable the out-of-date cipher suites only. After research. I think the cipher suites with red text in the table can be disabled on Windows Server 2016/2019.

You can get the current cipher suite configuration list with PowerShell:

(Get-TlsCipherSuite).Name

What’re the Impacts to Disable Cipher?

Because the cipher suite must be supported by application and Windows both. Therefore, there are two impacts to disable cipher suites on Windows Server 2016/2019. Firstly, it’s the internal impact. It means the native application may throw out errors if it doesn’t support TLS 1.2. For example, you may see the following error in Windows Event Logs after disabled SSL 1.0, SSL 2.0, SSL 3.0, TLS 1.0, and TLS 1.1 on a new provisioned Windows Server 2016/2019. The reason is TLS support for the .Net framework is not enabled.

Log Name: System
Source: Schannel
Date: 10/11/2020 1:1:1 PM
Event ID: 36871
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: test.zhengwu.org
Description:
A fatal error occurred while creating a TLS client credential. The internal error state is 10013.

Secondly, It may impact communication with external services. For example, the 3rd party software only supports the disabled cipher suites. You may see following log in Windows Event Log:

Log Name: System
Source: Schannel
Date: 10/11/2020 11:11:01 PM
Event ID: 36874
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: test.zhengwu.org
Description:
An TLS 1.2 connection request was received from a remote client application, but none of the cipher suites supported by the client application are supported by the server. The TLS connection request has failed.

How to Disable Cipher Suites?

There are several ways to control cipher suites. GPO is the recommended way. Or you can edit registry keys. But it’s inflexible. For example, It takes time to change the registry to disable a single cipher suite.

Microsoft introduced the PowerShell TLS module since Windows Server 2016. It supports to control a single cipher suite. I think it’s a better way compared with other ways. Because you can re-enable a cipher suite easily if the application doesn’t work.

Following is the command to disable cipher suite.

Disable-TlsCipherSuite -Name <xxx>

References

Password is incorrect when access admin$ or c$ on Windows

There are default shares for administration purposes on Windows. You can access it by //computer name/admin$ or //computer name/c$.

You may see the “password is incorrect” error when accessing the network shares. Even you entered the correct password for the machine.

The problem is the Windows local group policy is using guest only mode for sharing. But the guest account is disabled on the target machine.

You need to run gpedit.msc to open Local Group Policy Editor. And change the option “Sharing and security model for local accounts” to Classic.

The network access option in Windows Local Group Policy
The network access option in Windows Local Group Policy

Quick Notes: Windows lost network every 20 minutes

You may see a Windows machine lost network connectivity every 20 minutes. Or you may see the Windows machine lost network when you are connecting it via remote desktop protocol (RDP). I wrote an article to discuss virtual machine lost network connectivity problem on Emulex powered ESXi host. You may want to check out if you are running legacy ESXi and HPE hardware.

You may see following error if you check Application event log:

Source: Dot3Svc
Event ID: 15506
Description: Network authentication attempts have been temporarily suspended on this network adapter.

Or following error:

Source: Dot3Svc
Event ID: 15514
Description: Wired 802.1X Authentication failed.

There is a Reason Code in the event logs above. The code could be 327685, 327682, or 327626.

The reason is the Windows machine cannot get authenticated on an authentication enabled network. It could be certification file expired on the machine or server side, or something wrong between its.

Actually you can workaround this issue by disable “Enable IEEE 802.1x authentication for this network” option in Authentication tab in the network adapter Properties.

disable "Enable IEEE 802.1x authentication for this network" option

Please refer to Microsoft official document “advanced troubleshooting 802 authentication” if you want go deeper.

How to Manage Windows Servers With Ansible on CentOS 8

Ansible is a popular automation tool for infrastructure configuration. It runs on the Linux system. CentOS is an ideal distribution to run Ansible for lab purposes. It is similar to the Red Hat Linux but free. And the latest major release is CentOS 8. It contains Python 3 by default. So the Ansible configuration is different from CentOS 7. I will focus on the configuration in the lab environment. The goal is to create a simple environment to manage Windows servers with Ansible.

Ansible Installation on CentOS 8

I used CentOS 8 mini installation. It has no extra software installed. The procedure below maybe a bit different from your environment if you installed other roles on the OS.

Ansible is a standalone application that not rely on databases. There are two files it mainly needed in a quick lab environment: Playbook and host files. You can install multiple Ansible servers. They can run independently to control the same group of Windows servers.

I would suggest you take a snapshot before moving forward if your Ansible will running on a virtual machine.

  1. Enable Extra Packages for Enterprise Linux for yum.
yum install epel-release
  1. Install Ansible
yum install ansible
  1. (Optional) Install pip for Python 3. This step is for Red Hat 8.
yum install python3-pip
  1. Install pywinrm. The pywinrm will be used to communicate to Windows servers via winrm.
pip3 install pywinrm
  1. Install dependencies for pywinrm to use Kerberos in order to authenticate to Active Directory.
yum install gcc python3-devel krb5-devel krb5-libs krb5-workstation
pip3 install pywinrm[kerberos]

Ansible installation is completed. The procedure is elementary level but I spent some time figuring it out. Especially the Kerberos and pywinrm parts. 🙂

Please go to pywinrm GitHub if you want to dig into it.

Ansible Configuration on CentOS 8

Configure Ansible

As I mentioned in the previous section. There are two main files: Playbook and host. A Playbook is a file consist of multiple tasks that will run on the target Windows servers. It’s not covered by this article. The host file stores variables, and target server FQDNs or IP addresses. Ansible gets the target servers’ information in the host file when you run a playbook.

The host file location is /etc/ansible/hosts. There are two sections in the file for lab purpose.

  1. Server group. You can have multiple groups. Group name is in [ ]. You can give FQDN or IP addresses of the target Windows servers. I recommend using FQDNs if your targets are domain member servers. My example uses server win2019test1.zhengwu.org.
[windows]
win2019test1.zhengwu.org
  1. Variables of the target server group. Since this is for lab purpose. I’ll just list required variables in the /etc/ansible/hosts file. You need to use standalone variable files and avoid to input password if it’s for production. Following is a sample of the variable set for the windows group.
    • Variables are linked to a group by the variable name in the first line: [group name:vars].
    • The domain name should be uppercase in ansible_user. The reason is krb5 requires the uppercase domain name in the configuration file. We should match the name here. The domain name is not required if you use a local account.
    • ansible_winrm_server_cert_validation is optional. It only useful when ansible_winrm_scheme is ‘https‘.
    • ansible_port is ‘5985‘ when ansible_winrm_scheme is ‘http‘. Or ‘5986‘ when ansible_winrm_scheme is ‘https‘.
    • ansible_winrm_transport is ‘kerberos‘ in this example since the target Windows servers are domain members. It can be ‘ntlm‘ if you want to authenticate by local account. There are 5 authentication methods on Windows. Kerberos and NTLM are enabled by default. Please refer to Windows Remote Management for detail.
[windows:vars]
ansible_user='administrator@ZHENGWU.ORG'
ansible_password='123321'
ansible_connection='winrm'
ansible_winrm_scheme='http'
ansible_port='5985'
ansible_winrm_transport='kerberos'
ansible_winrm_server_cert_validation='ignore'

Configure Kerberos

Apart from Ansible configuration. We should configure Kerberos for domain authentication if the target Windows servers are joined a domain. My lab servers are joined domain ‘zhengwu.org‘. We have installed Kerberos components in the Ansible Installation on CentOS 8 section. So we just need to configure it. Edit Kerberos configuration file: /etc/krb5.conf.

  1. Change the default domain name. Make sure to remove # to uncomment the line. The domain name should be uppercase.
default_realm = ZHENGWU.ORG
  1. Uncomment all lines in the realms section. Please note domain name should be uppercase. The parameters kdc and admin_server are the same for the lab environment. The following is an example:

[realms]
ZHENGWU.ORG = {
     kdc = DC.ZHENGWU.ORG
     admin_server = DC.ZHENGWU.ORG
 }

Please refer to MIT Kerberos Documentation for the explanation.

Now Kerberos is configured. We have configured domain credentials in Ansible configuration file, specified Kerberos as the authentication method, and configured Kerberos for Active Directory. We just need to run the Windows ping module in Ansible to test the connection to target Windows servers. You should complete section Manage Windows servers with Ansible if the testing is failed.

ansible windows -m win_ping

You should see following output if authentication is successfully.

win2019test1.zhengwu.org | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

Kerberos troubleshooting

You may see authentication problem when validating target Windows server connection by Ansible win_ping module. Here is simple steps to troubleshooting Kerberos authentication

  1. Try authenticate to target Windows servers by domain account on Ansible server. It can be any domain account.
kinit administrator@ZHENGWU.ORG
  1. List cached authentication data. You should see something similar below.
Ticket cache: KCM:0
Default principal: administrator@ZHENGWU.ORG
Valid starting       Expires              Service principal
06/26/2020 03:56:12  06/26/2020 13:56:12  krbtgt/ZHENGWU.ORG@ZHENGWU.ORG
        renew until 07/03/2020 03:56:09

Manage Windows servers with Ansible

The target Windows servers should be configured to accept the winrm connection. Ansible provides a PowerShell script to configure target Windows servers automatically. The script should not be used in a production environment according to Ansible stated in their document.

The configuration is super easy for production. Open a command prompt under the administrator permission and then run following command

winrm quickconfig

Conclusion

Manage Windows servers with Ansible is not so hard as long as the authentication is configured correctly. Ansible is not the only tool for automation. I’m a super fan of PowerShell. I have posted some articles for automation you may want to check. PowerShell and Ansible both are automation tools.

I think manage Windows server with Ansible is like outsourcing PowerShell scripting works to communities. You give inputs to the tasks then Ansible modules will execute pre-defined PowerShell scripts and feedbacks output. Ansible reduces the development time of Windows automation but it still has some disadvantages. Such as you have to run multiple tasks to enable Remote Desktop on target Windows servers which is just a single task in PowerShell DSC. So I think automation of infrastructure is a combination of tools like Swiss Army Muti-Tools, each one has an advantage. We have to use them together to achieve the final goal of automation.

Quick Note: Microsoft Remote Desktop Connection Manager Windows overfit High DPI Screen

4K screen is getting popular in recent years. You may see some challenge for legacy applications. Such as “Microsoft Remote Desktop Connection Manager”. It’s stopped developed since 2014. But it’s still a useful tool for server administrators.

You may see the windows overfits screen on 4K display. The fix is:

  1. Go to properties of “RDCMan.exe
  2. Compatibility” tab
  3. Change high DPI settings
  4. Uncheck “Override high DPI scaling behavior“.

IE 11 Window Doesn’t Change Between 4K Internal and Regular External Monitors

Just a quick notes. If you use multiple monitors, some are 4K and some are regular resolution, you may see window display issue when move Internet Explorer between these monitors. Follow the KB below to change register to allow Internet Explorer 11 accommodates the monitor solutions.

Internet Explorer 11 window display changes between a built-in device monitor and an external monitor