Monday, April 13, 2015

​​Lessons Learned: Bug with Windows DNS Client

Last week I encountered an issue with what initially appeared to be a problem with DNS scavenging, but after hours of troubleshooting and research I was able to determine that it was actually a known issue with the Windows DNS client.

It all began on Monday when I spun up two new Windows Server 2012 servers, one physical and one VM, to be new domain controllers in our domain. I began testing on Wednesday, after replication had completed, to make sure that Directory Services and DNS BPA’s were clean. We were not receiving any alerts from System Center Operations Manager regarding any AD or DNS related issues. At that point I moved the Operations Master (FSMO) roles to the new servers and changed several system’s primary and secondary DNS to point to the new DC’s. By Thursday afternoon without any issues or alarms I began moving the rest of our servers to point to the new DC’s for primary and secondary DNS. 

The following afternoon, 24 hours later, I began to receive alarms and calls from our end users about various issues with Lync 2013, as well as an email from our developers reporting that they could not connect to the production database. I attempted to connect and was unable to. When I pinged the servers I received an IPv6 entry in return and not an IPv4 response as I had expected. When I attempted to force an IPv4 ping I received “hostname not found”. I jumped onto the new 2012 domain controller as well as one of the 2008 DC’s and immediately saw that DNS entries were missing.

At this point I thought DNS scavenging might have been enabled when I promoted the 2012 servers. I quickly checked and found that scavenging was enabled at the zone level but not at the server level. I could not figure out how or why the DNS entries were just disappearing on their own and began restoring one of the 2008 servers from the previous day using System Center Data Protection Manager 2012 so that I could have a list of all the entries before they started disappearing. After restoring and making the comparison I was able to determine that none of the static entries had been touched which again confirmed my theory that it was not scavenging as some entries were over 3 years old. I began to compare the list of entries and determined that only the 2008 and 2008 R2 servers had been affected. Our production Hyper-V cluster, Operations Manager, and Configuration Manager are all on 2012 and were still correctly published in DNS. I sent out an update to our infrastructure consulting group and Ryan Jackson replied with a KB he came across that stated there is an issue with the DNS client’s on:

Windows Vista
Windows Server 2008
Windows 7
Windows Server 2008 R2

Microsoft has identified and provided a hotfix and workaround for the issue:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;2520155

Upon installing the hotfix and running ipconfig /registerdns the servers re-registered in DNS and have not disappeared since.