We are having an odd DNS problem and I’ve been receiving a lot of good help from many wonderful folks on Twitter. However, we are trying to solve this in 140 characters and that is proving difficult so here is the whole story. If this fails the next step is to contact the friendly folks at Microsoft for only $500.
We run Active Directory Integrated DNS on Windows Server 2003. We have a single DNS server that works fine but we are migrated from our physical Server 2003 box to our virtual farm. Our virtual servers are running Server 2008 R2. The plan was, and hopefully still is, to add a Server 2008 R2 Domain Controller with DNS, allow it to replicate the DNS via Active Directory, and then point all the clients to the new DNS server. After sufficient time for that to take place then we would remove our old 2003 DNS server and just be running on the new 2008 R2 server.
I ran dcpomo on the 2008 R2 server, which added DNS, and the replication started. Everything went just as numerous articles said it would. The problem is that the new DNS server will not reliably resolve external domain names. It works fine internally but now when going upstream to our external DNS server, which is provided by our local fiber internet service.
When you run an nslookup on the new 2008 R2 DNS server you get a time-out the first try. You can then run it a second time at which point it works and will continue to work until you try a different domain. So if you nslookup yahoo.com the first response will be a time-out. The second and each subsequent time it will work fine until you switch and try google.com. Then that one will work but if you go back and try yahoo.com again it will time-out the first time.
The AD integration part is working as both the old 2003 DNS server and the new 2008 R2 DNS server are keeping each other up-to-date however for reasons we haven’t uncovered yet the 2008 R2 server is not resolving external names.
We have confirmed that our firewall is EDNS and DNSSEC compatible and passes the larger UDP packets. We have also tried disabling EDNS on the server but it has no effect. Whether on or off the time-out still occurs.
Seems we are missing something obvious but what?