Thursday, February 11, 2010

Some more thoughts on Deadwood’s “Type”

I’ve been thinking some more about Deadwood’s type byte, which I discussed in the last blog entry. The reason why the only thing the type byte notes is whether the NXDOMAIN bit is set is because, when I was making Deadwood a simple DNS cache that treats DNS data, as much as possible, like a “black box”.

For a caching-only DNS program, we don’t care what’s in the packet except to decide how long to cache it. We only used the type byte because it was the only place I could store the NXDOMAIN bit in the header.

There’s a lot of confusion about the NXDOMAIN bit, and, as DJB has pointed out, a lot of naive DNS implementations get it wrong. The NXDOMAIN bit in the DNS header indicates that not only isn’t there a DNS entry for this name with this record type, there is no DNS entry for this name for any record type. You can also have a simple DNS “not there” reply, which is a DNS record without the NXDOMAIN bit set, but in the format of a NXDOMAIN: No answer in the AN section, and a SOA record in the NS section.

So, in reality, we have positive DNS answers, and we have two types of negative DNS answers: “Not there” DNS replies, and NXDOMAINs.

Deadwood currently treats “not there” and positive answer DNS replies the same: Both DNS replies are passed as-is on to the client. They are both type 0 (answer to pass on to the client without setting the NXDOMAIN bit) replies.

Now that we are doing a deeper inspection of DNS packets, I would like to change that. I would like to have positive replies distinguished from non-NXDOMAIN “not there” replies, and require a reply with the NXDOMAIN bit set to be, in fact, a NXDOMAIN reply.

So, my revised list of types:
  • Type 0: Positive answer
  • Type 1: NXDOMAIN negative reply
  • Type 2: Non-NXDOMAIN negative reply
  • Type 16: NS referral
  • Type 17: CNAME referral
  • Type 18: All known servers timed out last time we tried to get this data
The reason for the revised numbering scheme is that I would like types 0-15 to be set aside for replies we pass on to the client, with 16-31 used for data which means we have to do more work to find an answer. Type 18 will let us know to be more patient with the servers (maybe they’re just really flaky), and even allows there to be framework to have Deadwood periodically try to contact these servers just in case it’s a case of the servers being offline whenever the admin unplugs the server to plug in the fridge to cool down the beer (note: I’ll probably not implement this).