Friday, June 5, 2009

The next step: DNS decompression

The next step for Deadwood is to implement DNS compression.

In order to implement DNS compression, it is necessary to perform a "deep packet inspection" of the DNS packets. So far, I have been able to keep Deadwood's code as simple as possible by treating DNS packets as a mostly "black box". We only parse parts of the DNS header, the DNS question, and the TTL of the answer from the DNS record.

This keeps the code simple and it's one of the reasons we have have, in under 32k, a complete DNS-over-UDP non-recursive cache. The only TTL we care about in the answer is the TTL of the first answer we get. In order to resolve what I called the "Google Problem" (DNS packets where the first CNAME record has a high TTL and the actual A records the CNAME points to have a low TTL), I have code that go through the answer section of a DNS packet, looking at the TTL of CNAME answers until we get a non-CNAME answer, and using the lowest TTL we find.

OK, let me try to say that in English. DNS is an unnecessarily complex protocol. There's a reason Dr. Bernstein muttered very darkly about DNS' format. A DNS record has multiple records. Each and every DNS record has a "TTL": Time to live. This tells the DNS resolver how long the DNS packet should be stored locally.

Deadwood, however, doesn't store each record individually. Deadwood stores the entire DNS data packet as a single "black box" packet and only does the most cursory inspection of the DNS packet to figure out how long the TTL should be. In more detail, Deadwood usually just looks at the TTL of the first record to determine how long to cache the packet. Since, in the case of some records, we have what is called a CNAME record ("This host really has this name"), we sometimes have a packet that looks like this (this is a real resolution of www.google.com I just performed):
  • www.google.com actually is an alias for the name www.l.google.com; remember this for 604800 seconds (one week)
  • www.l.google.com has the IP 74.125.159.106; remember this for 300 seconds (five minutes)
  • www.l.google.com also has the IP 74.125.159.147; remember this for five minutes
  • www.l.google.com has the IP 74.125.159.103; again remember this for five minutes
Now, the problem is that Deadwood used to remember both the fact that Google's portal really has the name www.l.google.com for a week, in addition to all of the IPs for www.l.google.com.

A few months ago, I finally fixed things so, when the first answer is this CNAME (no, we really have a different name) answer, we will look past the CNAME record to see if the name the CNAME points to has a lower TTL (how long we want to remember the record).

This is the only time we do any real inspection of the DNS packet. Once we implement DNS decompression, this will change.

Right now, I'm trying to decide what is a reasonable way to store the DNS packet internally after inspecting and decompressing the packet. There are various possibilities, which I will discuss in my next blog entry.