Sunday, September 12, 2010

Deadwood on OpenWRT; The case for 32-bit Skein

When I was designing Deadwood, I made sure to make the program work really nicely on 32-bit processors, while retaining compatibility on 64-bit processors. In the places where word size really matters, namely the hash compression function and the cryptographic random number generator, I went out of my way to use 32-bit algorithms (my own home-grown algorithm for the 32-bit hash compression function; RadioGatun[32] for the random number generator).

During Deadwood’s development cycle, I even gave a nod to systems that can work with both 16-bit and 32-bit numbers, by having it so “int” in the source code means “something that can fit in 16 bits”, and using int32_t for anything that would need more than 16 bits (near the end of the Deadwood development cycle, I went from “int32_t” to “int_fast32_t” for “integer that needs more than 16 bits”). This is untested—I don’t think there are any systems in use today capable of routing packets that have 16-bit wide registers—but allows it to be at least possible to port Deadwood to a 16-bit system.

The reason for all this attention to detail was because my target platform while developing Deadwood was to have it run nicely on the 32-bit router in the corner used to connect people to the internet. And, I have succeeded.

Sebastian Müller has already gotten Deadwood to recursively process queries on an embedded router and is working on porting Deadwood to a number of platforms. He was able to pull this off in a single afternoon because Deadwood is, from beginning to end, an architecture-independent recursive DNS server optimized for 32-bit platforms.

Of course, Deadwood also runs nicely on 64-bit platforms; one of my regressions I perform when building Deadwood is to make sure nothing breaks in 64-bit. The only issue is that some operations (the hash compression, the cryptographic random number generator) are less efficient because they were written to run on 32-bit systems. [1]

Now, while Deadwood does run nicely on various 32-bit embedded systems, I unfortunately can not readily support these configurations because I run Deadwood on x86_32 and can not solve problems for people running it on different systems. It works, but people want to do this are on their own and need to take responsibility for any problems they see.

This leads us to Skein, a proposed hash primitive for SHA-3. Skein is a very good hash with a lot of features, including having, in its core, Threefish, the only tweakable block cipher primitive that appears secure.

However, I can’t use Skein because there’s no native 32-bit version of it, even though such a thing is possible. When I pointed out this problem on Schneier’s blog, I was dismissed by another poster with “all 32-bit desktop processors (and even some ARM chips) have instructions that can do math on pairs of 64-bit numbers”.

That’s all well and good, but Deadwood isn’t just for the desktop. It’s also for the embedded 32-bit space and that means no MMX/SSE instructions that work on 64-bit numbers.

One of the reasons why I feel so frustrated is because I can’t just make a 32-bit Skein variant. Not without possibly making 10 different errors that could destroy its security.

I understand why Schneier, et. al. don’t feel comfortable coming out with some rotation constants to make a 32-bit Skein variant; if they did make such a Skein variant and some cryptographers found weaknesses in them, it would make them look bad. So, they are being very conservative about what they will declare to be the official “magic numbers” for Skein. Numbers I simply do not feel comfortable changing.

If I were to use one of the SHA-3 submissions for Deadwood’s PRNG, I would use Keccak. Like Skein, it can output a stream of infinite length from any input of any length. Unlike Skein, it is more 32-bit compatible; not only is there a 32-bit “reduced word length” variant officially blessed by the algorithm’s creators, but also 64-bit Keccak more easily scales down to 32-bits than Skein, since the only operations done are permutes, rotates, and exclusive ORs.

32-bits is still very much a reality today. While the transition to 64-bit desktops is well underway, it will be a few years before 64-bit desktops become more common than 32-bit desktops. The embedded space is very slowly letting go of 8-bit systems, replacing them with 32-bit systems. I wouldn’t be surprised if 32-bit is still around in 2038, when we will have to worry about systems with 32-bit time_t overflowing (Deadwood will run until 2143 on systems with a 32-bit time_t, uses 64-bit timestamps internally, and won’t overflow in 2143 if compiled on a system with a 64-bit time_t).

Yes, we are making the transition to 64-bit. But we can’t make the transition with solutions like Skein that pretend 32-bit no longer exists.

[1] In terms of making a full 64-bit port of Deadwood, I would have to:
  • Make most, but not all, of the int32_t declarations int_fast32_t declarations (64-bit compilers should make int_fast32_t 64 bits wide)
  • Use RadioGatun[64] instead of RadioGatun[32] (because of how I wrote the RadioGatun code, this is a fairly quick change)
  • Replace the hash compression core with one using 64-bit instead of 32-bit operations. The most difficult part of this is making a 63-bit random prime number; we can quickly brute-force test the primality of a 31-bit number, but not for a 63-bit number.
No, I don’t have plans to do this before MaraDNS 2.0 is released.