原文地址:http://scienceblogs.com/goodmath/2007/04/15/strange-loops-dennis-ritchie-a/
I’m currently reading “I am a Strange Loop” by Douglas Hofstadter. I’ll be posting a review of it after I finish it. A “strange loop” is Hofstadter’s term for a Gödel-esque self-referential cycle. A strange loop doesn’t have to involve Gödel style problems – any self-referential cycle is a strange loop.
Reading this book reminded me of my favorite strange-loop story. It’s actually
a story about software security, and the kinds of stunts you can play with
software if you’re clever and subtle. It’s the story of the Unix C compiler, and the virtually invisible back-door security hole inserted into it by Ken Thompson – a story he told in his Turing award lecture..
(Idiot that I am, I originally said it was Dennis Ritchie who did this… Leave it to me to link to the original lecture, and not notice that I got the author wrong!)
In Unix systems, there’s a program named “login“. login is the code that takes your username and password, verifies that the password you gave is the correct one for the username you gave, and if so, logs you in to the system.
For debugging purposes, Thompson put a back-door into “login”. The way he did it was by modifying the C compiler. He took the code pattern for password verification, and embedded it into the C compiler, so that when it saw that pattern, it would actually generate code
that accepted either the correct password for the username, or Thompson’s special debugging password. In pseudo-Python:
def compile(code):
if (looksLikeLoginCode(code)):
generateLoginWithBackDoor()
else:
compileNormally(code)
With that in the C compiler, any time that anyone compiles login,
the code generated by the compiler will include Ritchie’s back door.
Now comes the really clever part. Obviously, if anyone saw code like what’s in that
example, they’d throw a fit. That’s insanely insecure, and any manager who saw that would immediately demand that it be removed. So, how can you keep the back door, but get rid of the danger of someone noticing it in the source code for the C compiler? You hack the C compiler itself:
def compile(code):
if (looksLikeLoginCode(code)):
generateLoginWithBackDoor(code)
elif (looksLikeCompilerCode(code)):
generateCompilerWithBackDoorDetection(code)
else:
compileNormally(code)
What happens here is that you modify the C compiler code so that when it compiles itelf, it inserts the back-door code. So now when the C compiler compiles login, it will insert the back door code; and when it compiles
the C compiler, it will insert the code that inserts the code into both login and the C compiler.
Now, you compile the C compiler with itself – getting a C compiler that includes the back-door generation code explicitly. Then you delete the back-door code from the C compiler source. But it’s in the binary. So when you use that binary to produce a new version of the compiler from the source, it will insert the back-door code into
the new version.
So you’ve now got a C compiler that inserts back-door code when it compiles itself – and that code appears nowhere in the source code of the compiler. It did exist in the code at one point – but then it got deleted. But because the C compiler is written in C, and always compiled with itself, that means thats each successive new version of the C compiler will pass along the back-door – and it will continue to appear in both login and in the C compiler, without any trace in the source code of either.
I’ve tried to illustrate this with the diagram to the right. The arrows with solid heads are inputs to a compiler; the lines with hollow heads are outputs from the compiler. The solid lines are the original compilation paths: the C compiler source code (without back-door) gets compiled by the C compiler, generating a new (backdoor-free) C-compiler executable; the login source code gets compiled by the C compiler, generating a new login executable without any back door; and the C compiler with backdoor code generates a new C compiler executable with the back door.
Now the tricky part. The dotted lines are what happens after you compile the C compiler with the back-door code included. In thofse paths, code is getting compiled by the C compiler with the back-door. When you compile the login source code with the new C compiler, you get a new login executable which contains back-door code that wasn’t in the source; and when you compiler the C compiler source code with the back-door C compiler, it will generate a C compiler executable with the back-door code included.
So you’ve got a loop now: the C compiler will recognize the logic source and insert a back door into it; and using self-reference, the C-compiler will recognize itself, and insert the back door into any versions of itself.