Google工程师:复杂是软件的死敌
2011-04-24 21:50
作者:陈秋歌
Google开发工程师Evan Martin近日在其个人网站发表了一篇博文《Complexity is the enemy》,文章中指出复杂是软件的死敌,新代码的引入是否增加了软件的复杂度,是否应该加入,要依据是否符合项目特定设计目标来判定,在文末作者指出应该像C语言那样写Python代码。现把此文进行了翻译,全文如下:
这是我在Google工作的第七个年头了,在Google我学到了很多东西,远比我可以写下来的多得多。我想我至少可以和你们分享其中的一些。
复杂是软件的死敌,它很难估值,常慢慢地混入到软件开发中。它像一个逐渐变烂的脓包,发现它时,为时已晚。从另一方面来讲,增加复杂度可以帮你解一时之忧:一个新的间接层允许增加新的特性X,但同时你需要增加另外一个间接层;把运行在一个机器上的过程分隔成运行于两个机器上的过程,可以帮你解决当前遇到的扩展难题,但你同时也必须实现一个RPC层,来管理这两个机器。
上面所说的现象在开发者新人中和在老手中一样突出。通过这几年的工作,我认为我已经可以很好地在这方面达到平衡,什么时候应该增加软件的复杂性,什么时候应该拒绝。我常常回想一个朋友对Ken Thompson所开发的Go语言编译器的评价:它很快,因为它只做很少的工作,它的代码十分简单易懂。
写一篇长长的博客容易,而用简短的话来概括相同的观点却很难,同样的道理,开发一款简小而优秀的软件是很困难的。在程序语言设计中,此种现像很普遍。新手所开发的新语言包含过多的属性,很少具有C语言的简明和清晰。在今天的程序开发中,程序的优劣与其包含多少个对象有关,在分布式系统中,则与有多少个可移动的部分有关。
针对此问题的另一个词语是“精巧”:再引用这位C语言大牛的一句话,“调试代码比写代码困难两倍之多,所以,你如果写的代码尽可能的精巧,理论来讲,你很难对它进行完美地调试。”
什么可以帮助解决这个问题呢?是否只能依靠经验呢?我发现,通过特定的设计目标来评估新代码可能会有帮助。如果你说“这并不能帮助解决项目的最初目标”,那么可以很容易地把新代码否定掉。在Google,每个新项目的设计模版文档的开头都有一个“ non-goals”列表:你应该拒绝的合理的项目扩展。
很讽刺的是,我发现了一个很“差劲”的工具,它可以帮助减低软件的复杂度。用C语言写一段很复杂的程序很难,因为它所能实现的功能有限。C语言通常会使用大量的数组,而且你只能使用这些数组,但是这些数组功能很强大——可以压缩存储器表达式,如O(1) ,可以很好的定位数据位置。我从未有意地提倡使用这个“差劲”工具,然而我所得到的应验是:像C语言那样写Python代码。
Complexity is the enemy
Here's a collection of random notes on software. I also write a separate collection of notes about Chromium, the browser better known as Google Chrome. See the archive for more posts, and the LiveJournal I used to use before this.
Please note this is not the official word of Google; I'm just a programmer.
You're reading a single entry. Go back to the front page for more.
2011/04/23 12:06
» Complexity is the enemy
I'm almost through my seventh year working at Google(!). I have learned many things there, more than I could ever write down. I thought I would at least share with you something that's only come to me with more experience.
Complexity is the death of software. It's hard to quantify the cost of, and it tends to creep in slowly, so it's a slow boil of getting worse that's hard to see until it's too late. On the other side, frequently it's easy to see a benefit of increasing complexity: a new layer of indirection allows new feature X, or splitting a process that ran on one machine into two allows you to surmount your current scaling hurdle. But now you must keep another layer of indirection in your head, or implement an RPC layer and manage two machines.
The above is hopefully just as obvious to a new programmer as it is to a veteran. What I think I've learned through my few years in the industry is a better understanding of how the balance works out; when complexity is warranted and when it should be rejected. I frequently think back to a friend's comment on the Go compiler written by Ken Thompson: it's fast because it just doesn't do much, the code is very straightforward.
It turns out that, much like it's easier to write a long blog post than it is to make the same point succinctly, it's difficult to write software that is straightforward. This is easiest to see in programming langauge design; new languages by novices tend to have lots of features, while few have the crisp clarity of C. In today's programs it's frequently related to how many objects are involved; in distributed systems it's about how many moving parts there are.
Another word for this problem is cleverness: to quote another one of the C hackers, "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
What helps? I wonder if it maybe just comes down to experience -- getting bitten by one too many projects where someone thought metaprogramming was cool. But I've found having specific design goals to evaluate new code by can help. It's easier to reject new code if you can say "this does not help solve the initial goals of the project". Within Google the template document for describing the design of a new project has a section right at the top to list non-goals: reasonable extensions of the project that you intend to reject.
Ironically, I've found that using weaker tools can help with complexity. It's hard to write a complicated C program because it can't do very much. C programs tend to use lots of arrays because that's all you get, but it turns out that arrays are great -- compact memory representation, O(1) access, good data locality. I'd never advocate intentionally using a weak tool, though. Instead, my lesson has been: write Python code like it was C.