编程警世录(1)Herb Sutter:告别免费午餐时代,迎接20年来最大编程变革

Herb Sutter:告别免费午餐时代,迎接20年来最大编程变革
(2004.12.31)   来自:CSDN  
 

在将要发表在DDJ杂志2005年3月号的一篇文章中,C++专家Herb Sutter这样写道:“免费午餐的时代很快就要结束了。软件开发业即将迎来自OO革命之后最大的变革,它的名字叫‘并发’……”

Herb Sutter在文中指出,现在的程序员对效率、伸缩性、吞吐量等性能指标相当忽视,很多性能问题都仰仗越来越快的CPU来解决。但CPU的速度很快将偏离摩尔定律的轨迹,并达到一个极限。然后,越来越多的应用程序将必须直面性能问题,并必须依靠并发编程来解决这些问题。然而至少从现在的情况来看,并发编程的难度已经超出了大部分主流程序员能够解决的水平。因此,深入了解和学习并发编程已经成为众多程序员的一个重要发展方向。

Herb Sutter全文请看:
http://www.gotw.ca/publications/concurrency-ddj.htm

The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

The biggest sea change in software development since the OO revolution is knocking at the door, and its name is Concurrency.

This article will appear in Dr. Dobb's Journal, 30(3), March 2005. A much briefer version under the title "The Concurrency Revolution" will appear in C/C++ Users Journal, 23(2), February 2005.

 

Your free lunch will soon be over. What can you do about it? What are you doing about it?

The major processor manufacturers and architectures, from Intel and AMD to Sparc and PowerPC, have run out of room with most of their traditional approaches to boosting CPU performance. Instead of driving clock speeds and straight-line instruction throughput ever higher, they are instead turning en masse to hyperthreading and multicore architectures. Both of these features are already available on chips today; in particular, multicore is available on current PowerPC and Sparc IV processors, and is coming in 2005 from Intel and AMD. Indeed, the big theme of the 2004 In-Stat/MDR Fall Processor Forum was multicore devices, as many companies showed new or updated multicore processors. Looking back, it’s not much of a stretch to call 2004 the year of multicore.

And that puts us at a fundamental turning point in software development, at least for the next few years and for applications targeting general-purpose desktop computers and low-end servers (which happens to account for the vast bulk of the dollar value of software sold today). In this article, I’ll describe the changing face of hardware, why it suddenly does matter to software, and how specifically it matters to you and is going to change the way you will likely be writing software in the future.

Arguably, the free lunch has already been over for a year or two, only we’re just now noticing.

The Free Performance Lunch

There’s an interesting phenomenon that’s known as “Andy giveth, and Bill taketh away.” No matter how fast processors get, software consistently finds new ways to eat up the extra speed. Make a CPU ten times as fast, and software will usually find ten times as much to do (or, in some cases, will feel at liberty to do it ten times less efficiently). Most classes of applications have enjoyed free and regular performance gains for several decades, even without releasing new versions or doing anything special, because the CPU manufacturers (primarily) and memory and disk manufacturers (secondarily) have reliably enabled ever-newer and ever-faster mainstream systems. Clock speed isn’t the only measure of performance, or even necessarily a good one, but it’s an instructive one: We’re used to seeing 500MHz CPUs give way to 1GHz CPUs give way to 2GHz CPUs, and so on. Today we’re in the 3GHz range on mainstream computers.

The key question is: When will it end? After all, Moore’s Law predicts exponential growth, and clearly exponential growth can’t continue forever before we reach hard physical limits; light isn’t getting any faster. The growth must eventually slow down and even end. (Caveat: Yes, Moore’s Law applies principally to transistor densities, but the same kind of exponential growth has occurred in related areas such as clock speeds. There’s even faster growth in other spaces, most notably the data storage explosion, but that important trend belongs in a different article.)

If you’re a software developer, chances are that you have already been riding the “free lunch” wave of desktop computer performance. Is your application’s performance borderline for some local operations? “Not to worry,” the conventional (if suspect) wisdom goes; “tomorrow’s processors will have even more throughput, and anyway today’s applications are increasingly throttled by factors other than CPU throughput and memory speed (e.g., they’re often I/O-bound, network-bound, database-bound).” Right?

Right enough, in the past. But dead wrong for the foreseeable future.

The good news is that processors are going to continue to become more powerful. The bad news is that, at least in the short term, the growth will come mostly in directions that do not take most current applications along for their customary free ride.

Over the past 30 years, CPU designers have achieved performance gains in three main areas, the first two of which focus on straight-line execution flow:

clock speed

execution optimization

cache

Increasing clock speed is about getting more cycles. Running the CPU faster more or less directly means doing the same work faster.

Optimizing execution flow is about doing more work per cycle. Today’s CPUs sport some more powerful instructions, and they perform optimizations that range from the pedestrian to the exotic, including pipelining, branch prediction, executing multiple instructions in the same clock cycle(s), and even reordering the instruction stream for out-of-order execution. These techniques are all designed to make the instructions flow better and/or execute faster, and to squeeze the most work out of each clock cycle by reducing latency and maximizing the work accomplished per clock cycle.

Chip designers are under so much pressure to deliver ever-faster CPUs that they’ll risk changing the meaning of your program, and possibly break it, in order to make it run faster

Brief aside on instruction reordering and memory models: Note that some of what I just called “optimizations” are actually far more than optimizations, in that they can change the meaning of programs and cause visible effects that can break reasonable programmer expectations. This is significant. CPU designers are generally sane and well-adjusted folks who normally wouldn’t hurt a fly, and wouldn’t think of hurting your code… normally. But in recent years they have been willing to pursue aggressive optimizations just to wring yet more speed out of each cycle, even knowing full well that these aggressive rearrangements could endanger the semantics of your code. Is this Mr. Hyde making an appearance? Not at all. That willingness is simply a clear indicator of the extreme pressure the chip designers face to deliver ever-faster CPUs; they’re under so much pressure that they’ll risk changing the meaning of your program, and possibly break it, in order to make it run faster. Two noteworthy examples in this respect are write reordering and read reordering: Allowing a processor to reorder write operations has consequences that are so surprising, and break so many programmer expectations, that the feature generally has to be turned off because it’s too difficult for programmers to reason correctly about the meaning of their programs in the presence of arbitrary write reordering. Reordering read operations can also yield surprising visible effects, but that is more commonly left enabled anyway because it isn’t quite as hard on programmers, and the demands for performance cause designers of operating systems and operating environments to compromise and choose models that place a greater burden on programmers because that is viewed as a lesser evil than giving up the optimization opportunities.

Finally, increasing the size of on-chip cache is about staying away from RAM. Main memory continues to be so much slower than the CPU that it makes sense to put the data closer to the processor—and you can’t get much closer than being right on the die. On-die cache sizes have soared, and today most major chip vendors will sell you CPUs that have 2MB and more of on-board L2 cache. (Of these three major historical approaches to boosting CPU performance, increasing cache is the only one that will continue in the near term. I’ll talk a little more about the importance of cache later on.)

Okay. So what does this mean?

A fundamentally important thing to recognize about this list is that all of these areas are concurrency-agnostic. Speedups in any of these areas will directly lead to speedups in sequential (nonparallel, single-threaded, single-process) applications, as well as applications that do make use of concurrency. That’s important, because the vast majority of today’s applications are single-threaded, for good reasons that I’ll get into further below.

Of course, compilers have had to keep up; sometimes you need to recompile your application, and target a specific minimum level of CPU, in order to benefit from new instructions (e.g., MMX, SSE) and some new CPU features and characteristics. But, by and large, even old applications have always run significantly faster—even without being recompiled to take advantage of all the new instructions and features offered by the latest CPUs.

That world was a nice place to be. Unfortunately, it has already disappeared.

Obstacles, and Why You Don’t Have 10GHz Today

 

Figure 1: Intel CPU Introductions (sources: Intel, Wikipedia)

CPU performance growth as we have known it hit a wall two years ago. Most people have only recently started to notice.

You can get similar graphs for other chips, but I’m going to use Intel data here. Figure 1 graphs the history of Intel chip introductions by clock speed and number of transistors. The number of transistors continues to climb, at least for now. Clock speed, however, is a different story.

Around the beginning of 2003, you’ll note a disturbing sharp turn in the previous trend toward ever-faster CPU clock speeds. I’ve added lines to show the limit trends in maximum clock speed; instead of continuing on the previous path, as indicated by the thin dotted line, there is a sharp flattening. It has become harder and harder to exploit higher clock speeds due to not just one but several physical issues, notably heat (too much of it and too hard to dissipate), power consumption (too high), and current leakage problems.

Quick: What’s the clock speed on the CPU(s) in your current workstation? Are you running at 10GHz? On Intel chips, we reached 2GHz a long time ago (August 2001), and according to CPU trends before 2003, now in early 2005 we should have the first 10GHz Pentium-family chips. A quick look around shows that, well, actually, we don’t. What’s more, such chips are not even on the horizon—we have no good idea at all about when we might see them appear.

Well, then, what about 4GHz? We’re at 3.4GHz already—surely 4GHz can’t be far away? Alas, even 4GHz seems to be remote indeed. In mid-2004, as you probably know, Intel first delayed its planned introduction of a 4GHz chip until 2005, and then in fall 2004 it officially abandoned its 4GHz plans entirely. As of this writing, Intel is planning to ramp up a little further to 3.73GHz in early 2005 (already included in Figure 1 as the upper-right-most dot), but the clock race really is over, at least for now; Intel’s and most processor vendors’ future lies elsewhere as chip companies aggressively pursue the same new multicore directions.

We’ll probably see 4GHz CPUs in our mainstream desktop machines someday, but it won’t be in 2005. Sure, Intel has samples of their chips running at even higher speeds in the lab—but only by heroic efforts, such as attaching hideously impractical quantities of cooling equipment. You won’t have that kind of cooling hardware in your office any day soon, let alone on your lap while computing on the plane.

TANSTAAFL: Moore’s Law and the Next Generation(s)

“There ain’t no such thing as a free lunch.” —R. A. Heinlein, The Moon Is a Harsh Mistress

Does this mean Moore’s Law is over? Interestingly, the answer in general seems to be no. Of course, like all exponential progressions, Moore’s Law must end someday, but it does not seem to be in danger for a few more years yet. Despite the wall that chip engineers have hit in juicing up raw clock cycles, transistor counts continue to explode and it seems CPUs will continue to follow Moore’s Law-like throughput gains for some years to come.

Myths and Realities: 2 x 3GHz < 6 GHz

So a dual-core CPU that combines two 3GHz cores practically offers 6GHz of processing power. Right?

Wrong. Even having two threads running on two physical processors doesn’t mean getting two times the performance. Similarly, most multi-threaded applications won’t run twice as fast on a dual-core box. They should run faster than on a single-core CPU; the performance gain just isn’t linear, that’s all.

Why not? First, there is coordination overhead between the cores to ensure cache coherency (a consistent view of cache, and of main memory) and to perform other handshaking. Today, a two- or four-processor machine isn’t really two or four times as fast as a single CPU even for multi-threaded applications. The problem remains essentially the same even when the CPUs in question sit on the same die.

Second, unless the two cores are running different processes, or different threads of a single process that are well-written to run independently and almost never wait for each other, they won’t be well utilized. (Despite this, I will speculate that today’s single-threaded applications as actually used in the field could actually see a performance boost for most users by going to a dual-core chip, not because the extra core is actually doing anything useful, but because it is running the adware and spyware that infest many users’ systems and are otherwise slowing down the single CPU that user has today. I leave it up to you to decide whether adding a CPU to run your spyware is the best solution to that problem.)

If you’re running a single-threaded application, then the application can only make use of one core. There should be some speedup as the operating system and the application can run on separate cores, but typically the OS isn’t going to be maxing out the CPU anyway so one of the cores will be mostly idle. (Again, the spyware can share the OS’s core most of the time.)

The key difference, which is the heart of this article, is that the performance gains are going to be accomplished in fundamentally different ways for at least the next couple of processor generations. And most current applications will no longer benefit from the free ride without significant redesign.

For the near-term future, meaning for the next few years, the performance gains in new chips will be fueled by three main approaches, only one of which is the same as in the past. The near-term future performance growth drivers are:

hyperthreading

multicore

cache

Hyperthreading is about running two or more threads in parallel inside a single CPU. Hyperthreaded CPUs are already available today, and they do allow some instructions to run in parallel. A limiting factor, however, is that although a hyper-threaded CPU has some extra hardware including extra registers, it still has just one cache, one integer math unit, one FPU, and in general just one each of most basic CPU features. Hyperthreading is sometimes cited as offering a 5% to 15% performance boost for reasonably well-written multi-threaded applications, or even as much as 40% under ideal conditions for carefully written multi-threaded applications. That’s good, but it’s hardly double, and it doesn’t help single-threaded applications.

Multicore is about running two or more actual CPUs on one chip. Some chips, including Sparc and PowerPC, have multicore versions available already. The initial Intel and AMD designs, both due in 2005, vary in their level of integration but are functionally similar. AMD’s seems to have some initial performance design advantages, such as better integration of support functions on the same die, whereas Intel’s initial entry basically just glues together two Xeons on a single die. The performance gains should initially be about the same as having a true dual-CPU system (only the system will be cheaper because the motherboard doesn’t have to have two sockets and associated “glue” chippery), which means something less than double the speed even in the ideal case, and just like today it will boost reasonably well-written multi-threaded applications. Not single-threaded ones.

Finally, on-die cache sizes can be expected to continue to grow, at least in the near term. Of these three areas, only this one will broadly benefit most existing applications. The continuing growth in on-die cache sizes is an incredibly important and highly applicable benefit for many applications, simply because space is speed. Accessing main memory is expensive, and you really don’t want to touch RAM if you can help it. On today’s systems, a cache miss that goes out to main memory often costs 10 to 50 times as much getting the information from the cache; this, incidentally, continues to surprise people because we all think of memory as fast, and it is fast compared to disks and networks, but not compared to on-board cache which runs at faster speeds. If an application’s working set fits into cache, we’re golden, and if it doesn’t, we’re not. That is why increased cache sizes will save some existing applications and breathe life into them for a few more years without requiring significant redesign: As existing applications manipulate more and more data, and as they are incrementally updated to include more code for new features, performance-sensitive operations need to continue to fit into cache. As the Depression-era old-timers will be quick to remind you, “Cache is king.”

(Aside: Here’s an anecdote to demonstrate “space is speed” that recently hit my compiler team. The compiler uses the same source base for the 32-bit and 64-bit compilers; the code is just compiled as either a 32-bit process or a 64-bit one. The 64-bit compiler gained a great deal of baseline performance by running on a 64-bit CPU, principally because the 64-bit CPU had many more registers to work with and had other code performance features. All well and good. But what about data? Going to 64 bits didn’t change the size of most of the data in memory, except that of course pointers in particular were now twice the size they were before. As it happens, our compiler uses pointers much more heavily in its internal data structures than most other kinds of applications ever would. Because pointers were now 8 bytes instead of 4 bytes, a pure data size increase, we saw a significant increase in the 64-bit compiler’s working set. That bigger working set caused a performance penalty that almost exactly offset the code execution performance increase we’d gained from going to the faster processor with more registers. As of this writing, the 64-bit compiler runs at the same speed as the 32-bit compiler, even though the source base is the same for both and the 64-bit processor offers better raw processing throughput. Space is speed.)

But cache is it. Hyperthreading and multicore CPUs will have nearly no impact on most current applications.

So what does this change in the hardware mean for the way we write software? By now you’ve probably noticed the basic answer, so let’s consider it and its consequences.

What This Means For Software: The Next Revolution

In the 1990s, we learned to grok objects. The revolution in mainstream software development from structured programming to object-oriented programming was the greatest such change in the past 20 years, and arguably in the past 30 years. There have been other changes, including the most recent (and genuinely interesting) naissance of web services, but nothing that most of us have seen during our careers has been as fundamental and as far-reaching a change in the way we write software as the object revolution.

Until now.

Starting today, the performance lunch isn’t free any more. Sure, there will continue to be generally applicable performance gains that everyone can pick up, thanks mainly to cache size improvements. But if you want your application to benefit from the continued exponential throughput advances in new processors, it will need to be a well-written concurrent (usually multithreaded) application. And that’s easier said than done, because not all problems are inherently parallelizable and because concurrent programming is hard.

I can hear the howls of protest: “Concurrency? That’s not news! People are already writing concurrent applications.” That’s true. Of a small fraction of developers.

Remember that people have been doing object-oriented programming since at least the days of Simula in the late 1960s. But OO didn’t become a revolution, and dominant in the mainstream, until the 1990s. Why then? The reason the revolution happened was primarily that our industry was driven by requirements to write larger and larger systems that solved larger and larger problems and exploited the greater and greater CPU and storage resources that were becoming available. OOP’s strengths in abstraction and dependency management made it a necessity for achieving large-scale software development that is economical, reliable, and repeatable.

Concurrency is the next major revolution in how we write software

Similarly, we’ve been doing concurrent programming since those same dark ages, writing coroutines and monitors and similar jazzy stuff. And for the past decade or so we’ve witnessed incrementally more and more programmers writing concurrent (multi-threaded, multi-process) systems. But an actual revolution marked by a major turning point toward concurrency has been slow to materialize. Today the vast majority of applications are single-threaded, and for good reasons that I’ll summarize in the next section.

By the way, on the matter of hype: People have always been quick to announce “the next software development revolution,” usually about their own brand-new technology. Don’t believe it. New technologies are often genuinely interesting and sometimes beneficial, but the biggest revolutions in the way we write software generally come from technologies that have already been around for some years and have already experienced gradual growth before they transition to explosive growth. This is necessary: You can only base a software development revolution on a technology that’s mature enough to build on (including having solid vendor and tool support), and it generally takes any new software technology at least seven years before it’s solid enough to be broadly usable without performance cliffs and other gotchas. As a result, true software development revolutions like OO happen around technologies that have already been undergoing refinement for years, often decades. Even in Hollywood, most genuine “overnight successes” have really been performing for many years before their big break.

Concurrency is the next major revolution in how we write software. Different experts still have different opinions on whether it will be bigger than OO, but that kind of conversation is best left to pundits. For technologists, the interesting thing is that concurrency is of the same order as OO both in the (expected) scale of the revolution and in the complexity and learning curve of the technology.

Benefits and Costs of Concurrency

There are two major reasons for which concurrency, especially multithreading, is already used in mainstream software. The first is to logically separate naturally independent control flows; for example, in a database replication server I designed it was natural to put each replication session on its own thread, because each session worked completely independently of any others that might be active (as long as they weren’t working on the same database row). The second and less common reason to write concurrent code in the past has been for performance, either to scalably take advantage of multiple physical CPUs or to easily take advantage of latency in other parts of the application; in my database replication server, this factor applied as well and the separate threads were able to scale well on multiple CPUs as our server handled more and more concurrent replication sessions with many other servers.

There are, however, real costs to concurrency. Some of the obvious costs are actually relatively unimportant. For example, yes, locks can be expensive to acquire, but when used judiciously and properly you gain much more from the concurrent execution than you lose on the synchronization, if you can find a sensible way to parallelize the operation and minimize or eliminate shared state.

Perhaps the second-greatest cost of concurrency is that not all applications are amenable to parallelization. I’ll say more about this later on.

Probably the greatest cost of concurrency is that concurrency really is hard: The programming model, meaning the model in the programmer’s head that he needs to reason reliably about his program, is much harder than it is for sequential control flow.

Everybody who learns concurrency thinks they understand it, ends up finding mysterious races they thought weren’t possible, and discovers that they didn’t actually understand it yet after all. As the developer learns to reason about concurrency, they find that usually those races can be caught by reasonable in-house testing, and they reach a new plateau of knowledge and comfort. What usually doesn’t get caught in testing, however, except in shops that understand why and how to do real stress testing, is those latent concurrency bugs that surface only on true multiprocessor systems, where the threads aren’t just being switched around on a single processor but where they really do execute truly simultaneously and thus expose new classes of errors. This is the next jolt for people who thought that surely now they know how to write concurrent code: I’ve come across many teams whose application worked fine even under heavy and extended stress testing, and ran perfectly at many customer sites, until the day that a customer actually had a real multiprocessor machine and then deeply mysterious races and corruptions started to manifest intermittently. In the context of today’s CPU landscape, then, redesigning your application to run multithreaded on a multicore machine is a little like learning to swim by jumping into the deep end—going straight to the least forgiving, truly parallel environment that is most likely to expose the things you got wrong. Even when you have a team that can reliably write safe concurrent code, there are other pitfalls; for example, concurrent code that is completely safe but isn’t any faster than it was on a single-core machine, typically because the threads aren’t independent enough and share a dependency on a single resource which re-serializes the program’s execution. This stuff gets pretty subtle.

The vast majority of programmers today don’t grok concurrency, just as the vast majority of programmers 15 years ago didn’t yet grok objects

Just as it is a leap for a structured programmer to learn OO (what’s an object? what’s a virtual function? how should I use inheritance? and beyond the “whats” and “hows,” why are the correct design practices actually correct?), it’s a leap of about the same magnitude for a sequential programmer to learn concurrency (what’s a race? what’s a deadlock? how can it come up, and how do I avoid it? what constructs actually serialize the program that I thought was parallel? how is the message queue my friend? and beyond the “whats” and “hows,” why are the correct design practices actually correct?).

The vast majority of programmers today don’t grok concurrency, just as the vast majority of programmers 15 years ago didn’t yet grok objects. But the concurrent programming model is learnable, particularly if we stick to message- and lock-based programming, and once grokked it isn’t that much harder than OO and hopefully can become just as natural. Just be ready and allow for the investment in training and time, for your and for your team.

(I deliberately limit the above to message- and lock-based concurrent programming models. There is also lock-free programming, supported most directly at the language level in Java 5 and in at least one popular C++ compiler. But concurrent lock-free programming is known to be very much harder for programmers to understand and reason about than even concurrent lock-based programming. Most of the time, only systems and library writers should have to understand lock-free programming, although virtually everybody should be able to take advantage of the lock-free systems and libraries those people produce.)

What It Means For Us

Okay, back to what it means for us.

1. The clear primary consequence we’ve already covered is that applications will increasingly need to be concurrent if they want to fully exploit CPU throughput gains that have now started becoming available and will continue to materialize over the next several years. For example, Intel is talking about someday producing 100-core chips; a single-threaded application can exploit at most 1/100 of such a chip’s potential throughput. “Oh, performance doesn’t matter so much, computers just keep getting faster” has always been a naïve statement to be viewed with suspicion, and for the near future it will almost always be simply wrong.

Applications will increasingly need to be concurrent if they want to fully exploit continuing exponential CPU throughput gains

Efficiency and performance optimization will get more, not less, important

Now, not all applications (or, more precisely, important operations of an application) are amenable to parallelization. True, some problems, such as compilation, are almost ideally parallelizable. But others aren’t; the usual counterexample here is that just because it takes one woman nine months to produce a baby doesn’t imply that nine women could produce one baby in one month. You’ve probably come across that analogy before. But did you notice the problem with leaving the analogy at that? Here’s the trick question to ask the next person who uses it on you: Can you conclude from this that the Human Baby Problem is inherently not amenable to parallelization? Usually people relating this analogy err in quickly concluding that it demonstrates an inherently nonparallel problem, but that’s actually not necessarily correct at all. It is indeed an inherently nonparallel problem if the goal is to produce one child. It is actually an ideally parallelizable problem if the goal is to produce many children! Knowing the real goals can make all the difference. This basic goal-oriented principle is something to keep in mind when considering whether and how to parallelize your software.

2. Perhaps a less obvious consequence is that applications are likely to become increasingly CPU-bound. Of course, not every application operation will be CPU-bound, and even those that will be affected won’t become CPU-bound overnight if they aren’t already, but we seem to have reached the end of the “applications are increasingly I/O-bound or network-bound or database-bound” trend, because performance in those areas is still improving rapidly (gigabit WiFi, anyone?) while traditional CPU performance-enhancing techniques have maxed out. Consider: We’re stopping in the 3GHz range for now. Therefore single-threaded programs are likely not to get much faster any more for now except for benefits from further cache size growth (which is the main good news). Other gains are likely to be incremental and much smaller than we’ve been used to seeing in the past, for example as chip designers find new ways to keep pipelines full and avoid stalls, which are areas where the low-hanging fruit has already been harvested. The demand for new application features is unlikely to abate, and even more so the demand to handle vastly growing quantities of application data is unlikely to stop accelerating. As we continue to demand that programs do more, they will increasingly often find that they run out of CPU to do it unless they can code for concurrency.

There are two ways to deal with this sea change toward concurrency. One is to redesign your applications for concurrency, as above. The other is to be frugal, by writing code that is more efficient and less wasteful. This leads to the third interesting consequence:

3. Efficiency and performance optimization will get more, not less, important. Those languages that already lend themselves to heavy optimization will find new life; those that don’t will need to find ways to compete and become more efficient and optimizable. Expect long-term increased demand for performance-oriented languages and systems.

4. Finally, programming languages and systems will increasingly be forced to deal well with concurrency. The Java language has included support for concurrency since its beginning, although mistakes were made that later had to be corrected over several releases in order to do concurrent programming more correctly and efficiently. The C++ language has long been used to write heavy-duty multithreaded systems well, but it has no standardized support for concurrency at all (the ISO C++ standard doesn’t even mention threads, and does so intentionally), and so typically the concurrency is of necessity accomplished by using nonportable platform-specific concurrency features and libraries. (It’s also often incomplete; for example, static variables must be initialized only once, which typically requires that the compiler wrap them with a lock, but many C++ implementations do not generate the lock.) Finally, there are a few concurrency standards, including pthreads and OpenMP, and some of these support implicit as well as explicit parallelization. Having the compiler look at your single-threaded program and automatically figure out how to parallelize it implicitly is fine and dandy, but those automatic transformation tools are limited and don’t yield nearly the gains of explicit concurrency control that you code yourself.

Conclusion

If you haven’t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency. Now is also the time for you and your team to grok concurrent programming’s requirements, pitfalls, styles, and idioms.

A few rare classes of applications are naturally parallelizable, but most aren’t. Even when you know exactly where you’re CPU-bound, you may well find it difficult to figure out how to parallelize those operations; all the most reason to start thinking about it now. Implicitly parallelizing compilers can help a little, but don’t expect much; they can’t do nearly as good a job of parallelizing your sequential program as you could do by turning it into an explicitly parallel and threaded version.

Thanks to continued cache growth and probably a few more incremental straight-line control flow optimizations, the free lunch will continue a little while longer; but starting today the buffet will only be serving that one entrée and that one dessert. The filet mignon of throughput gains is still on the menu, but now it costs extra—extra development effort, extra code complexity, and extra testing effort. The good news is that for many classes of applications the extra effort will be worthwhile, because concurrency will let them fully exploit the continuing exponential gains in processor throughput.

开源J2EE技术专家Rickard Oberg表示出了同样的担忧,甚至把Sutter这篇文章称为“本年度最重要的技术文章”(尽管它还没有正式发表)。他认为技术高手(原文是“intellectual”,即“知识分子”)们可以围坐在桌边解决并发问题,但他们得到的解决方案对于普通水准的主流程序员来说仍然将是难以使用的。因此“很多的应用程序将遇到大麻烦”。

但Oberg同时也认为,即使CPU的速度提升真的到了极限,在可预见的将来,大部分应用的大部分代码仍然可以继续享受“免费的午餐”,只有少数代码需要开始面对并发问题。他认为两者的关系就像经典力学与量子力学:尽管世界的真相是量子和概率,但我们在大多数时候仍然可以相信物质的实在性和牛顿力学。

Rickard Oberg的评论全文请看:
http://jroller.com/page/rickard/20041231#most_important_article_of_the

Most important article of the year

This is probably the most important article of the year, even if it has not yet been published.

It is important because it highlights several key problem areas: 1) Technical: CPU's won't expand as rapidly 2) Human: dealing with concurrency will be(/is a ) a *huge* knowledge problem 3) Time: we have to start dealing with this *now*

I have had a really bad stomach feeling about this problem for the past six months or so, and this article confirmed my fears. We are collectively in big doodoo if these problems aren't dealt with. What is especially important is to be able to deal with them in a technical way that can be easily used by mainstream programmers.

This is the real problem. Finding tech to deal with it is "easy". Technological intellectuals can probably sit down around a table and find a solution that works. Finding tech to deal with it that people can use is going to be very difficult.

I suppose the first question is: what happens if we let go of the idea that data exists in one place only?

That's a biggie on its own.

We also have to start thinking about not only performance, but also scalability and throughput. This inevitably ties back into how to deal with concurrent programming, and by extent, concurrent distributed programming. And not because we want to do it, but because (as the articles describes) we will have to.

Which brings the next big question: *how* important is it that "data" is always atomically consistent? By data I mean (at least) the raw data, indexes for querying, and caches (local and non-local). Are there architectures that allow for data to not be globally consistent, but still yield deterministic results? To mandate global consistency at all time is probably a fallacy, as it will not scale well enough.

Next question: how "common" is it that an application requires absolute determinism and consistency, and when is "good enough" enough?

To me this whole issue can be likened to the problems of classic and quantum physics. The "free lunch" has been that we can think of stuff as particles. They have position, size, and velocity. But, in "reality" (loosely defined) they are only probabilities, i.e. they are not "consistent" or "deterministic" in computer terms. Should we think of data in our world the same way? That we can sometimes enjoy the illusion of "free lunch", in terms of a deterministic data set, but that we have to start writing architectures that can deal with "quantum" data that is not exactly quantifiable at all points in time, but which will allow us to break the boundaries into the world of concurrent distributed programming?

I don't know, but I think this is going to be vitally important questions in the very near future.

It also has an impact on how we will value new technologies such as web services, AOP, SEDA, JXTA, etc. The answers to the questions posed above will probably to a large extent mandate what technologies will be relevant for accomplishing these needs, and which will not be relevant.

Something to think about.

@ 03:28 上午 EST
 
 
 
 
Interesting new customer
We just got a pretty cool new customer: the IRS. Next year their website is going to be running on top of our code, and the performance tests we've done so far with them show that 4 Linux boxes with Tomcat is going to get us twice the performance they need on peak days (only one really: tax declaration day). Maybe that'll shut up the Tomcat-spankers. :-)

And maybe, just maybe, if JRockit starts working with CGLIB properly we can use that to get even better performance. But that remains to be tested. ;-)

There's going to be lots of interesting integration work to be done as well, and since they are a reference for WAI/WSAG support in Sweden we hope to be able to make the website as highly accessible as is technically possible.

@ 05:28 上午 EST
 
 
 
 
XStream + Xindice = !
Since it's been a while since my last update to the blog there's a ton of stuff that I'd like to comment on. But there's not much time these days, so I'm going to cover only my latest research project: using XStream to persist objects in Xindice (a native XML database).

So far we have been using Jisp to persist our objects, as it provides no-fuss and easy-to-use persistence. The good part is that it's very performant and fast to develop for. The bad part is that we have no querying capabilities, and it's difficult to work with the data as "just data". Doing schema migration (as described in earlier blog entries) is a bit tricky, although definitely doable. Currently pretty much all of our querying needs are covered by Lucene, which is used to index content, but being able to do querying would enable some cool new features.

One idea, then, would be to use XStream to "serialize" our data into XML instead of a binary format, and then use a native XML database to persist the objects. That way we can both get easy persistence (since XStream doesn't require any configuration) and a data-oriented way to work with the model at the same time.

I tried implementing this today, and it was really easy to do. XStream has SAX-integration, as does Xindice, so the performance was very nice. Creating and updating objects was sufficiently easy, and would even allow for better performance than Jisp since we can more easily integrate it with our loading policy stuff (see earlier posts on this feature). Xindice stores the data using a symbol table, so the XML doesn't take up much space.

All in all I now have the same features as before, but with the value-add that I can query the database for objects using XPath, and I can inspect and transform data using the XML interface. The fact that I can do XSL transformations of data in the database on the fly is definitely going to help.

Not sure if anyone else has used this combo before, but it seems like a decent choice for applications that want both an object-model and a data-model at the same time. As always, YMMV though.

@ 01:55 下午 EST
 
 
 
 
59,054,087
The question posed by the Mirror is "How can 59,054,087 people be so DUMB?". Well, that's obviously a good question (even though the number of votes not necessarily matches the number of people that voted), but there's a bright side. According to census.gov the US population is 281,421,906, which means that there are a possible 222,367,819 americans who are *not* dumb. There is hope!
@ 01:07 上午 EST
 
 
 
 
Election coverage
For all your election coverage needs, go see the Signs of the Times!
@ 03:08 下午 EST
 
 
 
 
EuroFoo thoughts
Last weekend I attended EuroFoo, which is the european version of Tim O'Reilly's "invite only" FooCamp geekfest (FOO=Friends Of O'Reilly). I've been to a number of similar "invite only" gatherings, and they are the by far best place to meet new friends and gather ideas. In evolutionary terms it is an amazing mixture of thoughts, which mutatates uncontrollably as different people find new angles to old problems and ideas. The format was that we had a number of rooms and a number of time slots. The first evening everyone (about 140 people) then got to introduce themselves and if an introduction was followed with cheers of "Session!" that person would fill up a slot with a talk on the favorite topic. People would then simply go to the sessions they found most interesting.

This gathering was different from others that I have attended, as I was one of the few Java-ites in the crowd. Most of the others were OpenSourcies of the PHP/Linux/C/Perl/Python kind, with a healthy dose of non-techie people that are more concerned with social/legal/political issues related to the OpenSource world. The most interesting sessions from my point of view were non-technical in nature.

But before I describe some of the sessions I went to I want to thank George Bush for making this gathering possible. As Tim pointed out during the opening session the main reason they did a EuroFoo was that a LOT of the invited people did not want to visit the US these days. It appears I'm not the only one who is not inclined to travel to such an obvious terrorist country. I suppose gatherings like this is one small benefit of the behaviour of the Bush administration.

Now, on with the sessions. Here are some thoughts from the sessions I attended.

Better Living Through RSS - Ben Hammersley

Ben described how he is using RSS to syndicate more or less all of his daily info streams, whether its blog and news or bug trackers and mailing lists. For those who are already hooked on RSS there was not much new, but the last part was dedicated to an interesting discussion of whether RSS could be used as an application delivery mechanism. Ben argued that since RSS can contain HTML it should be possible to have it contain forms that could be submitted to applications somewhere. The crowd, including me, were quite sceptical of these ideas, in particular with regard to authentication issues and application flow. RSS feeds can authenticate the user with http-auth, but this would not play well with publishing and sharing RSS feeds. Handling application flow is a much trickier issue though, as it would mean that as soon as a user submits a form in the RSS viewer the result must (more or less) be handled by a regular browser in order to be useful. This seemed like a non-trivial problem, as one could normally not mandate that a RSS-viewer had this capability. The general concensus was also that being able to use RSS for this type of application should work with most vanilla RSS-viewers.

AOP

This was my own session, and admittedly it was probably one of the worst presentations I've ever made. There was no projector in the room so I had to try and do it the "ad lib" way. That.. ehm.. doesn't quite work for me, and with such a complicated topic I tried to attack it from all angles at the same time. About half-way through I think most of the bases were covered however, so we started discussing it more in terms of how it could be useful, the pitfalls, the benefits, yada yad. Leo Simons has a writeup which describes what came up.

Anti-software patent discussion

The patent discussion was mainly presented by a guy from some organization in Europe (forgot the name) which is trying avoid getting patent legislation for software at all in Europe. Some people in the audience pointed out that due to the wording of an agreement between the US and Europe that was not really possible. They argued that the option of not having patents at all was long gone, and that it was more a case of getting patent legislation that was harmless. That way the politicians would feel good about doing what the big corps are telling them to do (and there is apparently some MASSIVE lobbying going on in this area), yet software developers wouldn't be affected by it in the normal case. There were several good ideas presented on how to make the legislation harmless, for example making sure that they had a relatively short lifespan (3 years max) and that technology submitted to standards bodies could not be patented. It will be interesting to see where it all ends, but if the anti-patents groups are still on the "No Patents At All!" track then it will most likely fail completely.

Multi Dimension Databases

I stumbled into this one thinking it was some other session, but I stuck around and it turned out to be quite interesting. The topic was multidimensional databases, which basically had to do with taking some relational data and precalculating loads of queries. The result, a multidimensional cube, could then be easily traversed and queried against with exceptional performance. The cube is read-only, as doing changes to it would be way too complicated. So, each night the current data is thrown away and a new cube is calculated. This is typically used to do advanced queries on, for example, huge amounts of sales data. I don't have any such needs myself, but if I had this seems like a very nice and performant way to do it.

Free/Open Licensing

This presentation, by a MySQL guy, talked about the basics involved with licensing of OpenSource projects. I knew most of it from before, but it was interesting to get a different perspective on it. Danese talked about how choosing a license would impact the community, so determining what kind of community you want is crucial when figuring out what license to choose.

Bluetooth Security

This presentation by Adam Laurie was one of the most interesting sessions I attended during EuroFoo. He's a security guy who decided to check whether the Bluetooth support in regular cell phones was secure or not. It turned out that quite a number of models (including my own T610) was quite easy to hack into. Once hacked the intruder could either do things like download the address book, or make phone calls and send SMSs (typically to numbers with an excessively high minute rate). Just making the phone silently call the intruder made it useful as a bugging device. There were all sorts of interesting ways to abuse these techniques, and there appear to be some work going on to find out whether a hacker could upload new firmware using this method. The result would be an airborne virus, which could easily spread throughout a country in a day, if released during rush hours. A hacker could probably earn quite a lot of money this way, and the legal implications are quite fuzzy (i.e. who is responsible for it?). While it was amazing what kind of holes he had found, it was even more amazing to hear the response he got from some of the cell phone manufacturers. He tried to simply get his research into their hands so they could fix it, but many of them (of which SonyEricsson appeared to be the worst) simply flatly denied that what he had done was possible. Talk about irresponsible corporations. He did some nice demos during the session to show how easy it was, and how fast the attack itself was. He also noted that while turning off discovery in the phone made it more difficult to hack, a targeted attack would still be possible if performed during a longer period, as a hacker could simply try the entire address range using a ping. Once the correct phone id was found the usual 15-second-techniques could be used to do the hack. Even turning off bluetooth entirely didn't help, since it appears that the lowlevel bluetooth stack is always enabled, which could be messed with by sending broken packets, essentially crashing the software and the phone. Not dangerous perhaps, but could definitely be annoying.

While the sessions were interesting, the most rewarding aspect of these kinds of gatherings is the people you meet. I was constantly amazed by the kewl new people I met, and how much we had in common (even though essentially none of was into Java). Great fun!

@ 08:04 上午 EDT
 
 
 
 
XSL scalability issues; SAX to the rescue!
As described in a previous entry, I developed a technique whereby I could process huge XML files with XSL by chopping it up into pieces using StAX.

It worked fairly well, but when I looked under the hood of the StAX code it implemented a bridge for a SAXSource when it made the transformations in XSL. This seemed like a silly overhead, so I wondered how hard it would be to implement the same idea using only SAX instead.

This turned out to work quite well, although I had to dig quite deep in the JAXP API's to get all details going properly. First of all I used an identity transform XMLFilter which simply copied the input to the output using SAX events. Then, I implemented a custom SAX ContentHandler which given a description of a subtree could send the events into an XSL transformation. By making that subtree appear as a single document for the XSL transformation the end result is that I could attach XSL transformations to arbitrary subtree's in the huge XML file. By doing this I got the scalability of SAX coupled with the transformation capabilities of XSL, and since I had essentially removed the StAX wrapper layer it was significantly faster. It was approximately twice as fast, which means that I could process a reasonably large file in 8 minutes instead of 20. When you're doing lots of such transformations that is really a Good Thing.

I was initially worried about getting complex SAX code, but since I could minimize it to just chopping up the SAX event stream it was fairly manageable. Overall the code was about as simple as the StAX code, since all of the grunt work really is done in the XSL.

It would be interesting to combine this with something like XStream, as it would make it possible to deserialize XML streams quite easily. Since one doesn't have to wait for the entire document to load (only a subtree) it should make it possible to have XStream deserialize a continuous XML stream which might contain, for example, a message queue or commands.

@ 03:44 下午 EDT
 
 
 
 
Dependency injection and open vs. closed designs
I recently converted our code to use Dependency Injection, and the container I chose for this purpose was Pico. Now that it is done and I have begun writing new code that uses Pico I have noticed that my perspective on how to design software has changed. In particular, I've begun thinking more about "open" and "closed" designs, and how they relate. The kind of design you choose, and the assumptions you make that go along with it, will impact what you can do with the code in the future, and how you do it.

Let me use my friends Foo, Bar and Xyzzy to explain what I mean. Here's a small example of what your typical Foo might look like:

public class Foo
{
  Bar bar;
  public Foo()
  {
    bar = new Bar();
  }
  ...
}
This Foo apparently needs a Bar to function properly. There is nothing strange about this, but let's look at the assumptions implicit in this code:
  1. Bar is a class
  2. Bar should be instantiated using a constructor with no arguments
  3. Foo owns the Bar instance
  4. The Bar needs no further configuration to work properly
These are the assumptions made so far, and if any single one of them is false, the code has to be changed. For example, let's say that Bar is changed to require a Xyzzy. Then you have (basically) two options: either you use the same design, and let the Xyzzy instance be instantiated in the constructor of Bar, or you let Foo provide it. But if you provide it you will have to change the code to this:
public class Foo
{
  Bar bar;
  public Foo()
  {
    bar = new Bar(new Xyzzy());
  }
  ...
}
which implies the same assumptions about Xyzzy as applied to Bar. An alternative solution would be:
public class Foo
{
  Bar bar;
  public Foo(Xyzzy xyzzy)
  {
    bar = new Bar(xyzzy);
  }
  ...
}
This is even worse, because now the change in Bar has changed Foo, which in turn will lead to anyone using Foo to change. So, the failed assumption that Bar has a default constructor has led to some rather extreme consequences.

Let's say that the assumption about Foo being the only one interested in Bar is false. Then you will have to introduce a new accessor to facilitate the exposure of Bar:

public class Foo
{
  Bar bar;
  public Foo()
  {
    bar = new Bar();
  }
  public void Bar getBar() { return bar; }
  ...
}
This method really has nothing to do with the function of Foo, per se. It is there for the sole purpose of exposing Bar for others to use, and the reason for this is the failed assumption that only Foo is interested in the Bar instance.

And so on. In short, the implicit assumptions in the initial design of the seemingly simplistic code can have a huge impact on what happens in the future. In essence, it has become more costly to maintain it, and it is not very keen on being changed, since change becomes expensive. This is what I'd call a "closed" design. The code itself is rather stable, in the sense that it is so expensive to change it that any code change due to a change in requirements will most likely be done outside of Foo. If you instead use Dependency Injection, the code might look like the following:

public class Foo
{
  Bar bar;
  public Foo(Bar aBar)
  {
    bar = aBar;
  }
  ...
}
This code has the following properties:
  1. Bar can be a class, an abstract class, or an interface
  2. Foo does not need to know how Bar is instantiated
  3. The Bar instance can be used by other components
  4. Bar can have any configuration needs
  5. The user of Foo needs to know how to provide a Bar
The first point is important in terms of design evolution. At first Bar might be a regular class, and as the design evolves it might be changed to an abstract class with many concrete implementations, or it could even be changed to an interface. Regardless of what happens there is no change to the Foo class.

In terms of ownership, Foo is now handed the instance so it is quite possible that others are using it as well. We do not need to expose it through a getter method in Foo if that is the case.

In short, there are no implicit assumptions about Bar, which means that Foo will never be changed due to a change in Bar. This is what I'd call an "open" design, since it is open to the possibilities of change. There is no built-in inertia, or resistance, to change. The code is also stable, but for the opposite reason: it encourages change, but is not affected by it.

So, is an "open design preferable to a "closed" design? Well, I like to compare this to house building. Using a "closed" design is like building a house where the rooms have no doors. Whenever you decide that there needs to be a door between two rooms you punch out the wall and insert a door. It's not something you would do easily. On the other hand, using an entirely "open" design is like building a house where the rooms have no walls, only doors. This makes for great flexibility, but don't be surprised if the roof comes crashing down. The conclusion of this analogy is that you'd probably want both: the rigidity of walls combined with the flexibility of doors. In Pico terms, the foundation of the house is provided by the Container, while the rigidity is provided by a ContainerComposer. The composer is the place where all decisions are made, and essentially it has to know about all the things that Foo knew in the "closed" design version: whether Bar is a class or interface, how to instantiate Bar, and what parameters to provide to it. If you let it do this for all components in the system, then all of our assumptions have been put into one place, and if any assumption would be wrong, that is the only place which would be affected by it.

The conclusion of this is that all the components that you write should use the "open" design principles, whereas the composer would typically be done using the "closed" design principles. Yet, even this last part is not strictly necessary. You could make the composer scriptable, configurable, and even allow for it to work together with other composers which handle parts of the system. In essence, you could let your entire system be designed like a village: each house is internally "open", yet created by a composer with a varying degree of "closedness", for rigidity, and then let each such house (or subsystem) interact. This would most likely yield the best results not only at the point of construction, but also in terms of how well it can adapt to change in the future.

If you decide to follow these principles when designing systems, there are a number of new questions that arise. Should I make the houses large or small? Should I let them interact a lot or not at all? How many should I create? I believe this is where software engineering becomes a craft, and there really is a need for good software architects, who can "see" on a case by case basis the answers to these questions. But that's a different topic.

@ 06:03 上午 EDT
 
 
 
 
Breaking news: Bush tells truth for the first time!
It seems like there's a first for everything, and now Bush has done it: told the truth! This awesome event happened at a bill signing, as reported by USA Today.
"Our enemies are innovative and resourceful, and so are we," Bush said. "They never stop thinking about new ways to harm our country and our people, and neither do we."

No one in Bush's audience of military brass or Pentagon chiefs reacted.

Finally a tiny bit of truth.
@ 03:13 上午 EDT
 
 
 
 
Re: Warped weaving
This is a comment for the blog entry Warped weaving. I started writing it as a comment to the blog, but it grew so much that it seemed more appropriate as a separate entry.

The referenced blog entry has an interesting problem description, but I'm not sure I understand the conclusion. First of all, my own conclusion from the problem description is that you should probably limit yourself to one aspect weaver. He mentions "try to combine multiple aspects together", whereas it should probably read "try to combine multiple aspect weavers together", since the supposed problem is about one not knowing about the other. If you use multiple aspects with a single weaver there is no such problem, assuming that the weaver does what he describes (e.g. ours don't).

All that aside, the focus of this post is on compiler optimizations, and yet the conclusion seems to involve the complexity of the aspect being weaved. To try and make an estimate about AOP performance based on some simplistic examples of AOP, which aren't even characteristic of real usage, is a strange conncection. In our own case the compiler optimizations aren't even central to performance, since we do dynamic weaving and care more about the scalability of the solution in terms of memory usage. Sure, having the invocations run fast is nice, and I am always looking for ways to increase the throughput, but if the system can't handle the millions of objects, each having 5-10 stateful introductions, then what good is it? Those scalability and performance problems are dealt with using caching techniques rather than compiler optimizations.

But the main problem with this, and similar arguments about supposed performance issues when using AOP, is that it does not consider the alternative. If you are using aspects, and from your description it seems advice, or behaviour aspects, are the main target, then what is the alternative if you don't use AOP? As far as I can tell there are mainly two options: 1) write the code over and over in the main methods you would apply the advice to or 2) separate the code into another method, and have the main method call it. These two would yield the same semantics, but without using a weaver. With nr 1 the compiler should have the most information to be able to do proper optimizations, but it obviously is also an example of why you'd want AOP in the first place since it isn't maintainable for any real example.

Which leaves option nr 2. This has similar modularization possibilities as if an advice had been used, except the calls to the methods has to be sprinkled around the system, so it's again not really maintinable. But more importantly, it gives the compiler less room for code optimization since it involves methods calls instead of inlined code, source-wise. So the real question is, how does the compiler optimization possibilities differ when you are using a more ad-hoc approach to code modularization, such as option 2, and a more well-defined approach such as what an aspect weaver could do? I don't know, as I haven't investigated it much, but I wouldn't be surprised if they are fairly similar.

And if that is the case, then I believe using AOP to do this modularization of cross-cutting concerns should be preferable, since it is after all designed to solve that particular problem. If anyone has more detailed info on this problem, please do let me know!

The original entry focus on advice aspects, but if you consider introduction aspects then the compiler should have no trouble optimizing them to a similar degree as a normal class. After all, the only thing that has happened is that the weaved class has gotten a few more methods and interfaces.

@ 02:39 上午 EDT
 
 
 
 
XSL scalability issues; StAX to the rescue!
As I described in a previous entry we are using XML and XSL as a way to do schema migration of our serialized databases. The technique works well, and was easy to develop.

Or at least, it worked well with small demo websites, where the XML files were small. I am now working with the database of one of our medium-sized clients, and it is one XML file 680Mb in size. Trying to run the XSL processor on that file was just impossible, since the processor had to load it all into memory before doing the transformation, so it dies horribly with OutOfMemory exceptions about halfway through.

So I needed another approach which works better with large files. I considered using SAX for about .5 seconds, and then started looking for pull oriented XML parsers. I ended up with StAX, which is an XML processing API developed through the JCP as JSR 173.

The parser itself and the API was not that difficult to use, and some quick tests showed that it indeed had a constant memory consumption, i.e. it did streaming properly. However, rewriting my XSL transformations as Java code did not excite me at all.

So, I ended up using a mix of the two! Since the XML stream is basically a sequence of object data chunks I simply use StAX to retrieve enough events to form a complete chunk, and then make that appear as one XML document which is fed, through a SAX bridge, into an XSL transform, and the output is then redirected back into the StAX pipeline for writing to the output file. This way I get the best of both worlds: the scalability of StAX and the expressiveness of XSL for the transformations. The end solution is probably not as efficient as if I had written everything in plain Java using only StAX, but this way the transformation rules becomes much more maintainable and easier to understand what they actually do.

@ 11:03 上午 EDT
 
 
 
 
Swedish hostage finally released!
The Swedish hostage has finally been released from the terrorist camp at Guantanamo bay! He has been held captive there for about two and a half years by a Christian fundamentalist group, a.k.a. the US government, a.k.a. "the NeoCons", apparently for absolutely no reason at all, other than that it suited their agenda. As demands for a fair trial was raised he was suddenly released in a timely manner. The Swedish government of course attributes it to the success of "silent diplomacy", but I have a feeling that the "threat" of a fair trial was far more important.

In an interview he describes the daily torture, which is nothing less than completely horrifying. It is everything that the Guantanamo camp terrorists claim their so-called enemies do to their captives, and then some. The word hypocrisy comes to mind.

Let's hope that the other unfortunate people who are still being held hostage are also released shortly. I am glad that we got our guy out of there, but it's still bad that so many other have to deal with the daily abuse of the US so-called "soldiers". It's a shame.

You can read more about this and a short version of the interview here.

@ 04:57 下午 EDT
 
 
 
 
Fahrenheit 9/11
I've watched Fahrenheit 9/11. Truth is a tricky thing to get right, and this movie sadly only tells half the story, which in effect makes it yet another layer of disinfo.

Moore tries to give hope where there is none. Moore tries to imply that removing Bush will make things alright. It won't. At this point, not much will. He is pointing the finger at the Saudis, who certainly are not saints, but the real terrorists can be found in a rather different direction.

Oh well. As a consequence I've removed the link to Moore's homepage, and also the one to Truth Uncovered (for similar reasons).

@ 04:51 下午 EDT
 
 
 
 
Catching up

It‘s been an interesting month. When the details of how JBoss execs and developers had been using fake identities on forums to harass and stalk competitors and critical individuals was released, I had expected some of the responses. I had expected that JBossians would not admit it (and they haven‘t), that some people would be outraged, and that some people would say “oh, everyone does it“ (to whom I say:“SHOW ME THE MONEY!“). I had not expected that a vast number of people just wanted it to “go away“, to be able to ignore it as just some “flame war”, or some such. I could probably count on my two hands the comments about the whole deal which showed an understanding of what had happened and why it was wrong. The desire to go to sleep, or stay asleep, is strong it seems.


The ”response“ from Marc Fleury was quite hilarious for three reasons: first of all, it really really did not sound like he had written it. I know from past experience that his father is quite good at writing these kinds of things, so my guess is he was responsible for this one, and Marc just signed it. Second, it only took Marc a week to resume his behaviour as he began posting as “Race Condition“ over at TSS. Bad habits are hard to break. Third, the comments from the obviously-not “Outsider” are most amusing. A must-read.


In other news, JayView has posted an interview with me talking about AOP and our CMS product SiteVision. Blatant product marketing alert aside, there‘s a couple of notes on the commercial aspects of AOP that might be interesting for some of you.


The past weeks I‘ve been working on something that probably deserves a longer blog entry: schema migration of serialized Java objects using XML and XSL. See, our CMS stores all data in a persistent hashtable (JISP and JDBM, moving towards JDBM-only) using serialization. This is nice because it‘s very fast to store/load objects this way (no mappings!), and the development time is almost zero (no mappings!). The catch is that you‘re setting yourself up for a nice hang-over: when the time comes that you want to change the object model substantially you‘re going to have trouble reading the data and converting it to the new Java classes. For simple cases there are some hooks in serialization to help with this. For complex cases, such as ours, you really are quite stuck. The solution I found was to export the whole damn thing to XML using the most-timely-released XStream tool. It can handle XML-serialization of most Java classes out of the box, and has a very nice API to plug in custom converters for the special cases. After I had written converters for our special cases, in particular to handle references between objects using AOP proxies, I could export huge SiteVision 1.x databases to plain XML files. After that I sat down and finally learned the basics of XSL in order to convert the schema to our new classes (which in some cases was rather complicated), and then reversed the process and loaded the data into a SiteVision 2.x database. The nice thing is that by doing it this way the conversion process does not ever have to have access to Java classes. All it needs is the XML and XSL. Niiiice.


Non-work wise I got to play paintball last week. Loads of fun, and I even did the most stupid stunt ever by doing a catch-the-flag run without a gun. Who needs it anyway? Just slows ya down :-) Anyway, twenty secs and some fancy dodging later and I/we had secured the flag. Tada! It‘s fun when stupid stunts work out.


And then I got married. :-)


It‘s been an interesting month.

@ 03:00 下午 EDT
 
 
 
 
Crossing the line
Me and Cameron wrote a brief statement about the situation in our community, and how it affects all of us. You can read it online over at

你可能感兴趣的:(编程,application,concurrency,performance,compiler,optimization)