Modern Microprocessors - a 90 minute guide

转自:http://www.lighterra.com/papers/modernmicroprocessors/

by Jason Robert Carey Patterson, last updated Aug 2012 (orig Feb 2001)

WARNING: This article is meant to be informal and fun!

Okay, so you're a CS graduate and you did a hardware/assembly course as part of your degree, but perhaps that was a few years ago now and you haven't really kept up with the details of processor designs since then.

In particular, you might not be aware of some key topics that developed rapidly in recent times...

    pipelining (superscalar, OoO, VLIW, branch prediction, predication)
    multi-core & simultaneous multithreading (SMT, hyper-threading)
    SIMD vector instructions (MMX/SSE/AVX, AltiVec)
    caches and the memory hierarchy

Fear not! This article will get you up to speed fast. In no time you'll be discussing the finer points of in-order vs out-of-order, hyper-threading, multi-core and cache organization like a pro.

But be prepared – this article is brief and to-the-point. It pulls no punches and the pace is pretty fierce (really). Let's get into it...

More Than Just Megahertz

The first issue that must be cleared up is the difference between clock speed and a processor's performance. They are not the same thing. Look at the results for processors of a few years ago (the late 1990s)...
SPECint95 SPECfp95
195 MHz MIPS R10000 11.0 17.0
400 MHz Alpha 21164 12.3 17.2
300 MHz UltraSPARC 12.1 15.5
300 MHz Pentium-II 11.6 8.8
300 MHz PowerPC G3 14.8 11.4
135 MHz POWER2 6.2 17.6

A 200 MHz MIPS R10000, a 300 MHz UltraSPARC and a 400 MHz Alpha 21164 were all about the same speed at running most programs, yet they differed by a factor of two in clock speed. A 300 MHz Pentium-II was also about the same speed for many things, yet it was about half that speed for floating-point code such as scientific number crunching. A PowerPC G3 at that same 300 MHz was somewhat faster than the others for normal integer code, but still far slower than the top 3 for floating-point. At the other extreme, an IBM POWER2 processor at just 135 MHz matched the 400 MHz Alpha 21164 in floating-point speed, yet was only half as fast for normal integer programs.

How can this be? [b]Obviously, there's more to it than just clock speed – it's all about how much work gets done in each clock cycle[/b]. Which leads to...
(除了CPU的频率之外,最重要的是CPU在一个时钟周期做了多少工作,而不仅仅是CPU的频率有多高。)

Pipelining & Instruction-Level Parallelism


Instructions are executed one after the other inside the processor, right? Well, that makes it easy to understand, but that's not really what happens. In fact, that hasn't happened since the middle of the 1980s. Instead, several instructions are all partially executing at the same time.

Consider how an instruction is executed – first it is fetched, then decoded, then executed by the appropriate functional unit, and finally the result is written into place. With this scheme, a simple processor might take 4 cycles per instruction (CPI = 4)...

你可能感兴趣的:(cpu)