目录
课程大纲
计算机系统层次:
计算机架构定义
计算机架构、组成和实现的区别
计算机架构、组成具体内容(两相比较以示区别)
计算机架构分类--费林分类法(Flynn's classification)
计算机系统的设计原则--阿姆达尔定律 Amdahl’s Law
计算机表现的衡量标准
CPU TIME和CPI
另一种测量方法:MIPS 和 MFLOPS
计算机架构--相比计算机组成原理更宏观,主要看整机系统的效率
1. 计算机架构基本概念
2. 指令系统
3. 存储系统
4. 流水线技术
5. 平行的处理器和多任务处理器
需要记住,硬件和软件之间的界面(interface)处于操作系统和指令集之间,前者为软件最底层,后者为硬件最高层
微架构(microarchitecture)用来解释ISA。
架构:
the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls ,the logic design, and the physical implementation.
– Amdahl, Blaaw, and Brooks, 1964
和计算机的组成、实现不同
计算机架构:the attributes of a computing system as seen by the Machine Language Designer or Compile Program Designer.
机器语言设计师和程序编译者所理解的计算机属性(后面会详细解释)
解释(举例)
程序员只需看到所写程序如常运行即可,但设计师还需要看到硬件的实际功能
透明性(Transparency)
定义:A kind of thing and attribute which originally exists many differences, from a certain perspective the differences can not be seen, referred to as "transparency“.
某种事物本身具有很多不同但是从特定角度看不到的属性(举例:擦得特别干净的玻璃门)
简单的判断标准:相关人员负责管理的事物不透明,不负责管理的透明
Computer Architecture - the attributes of a computing system as seen by the Machine Language Designer
Computer Organization–Logical Implementation of Computer Architecture
Computer Implementation–Physical Implementation of Computer Organization
(上方要求背诵)
计算机实现人员:使用物理硬件实现任务
例题1
答案:C 因为其他三个由计算机架构研究员负责
例题2
答案:选C (A:实现人员 B:组成人员 D:实现人员)
例题3
答案:1. logical implementation 2. physical implementation
基本概念解释
指令流 Instruction Stream--顺序进入系统的若干条指令构成的序列
数据流 Data Stream--和指令流类似 若干条数据构成的序列
多倍性 Multiplicity:the largest possible number of instruction or data on the system performance bottleneck components at the same execution stage
指令或数据在同一时间中性能瓶颈的条件下最大的可能个数
根据多倍性个数,可以将计算机架构分为如下的四类:
(来源:wiki费林分类法页面)
以SISD为例,解释具体名词:
CU:controlling unit,控制器
PU:processing unit 处理原件(将任务推给ALU进行操作)
CS:控制流,controlling stream(就是指令串)
MM:存储单元
(建议自己顺一遍这个过程)
补充:不同架构的应用范围:
SISD:传统单处理器机器(非并行)
SIMD:流水线执行
MISD:不常见,多用于航天飞机等系统,目的是保证结果一致,用来提高容错
MIMD:多核处理器,分布式系统
冯氏分类法:按照单位时间最大处理字节、字符进行分类,不要求掌握
例题4
Amdahl’s Law
The performance of a part of a system is improved by some ways, thus improves the performance of the whole system.
The measurement is speedup ratio. (Sp)
可以通过计算SP的值来衡量计算机性能是否得到提升,公式:
Fe:提升的操作占计算机整体操作的比例
Re:提升的操作能够提升的倍数
例题5
A test program contains a large number of floating point data processing operations, in order to improve performance we can use two options:
One is to use hardware implementation for floating-point square root (FPSQR) operation, this method can make the operation speed increased by 10 times;
Another solution is to improve the speed of all floating-point data manipulation (FP), to make it speed up 2 times.
It is also known that FPSQR operation time accounts for 20% of the entire test program execution time, and FP operation accounts for 50% of the entire execution time.
FPSQR是一种特殊的FP(但需要按总体比例计算)
解:
例题6
If we make a function’s operation speed increased by 10 times, but this function’s operation time accounts for 40% of the entire system execution time. After this enhancement method is adopted, how many can be improved the performance of the whole system?
CPU time的计算:在给出FEQ时直接乘CPI并相加,不给出的情况下需要做除法计算
例题7
0.5+0.2+0.4+0.4=1.5
例题8
A kind of computer only uses the Load/Store instructions to read or write memory. According to the results of the program tracking experiment, the proportion of each instruction and CPU clock cycles are as follows:
Please try to get the value of average CPI.
例题9
If FP operation ratio is 25%, CPIFP = 4, CPI other =1.33. FPSQR operation ratio is 2%, CPISQR = 20.
There are two options for improvement.
One is to improve the FP operation speed, make it double, namely CPIFP = 2;
The other is to improve the speed of FPSQR 10 times, namely CPISQR = 2;
Try to compare the two solutions according to the CPI result.
1 is better。
MIPS:每秒百万条指令,Million instructions per second
Rc:频率
MFLOPS:每秒浮点运算次数(非重要)
例题9
The clock frequency of a certain processor is 15MHz, and the rate of executing the test program is 10MIPS. It is assumed that memory’s access requires 1 clock cycle each time.
(1) what is CPI of the processor?
(2) If it is assumed that the clock frequency of the processor is increased to 30MHz, but the access rate of memory remains the same, which requires 2 clock cycles for each memory access. If the 30% instruction of the test program needs one memory access, the 5% instruction needs two memory access, and other instructions do not need memory access, try to find MIPS executed by the test program on the improved processor.
解:
- CPI=15*10^6/(10*10^6)=1.5
- CPI‘=1.5+0.3*(2-1)+0.05*(4-2)=1.9
mips=30*10^6/(1.9*10^6)=15.79(mips)
Arithmetic Performance Average(取算数平均值)
Geometric Performance Average 连乘再求n次方根
Harmonic Performance Average (Am的倒数)