Personal computers(PCs) emphasize delivery of good performance to single users at low cost and usually execute third-party software.
Servers are the modern form of what were once much larger computers,and are usually accessed only via a network.
Supercomputers are that a class of computers with the highest performance and cost.
Embedded computers are the largest class of computers and span the widest range of applications and performance.
a. performance via pipelining
b.dependability via redundancy
c. use abstraction to simplify design
d. make the common case fast
e. performance via prediction
f. design for Moore’s Law
g. performance via parallelism
h. hierarchy of memories
High-level language program ------(Compiler)-------> Assembly language program ------(Assembler)------> Binary machine language program
a. 1280 * 1024 * 8 * 3 / 8 = 3932160 bytes
b. (3932160 * 8) / (100 * 2^20) = 0.3 s
a.
Execution time = Instruction count * CPI / clock rate
result = Instruction count / Execution time = clock rate / CPI
result1 = 3 / 1.5 = 2
result2 = 2.5 / 1.0 = 2.5
result3 = 4.0 / 2.2 = 1.82
so processor B has the highest performance expressed in instructions per second.
b.
cycles1 = 10 * 3 * 10^9 = 3.0E10, instructions1 = cycles1 / 1.5
cycles2 = 10 * 2.5 * 10^9 = 2.5E10, instructions2 = cycles2 / 1.0
cycles3 = 10 * 4 * 10^9 = 4.0E10, instructions3 = cycles3 / 2.2
c.
assuming original execution time is T, instruction count is I, original clock rate is C.
C’ = I * 1.2 * CPI / 0.7 * T = 12 / 7 * C
a.
The global CPI of P1 = (0.1 * 1 + 0.2 * 2 + 0.5 * 3 + 0.2 * 3) / 1 = 2.6
The global CPI of P2 = (0.1 * 2 + 0.2 * 2 + 0.5 * 2 + 0.2 * 2) / 1 = 2
b.
The clock cycles of P1 = 1. 0 * 10^6 * 2.6 = 2.6E6
The clock cycles of P2 = 1.0 * 10^6 * 2 = 2.0E6
a.
Execution time = Instruction count * CPI * Clock cycle time
CPI of A = 1.1, CPI of B = 1.5 / 1.2 = 1.25
b.
A : B = 1.2 * 1.25 / 1.1 = 15 / 11
c.
Compare with A: Execution time ratio= 1.1 / 0.66 = 5 / 3
Compare with B: Execution time ratio = 1.25 / 0.66 = 125 / 66
dynamic power ∝ 1/2 * Capacitive load * Voltage^2 * Frequency switched
The capacitive load of the Pentium 4 Prescott processor
= 90 / (1/2 * 1.25^2 * 3.6 * 10^9) = 32 nF
The capacitive load of the Core i5 Ivy Bridge
= 40 / (1/2 * 0.9^2 * 3.4 * 10^9) = 29 nF
The percentage of the total dissipated power comprised by static power for the Pentium 4 Prescott processor = 10 / 100 = 0.1 = 10%
The ratio of static power to dynamic power for the Pentium 4 Prescott processor = 10 / 90 = 1 / 9
The percentage of the total dissipated power comprised by static power for the Core i5 Ivy Bridge = 30 / 70 = 0.43 = 43%
The ratio of static power to dynamic power for the Core i5 Ivy Bridge processor = 30 / 40 = 3 / 4
voltage = power / current
so U’ = 0.9U
On 1 processor:
T1 = (2.56 * 10^9 * 1 + 1.28 * 10^9 * 12 + 2.56 * 10^8 * 5) / (2 * 10^9)
= 9.6 s
On 2 processors:
T2 = (2.56 / 1.4 * 1 + 1.28 / 1.4 * 12 + 2.56 * 0.5) / 2 = 5.86 s
On 4 processors:
T3 = (2.56 / 2.8 * 1 + 1.28 / 2.8 * 12 + 2.56 * 0.5) / 2 = 3 s
On 8 processors:
T4 = (2.56 / 5.6 * 1 + 1.28 / 5.6 * 12 + 2.56 * 0.5) / 2 = 1.54 s
On 1 processor:
T1 = (2.56 * 2 + 1.28 * 12 + 2.56 * 0.5 ) / 2 = 10.88 s
On 2 processors:
T2 = (2.56 / 1.4 * 2 + 1.28 / 1.4 * 12 + 2.56 * 0.5) / 2 = 5.96 s
On 4 processors:
T3 = (2.56 / 2.8 * 2 + 1.28 / 2.8 * 12 + 2.56 * 0.5) / 2 = 3.12 s
On 8 processors:
T4 = (2.56 / 5.6 * 2 + 1.28 / 5.6 * 12 + 2.56 * 0.5) / 2 = 1.64 s
assuming the CPI of load/store instructions should be reduced to x
2.56 + x * 12 + 2.56 * 0.5 = 2.56 / 2.8 + 1.28 / 2.8 *12 + 2.56 * 0.5
x = 0.58
so the CPI of load/store instructions should be reduced to 5.8E8
Cost per die = Cost per wafer / (Dies per wafer * Yield)
Dies per wafer = Wafer area / Die area
Yield = 1 / (1 + (Defects per area * Die area / 2))^2
for a 15cm diameter wafer:
yield = 1 / (1 + 0.02 * 7.5 * 7.5 / 84 / 2)^2 = 0.9867 = 98.67%
for a 20cm diameter wafer:
yield = 1 / (1 + 0.031 * 10 * 10 / 100 / 2)^2 = 0.9697 = 96.97%
for a 15cm diameter wafer:
cost per die = 12 / (84 * 0.9867) = 0.145
for a 20cm diameter wafer:
cost per die = 15 / (100 * 0.9697) = 0.155
for a 15cm diameter wafer:
die area = 7.5 * 7.5 / (1.1 * 84) = 0.61 cm^2
yield = 1 / (1 + 0.02 * 1.15 * 0.61 / 2)^2 = 0.9861 = 98.61%
for a 20cm diameter wafer:
die area = 10 * 10 / (1.1 * 100) = 0.91 cm^2
yield = 7 / (1 + 0.031 * 1.15 * 0.91 / 2)^2 = 0.968 = 96.8%
√代表根号
when the yield is 0.92:
defects per area = 2 * (√(1/yield) - 1) / die area
= 2 * (√(1/0.92) - 1) / 2 = 0.0426 cm^2
when the yield is 0.95
defects per area = 2 * (√(1/0.95) - 1) / 2 = 0.026 cm^2
CPI = Execution time / (Instruction count * clock cycle time)
= 750 / (2.389E12 * 0.333E-9) = 0.94
ratio = 9650 / 750 = 12.9
T’= 1.1T
T’ = 1.1 * 1.05 * T = 1.155T
ratio = 9650 / (750 * 1.155) = 11.1
CPI = 700 * 4.0E9 / (0.85 * 2.389E12) = 1.38
CPI = k * clock rate, but k != 1
750 - 700 = 50 s
assuming the number of instructions is x.
x = Execution time * clock rate / CPI = 9.6E-7 * 0.9 * 4.0E9 / 1.61 = 536.6 = 537
assuming the clock rate is x
x = 537 * 1.61 / (9.6E-7 * 0.9 * 0.9) = 1.1GHz
assuming the clock rate is x
x = (9.6E-7 * 3.0E9 / 1.61) * 1.61 * 0.85 / (9.6E-7 * 0.8) = 3.1875GHz
P1:
Execution time = 5.0E9 * 0.9 / 4.0E9 = 1.125 s
P2:
Execution time = 1.0E9 * 0.75 / 3.0E9 = 0.25 s
“The computer with the largest clock rate as having the highest performance” is false.
when processor P1 is executing a sequence of 1.0E9 instructions
P1:
Execution time = 1.0E9 * 0.9 / 4.0E9 = 0.225 s
P2:
The number of instructions = 0.225 * 3.0E9 / 0.75 = 0.9E9
P1:
Execution time = 1.125 s
MIPS = 5.0E9 / (1.125 * 1.0E6) = 4444
P2:
Execution time = 0.25 s
MIPS = 1.0E9 / (0.25 * 1.0E6) = 4000
“The processor with the largest MIPS has the largest performance” is false.
P1:
MFLOPS = 5.0E9 * 0.4 / (1.125 * 1.0E6) = 1777
P2:
MFLOPS = 1.0E9 * 0.4 / (0.25 * 1.0E6) = 1600
assuming the original time is T, the new time is T’
Execution time for INT operations = 250 - 70 - 85 - 40 = 55 s
T’ = 70 * 0.8 + 85 + 40 + 55 = 236 s
ratio = (T - T’) / T = (250 - 236) / 250 = 0.056 = 5.6%
ratio = 20%
assuming it can and the new time of branch instructions is T’’’
T’’’ = 250 * 0.8 - 70 - 85 - 55 = -10 s
so the total time can not be reduced by 20% by reducing only the time for branch instructions.
Execution time = (5E7 * 1 + 1.1E8 * 1 + 8E7 * 4 + 1.6E7 * 2) / 2E9 = 0.256 s
assuming the new CPI of FP instructions is CPI’
CPI’ = 0.256 * 2 / (1.1E8 + 8E7 * 4 + 1.6E7 * 2)) / 5E7 * 2E9 = 11.24
assuming the new CPI of L/S instructions is CPI’’
CPI’’ = 0.256 * 2 * 2E9 / (5 + 11 + 3.2) / 1E7 / 8 = 10.4
assuming the new time of execution time is T
T = (5E7 * 0.6 * 1 + 1.1E8 * 0.6 * 1 + 8E7 * 4 * 0.7 + 1.6E7 * 2 * 0.7) / 2E9 = 0.1712 s
On 2 processors:
T2 = 100 / 2 + 4 = 54 s
On 4 processors:
T4 = 100 / 4 + 4 = 29 s
On 8 processors:
T8 = 100 / 8 + 4 = 16.5 s
On 16 processors:
T16 = 100 / 16 + 4 = 10.25 s
On 32 processors:
T32 = 100 / 32 + 4 = 7.125 s
On 64 processors:
T64 = 100 / 64 + 4 = 5.5625 s
On 128 processors:
T128 = 100 / 128 + 4 = 4.78125 s