Question 4
A single-issue processor uses tomasulo’s algorithm in its floating-point unit, which has one adder and one multiplier,each with its own set of reservation station,there is only one CDB ,and broadcast on this CDB takes an entire cycle.the processor is executing the following sequence of instruction and, for each instruction ,we show the cycle in which the instruction is fetched,decoded,issued,begins to execute,and writes result.
单发射处理器在其浮点单元中使用托马斯算法,其具有一个加法器和一个乘法器,每个具有其自己的一组保留站,只有一个CDB,并且在该CDB上广播需要整个周期。处理器正在执行以下指令序列,并且对于每个指令,我们示出指令被取出,解码,发出,开始执行和写入结果的周期。
I1 I2 I3 I4 I5 I6 I7
1、what is the latency of the multiplier?4 2、Is the multiplier pipelined?N
3、How many reservation station are there for the adder?2
instruction MUL R1,R2,R2 ADD R1,R1,R2 MUL R2,R2,R3 ADD R3,R1,R1 MUL R1,R1,R1 ADD R2,R3,R4 ADD R1,R5,??
fetch 1 2 3 4 5 6 7
decode 2 3 4 5 6 7 8
issue 3 4 5 6 7 11 13
execute 4 9 8 11 12 ?? 17
Write result 8 10 13 12 16 ?? 18
4、In which cycle does I6 begin to execute? 13
5、Which register does ?? Represent in I7R1
6、If the priory for using CDB depends on the type of instruction,between ADD and MUL the priority for using the CDB goes to ___? ADD
Question 8(书本92页)
.Loop: LD R1,0(R2)
DADDI R1,R1,#1 SD R1,0,(R2) DADDI R2,R2,#4 DSUB R4,R3,R2 BNEZ R4,Loop
;load R1 from address 0+R2 ;R1=R1+1
;store R1 at address 0+R2 ;R2=R2+4 ;R4=R3-R2
;branch to Loop if R4!=0
Assume that the initial value of R3 is R2 + 396.
答案:(1)依题意可得,指令序列执行的流水线时空图如下: 1 2 3 4 5 6 1 1 IF 2 ID IF 3 EX 4 ME 5 6 7 8 WB ID IF EX ME WB ID IF 9 10 11 12 13 14 15 16 17 18 EX ID IF ME EX WB ME WB ID IF EX ME WB ID IF EX ME WB IF 19 20 21 ID EX ME
时钟周期为:17*98+18 = 1684
(2)依题意可得,指令序列执行的流水线时空图如下: 1 2 3 4 5 6 7
1 IF 2 ID IF 3 EX ID IF 4 ME S S 5 6 WB EX ID IF ME EX ID IF 7 WB ME EX ID IF 8 WB ME EX ID IF 9 WB ME EX Miss 10 WB ME Miss 11 12 13 WB IF ID EX 14 ME 15 WB 时钟周期为:10*98+11 = 991
(3) 依题意可得,指令序列执行的流水线时空图如下:
1 2 3 4 5 6 1
时钟周期为:6*98+10 = 598 1 IF 2 ID IF 3 EX ID IF 4 ME EX ID IF 5 6 WB ME EX ID IF WB ME EX ID IF 7 WB ME EX ID IF 8 WB ME EX ID 9 WB ME EX 10 WB ME 11 WB Question 9
9a) What is the effective access time of a cache memory system in which thereis a 2-way set associative cache, having the following parameters:
Parameter: number of sets line size
Value: 1024 sets 16 words
cache access time main memory access time main memory address space size cache hit rate
15 ns/line 70 ns/word 256M words 95%
Label the fields of the memory address below used to access the cache and indicate thesize of each field (in number of bits). Assume that memory isword-addressed.
Tag : _14_ bits
Index : _10_ bits
Offset : _4_ bits
9b) What is the effective access time of a cache memory system in which thereis a direct mapped level 1 (L1) cache and a fully associative level 2 (L2) cache, having thefollowing parameters:
Parameter: L1 number of sets L1 line size L1 cache access time L2 line size L2 cache access time main memory access time main memory size L1 cache hit rate L2 cache hit rate
Value: 128 sets 4 words 10 ns/line 8 words 20 ns/line 70 ns/word 256M words 95% 89%
Label the fields of the memory address below used to access the L1 cache and indicate thesize of each field (in number of bits). Assume that memory is word-addressed.
Tag : _19_ bits
Index :_7_ bits
Offset :_2_ bits
Label the fields of the memory address below used to access the L2 cache and indicate
thesize of each field (in number of bits). Assume that memory is word-addressed.
Tag : _25_bits
Index : _0_bits
Offset : _3_ bits
Question 11
一个简单的共享内存cache-coherent机有四个处理器,没有\\\虚拟到物理\\\的翻译和16位(物理)地址。每个处理器有一个L1数据缓存,没有L2高速缓存。每个L1缓存是有四个64字节的块(每个缓存的大小是256字节)的直接変换,他们使用MESI一致性协议来保持一