First, thanks for sharing, it was really helpful. I think that there is a mistake in the second instruction DADDI R1, R1, #8 since there is only one CDB so we can't write to the CDB in the same cycle
Isn't it better to prioritise loads before stores? As the loads are a data dependency, but stores are not? In the first example, this would bring the CPI to 1.2. Or does the memory consistency model not allow this?
It seems that in the execution of memory access instructions, the latency of memory access is not taken into consideration? And isn't the CDB a bus, but not a register, hence there should be no extra cycle running write CDB stage? And the second DADDI may collide using CDB which is occupied by the previous L.D instruction.
First, thanks for sharing, it was really helpful. I think that there is a mistake in the second instruction DADDI R1, R1, #8 since there is only one CDB so we can't write to the CDB in the same cycle
no it is not in the same cycle
@@mhd-em6yt Its not 11th cycle for both CDB writes? Or the later one is stalled to 12th cycle?
Yes, he made a mistake. The latter adddi should be stalled until the 12th cycle because of the structural hazards that will occur on CDB.
Isn't it better to prioritise loads before stores? As the loads are a data dependency, but stores are not? In the first example, this would bring the CPI to 1.2. Or does the memory consistency model not allow this?
It seems that in the execution of memory access instructions, the latency of memory access is not taken into consideration? And isn't the CDB a bus, but not a register, hence there should be no extra cycle running write CDB stage? And the second DADDI may collide using CDB which is occupied by the previous L.D instruction.
why are we incrementing by 8 bytes here ?
because floating point variables are considered here and they are 8 bytes