Thank you so much! RUclips has taught me more about pipelines and data paths in an hour than my prof. has this whole month. They make it so much harder than it has to be!! Again, thank you!
I know it's been a whole year, but do you recall any specific videos that helped explain the topic that you think are definitely worth looking at? If not, that's all good but thanks a lot!
in case of forwarding beq schuld not wait for data cause it can get the data passed throw in the same cycle, unless you are assuming that EX stage for slt and ID stage for beq they are not happening in the same cycle:)
This isn't true. In the SLT instruction, the arithmetic is performed at the beginning of cycle 3 therefore it can't be forwarded until the end of cycle 3. So BEQ will not have the correct value in cycle 3 but in cycle 4 it can after the result has been forwarded from SLT - E/M
I don't understand why you stall in the first beq (second instruction), but you don't stall lw (fourth instruction) and let forwarding take care of it. Because the previous instruction blt and add, both have the result ready after the execute stage
In the processor this is dealing with, branches are resolved in the decode stage. In this case that means that the value of $t0 is needed in the decode stage. Since the instruction before the branch (the add) writes to $t0, the value needs to come from the slt instruction and the result of the slt isn't available until the end of the EX stage so the first beq has to stall a cycle so that it can get the correct value from EX forwarded into the ID stage. The lw doesn't need to stall because it doesn't need the value of $t0 until the EX stage (where the branch needed it in the ID stage). In this case, the add instruction has completed the EX stage before the lw enters the EX stage and so no stalling is needed (it is just directly forwarded).
This is a really great video, thanks! But I am still not sure on how data dependencies work. How do you know when a command has the data ready for another to use? For example, the BEQ command needs $t0 and it can get it after the SLT command has executed, but the next BEQ command has to wait for the LW command to get to the memory clock cycle. I would be very grateful for an answer, thanks in advance!
This partially depends on the implementation of the pipelined processor. For this example it is assumed that for all instructions that produce data, except for load instructions, the data is available as the instruction moves from the execute (E) to the memory (M) stage. This means that for the slt/beq combination. The SLT produces the data in execute and so that data can be forwarded from the beginning of the memory stage to the decode (D) stage (where it is needed for branches). For load instructions, the data is not available until after the instruction accesses the data memory, which means it is only available as the instruction moves from memory (M) to writeback (W). This is why the lw/beq combination has to wait another cycle as it is only as the lw moves into writeback that the data is available to forward to decode.
The value for $t0 from the SLT instruction should be ready to forward at the later half of stage E, which is right before the early half of stage E for the BEQ instruction, which suggests that value for $t0 will be forwarded to the ALU instead of requiring a stall. Is this not correct?
No, this is not correct. The result of the SLT (or any instruction computed in the execute stage) is only ready at the end of the cycle and so can really only be forwarded at the beginning of the next stage (the memory stage). Additionally, the BEQ needs the value for $t0 in the decode stage since it resolves the branch in this stage. This means the branch can not properly complete the decode stage until the previous SLT has completed the execute stage. *If* the branch was resolved in the execute stage (which is not the case here), then a stall would not be necessary as forwarding would take care of the dependency.
@@matthewwatkins88 I had a similar though. It seems that I have been told that you can forward the data directly to the ALU (or more precisely, the register in between the D and E stages) for the calculation (overwriting the data received from the register in the D stage). This would give you the time to not require a stall there. Is this just not correct?
The example definitely assumes forwarding. I'm not 100% sure what you mean by "start." The processor fetches the next instruction the next cycle. If there is a dependency that forwarding can't handle, then the processor will stall the necessary stages (stalling is shown in the example by stages shown in '()', such as (F)).
What I meant by start was where the next instruction would begin F,D,E... if we didnt use forwarding but needed information from the previous instruction. If we were not using forwarding and need information from a current register in the next instruction, we wouldnt decode the next instruction until after the current instruction finished its memory stage?
If there was no forwarding at all, the dependent instruction wouldn't truly start decode until the previous was in writeback (assuming writes to the register file appear to happen before reads, which is what is assumed in the video). Data is only written to the register file in writeback, so, without forwarding, wouldn't be available until then.
Hey I just wanted to ask if an add instruction was dependent on a ld or lw instruction prior to it, would there be the same 2 cycle stall as there was for the beq instruction that was dependable on the lw instruction?
There are a lot of things you are saying that contradict my teachings and readings on this matter. Can you please explain to me what you define as the following: 1) What is "branch taken/not taken" 2) What is forwarding Additionally, are you saying that the resource in t0 cannot be accessed by the subsequent instruction until the memory stage of the previous instruction? And we have forwarding in this problem? Assuming yes, then your understanding of forwarding, and my understanding of forwarding contradict. Can you help explain?
As is noted in the comment for the video, there is a slightly updated version of this video (ruclips.net/video/Bj_BZ_d0OkU/видео.html). The CPI calculation shown is correct, but, as you note, the line at ~7:00 should extend to cycle 18, for a total of 14 cycles. (Also, the W in cycle 18 for the slt should really be an M.)
When you say "RISC" do you mean actual RISC? If so, actual RISC code is equavelent to what is shown. If you are refering to the Mips code, then yes, real Mips code would change the performance, but it doen't necesarily destroy it.
Sometimes I am so desperate I try to understand Vietnamese videos to study. Always feels good to find an English video even though it isn't my first language
I didn't know Tom Hanks made videos about instruction pipelining in his free time!
tony stark*
hahahaha actually their voices are very similar and I just noticed it 😂😂😂
Wow now that you mention it...
@@zackjohnson9387 Tony Hanks
Thank you so much! RUclips has taught me more about pipelines and data paths in an hour than my prof. has this whole month. They make it so much harder than it has to be!! Again, thank you!
I know it's been a whole year, but do you recall any specific videos that helped explain the topic that you think are definitely worth looking at? If not, that's all good but thanks a lot!
@@ad.i david and sarah harris
Still helping me in 2024 - big thanks!
true
Great video, and active in comments section. Excellent content creator! This is what we need. Thanks
I am a bit confused as to how the iteration is from 5 to 18?
this is wonderful , pipeline fantastic explanation !!!!
in case of forwarding beq schuld not wait for data cause it can get the data passed throw in the same cycle, unless you are assuming that EX stage for slt and ID stage for beq they are not happening in the same cycle:)
This isn't true. In the SLT instruction, the arithmetic is performed at the beginning of cycle 3 therefore it can't be forwarded until the end of cycle 3. So BEQ will not have the correct value in cycle 3 but in cycle 4 it can after the result has been forwarded from SLT - E/M
I don't understand why you stall in the first beq (second instruction), but you don't stall lw (fourth instruction) and let forwarding take care of it. Because the previous instruction blt and add, both have the result ready after the execute stage
In the processor this is dealing with, branches are resolved in the decode stage. In this case that means that the value of $t0 is needed in the decode stage. Since the instruction before the branch (the add) writes to $t0, the value needs to come from the slt instruction and the result of the slt isn't available until the end of the EX stage so the first beq has to stall a cycle so that it can get the correct value from EX forwarded into the ID stage. The lw doesn't need to stall because it doesn't need the value of $t0 until the EX stage (where the branch needed it in the ID stage). In this case, the add instruction has completed the EX stage before the lw enters the EX stage and so no stalling is needed (it is just directly forwarded).
This is awesome. Thank you very much!
This is a really great video, thanks! But I am still not sure on how data dependencies work. How do you know when a command has the data ready for another to use? For example, the BEQ command needs $t0 and it can get it after the SLT command has executed, but the next BEQ command has to wait for the LW command to get to the memory clock cycle. I would be very grateful for an answer, thanks in advance!
This partially depends on the implementation of the pipelined processor. For this example it is assumed that for all instructions that produce data, except for load instructions, the data is available as the instruction moves from the execute (E) to the memory (M) stage. This means that for the slt/beq combination. The SLT produces the data in execute and so that data can be forwarded from the beginning of the memory stage to the decode (D) stage (where it is needed for branches). For load instructions, the data is not available until after the instruction accesses the data memory, which means it is only available as the instruction moves from memory (M) to writeback (W). This is why the lw/beq combination has to wait another cycle as it is only as the lw moves into writeback that the data is available to forward to decode.
@@matthewwatkins88 Thanks for this response, very helpful!!
For when neither branch taken, why does the last instruction "add $v0, $s0, S0" have no cycle?
Because it's outside the loop. Only the ones inside the loop are considered for this problem. We are determining the overall CPI for the loop
The value for $t0 from the SLT instruction should be ready to forward at the later half of stage E, which is right before the early half of stage E for the BEQ instruction, which suggests that value for $t0 will be forwarded to the ALU instead of requiring a stall. Is this not correct?
No, this is not correct. The result of the SLT (or any instruction computed in the execute stage) is only ready at the end of the cycle and so can really only be forwarded at the beginning of the next stage (the memory stage). Additionally, the BEQ needs the value for $t0 in the decode stage since it resolves the branch in this stage. This means the branch can not properly complete the decode stage until the previous SLT has completed the execute stage. *If* the branch was resolved in the execute stage (which is not the case here), then a stall would not be necessary as forwarding would take care of the dependency.
@@matthewwatkins88 I had a similar though. It seems that I have been told that you can forward the data directly to the ALU (or more precisely, the register in between the D and E stages) for the calculation (overwriting the data received from the register in the D stage). This would give you the time to not require a stall there.
Is this just not correct?
@@matthewwatkins88 that not realy true cause the result of each stage could be ready in the first half of the cycle like the WB stage
Are we using forwarding in this problem? I'm confused on when the next instruction should start if we are using forwarding
The example definitely assumes forwarding. I'm not 100% sure what you mean by "start." The processor fetches the next instruction the next cycle. If there is a dependency that forwarding can't handle, then the processor will stall the necessary stages (stalling is shown in the example by stages shown in '()', such as (F)).
What I meant by start was where the next instruction would begin F,D,E... if we didnt use forwarding but needed information from the previous instruction.
If we were not using forwarding and need information from a current register in the next instruction, we wouldnt decode the next instruction until after the current instruction finished its memory stage?
If there was no forwarding at all, the dependent instruction wouldn't truly start decode until the previous was in writeback (assuming writes to the register file appear to happen before reads, which is what is assumed in the video). Data is only written to the register file in writeback, so, without forwarding, wouldn't be available until then.
Thank you that is very helpful! :D
I think in the third case you meant first branch (beq $t0,$0, end) is taken only
I'll stop my head, I would agree with you.
This went way too fast for me. I kept having to rewind.
If we dont have the last line, what the pipeline will be? Can we begin the IF of the first loop line directly in circle 14?
The last line, as I interpret it anyway, is never executed, so removing it really wouldn't change anything.
@@matthewwatkins88 I see. Thank you very much!
Hey I just wanted to ask if an add instruction was dependent on a ld or lw instruction prior to it, would there be the same 2 cycle stall as there was for the beq instruction that was dependable on the lw instruction?
OH MY GOD!! THANK YOU
There are a lot of things you are saying that contradict my teachings and readings on this matter. Can you please explain to me what you define as the following:
1) What is "branch taken/not taken"
2) What is forwarding
Additionally, are you saying that the resource in t0 cannot be accessed by the subsequent instruction until the memory stage of the previous instruction? And we have forwarding in this problem? Assuming yes, then your understanding of forwarding, and my understanding of forwarding contradict. Can you help explain?
instruction no3 and no4 there must be stall at decode for instr 4.Correct me if i am wrong
why is the iteration 14 is including the first W and does not include the last W (between 5 to17) ?
As is noted in the comment for the video, there is a slightly updated version of this video (ruclips.net/video/Bj_BZ_d0OkU/видео.html). The CPI calculation shown is correct, but, as you note, the line at ~7:00 should extend to cycle 18, for a total of 14 cycles. (Also, the W in cycle 18 for the slt should really be an M.)
what are stages in typical four stage cpu pipeline? and whats the purpose of each stage? this question was in my exam. can you help me with answee
This is so cool
this is fantastic :)
Didn't know mike greenberg knew mips
did you forget to resolve a dependency between add and lw?
add $t0, $s3, $s4
lw $t0, 0($t0)
there is is a dependency but it doesn t change the outcome
Because t0 is already executed in first instruction so there is no need for the processor to run it second time.
mi causa dice que te equivocaste, es cierto? que opinas?
I don't speak Spanish.
nice tutorial
Good video
thank you tony stark
Oh come on throw those branch delays in and show how inefficient RISC code is.
When you say "RISC" do you mean actual RISC? If so, actual RISC code is equavelent to what is shown. If you are refering to the Mips code, then yes, real Mips code would change the performance, but it doen't necesarily destroy it.
senin taşşaklarına kurban olalım abi :D
ohh no offence but i am happy to hear non-indian accent, I said oh god thanks in the beginning of the video
Sometimes I am so desperate I try to understand Vietnamese videos to study. Always feels good to find an English video even though it isn't my first language