CPU Pipelining - The cool way your CPU avoids idle time!

0612 TV w/ NERDfirst

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 янв 2025

Комментарии • 66

@Ans3lm0777 2 года назад ⁺¹¹
The explanation in your videos are so crisp. Really appreciate the quality of these - keep it up :)
@NERDfirst 2 года назад
Hello and thank you very much for your comment! Glad you liked the video =)
@Eujinv Год назад ⁺³
I have a computer architecture exam late this morning, wake up extra early to go to the hospital for a visit, im watching this video while im waiting🙌🏻
@NERDfirst Год назад ⁺¹
Hello and thank you for your comment! Do take care and all the best for your exam =)
@mill4340 Год назад
I completely forget all of this having studied for Comp Arch class. Your video refreshes the introduction I needed. Thank you.
@NERDfirst Год назад
You're welcome! Glad to be of help =)
@123jimenez99 2 года назад ⁺²
Amazing video, it really made my understand why the PPE cores used both in CELL and Xenon where so underwhelming, it really suffered from all the bad stuff mentioned in this video: long pipelines, lots of stalls, lack of out of order execution and more. Also it made me realize how important was relying on the SPEs as much as possible in CELL's case, witch BTW was a big PITA. Cool Stuff.
@NERDfirst 2 года назад ⁺¹
Oh wow, this is a great case study, thank you for sharing! Its pipeline is 23 stages! Really interesting to read about.
@123jimenez99 2 года назад ⁺¹
@@NERDfirst Prescott P4: Hold my beer!
@NERDfirst 2 года назад ⁺¹
At least that's x86 - a CISC instruction set so it's less out of place!
@jefferybarnett1849 2 года назад
Thanks for enlightening me about heuristics. I loved the graphical representation of the "shifts" in your presentation on pipelines and "stalls" that happen and avoiding them along the way. I knew just a moment before you showed us that the instructions were about to be reordered. My understanding has been improved. My knowledge of assembly language helped a lot, I just never bothered to look into the matter as you have done. Thanks a lot.
@NERDfirst 2 года назад
Hello and thank you very much for your comment! Glad you enjoyed the video, and really appreciate you sharing your "aha" moment - That's one of the things I live for as an educator =)
@ArneChristianRosenfeldt Год назад
Heuristics makes me want to see a CPU (simulation) where the scalar CPU splits up into two threads at every branch (becomes super scalar). Store commands write into a FIFO! Then when the branch condition is clear, a whole tree of threads is flushed. The Store FIFO of the taken branch is flushed to memory. This might be a useful operation mode for those 16 core RISCV chips.
@KumaAdventure Год назад ⁺¹
Thank you, this helped clarify some things I came across for the Comptia A+ exam. Much appreciated.
@NERDfirst Год назад
You're welcome! Very happy to be of help :)
@akioasakura3624 Год назад
THANK YOU SIR!! I made many minecraft CPUs when i was 13. back then there werent many videos or resources that didn't explain pipelining in terms of "car assembly lines" or "laundry", or 4000 page university PDFs from the 90s. Thank you so much good sir.
@NERDfirst Год назад ⁺¹
You're welcome! Very happy to be of help =) I think those are fairly textbook explanations so it's no wonder you see them a lot. Analogies are good too I suppose, but I guess nothing beats visualizing it properly!
@akioasakura3624 Год назад
@@NERDfirst i struggled with this for so long. but thanks to u maybe i can try playing minecraft again. have a good day!!
@NERDfirst Год назад ⁺¹
Good luck! Consider planning out your design first using actual logic components before doing it in game. Redstone is a whole different level of complexity!
@akioasakura3624 Год назад
@@NERDfirst ohh alright, thanks!!
@juanmanuelserna7692 Год назад
Great quality video, easy to understand for people who does not come from computer science world, great job!
@NERDfirst Год назад
Hello and thank you very much for your comment! Glad you liked the video :)
@Atharv0812 2 года назад ⁺⁵
Your content is so professional. Can you also make videos on modern microprocessor architecture like i3 ,i5 ,i7 etc.
@NERDfirst 2 года назад
Hello and thank you for your comment! Unfortunately those architectures are far more complex (some modern architectures have twenty or more pipeline stages) so I haven't gotten round to learning about them.
@akkudakkupl Год назад ⁺⁵
That's not the only reason for pipelining. You could do a CPU that does the whole instruction in one clock (one rising, one falling edge). But you still have propagation time that limits max clock speed (and computation speed), pipelining allows to break up propagation into smaller chunks and to elevate clock speeds.
@NERDfirst Год назад ⁺¹
Hello and thank you for your comment! To be fair, increasing clock speed this way isn't going to increase the overall speed of computation - No point getting your clock speeds up to 20GHz if every instruction has to make its way through 100 pipeline stages!
Ultimately it's less about managing propagation delay - In fact having multiple pipeline stages _increases_ the total per-instruction propagation delay since it makes the circuitry more complex. The advantage comes about from the "parallelism" where we essentially start on the next instruction before the last one is complete.
@akkudakkupl Год назад ⁺¹
@@NERDfirst let's say you have an ALU that has 100ns propagation. Now you split that up into two 50ns steps with some latches in between. You just almost doubled your instructions per second due to doubling the clock rate. This is pipelining and it's most important reason.
What you are referencing is superscalarity and out of order execution - the use of multiple execution units to their full extent.
@NERDfirst Год назад
I think we're talking about the same things using different words, or maybe I just wasn't explicit enough on the point. My way of explaining it (at 3:32) assumes that pipeline stages exist but instructions are processed to completion before the next instruction enters the pipeline. Your way of explaining it does away with the pipeline model and considers the execution of an instruction as a single large step.
I didn't explicitly mention propagation delay by name to reduce on cognitive load, but I do believe the understanding conveyed is the same. If I understand your explanation correctly, you get a doubling of instructions per second _because_ of instruction-level parallelism. At the end of the day, if you double the clock speed but each instruction takes two clock cycles to complete, the number of instructions per second is exactly the same. It is because of superscalarity allowing you to have multiple instructions in the ALU at once that you can have a performance benefit.
Do let me know if I'm understanding you wrongly. It's been a while since I did this stuff.
@akkudakkupl Год назад ⁺¹
@@NERDfirst In my example my single ALU can be in two discrete steps of executing two instructions - first half of a new instruction and second half of an older instruction. You can imagine my pipeline like this (a modification of the classic RISC pipeline):
Fetch
Decode
Execution 1
Execution 2
Memory
Write Back
I have divided the execution stage in two. This is because my hypothetical ALU would have 100 ns of propagation and would limit the clock to 10 MHz. By splitting it up I now have a little longer pipeline , but my largest propagation went down to lets say 55 ns (because we had to add latches in between stages its not ideally half). Now my CPU can run at 18 MHz. Both of those frequencies roughly translate to instructions per second because in both cases the instructions complete "in a single cycle" due to pipelining. This is the advantage of longer pipelines - as long as you get an uninterrupted stream of instructions you can get a boost in IPS because you have higher max clock. This is of course not ideal because you have branches in the code and that stalls or flushes the pipeline.
You are executing multiple instructions at a time because result of one step is transferred further on to be computed in the next - basicaly it's an improvement over very old CPUs that executed those steps one after another because pipelining needs additional circuitry, so you got one instruction in for example 4 clocks.
But you can't compute more instructions at a time than you have pipeline stages. For that you need superscalarity - having multiple ALUs, multiple address generation units, etc. working at the same time - and to make it work right you also use out of order execution, so you can fill up those elements pipelines (yes, everything is pipelined in a modern CPU).
What I was implying earlier was that a Harvard architecture CPU could execute a full instruction in a single clock - because both instruction and data are supplied at the same time - but it might not run at a very fast clock because data has to propagate through the whole datapath in that one clock cycle.
@AshtonvanNiekerk Год назад
Very well explained.
@NERDfirst Год назад ⁺¹
Hello and thank you for your comment! Very happy to be of help =)
@awayfrom90 9 месяцев назад
Superb explanation 🎉
@NERDfirst 9 месяцев назад ⁺¹
Hello and thank you very much for your comment! Very happy to be of help :)
@LegonTW0 Год назад
gracias capo, clarito como un vasito, te quiero
@NERDfirst Год назад
Hello and thank you for your comment! Glad to be of help =)
@itznukeey 2 года назад
Great explanation, thanks
@NERDfirst 2 года назад
You're welcome! Glad to be of help =)
@dimnai 2 года назад
Great video, well done!
@NERDfirst 2 года назад
Hello and thank you very much for your comment! Glad you liked the video :)
@galdali10 Месяц назад
Great video!!!
@NERDfirst Месяц назад ⁺¹
Hello and thank you very much for your comment! Glad you liked the video :)
@DReam-mn4mj Год назад
Great video, keep it up!
@NERDfirst Год назад
Hello and thank you very much for your comment! Glad you liked the video :)
@Epic-so3ek 2 года назад
these videos are really good
@NERDfirst 2 года назад
Hello and thank you very much for your comment! Glad you liked the video =)
@memeingthroughenglish7221 6 месяцев назад
Damn, your videos are so nice!!!
@NERDfirst 6 месяцев назад
Thank you very much! I remember your comment on another one of my videos as well, glad to know you like my work =)
@robot67799 2 года назад
Great content 👍
@NERDfirst 2 года назад
Hello and thank you very much for your comment! Glad you liked the video =)
@fraewn2617 2 года назад
well put
@NERDfirst 2 года назад
Thank you very much! Glad you liked the video :)
@JedJarin 2 месяца назад
thank you
@NERDfirst 2 месяца назад ⁺¹
You're welcome! Glad to be of help :)
@cyprienvilleret2266 2 года назад
great thanks
@NERDfirst 2 года назад
You're welcome! Glad to be of help :)
@Brekstahkid 6 месяцев назад
Good stuff
@NERDfirst 6 месяцев назад
Thank you! Glad you liked the video :)
@cheenoong9228 Год назад
why do i see in some materials regarding the order of the process is IF ( Instruction Fetch ) --> ID ( Instruction Decode ) -> EX( Instruction Execute ) -> MEM( Access Memory Operand ) -> WB ( Write Back )
@NERDfirst Год назад ⁺¹
Hello and thank you for your comment! If I'm not wrong, what you've described is specifically the MIPS pipeline. Different architectures can have a different number and order of pipeline stages, so this isn't universal. What I've shown in the video isn't linked to any specific assembly architecture, it's just a generic abstract pipeline to make understanding things easier.
@ArneChristianRosenfeldt Месяц назад
I think that MIPS tries to speed-up write back. When every value flows through the pipeline for 5 cycles, we can turn off power for that register for this time. Leakage should bring it to a middle state between on and off. Then we write back, which is still a little power hungry due to the fan-out, and then turn on power to let the bits flip into their intended states.
@adamchalkley956 Год назад
I have a question, not all instruments have a write back, i.e. not written the results back to registers, memory, etc. for example on the 8080, jmp instructions do not write back to anywhere. Another example would be a MOV instruction, that moves data from memory/registers to registers/memory.
So what happens when an instruction has no write back? Does it execute a noop?
Again I’m still quite the novice, thanks
@NERDfirst Год назад ⁺¹
Hello and thank you for your comment! Yes, instructions that don't require any action to be taken on any stage would still have to go through the stage, but will do nothing there.
@adamchalkley956 Год назад
@@NERDfirst Thanks, that makes sense
@bahrikeskin5824 Год назад
could you change he song please my brain is burning because of this :(
but i understand the consept thanks :) like
@NERDfirst Год назад ⁺²
Oh sorry about that! I compared levels with popular RUclipsrs and realized my BGM was turned down much lower than them. I'd hoped for it to be out of the way but looks like you still picked up on it. I'll see what I can do for future videos!
@bahrikeskin5824 Год назад
@@NERDfirst thanks

Следующие

Автовоспроизведение

Ep 085: Introduction to the CPU Pipeline