How Condition Codes Work - Superscalar 8-Bit CPU #37

Поделиться
HTML-код
  • Опубликовано: 20 сен 2024

Комментарии • 36

  • @DavidLatham-productiondave
    @DavidLatham-productiondave 4 месяца назад +4

    I was trying to figure out why you didn't use the inverting output of the multiplexer. But then I realized you needed a way to select the inverted output or not. Which would have required another multiplexer. So your design choices make sense to me now. I know this comment is pointless, but who knows. Maybe someone else will be wondering the same thing.

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +1

      Yeah I felt bad about not being able to use that output. It's already there! 🙂 But the XOR gate was already there, so I didn't even have to add another chip. 😃

  • @Artentus
    @Artentus 4 месяца назад +3

    Conditional move instructions are under-appreciated but I love them. Whenever you can express a branch in terms of conditional moves it avoids so much pipeline stalling.
    One thing to note here is that technically you don't need a separate move instruction anymore because it's equivalent to a conditional move with condition set to always.
    Although that might be slightly sub-optimal if you ever get to superscalar execution because it uses the ALU while the normal move does not. That could also be solved in the decoder tho.

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +3

      Yeah I couldn't agree more! If your CPU already goes through the trouble of having flags, the conditional moves are often almost free. And as you said, they don't mess with the instruction fetch pipeline at all 🥳.
      And you're right, the always condition would allow you to use a conditional move as a regular move. My current idea for out-of-order execution is to make the registers store IDs when an instruction is still in flight. With a dedicated move as I have it now, you can just copy that ID into another register, and then have both registers store the result when it becomes available. If the move ran through the ALU, you'd occupy a reservation as you suggested.

    • @alexloktionoff6833
      @alexloktionoff6833 4 месяца назад +2

      @@fabianschuiki but having ALU flags makes all commands interdependent, so out of order becomes very complicated. Why to have flags register in your superscalar design at all? You don't have legacy to be backward compatible. Why not use ALPHA, MIPS and RISCV way of using register values by themselves for conditionals and just reorder commands based on used registers?

    • @fabianschuiki
      @fabianschuiki  4 месяца назад

      @alexloktionoff6833 Yes, you're definitely right! All instructions that interact with the flags become dependent in one form or another. And I agree, other ISAs like RISC-V have a more elegant approach that feels cleaner and more modern. One thing to keep in mind though is that these generally are wider CPUs with wider instructions, 32 bit for RISC-V for example. This means that you only need a single register operand to provide a jump offset, or you have plenty of bits in the instruction for immediate jump offsets. That allows you to have conditional branches like `blt r0, r1, label`, which compares r0 and r1 and, if r0 is less than r1, does a relative jump to the label. For narrow instructions like my 16 bits, it's difficult to encode two register operands plus a reasonably large jump offset in a 16 bit instruction. This is also why flags tend to remain popular on CPUs with narrow datapaths, e.g. 8 bits like here, or narrow instructions, like 8 or 16 bits. As soon as you clear the 32 bit hurdle, you're in territory where addresses start to fit into single registers, and you have enough instruction bits to encode enough data to no longer need the flags crutch.

    • @alexloktionoff6833
      @alexloktionoff6833 4 месяца назад +1

      @@fabianschuiki but there are ways not to store offset in jump instructions: use conditional SKIP instruction or conditional RET instruction with registers # and simple dumb jump with only constant offset.

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +1

      @alexloktionoff6833 Yes, you're totally right, there are definitely ways to avoid most of the mess! A flag-less 8 bit CPU would definitely be worth exploring 🙂

  • @reinoud6377
    @reinoud6377 4 месяца назад +2

    I wonder why you didnt use a programmable ic as you used before with the expressions, this might safe a few ICs in the pcb, heck maybe even a small lut

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +1

      Yeah that would have been a pretty nice idea! 🙂 You could probably fit the entire condition matching circuitry into a single 16V8 PLD. I'm trying to avoid using PLDs for everything because they could essentially absorb almost all logic in the build. Maybe you could have a maximum-PLD build that tries to use them to their greatest potential. They do also have a few downsides however, especially in terms of power consumption. I haven't really figured out a rule for myself as to when a PLD is okay, and when I'd rather use discrete chips. But for something like the circuitry in this episode, where almost all gates in the chips are actually utilized, having discrete chips instead of a PLD feels okay. But your point definitely stands: this could have been a PLD 🙂

  • @OscarSommerbo
    @OscarSommerbo 4 месяца назад +1

    This video answered a question I always had about the various conditional jump instructions, why are there so many, even in simple systems? Because you have to have a few, and those few can trivially combined to make the more specialized conditional tests. Great video to learn from.

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +1

      Thanks! 🙂 I'm learning a lot about x86 and its humble beginnings by going through these circuits. I did most of my work on more modern RISC-style architectures, where flags are mostly absent. It's great to work with them for a change! Although I can totally see why they will become very annoying once you're trying to do out-of-order execution 😬

    • @janhofmann3499
      @janhofmann3499 4 месяца назад +1

      @@fabianschuikiI thought that the flags register in OoO CPUs gets renamed like any other architectural register. Microbenchmarks e.g. on the Firestorm cores in Apples A14/M1 suggested that it has a flags register file of 128 entries. It’s on the other hand astonishing that all the needed logic can be implemented with so few components..

    • @fabianschuiki
      @fabianschuiki  4 месяца назад

      @janhofmann3499 Yeah I think when you move to OoO execution, you promote the flags register to just yet another register. And all your ALU instructions have an implicit flags register operand that is read and/or written. That makes it almost just a compression scheme in the instruction set which makes certain register operands implicit. It's kind of fun to think about, but I also get why more modern ISAs like RISC-V skip flags entirely 🙂

  • @akkudakkupl
    @akkudakkupl 4 месяца назад +2

    Those floating unused inputs are bugging me ;D

    • @fabianschuiki
      @fabianschuiki  4 месяца назад

      😃 Yeah they definitely need to be tied off.

    • @akkudakkupl
      @akkudakkupl 4 месяца назад +1

      @@fabianschuiki I left a comment on your assembler video (last part), IDK if you get notifications on old videos. Very nice watch after the first two, I must say. Skimmed the second two a bit because I had and idea that might make your life easier and just had to comment.
      I'm certainly going to follow this like the James Sharman CPU series :-)

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +1

      @akkudakkupl Thanks 🙂! Labels and a table-based approach are going to make the assembler a lot nicer to work with and extend 🥳

  • @costa_marco
    @costa_marco 4 месяца назад +2

    Is comparing against -128 allowed in your implementation? From my understanding, SF will always be different from OF when comparing to -128, on either operand.

    • @fabianschuiki
      @fabianschuiki  4 месяца назад

      Hmmm, I think it should work like any other signed number 🤔 I'll have to recheck that carefully though. Thanks for the pointer!

  • @naikrovek
    @naikrovek 4 месяца назад +2

    register your copy of sublime text lol

    • @fabianschuiki
      @fabianschuiki  4 месяца назад

      😃 Yeah I should. I think I have a license for it lying around somewhere.

  • @lawrencemanning
    @lawrencemanning 3 месяца назад +1

    I cheated and just borrowed ARM’s 4 bit codes. 😂 Except I made 0000 “always” as it bugged me otherwise.
    Edit: and on a previous build I had instruction bits to burn so just had a nybble for “cares” flags and a nybble for what value the cared for bits had to be. It works fine and the programmer can dream up nonsensical tests (eg. Zero and negative) if they want, but it is wasteful.

    • @fabianschuiki
      @fabianschuiki  3 месяца назад

      Haha, that's definitely a good approach! 😃 How come you had bits to spare?

    • @lawrencemanning
      @lawrencemanning 3 месяца назад +2

      @@fabianschuiki my very first softcore processor was a 16 bit address and data multi cycle. Many instructions took trailing 16 bit immediates, including branching. Was quite pleased with it (got as far as programming Snake (video on my channel if you are interested), but in retrospect it wasn’t great. Latest 32 bit core is a mostly RISC like 2 stage pipeline with embedded immediates. It’s not quite as friendly on the assembly programmer, but more interesting technically. I’ve implemented Boulderdash on that. Yes I build processors to play 80s computer games! 🤣

    • @fabianschuiki
      @fabianschuiki  3 месяца назад +1

      @lawrencemanning That's really nice 😃🤓!

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt 2 месяца назад

      @@lawrencemanningwhat stages? A typical Homecomputernhad a single system bus. It was used for code, data, graphics, ROM, and sound. So naturally, a CPU grabs the data from the bus at the optimal time and keeps it in a register. Even the 6502 has a fetch stage in its pipeline. So the other stage is decode and execute reg-reg and reg-imm?