Poor Man's Conditional Jump - Superscalar 8-Bit CPU #36

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 31

  • @halfacanuck
    @halfacanuck 5 дней назад +1

    Major milestone! Congrats. Also, seems to me an immediate version of TEST would be very useful, if you can make it work.

    • @fabianschuiki
      @fabianschuiki  5 дней назад

      Yeah that's an excellent point! Immediate flavors of these ops would be very nice. I think that should be pretty doable once the instruction decoder is in place. Either as 4 or 8 bit immediates 🥳

    • @halfacanuck
      @halfacanuck 5 дней назад +1

      @@fabianschuiki An immediate TEST would be great for testing bits, so it'd make sense if it took an 8-bit immediate. A separate bit-test instruction would need only 3 bits, though couldn't test multiple at once.

    • @halfacanuck
      @halfacanuck 5 дней назад +1

      @@fabianschuiki As for immediate CMP, a common use-case might be testing for specific ASCII/UTF-8 codes, which would mean 7 bits for the English market and 8 for the rest of the world ;). A 4-bit immediate might be useful for small counting-up loops, but I'm not sure they happen often enough to be worth it. However, there might well be a use-case that's not occurring to me right now.

    • @fabianschuiki
      @fabianschuiki  5 дней назад

      Things like `addi r0, 1` or `addi r0, -1` are extremely common, so the 4 bit immediates would be very handy there. Or things like `andi r0, 0b10`, where you only fiddle with the lowest bits. But there is definitely some opcode space available for full 8 bit immediates, so maybe addi and cmpi would take full 8 bit ones. (Each consumes a massive 4096 slots in the opcode space, so there's only a limited number of 8 bit immediate ops we can have.)

    • @halfacanuck
      @halfacanuck 5 дней назад +1

      @@fabianschuiki I was referring to immediate CMP specifically when I said I wasn't sure how useful 4 bits would be. I can definitely see the utility of an immediate ADD, even if it's only 4 (sign extended) bits because that would cover by far the most common use cases. Personally I'd probably prioritize immediate TEST and ADD over CMP but hey, it's not my project and I've given it very little thought :)

  • @KeesJanLogemann
    @KeesJanLogemann 6 месяцев назад +2

    I was ready to see you implement the support for labels in the assembler.py assembler as you used "loop:" in the source file.
    spoiler? coming in the next video?

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад

      Definitely coming soon 🙂 Computing those labels manually is getting old pretty fast 😅

  • @OscarSommerbo
    @OscarSommerbo 6 месяцев назад +2

    This series, along with James Sharman's inspired me to design my own architecture. I am nowhere skilled enough to create the physical cpu/computer (it is big project) but Fabian gives me ideas on how to do some interesting things. Once I get more Ideas nailed down I might do some videos about it.

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад

      That sounds fantastic! I'd love to see your CPU design documented in video form. Glad to hear that 🙂

    • @OscarSommerbo
      @OscarSommerbo 6 месяцев назад +1

      @@fabianschuiki As a teaser, a split bus. One data bus and one instruction bus. That is the starting point, everything else kinda flows from that.

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад +1

      @OscarSommerbo That's a great idea!Harvard architectures are very popular for microcontrollers and safety-critical systems. Are you planning on strictly separating storage for data and instructions, or to just split the buses early on and have some caching for data and instructions before they access a joint memory?

    • @OscarSommerbo
      @OscarSommerbo 6 месяцев назад +1

      @@fabianschuiki I didn't know of the Harvard architecture prior to starting mapping out the system, but I found out from ChatGPT. I use the chatbot to organize my ideas and to get inspiration.
      My current plan is for strict separation, but I am starting to see the flaws in that. So I will massage that concept some more.
      The big bonus with having a clean instruction bus is that adding in co-processors and/or ASIC accelerators is fairly trivial and with minimal overhead. And with a blitter type chip the main CPU can offload ram-ram transfers, a crypto ASIC could decrypt and encrypt RAM and keep strict ACLs for processes, all on its own.
      Most of the instant upsides have been security related, mainly because security have been a major failing of cpus recently. But I am thinking about how a GPU could be hooked in, I am thinking the blitter copies out ram regions to a double/triple frame buffer and the GPU is more of a specialized math coprocessor.
      There I go, spilling the beans.

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад

      This sounds very exciting. Do you plan to build an ASIC?

  • @sarge2742
    @sarge2742 6 месяцев назад +1

    This is progressing really nicely, I'm personally quite interested to see what you have in mind for 'proper instruction decoding' that you've mentioned a few times.

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад +1

      🙂 I've been kicking that can down the road for quite some time now. I'm probably going to do something similar to James Sharman and Ben Eater, but with less of a centralized control vibe. If you look at modern CPUs, the instruction decoding stage is mainly responsible for figuring out which registers get read or written, and at what general functional unit in the processor you can throw the instruction. But the decoder often doesn't care about the details too much. I'm inclined to do something similar: have the decoder mainly figure out which registers (and flags) are read and written. This will be necessary for the hazard detection and prevention mechanism down the line, once instructions can complete out of order. The decoder will have to know if all register operands are already available, or if it has to stall and wait for their computation to finish. Then later on, reservation stations can allow the decoder to dispatch the instructions with incomplete data, and have them sit and wait at the corresponding functional unit.
      So long story short: the decoder will likely focus on general register interaction, and leave detailed decoding up to the corresponding functional units, like the decoder in the ALU.

  • @lawrencemanning
    @lawrencemanning 4 месяца назад +1

    I've never seen someone use the flags register state to directly calculate a branch offset. Well done. :)
    Question: computed branches seem a bit unusual. Computed jumps, sure, many uses. But branching through an offset held in a register? How many ISAs have that?

    • @fabianschuiki
      @fabianschuiki  4 месяца назад +1

      None that I am aware of 😃. Usually you would just load a base address into a register and add the offset onto that yourself. But since this is an 8 bit machine I thought it might come in handy. And it was basically free hardware-wise. Immediates and the rs2 operand are on the same wires 🙂

  • @janhofmann3499
    @janhofmann3499 6 месяцев назад +1

    Great as always and the multiplication was the sugar on top. The HUD overlay is fantastic but looks like a lot of work. Can you at least somehow automate/script/.. this process or is it a click orgy in your editing software?

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад +1

      Thanks 🙂! The animations are pretty straightforward with Manim. It's just a Python script that assembles the animations. So for these overlays, I can put the list of instructions into an array and then let the script more or less simulate the CPU and update the overlay. Not very clean, but gets the job done. Manim has been fantastic. It's annoying for schematics, but for anything that is regular and repetitive, like updating a state overlay, it's brilliant.

  • @JaenEngineering
    @JaenEngineering 6 месяцев назад +1

    This is really starting to come along. If I remember correctly, can't we already alter the step size in the program counter? If so then couldn't we use some logic to either step to the next instruction which would be a relative jump back to the start of the loop or double step past the relative jump to exit out of the loop depending on the flag status. Seems like another good use for our pal the PAL!😅

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад +2

      Haha great point 🙂 the PAL pal does have a lot of uses. Using the step size for skipping instructions is a nice idea! I was thinking about using the flags to derive a condition codes for jumps, and then feeding that into the select signals of the PC: if the condition holds, do a relative jump, and if it doesn't hold, do a regular step. That would allow you to write things like `breli.z -16` to branch backwards by 16 bytes if the zero flag is set.

  • @andrewwatts1997
    @andrewwatts1997 6 месяцев назад +1

    amazing progress! one step closer to a great cpu.
    do you have any specific tasks you want it to perform when it's done?

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад +3

      Thanks! 🙂 I think I want to be able to write a simple operating system to run on it. Something with a little bit of virtual memory, some form of user space vs. kernel space separation, and a trivial form of multithreading. I don't think the CPU needs a lot of features for that. But it would be cool to have something like an 80s era homebrew CPU with a modern twist 😃

    • @andrewwatts1997
      @andrewwatts1997 6 месяцев назад +1

      @@fabianschuiki That sounds like a small version of linux ;)

    • @fabianschuiki
      @fabianschuiki  6 месяцев назад +1

      @andrewwatts1997 That doesn't sound like a terrible thing 😁 Well, it would be a very tiny version of it. But in the spirit of exploring the fundamentals of how modern CPUs do their things, it doesn't sound too bad to toy around with the basics of an OS 😏