Fabian, I want to thank you for the level of detail you put in to each of these videos. It is obvious that you put a lot of time and effort into not only the content of each video, but also in the video production it self. All of the on-screen indicators for the state of the computer are hugely helpful in following what is happening. Please keep up the awesome work!
Thank you so much! Very glad you appreciate it 😀! It does take quite a bit of effort, but I think it can help follow better what's going on and how things change. Sometimes it's hard to see the exact values in registers, reading off those tiny SMD LEDs. And it's always easy to skip parts of the video if they are too slow, but it's hard to go the other direction 😬
It’s very interesting watching this evolve and imagining alternative solutions. For example, the 4 bit mux for selecting the carry source could alternatively make use of the carry in feed being the same as the ISA add/sub bit to supply the carry-in for cases not coming from the carry register - reducing the mux to a choice of two… which might even be feasible with a tri-state buffer and tri-state carry register? So the C bit of the ISA controls the source of carry-in being either carry register or the add/sub bit of the ISA. Hopefully not muddy late night thinking 😊 and a lot might depend on what else you have in store for us. Many thanks for a great series.
Thanks! 😀 Yeah that's a very nice idea! The reason why I went with the 4-way mux is because it'll allow me to replace on of the inputs that's currently hard-wired to the carry flag with something else, like the sign bit, later on. I plan to add some ALU op decoding soon, which means that there will be more opportunity to make the ALU have just a bunch of control signals, and then look at the ALU op and set them correctly. For example, that Carry Op mux can then be used to inject 0, 1, CF, or the sign bit into the computation, either as a bit to be shifted in or as the carry input to the adder. The decoder would then just look at the requested ALU op and put all the switches in the right place to make it happen. You make an excellent point though; using the the add/sub ISA bit for the non-CF input of the mux would make things more compact. Cool idea 😀
As someone who is also interested in video production, I would love to see a "behind the scenes" video about how you produce these episodes. I'd especially like to understand how you do the text overlays and animations.
Haven't thought about doing something like that. Great idea! I'm mainly using Manim by Grant Sanderson (@3blue1brown), which is absolutely fantastic for creating these highly systematic, repetitive animations. Because it's Python, I'm basically just writing a script that runs through a list of instructions and processor states, and displays those on the overlay. That produces a truckload of small movie segments that I line up with the actual camera footage in the edit, such that the overlay tracks exactly what's going on in the recording. No magic really, just busy work, and Grant being an absolute legend for writing Manim 😀
Cool. Looking forward to seeing what you do with the shift. Be it either James Sharmans simple shift using 4:1 muxers or a full shifter using 8:1 muxers. I know the full shifter is a little more work, but I think the option of having logical, arithmetic, rotate and rotate thru carry style of shifts will be handy. (Also nice that logical left and arithmetic left are the same, so we only need 7 inputs for shifts leaving one for a pass through)
My plan is to imitate James' work and make the ALU do shifts by just one position, because it simplifies things quite a bit. A proper barrel shifter would be a nice example for a separate functional unit, maybe one that takes multiple cycles. That could tickle the out-of-order execution quite nicely 😏
My idea for a full shift unit still only shifts by one position at a time, it just offers the full complement of shifts. So by using an 74x151 8:1 muxer we have Input (0) [0,0,0] pass through 7, 6, 5, 4, 3, 2, 1, 0, Input (1) [0,0,1] logical shift right [0], 7, 6, 5, 4, 3, 2, 1, -0,- Input (2) [0,1,0] logical/arithmetic shift left -7,- 6, 5, 4, 3, 2, 1, 0, [0] Input (3) [0,1,1] arithmetic shift right 7, 7, 6, 5, 4, 3, 2, 1, -0,- Input (4) [1,0,0] rotate left 6, 5, 4, 3, 2, 1, 0, 7, Input (5) [1,0,1] rotate right 0, 7, 6, 5, 4, 3, 2, 1, Input (6) [1,1,0] rotate left through carry {7}, 6, 5, 4, 3, 2, 1, 0, [C], Input (7) [1,1,1] rotate right through carry [C], 7, 6, 5, 4, 3, 2, 1, {0}, [ ] actual/fixed value ( "C" = carry flag) { } bit into carry flag -Struck out- bits discarded.
Oh I see, yeah that's pretty cool! I like the flexibility 👍. I was thinking about picking carefully what bit gets shifted in as a carry, which would allow you to do the arith/logic right shift, left shift, pass through, and rotate through carry. I don't think my solution allows for the plain rotates without carry though 🤔.
I guess it depends on what you think you'll need in the future. If you're pretty certain that you can get away with a simpler version, there's benefits to that, namely a reduction in parts count. My idea requires double the ICs at it's core. Trivial on a PCB, if a little larger, but not very breadboard friendly.
Yeah good point. On a breadboard the wiring would be a bit annoying. I might do a simple thing now and then a proper shifter later as a functional unit 🤔
Yes, still working on it! Got the next video almost done. Depending on the day job, my time to work on videos sometimes drops to almost zero for several weeks. But they'll be back soon 🙂
That’s awesome! Yeah, I figured that might be the case. I’m trying to plan a channel in the future but have to work on that around Master’s classes and my day job. Thanks for working on this project, it’s very inspiring 😊
Does this system have the same considerations with loading 0 into a register vs xor clearing that x86 systems have? Do you think you'll ever run into an issue like that in your project?
Right now it does not, because all instructions are 16 bits at the moment. The xor register clearing pattern in x86 as far as I understand is mainly to reduce code size. You could also move an immediate into the register, but that will cost 4 or 8 additional instructiom bytes for the immediate. If I recall correctly, there's even some logic in the x86 instruction decoder to detect these patterns and turn them into proper "clear reg" micro-ops. I am planning to introduce compressed instructions in the future. We'll probably want to have a set-to-zero op that is just one byte, assuming that is a fairly common instruction. 🙂
I suppose at some point you will add comparison so you will probably use the result of the sub with the carry (< >). And some way to jump depending on the carry value. Still guessing here, I suppose the add/sub will be useless as the addc/subc can do both and you will add some instructions to change the carry register…..
That's the plan 🙂. The add/sub is still useful though, because it lets you ignore the carry such that you don't need another instruction to clear it beforehand. Being able to change the carry is also useful regardless, because you might want to do context switching or multithreading, which requires saving and restoring the machine state at any point.
Fair point! 😏 I do have a plan to deal with the whole instruction encoding and decoding thing a bit better. Need to get the ALU out of the way though. The nice thing about 16 bit instructions is that it theoretically offers 65536 individual instructions. However, 1-operand instructions will cost 16 such slots, 2-operand ones 256 slots, immediate-only ones 256 slots, and immediate-plus-reg ones 4096 slots. But there's quite a bit of encoding space to go around and play with. Maybe also some instruction compression? 😇
Yes I was thinking about something along those lines. For example, there could be 8 bit compressed instructions, which would run through a ROM which expands them to regular instructions. That would reduce the instruction fetch bandwidth, because a 16 bit line from memory could hold 2 compressed instructions, keeping the CPU busy for 2 cycles 🙂
This is such an excellent series, thank you so much. I built everything you taught us already in h.neemanns "digital" simulator and it works like a charm. Can't wait for #31 to continue and hope you keep going 🙂
Thank you so much! Great to hear that you were able to replicate the build in "Digital". That's very exciting 🙂. I'm almost done with the next video in the series -- hope to get it uploaded and released soon. Time to finish up the ALU and get back to some actual CPU architecture work 😉
@@fabianschuiki I will continue to implement it in "Digital", my ambitious plan is to use that as a basis for an fpga implementation later. Once I cleaned up everything you could add it to your repo if you like, maybe it's interesting for others as well to play with the simulator. Currently it's still a mess in layout but works very well. It should be even possible to use the output of your assembler to drive the CPU via the api from there and use it for debugging and single-stepping through the code with a view to all registers and so. In principal that's possible, the author h.neemann does that for his own cpu implementation.
@andreassteinhauser9508 - I have being doing the same thing in Digital since the beginning! Great idea about using Digital's api, I hadn't considered that. Thanks for the idea! :-D I'm porting the assembler to C# though, I just prefer it to Python /shrug. Cheers!
Fabian, I want to thank you for the level of detail you put in to each of these videos. It is obvious that you put a lot of time and effort into not only the content of each video, but also in the video production it self. All of the on-screen indicators for the state of the computer are hugely helpful in following what is happening. Please keep up the awesome work!
Thank you so much! Very glad you appreciate it 😀! It does take quite a bit of effort, but I think it can help follow better what's going on and how things change. Sometimes it's hard to see the exact values in registers, reading off those tiny SMD LEDs. And it's always easy to skip parts of the video if they are too slow, but it's hard to go the other direction 😬
I really love this series. Thank you!
Thanks! 🙂
yes, finally the wait is over! Love to watch this series progress! Really solid work man. 🙂
Thanks 😀!
It’s very interesting watching this evolve and imagining alternative solutions. For example, the 4 bit mux for selecting the carry source could alternatively make use of the carry in feed being the same as the ISA add/sub bit to supply the carry-in for cases not coming from the carry register - reducing the mux to a choice of two… which might even be feasible with a tri-state buffer and tri-state carry register? So the C bit of the ISA controls the source of carry-in being either carry register or the add/sub bit of the ISA.
Hopefully not muddy late night thinking 😊 and a lot might depend on what else you have in store for us.
Many thanks for a great series.
Thanks! 😀 Yeah that's a very nice idea! The reason why I went with the 4-way mux is because it'll allow me to replace on of the inputs that's currently hard-wired to the carry flag with something else, like the sign bit, later on. I plan to add some ALU op decoding soon, which means that there will be more opportunity to make the ALU have just a bunch of control signals, and then look at the ALU op and set them correctly. For example, that Carry Op mux can then be used to inject 0, 1, CF, or the sign bit into the computation, either as a bit to be shifted in or as the carry input to the adder. The decoder would then just look at the requested ALU op and put all the switches in the right place to make it happen. You make an excellent point though; using the the add/sub ISA bit for the non-CF input of the mux would make things more compact. Cool idea 😀
As someone who is also interested in video production, I would love to see a "behind the scenes" video about how you produce these episodes. I'd especially like to understand how you do the text overlays and animations.
Haven't thought about doing something like that. Great idea! I'm mainly using Manim by Grant Sanderson (@3blue1brown), which is absolutely fantastic for creating these highly systematic, repetitive animations. Because it's Python, I'm basically just writing a script that runs through a list of instructions and processor states, and displays those on the overlay. That produces a truckload of small movie segments that I line up with the actual camera footage in the edit, such that the overlay tracks exactly what's going on in the recording. No magic really, just busy work, and Grant being an absolute legend for writing Manim 😀
Great content. More people need to know about this channel.
Thanks! 🙂
Thanks for all hand work,
I'm glad you like it! 🙂
Cool. Looking forward to seeing what you do with the shift. Be it either James Sharmans simple shift using 4:1 muxers or a full shifter using 8:1 muxers. I know the full shifter is a little more work, but I think the option of having logical, arithmetic, rotate and rotate thru carry style of shifts will be handy. (Also nice that logical left and arithmetic left are the same, so we only need 7 inputs for shifts leaving one for a pass through)
My plan is to imitate James' work and make the ALU do shifts by just one position, because it simplifies things quite a bit. A proper barrel shifter would be a nice example for a separate functional unit, maybe one that takes multiple cycles. That could tickle the out-of-order execution quite nicely 😏
My idea for a full shift unit still only shifts by one position at a time, it just offers the full complement of shifts.
So by using an 74x151 8:1 muxer we have
Input (0) [0,0,0] pass through
7, 6, 5, 4, 3, 2, 1, 0,
Input (1) [0,0,1] logical shift right
[0], 7, 6, 5, 4, 3, 2, 1, -0,-
Input (2) [0,1,0] logical/arithmetic shift left
-7,- 6, 5, 4, 3, 2, 1, 0, [0]
Input (3) [0,1,1] arithmetic shift right
7, 7, 6, 5, 4, 3, 2, 1, -0,-
Input (4) [1,0,0] rotate left
6, 5, 4, 3, 2, 1, 0, 7,
Input (5) [1,0,1] rotate right
0, 7, 6, 5, 4, 3, 2, 1,
Input (6) [1,1,0] rotate left through carry
{7}, 6, 5, 4, 3, 2, 1, 0, [C],
Input (7) [1,1,1] rotate right through carry
[C], 7, 6, 5, 4, 3, 2, 1, {0},
[ ] actual/fixed value ( "C" = carry flag)
{ } bit into carry flag
-Struck out- bits discarded.
Oh I see, yeah that's pretty cool! I like the flexibility 👍. I was thinking about picking carefully what bit gets shifted in as a carry, which would allow you to do the arith/logic right shift, left shift, pass through, and rotate through carry. I don't think my solution allows for the plain rotates without carry though 🤔.
I guess it depends on what you think you'll need in the future. If you're pretty certain that you can get away with a simpler version, there's benefits to that, namely a reduction in parts count. My idea requires double the ICs at it's core. Trivial on a PCB, if a little larger, but not very breadboard friendly.
Yeah good point. On a breadboard the wiring would be a bit annoying. I might do a simple thing now and then a proper shifter later as a functional unit 🤔
Are you still working on this series? I know you might be working on or planning the next phase of the project, so I just wanted to check :)
Yes, still working on it! Got the next video almost done. Depending on the day job, my time to work on videos sometimes drops to almost zero for several weeks. But they'll be back soon 🙂
That’s awesome! Yeah, I figured that might be the case. I’m trying to plan a channel in the future but have to work on that around Master’s classes and my day job. Thanks for working on this project, it’s very inspiring 😊
Off-topic question: what is the name of the font used in terminal and sublime? Thanks
It's called "Iosevska". Love it 😊
@@fabianschuiki This font looks great, such readable and clean. Fits perfectly for compact assembler mnemonics. Thanks for your reply.
Does this system have the same considerations with loading 0 into a register vs xor clearing that x86 systems have?
Do you think you'll ever run into an issue like that in your project?
Right now it does not, because all instructions are 16 bits at the moment. The xor register clearing pattern in x86 as far as I understand is mainly to reduce code size. You could also move an immediate into the register, but that will cost 4 or 8 additional instructiom bytes for the immediate.
If I recall correctly, there's even some logic in the x86 instruction decoder to detect these patterns and turn them into proper "clear reg" micro-ops.
I am planning to introduce compressed instructions in the future. We'll probably want to have a set-to-zero op that is just one byte, assuming that is a fairly common instruction. 🙂
@@fabianschuiki I'm waiting as patiently as I can to see how your project develops 🤞
You're a fantastic inspiration to us all ❤
Thank you 😀!
I suppose at some point you will add comparison so you will probably use the result of the sub with the carry (< >). And some way to jump depending on the carry value.
Still guessing here, I suppose the add/sub will be useless as the addc/subc can do both and you will add some instructions to change the carry register…..
That's the plan 🙂. The add/sub is still useful though, because it lets you ignore the carry such that you don't need another instruction to clear it beforehand. Being able to change the carry is also useful regardless, because you might want to do context switching or multithreading, which requires saving and restoring the machine state at any point.
@@fabianschuiki as you are already struggling with the bits for the encoding of the ISA, I thought we could save a bit having only addc and subc ;)
Fair point! 😏 I do have a plan to deal with the whole instruction encoding and decoding thing a bit better. Need to get the ALU out of the way though. The nice thing about 16 bit instructions is that it theoretically offers 65536 individual instructions. However, 1-operand instructions will cost 16 such slots, 2-operand ones 256 slots, immediate-only ones 256 slots, and immediate-plus-reg ones 4096 slots. But there's quite a bit of encoding space to go around and play with. Maybe also some instruction compression? 😇
@@fabianschuikicompression like for instance an inst being an index to a rom where the internal logic in encoded ?
Yes I was thinking about something along those lines. For example, there could be 8 bit compressed instructions, which would run through a ROM which expands them to regular instructions. That would reduce the instruction fetch bandwidth, because a 16 bit line from memory could hold 2 compressed instructions, keeping the CPU busy for 2 cycles 🙂
This is such an excellent series, thank you so much. I built everything you taught us already in h.neemanns "digital" simulator and it works like a charm. Can't wait for #31 to continue and hope you keep going 🙂
Thank you so much! Great to hear that you were able to replicate the build in "Digital". That's very exciting 🙂. I'm almost done with the next video in the series -- hope to get it uploaded and released soon. Time to finish up the ALU and get back to some actual CPU architecture work 😉
@@fabianschuiki I will continue to implement it in "Digital", my ambitious plan is to use that as a basis for an fpga implementation later. Once I cleaned up everything you could add it to your repo if you like, maybe it's interesting for others as well to play with the simulator. Currently it's still a mess in layout but works very well. It should be even possible to use the output of your assembler to drive the CPU via the api from there and use it for debugging and single-stepping through the code with a view to all registers and so. In principal that's possible, the author h.neemann does that for his own cpu implementation.
@andreassteinhauser9508 That would be fantastic! 👍🙂
@andreassteinhauser9508 - I have being doing the same thing in Digital since the beginning! Great idea about using Digital's api, I hadn't considered that. Thanks for the idea! :-D I'm porting the assembler to C# though, I just prefer it to Python /shrug. Cheers!