Doesn't the "stack" idiom work in reverse to what's implemented? SP starts at top of stack memory. A "push" writes decrements SP, then writes; a "pop" reads from SP, then increments. (A byte or word at a time, appropriately)! Maybe getting Z8/680000 & 6502 mixed up - but I thought Stacks in general were always that way round?
You are. The entire universe is also on the head of a pin that's sitting on a table in the middle of an abandoned mental hospital. OOGA BOOGA Anyway, think about game theory and take a look at some of the systems and social ideals that you live under. You'll begin to feel the world unravel around you and see how meaningless and pointless your life has been.
How astonishing to find this YT suggestion ! I wrote a 6502/6503 emulator in 1987 in C on a PC-XT (8086). Both clocks of 6502 and 8086 were at 4Mhz. The emulation was 400 times slower than the real processor, but it was embedded in a debugger (MS C4-like) and it was possible to set breakpoints, survey memory values, execute step by step, a.s.o... Ahh ! nostalgia...
To anyone thinking about coding their own... Most processors, internally, use predictable bits of the instruction opcode to identify the addressing modes - because, the processor really needs to be able to decode opcodes fast, without having to 'think' about it! Understanding this strategic bit pattern can make writing a CPU emulator SO much easier! It's been a long time since I coded for 6502 ASM ... but, if you were to plot each instruction in a table, you'd likely notice that the addressing modes fall into very neat predictable columns. This means that you can identify the 'instruction' and 'mode' separately, which then lets you decouple the Instruction logic from it's Addressing logic. This 'decoupling of concerns' can really help shorten your code and reduce errors _(less code, as every Instruction-Type is "addressing agnostic" ... and less repetition, as each "Addressing logic" is only written once and is shared across all instructions)_ Just an idea for future exploration : ) Unfortunately, sometimes this bit-masking strategy isn't perfect, so you might have to handle some exceptions to the rule. *My experiences, for what it's worth...* Last time I emulated an 8-bit fixed-instruction-length processor... I wrote each instruction handler as a function, then mapped them into a function-pointer array of 256 entries. That way (due to ignoring mode differences) several opcodes in an instruction group all called the same basic handler function. I then did the same thing with the modes, in a separate array ... also of 256 entries. So, every Instruction was invariably a call to : fn_Opcode[memory[PC]] ... using the mode handler : fn_Mode[memory[PC]] That got rid of any conditionals or longwinded case statements... just one neat line of code, that always called the appropriate Opcode/Mode combination... because the two tables encoded all the combinations. Hope that makes sense ; ) Obviously, to ensure that this lookup always worked - I first initialised all entries of those tables to point at the 'Bad_Opcode' or 'Bad_Mode' handler, rather than starting life as NULLPTRs. This was useful for debugging ... and for spotting "undocumented" opcodes ; ) It also meant I knew I could ALWAYS call the function pointers ... I didn't have to check they were valid first ; ) It also meant that unimplemented opcodes were self-identifying and didn't crash the emu ; ) As I coded each new Instruction or Mode, I'd just fill out the appropriate entries in the lookup arrays. But the real beauty of this approach was brevity! If my Operation logic was wrong, I only had to change it in one place... and if my Addressing Mode code was wrong, I only had to change it in one place. A lot less typing and debugging... and a lot less chance for errors to creep in. Not a criticism though... far from it! I just thought I'd present just one more approach - from the millions of perfectly valid ways to code a virtual CPU : ) Understanding how the CPU, internally, separates 'Operation' from 'Addressing' quickly and seamlessly... is damned useful, and can help us emulate the instruction set more efficiently : ) But, ultimately, you might have to also handle various "ugly hacks" the CPU manufacturer used to cram more instructions into the gaps. By using two simple lookup tables, one for Operation and another for Mode ... you can encode all of this OpCode weirdness in a simple efficient way... and avoid writing the mother of all crazy Switch statements XD
I do agree with you. But I would say that even with my huge switch statement method, i do only write the address mode functions once and then reuse them. Secondly, it's still possible to screw up the lookup table method just as easily as the giant switch statement method (as you still have to fill the tables correctly) , so you would still have to do the unit testing for each instruction that i'm doing to really be sure. I would say the giant switch statement method i have here, is only really good for a small processors like this (150 instructions), it's very easy to read and easy to reason about. If i tried to do this for anything more complex like the Motorola 68000 then i would no way attempt it this way, i would certainly be using the method you are describing above. If you make each opcode into a switch case on the 68000 i'm pretty sure you would have multiple 1000's of switches.
@@DavePoo Oh, absolutely : ) But isn't that the wonderful thing about this project. There's a lot of ways you can go... all with their own little tradeoffs. I think the important thing is that people have a go - and don't be afraid to stray from the path and see where it leads : )))) I love that there are channels like yours, encouraging people to tackle things like this. 8-bit particularly tickles me, because it's where I got my start in game dev back in the mid-late 80's. Happy times : )
Ahhh! I just found my old code! The Generic 8-bit fixed-length CPU frame... using the opcode call-table so that you could load up CPU personalities as plugins. The plan at the time was to emulate all the 8-bit fixed-length families and their variants... but I guess life got in the way. I had a couple of almost identical z80 variants, an i8008 and i8080 and a classic (non-C) 6502 ... and, at some point I'd added a simple 8-instruction BF (brainf**k) machine from esolang. I'd completely forgotten writing most of this : )))) I love exploring old drives, some of it makes me cringe : )
@@garychap8384 At least your old drives work (or exist). I went back to my Amiga 600 to see if the first machine code game i ever attempted was on there, but the hard disk was missing, i think i must have sold the drive or the Amiga in the past and completely forgot. It turns out 30 years is a long time.
@@DavePoo Oh, that's such a shame : ( It's so sad to think of all the things we lose along the way. Not just the code, the files and the hardware... but the childlike wonder when we got our first 8-bit, or the thrill of making a modem connection and manipulating some machine at a distance. Even just groking peripheral ICs for the first time... poking at some addresses or registers and making things happen (or not) Bah! Growing up sucks : ) Still, I'm so glad I got to do it in the 70s/80s when computers were still a wild frontier and understanding ASM, and the bits on the bus, was the only way to get anything done in a reasonable time. Heroic days :D Now we all walk around with supercomputers in our pockets, and never give it a moments thought : / There's that quote about advanced technology being indistinguishable from magic... ... the unfortunate corollary of it is that the more ubiquitous advanced technology becomes - the less 'magic' there is in the world. Thanks for making your videos... stuff like this is slowly becoming a new field ... digital archaeology : ) Heh, I guess that makes me a relic XD Anyway, thanks for doing what you do.
@Brad Allen but it is nicely unit testable in a function. And INC is simple but it’s not just that you need to advance the PC the right amount of bytes, operate the flags. So a function that has an appropriate UnitTest is the more robust way to do it. And there’s another good benefit of doing this with dedicated functions. That’s that you have the logica detached from your interpreter. And can reuse it in other contexts. And in case of OO quickly override methods to facilitate a minimally different CPU. Like the 8080 and Z80. But indeed each their own, but I know that most companies, I’ve worked for, wouldn’t accept this in their code reviews :)
@@DavePooThe reason 6502 is not RISC, is because it’s not a load/store architecture and has many addressing modes. Some RISC ISA’s have complex instructions, but are still considered RISC because they have load/store architecture and few addressing modes.
@@DavePoobut still a lot simpler than the contemporary z80. DJNZ.. shadow registers.. funky stuff. Possibly more powerful for the assembly programmer, though.
My first CPU emulator in C was for a configurable VLIW-CPU back in the mid/late 80s ;) and that was not considered to be enough for my thesis .... Where do you study? ;)
It's very rare that I comment on the RUclips video, but this was an amazing find! Very clear and concise explaination on how to get started programming your own CPU emulator. Thank you for putting this together!
I was 14 when I tought myself to program on a C64. It took me 1 week to figure out that Basic was crap. So I basically learned programming using 6502 Assembler. Today I am a computer scientist, still having the C64 ROM listing from 1984 in my bookshelf. I learned so much from it.
BASIC is slow, so it's handy to know some commands to speed up a few things. There was a game for the TRS-80 CoCo which I ported across to GW-BASIC and I sped up one part of the game from taking roughly 30 seconds to taking almost no time at all, simply by choosing to use BSAVE and BLOAD instead of using serial file output and input as the original version of the game did. For anyone who wants more details: The game used an integer array to store the "levels" of the game, with the numbers representing what was in each location, so I just used BSAVE along with VARPTR to find the start of the numeric array along with calculating the size of the array based on its dimensions and the size of integer variables (not forgetting to add one to each dimension to account for array location 0). Once saved, the data could be reloaded into the array by using BLOAD along with VARPTR to find the start of the array. The size of the array was always the same, so there was no need to limit how much data was loaded.
In the early 80s I wrote several little programs for the 6502 in assembly language, just for fun, on my Apple II. It was always amazing how much faster this was than the same program in Basic language. The 6502 was really a simple design and easy to understand.
I did that on an ATARI 800 XL. While the C64 hat a 65C02 processor, the Atari had a 6502C (the C was a speed grading and denoted that the CPU could handle more than 2 MHz. Yet the Atari ran at 1.79 MhZ, almost twice the speed of the C64).
You use the term "clock cycle" for what is actually a machine cycle. Early processors such as the 6502 required multiple clock cycles to execute one machine cycle. The 68HC11 for example needed 4 clock cycles for each machine cycle.
what are you considering a "machine cycle"? i don't know much about the 68hc11, but one of the things that made the 6502 so awesome was one machine cycle (aka doing a thing, whether that thing be a memory fetch, instruction decode, alu operation, whatever) happened in one clock cycle- there was even a tiny bit of pipelining, though i'm fuzzy on the details- i think a memory fetch would happen in parallel with the instruction decode, so if an operand was needed it was already there by the time the instruction was ready for it. so a 2mhz 6502 was actually pretty close to a 8mhz 68k (at least in terms of fetching and executing instructions, ignoring differences in complexity of those instructions...)
@@MrFukyutube a clock cycle is the actual oscillator frequency, or crystal frequency, while a machine cycle is the internal cycle count which depending on the processor is between 1 clock cycle per machine cycle and I think the worst I saw was around 7. Intel Architectures (8080/Z80/8051 etc.) used to have higher counts where as Motorola and others including 6502 used to use multiple edges of the clock and so appear to be much faster on paper (in a instructions per clock cycle way) but ultimately those devices always had lower maximum clock (crystal) frequencies so ultimately difference was much lower. This link has a reasonable description: en.wikipedia.org/wiki/Cycles_per_instruction
On the 6502 if you go from an instruction which does 8 bit (zp) addressing to 16 bit , it needs one additional cycle. I dunno what the crystal has to with this. Typically, a TV crystal had 15 MHz. Bipolar JT reduce it before feeding the 6502 pin.
It’s been my opinion for a while that the designers of “C” made a mistake when they didn’t define the sizes of the int types. I mean what good is an “int” if you don’t even know if it can hold a value of 75000?
Thanks. I said in this video i wasn't going to write the whole thing. But i realised nobody had done the whole thing on video before, and i thought it would be good for people to see how much work it could be to get something like this working.
not only that the SP is only a Byte it also "grows" from top to bottom. so you need to decrement the SP when putting things on the stack and increment when pulling data. also the Reset vector is an indirect jump, so you don't start executing at 0xfffc but you take the address that is stored in 0xfffc/0xfffd and start executing at this address. it is the same for the other vectors as well.
I handled the SP as a byte in a later episode, as well as the direction the stack shrinks. I never really handled the reset vector properly. I think i would get round to that once i started emulating a whole computer system.
@@DavePoo As I saw the video it was not clear to me that this is a series. I've done a lot with the 65(C)02 in the past. I built my own computer from scratch called MOUSE and even used a 6502 emulator on Arduino boards to create an emulated version of my computer (MOUSE2Go). I like the "start simple and evolve" approach, because if our overplan it you might never start due to the complexity. But starting simple gets you into getting simple things work and than improve. SO now I'm curious how this ends :-).
I actually wrote a 6502 emulator in C on my Atari-ST (68000 CPU) in 1987. I was quite proud of it. It used a kind of jump table for the actual instructions. I made an array of pointers to functions, and used the content of the instruction register as an offset into this array to call each Op-code function. For example at A9 was a pointer to the LDA immediate mode function. I started off writing a cross-assembler, and then wanted to test the resulting machine code and so wrote the emulator for it. Amazingly, after all these years I still have the source code!
@@NOBODYUSEFULL Well, 34 years on, I expect there are quite a few embarrassing things about it, and remember, a state-of-the art 68000 Lattice-C from 1987 is going to have issues. But here goes...
@@NOBODYUSEFULL /* a 6502 simulator and assembler in Lattice C M.Stent Dec. 1987 */ long *jt[256]; /* jump table */ unsigned char memory[0x10000]; /* cpu registers */ unsigned char a; /* accumulator */ unsigned char x; /* index reg x */ unsigned char y; /* index reg y */ unsigned short pc; /* program counter */ unsigned char sp; /* stack pointer */ unsigned char p; /* status reg */ unsigned char ir; /* instruction register */ unsigned short fpaddr; /* front panel address reg. */ unsigned char fpdata; /* front panel data reg. */ unsigned short ruaddr; /* run stop-on-address */ unsigned char ruinst; /* run stop-on-instruction */ int ruclk; /* run stop-on-clock */ /* definitions for status reg. p */ #define CMASK 1 #define ZMASK 2 #define IMASK 4 #define DMASK 8 #define BMASK 16 #define VMASK 64 #define SMASK 128 /* inverse masks */ #define NCMASK 0xfe #define NZMASK 0xfd #define NIMASK 0xfb #define NDMASK 0xf3 #define NBMASK 0xef #define NVMASK 0xbf #define NSMASK 0X3f long time; /* cpu clock */ int clock; /* display clock */ /* here I leave out a lot of stuff connected to the display on an Atari ST but here is the core of the matter...*/ void execute(func) void (*func)(); { (*func)(); }
This is an easy and fun way to get a handle on how microprocessors work. There were no books on the Motorola 6800 except for the one intended for computer specialists when I started. I must have read that book twenty times before I had a clue what it was talking about. No hobby machines existed, and I was a Mechanical Engineer with a final year Degree project to control a machine using a D2 Evaluation Kit. To say it was a struggle would be a hell of an understatement. With no assembler, the opcodes had to be looked up and entered using a Hex keypad using a debug monitor program. A hard way to learn, but something you never forget. You guys have it so easy!
Thanks, i think i said in this first video somewhere that i wasn't going to write the whole CPU emulator, but in the end i went through and started doing the whole thing. I think one of the main purposes is to show that when you are writing a program, what you end up with is not always what you started with. The emulator code evolves and changes as the videos progress.
I did a lot of assembly programming on this architecture. Since then I've designed many more complicated processors, and each of them is first done by creating a program to emulate it's function on a clock by clock basis to turn the design into real logic. What you've done here is a behavioral simulator but cool to see it done on the fly
65C02 got branch always instruction. All microprocessor jump relative for small instructions despite large total memory. Fast page memory came 1987 with the 386. Also: is cycle time part of the ISA? The instructions set lacks all the goodies from 6900 . 16 bit stack. B register. I want “do for A and then repeat for B” versions of LDA ADC STA . And TAS and TSA. ADC AB, imm8 sign extended ( like branch).
Love this. When I was in University in the 1980's, we had to write a microcode engine to implement instructions for the 6809 and get a simple program to execute on it. We had to write the microcode for each instruction. We were given the microcode instructions for the RTL Register transfer language. You could create a microcode engine that could then run any instruction set on top of it! Set the microcode engine up as a state machine to make life a bit easier. At the time we were actually using an IBM/370 and the VM operating system so we each had our own virtual machine. but the microcode engine had to be writeent in 370/assembler and boot as a virtual machine on the mainframe! These days the average PC is capable of this with relative ease!
Great story. Yeah, not only an average PC is capable of this, but even a below average smart phone could emulate this now. I think it's amazing that we now all walk around with super-computers in our pockets and totally take it for granted.
@@DavePoo The best part was we never realize that this super special virtualization technology would become so prevalent back then. It was just what we had to use to get the assignment done. We never sat back and thought about just how much power we had or what would happen to it!
29:00 $fffc is a vector, so if those bytes are loaded there the 6502 will load the PC with $42a9 and try to execute that memory, which contains $00 (BRK) at the moment.
Really interesting! Thanks for the video. I see people have mentioned about std::uint8_t and std::uint16_t, but in C++17 onwards there is also std::byte in the cstdef header which you can use. It also can be passed to a non-member function std::to_integer(std::byte b) for a more numeric output if you're debugging the byte values.
Surprisingly it worked despite a bug in FetchWord with cycles++ when it should be cycles- - Also you should implement the instruction vs mode table to simplify it dramatically. By masking on the opcode bits you can then use a switch statement for the addressing mode. It would reduce the combinations to 23 instruction switch statement and 8 addressing functions. Btw the pc++ wrapping to 0x0000 is legal so as long as mem is mem[16k] it’s fine. I hope this isn’t taken as armchairing. The video was fun to see.
I did something similar to learn about the 6502, specifically the 65C02. but i didn't write an emulator, i built the whole CPU in a Logic Simulator. the end result is the same, you get a better understanding of the hardware. and it was quite fun.
@@DavePoo thanks. something a bit more direct to the video: around 5:13 why did you define a byte and a word instead of just using uint8_t and uint16_t? the "_t" types are made to be universal across all C/C++ compilers and architectures. also, the endianness of the platform shouldn't matter if you just have 2 temporary 8 bit variables instead of a single 16 bit one. and i assume in later videos you fixed the thing where the CPU starts executing from 0xFFFC? because that's not where the PC starts at, but rather at 0xFFFC and 0xFFFD is the address that gets loaded into PC before the CPU starts executing. it's like a hardwired jump indirect. either way your video made we want to try this for myself as well, but i'll try it in C instead of C++.
On a similar theme, back in the 1980's I wrote an assembler/disassembler pair for the Z80 microprocessor that ran on a Pyramid minicomputer. I used it to work out the full functionality of UHF radio scanner that had a Z80 and associated IO chips as it's central control. I dumped the radio's 16k byte EPROM into a file containing a long string of HEX pairs, disassembled it and printed the result. Then spent a few days looking at the printout and filling it with comments. Made my modifications, also adding all the comments to the disassembled program and used my Assembler to create a new HEX file ready for EPROM programming. Started up the radio and all my mods were working as planned. They were fun days. I doubt I could do what I did back then with today's systems.
I have tried to do similar thing for years, not with UHF radio but with another kind of ROM. Anyway I couldn’t to do. My level of knowledge is low and I think I am a bit lazy . . . he he he
Some embedded devices leave debug ports open that can be exploited to read/write data from the system, but it's certainly a lot harder than pulling out the EPROM and dumping the code out. Now you have to get lucky to even be able to see the code without very specialized tools. Low Level Learning has a video where he reads/writes data onto a baby monitor using an Arduino on its debug port, then used that to run arbitrary C code.
Takes me back to my youth where I used to dabble in M680x0 assembler. Not only was M68k assembler fun to work with but it was a blast to cycle (instruction order, instruction type, addressing modes) and instruction pipeline optimize (mainly try and prevent pipeline flushes that would eat cycles due to having to load a stream of new instructions from memory) the code in order to make it as fast as possible. With the advent of caches, branch/jump prediction, vector instructions etc. things have gotten quite more complicated of course. I wouldn't bother to hand optimize assembler code nowadays and let the compiler do it instead. Never the less, I'm still of the opinion that getting to know how a processor works on such a low level is still very valuable for any programmer and can not only help in debugging but also improve the understanding of high level languages and how they translate into assembler.
I tried to hand-optimize assembly code on a risc PPC601 (after learning on a 6502 and then a 68020). It was very complicated, and I am sure I didn't handle all the interdependencies correctly, but trying to achieve this teaches a lot. So trying to do it a couple times is quite worth it, I think. I am now, after a hiatus of 15+ years, playing with assembly on the ARM Cortex-A (in my Samsung tablet), and while the risc approach is familiar, the complexity of the processor has become astounding. The manuals covering a high-level view of the processor alone are hundreds of pages.
Yeah, some industrial coding standards actually require using it. If you make it a habit, your code will always be portable between different architectures - at least concerning POD type sizes.
@@272zub Ah yeah? Proof that! This mechanism is to aid in platform independent programming, because it is in fact NOT standardized and machine dependant what Dave uses. These storage modifier keywords are platform dependant. What you call "Miserable" (uint_8t, etc.) does in fact translate to the same instructions on Dave's machine. So your efficiency claim is a fallacy. It is just good style to use them. Especially when emulating foreign hardware. For example, look at the code from the professionals at github.com/stuartcarnie/vice-emu/blob/master/vice/src/arch/XXX/types.h. They have to define for EVERY system what to use. That was a design decision from the start and that project is very mature. In contrary look at the very new github.com/commanderx16/x16-emulator project. They use proper platform-independent code and save a huge amount of code. @272zub you can't generalize it this way. If this facility is there, why don't use it? What you say is an edge case and is only true in special cases. In addition, what you tell affects the code running on the (compiler-)TARGET. But here the function of this facility is a data-type related to the emulated machine (on the HOST). You are wrong on several levels. See stackoverflow.com/questions/6144682/should-i-use-cstdint ... I think that is what is related to your thoughts and what doesn't apply here. minasteros: #include ... Its C++ :) en.cppreference.com/w/cpp/header/cstdint
@@dieSpinnt Hold your horses. :) I think you missed the "where it's not needed" part in my reply. If you need a fixed-size integer, e.g. a 16-bit unsigned integer, then by all means do use cstdint (or stdint.h when in C). It's so much more better than either using an unsigned short because it happens to be 16 bits on your platform, or than making your own half-baked stdint. Clearly when writing an emulator, like it's done in this video, you will often needs such fixed-size types. I am not disputing that at all. In reality the types from cstdint are just the correct typedefs to some of the "normal" integer types, e.g. on some platforms it is that uint16_t is a typedef of unsigned short. So using the uintNN_t type of course is exactly the same as using the correct "normal" type. What I didn't like was @minastaros' suggestion to use the fixed-size types everywhere. In the extreme this means don't use an int at all, always use (u)int_NN. And that is where my "less efficient" comment applies: godbolt.org/z/7obs5s - if you use uint16_t when you don't explicitly need it, and an int would have been a good choice - you can see that the 16bit version is actually more complex than the 32 bit one. And that the normal int version is the same as the 32bit one. And by "Miserable" standard, I didn't mean the C++ standard. I meant en.wikipedia.org/wiki/MISRA_C and especially it's C++ evil cousin, which, to me, is how C programmers, who don't know C++, get their revenge on C++ programmers... By the way there are also the types (u)int_fastNN_t and (u)int_leastNN_t which could offer the best from the both worlds: Guaranteed minimal size while still being as efficient as possible. As their size is not guaranteed, that can't be used when a specific memory layout is needed though.
Very interesting. Earlier this year I was wanting to expand my knowledge of Java and went through a similar exercise. I had a Heathkit ET-3400A microcomputer a long time ago, and I wrote a functional ET-3400A emulator that runs the ET-3400A ROM in an emulation of a Motorola 6800.
At 5 minutes into the video, it is stated "So this is where you have to know exactly the size of a certain type on your platform or compiler". Doing this creates platform-specific code, which only works on platforms with the same type sizes. Instead, it is better to use the platform-independent types that are declared within . Specifically, the followings lines in the author's code: using Byte = unsigned char; using Word = unsigned short; should be something like: typedef unit8_t byte_t; typedef uint16_t word_t; It's debatable whether a using or typedef statement should be used, but the key thing is the use of uint8_t and uint16_t from .
Yep, i could have used those, but i was careful to use my aliases everywhere so it's trivial to fix them later. I prefer Word & Byte to everything having an _t on the end. Not sure why they did that. Would it have been so bad to call them uint8 and uint16 instead? There is no difference in typedef and using in this case other than the syntax, but i prefer "using" to typedef. unsigned char is guaranteed to be 1 byte anyway by the spec.
I think it probably was for me too, but only via BASIC, i think it was a BBC Micro from school where i wrote the classic "i was here" then made it loop. The first chip i programmed in machine code was actually the 68000 (the Amiga)
@@PWingert1966 oh, the 6809 was a REALLY nice cpu to code in assembler. Really good support for higher languages too. Used to code on a 6809 system with MMU so it had 512 KB ram, and run a multi task OS called OS9. We run 8 concurrent users on each system, we got two. Could dynamically load and unload drivers, way advanced system in the mid 1980:th. :-) I think that only nicer CPU I have worked with in this low level was PDP 11 which had a really nice orthogonal and symmetrical instruction set, much like 6809.
Back in the 80's I purchased a computer kit from a company named "Southwestern Technical Products", out of California. It was the first and only computer I ever built. Had to solder every component (capacitors, diod's , resister's, and even the ram chips, a whole 4k worth. It took about a month to get it all done. I never built another computer since.
Memory-mapped I/O is still very much in use. A large amount of memory address space on a modern PC is used, e.g., by your video card, which is why 32-bit windows would only have ~3 GB available to applications on a system with 4 GB of RAM installed.
Virtual memory is just at the user program level. The OS kernel still has access to the physical address space (IIRC by mapping that to the _kernel's_ virtual address space) and it manages assigning the I/O memory to device drivers.
4 GB limit was for 32 bit processor... its much higher now. Anyway it was lower 2GB of memory (addresses 0x00000000 through 0x7FFFFFFF) for application programs in windows rest above 2GB (addresses 0x80000000 through 0xFFFFFFFF) was system space normally.. where kernel or all I/O ports or DMA memory resides. But u can specify a boot time option in windows so that lower 3GB is for applications and just upper 1 GB for system. Now from processor point of view 32 bit processor supports 4 GB of memory normally but with the help of virtual memory mechanisum (paging) and use of PAE or PSE flags it can address 64GB of memory. Windows servers might be supporting that mode I guess. Basically last four digits are assumed 0 ...so all in all 36 bits instead of 32 bit.
Seriously impressive video and very informative. And someone who actually does know C++.I've been around software dev alooong time and when asked 'do you know any C++ devs' I always reply 'I know quite a few who *claim* to be c++ devs'. The rarest of beasts I think.
I would argue that there is no person on the planet anymore that can truthfully say "I know c++", considering the language isn't even _designed_ by one person anymore. Even making the question more constrained, eg "do you know the _syntax_ of c++?", even then, the answer will always be "no". Besides, what is the point of asking such questions when there is no reference, 100% compliant, verified, compiler? (there exists exactly one verified _c_ compiler that supports _almost_ all of c11)
Long ago, I wrote a Z80 emulator in x86 assembly. That's a good way to gain a thorough understanding of two different processors at once. I did it a lot like you did in this video but I put the more commonly used opcodes near the top so the emulator wouldn't have to do as many checks on average as it went through the list. I've since wondered if it would've been faster to check the opcode one bit at a time, thus guaranteeing that there are eight comparisons per opcode rather than fewer comparisons for common opcodes but over a hundred comparisons for rare ones. (Unlike with the 6502, Z80 opcodes aren't all 8-bit, but you get the point hopefully.) Or maybe there's an in-between solution that's ideal. I'm glad you have more of these videos so I can see how you do it. The eventual goal was to make a Sega Master System emulator, but I realized I was in over my head. Emulating the processor seems pretty easy compared to emulating a graphics adapter that's totally different than the machine on which you're running your emulator. Old games would often be timed to the h-sync and v-sync signals from the CRT, which don't exist on modern computers, and sometimes the program would write to the background color register just as the electron beam was at a certain horizontal position on the screen to make the background more than one color. How do you get your emulator to realize the background color was changed when the hypothetical electron beam was halfway across the screen, so the left half needs to be one color and the right half needs to be the other color? Things like that are why it's really hard to make an emulator that works with all software.
Thanks, i've got several comments on this now. I made sure to use my own "using" definition for all the types, so it's pretty trivial to change this at any point.
Did a FULL implementation back in 1986 in C to simulate machine tools controlled by a 6502. It was cheaper to test code on a PC before putting it into a tool than it was to put code on the tools and have it break something. Probably would have been easier in C++ as you could model the various components of the CPU as objects.
Those were the days... Throwing code that actually DOES something is so much more rewarding than crunching rows in a DB and spitting out a PDF report. Tony; just curious, long did it take you to do the emulation code?
@@sempertard better part of a couple of months. I was working part time and still working on my CS and EE degrees. So maybe 20 hours a week working while going to school.
ha! I wrote this program for 65816 back in 90's when I was interested in some aspects of snes workings. Really one thing I learned is all the addressing modes I didn't know about in 6502
Pretty cool. I did the same thing back in the early 90s using Borland Turbo C++ 1.0. I based it on the book 22 microcontroller project you can do at home, or something like that.
Great content and great video! 👏👏👏 Here are a few suggestions and things I noticed and would like to point out: 💡 Instead of relying on the width of primitive types for the host platform, using with its uint8_t, uint16_t and so on will make your code more elegant and platform agnostic; 💡 The stack pointer (SP) should actually be 8 bits wide. The 6502 will add 0x100 to it, thus making it able to range from 0x100 to 0x1FF; 💡 Upon reset, the 6502 will not set PC to 0xFFFC and execute from there. Actually, it will set PC to what memory location 0xFFFC points to (least significant byte first); 💡 For your FetchWord() implementation, you don't really have to worry about the endianness of the machine you're compiling your program for. That because endianness affects how numbers are laid out in memory only, and the 6502 will be little endian regardless. Numbers _per se_ and how you handle them will be the same regardless, thus (v
Thanks. The stack pointer was fixed in a later episode -> ruclips.net/video/i5JVCHSNxJY/видео.html . I don't think I ever got the reset correct, but it wouldn't really affect this implementation as I'm not actually getting a working computer (just a CPU). You are correct about the Endian thing, it was fixed here -> ruclips.net/video/i5JVCHSNxJY/видео.html
Ive coded my 6502 emulator in C, so very similar to yours. You've probably fixed this later, but just mentioning that the stack is an 8 bit register and it starts from FF downwards. The actual memory used is from 1FF to 100 (so the processor adds 100 to the register value). And remember, you will need to store the stack pointer register in the stack itself, as a single 8 bit byte. Looking forward to watching your next videos.
Great :) I remember doing it as a Msc. comp-sci student - we were emulating the Motorola 6800 using C and assembler. Deep nerdery but fun and very satisfying being able to run stuff on the finished emulator.
of course that only works if the software(usually written to the documentation) works as the actual hardware does, certainly not the case with the original 8086, the manual perfect 86 chip clones weren't perfect to the actual intel chips. not a enviable position imo, to try and insist to the hardware lot they messed up as i can fully see they would swear black and blue its the emulator that's bugged.
And now we all have a *hugely* greater appreciation of the folks that have written the C64 emulators and NES emulators and PS1 emulators that we run on our RetroPie machines! :-)
I spent four years programming a 6502. One of my last application versions would overflow the 2K EEPROM by one byte, but I could manage to shrink the last program by one byte.... by modifying a jump back so that it jumped to the previous byte, which was the second byte of a 2-byte instruction but happened to be the right opcode I needed next! This chancy patch let me deliver the application without a thorough revision of programs to find out whether I could squeeze a byte off one of them. For subsequent versions, I had to modify the hardware and two 2K EEPROMs were installed in the one EEPROM socket, one above the other, all pins but one (the strobe pin, there working as the 2K-page address bit) correspondingly soldered together.
when I was in college (1980s), the assembler class I was taking didn't include how to do output but instead had us dumping memory [to paper] and then highlighting and labeling the registers, and key memory locations. I do recall reading files at some point because, due to a bug, I corrupted my directory structure and lost access to my home dir. Thanks to a brilliant Lab Tech (he was like 14 or so and attending college), my directory was restored. I couldn't say if that was from a backup or if he fiddled with the bits to correct the directory but I'm pretty sure it was the former.
Excellent, I think it's good to know at least a little about how a computer works inside. It takes away a little of the mystery but once you realise all the things it's doing and even a CPU like this which is so old is doing operations at lightning speed. Computers are really a modern miracle.
the interrupt vectors dont actually get executed when they get triggered, but instead go to the address that they point to. So for example, when you were testing LDA ZP, you have 0xFFFC as 0xA5 and 0xFFFD as 0x42, the processor after resetting would real that and set the program counter to 0x42A5 and then begin program execution.
This is really cool. I think I'll try this myself. One idea I had that I think will make things a lot easier is to make an array of void pointers for the instructions which you can then assign to addresses of c++ functions that you write to do what each instruction needs to do. Then all you need to do is call the function at that address. Since all of the required information is going to be at the program counter, you can just have the function itself grab the data it needs, advance the program count, and decrement cycles accordingly. You could actually then stub out every instruction available to the 6502 without actually implementing them and just implement them one by one, only having to fill in the function and nothing else.
Yep, that's one way to do it but it wouldn't be much different to the switch statement in the end. You could create function pointers to the relevant addresssing mode and cycle counts and the actual instruction and then decode the instruction and lookup the correct functions to call. If i was to implement a more complex CPU (such as the 68000) then this would be the way to go as the number of addressing modes and registers goes up dramatically, which would make the switch statement approach too cumbersome.
@@DavePoo So after playing around with this a bit, I decided the best way to do this may actually be to create a control word lookup table. Then I only really need to write functions for the instructions that use the ALU. I've starting writing to table and handling for microcodes in my CPU class but I'm trying to keep the control word to 16 bits and I currently have 19 flags I'd like to have. I can't seem to find any documentation on how the 6502 handles control logic only that it apparently handles microcodes "differently than modern cpus" (Yeah the article I found was very vague about that). Any ideas on how I might be able to handle this? Personally the switch statement approach is already too cumbersome for my liking and the cycle counts already add quite a bit to that complexity so I'd like to avoid it if possible.
14:00(ish) The 6502 does no initialisation itself outside of using the vector ($fffc) to provide the code to start executing (and set the I-flag): its registers (and memory) can only be assumed to hold random values (except the I-flag which is set to prevent any IRQs) - it is up to the start code to set whatever is necessary, eg clearing memory. The first instruction of the called program should be to [re]set the stack pointer: LDX #$FF (or whatever value the system designer wants during reset), TXS.
I was just about to post this. He shows 6502 code in the rest routine to initialize the stack pointer and decimal flag, and proceeds to hard code this into the power-on reset hardware sequence instead. Just wrong.
20 years ago, the very famous device in China which called 文曲星(Wen Quxing) NC2000 series, 6502 CPU with GvBasic app, which I started the opcode from. So cool
if you ever want to understand C , low level programming , CPU architecture and how to read their specification documents , all this in one project , writing a 6502 emulation (eventually add PPU emulation ) software is just the perfect thing to do... I'm even astonished we didn't have this as a 5 months project in university ... it just reveals every flaws in your understanding of pointers and bit operations . Like , if you don't consider yourself so much a beginner and start to get a good grasp of basic concepts in C , I very highly suggest trying to emulate a NES ... first , because there's massive documentation on how to do it , there's a community that can help you , and you are going to gain a massive knowledge from it. There's two projects that made me very comfortable in C , I did a CPU nes emulation back in the days , also in C++ and SFML , and an OS kernel. trust me , the time you put in those kinds of project will massively pay , and it will be waaaaaaay better than any of the projects you get from schools and university.
08:30 You are missing the unused (expansion) flag of the 6502 in bit 5 between B and V (I'm assuming your compiler assigns the bits from the least significant bit 0 upwards). Without this bit some processor status manipulations (such as PHP, PLA, play with bits, PHA, PLP) could fail as V and N would be stored in the wrong bits.
This is the start of a 6502 emulator. Just a few points to make. If PC is a 16-bit value the wrapping around of addresses will be taken care of automatically. The handling of the reset vector at 29:32 is wrong. On reset the PC is set to the address stored at FFFC/D and execution starts at that address. The comments said that the SP wasn't incremented in the JSR implementation. It needs to be decremented. The stack works from the top down. I'm sure these things will be caught and fixed in the next video.
Brother i built this with your instructions and i didn't understand a THING. but when i ran this code in VICE, man did it feel good. im going to be doing this alone now, everyday all the time bro. thanks
I've done something like this before, just watched to see how someone else would do it. Instead of writing code for every instruction you can find a pattern in the bits of all the instructions, and make lookup table to indicate which instruction and addressing mode and flags and cycles are used. Then you don't need code for every instruction. That's how the real chip actually works I think.
There is a table of instructions here, and maybe you can see that most of them are organized in a pattern, just a few look out of place. www.masswerk.at/6502/6502_instruction_set.html
@@DavePoo I would probably argue that it is easier to see the pattern on an early 8-bit CPU like 6502 then on a 68k, Intel 8086 etc or a modern RISC V CPU. ;-) (Even though the RISC V is a orthogonal CPU design, like 6809, 68K and PDP 11. Real nice CPU to code machine code in. :-) )
thanks mr Poo. this is right in the crosshairs of what i needed to learn. it seems 6502 was everywhere back in the day. didn't realize the atari and NES were the same freaking processor
I like your opening about how knowing the 6502 is relevant to modern processors. I learned assembly on the Commodore 64, and when I moved to a PS/2 and the 80386, apart form learning segmented memory and real mode vs. protected mode, everything was about the same!
41:46 The endianess of the host is irrelevant unless you are using non-char types to read or write the "memory" (an array of char). On a bigendian system the memory byte array is still going to be in little endian form - it is just a byte stream. If you were going to use the endianess of the host cpu I would expect you to cast the memory from char to word and then read/write the word in one go.
Coding style beauty is in the eye of the beholder, but man, I dropped in as a first time viewer and can't forgive the spacing, the bracing, the TitleCasing... oh my!
Nice, I wrote my own 6502 emulator in c# in spare time in just two days. One class contained 151 methods corresponding to the mnemonics which emits code to the memory. So JSR emits three bytes and increments the PC. Then the emulator can emulate the actual 6502 code (without cycle counts). Works like a dream.
Don't call it 'wierd' either! :) It was a creature of its day. The Computer History Museum has audio histories with both Bill Mensch and Chuckle Peddle about why they did things the way they do. For example Peddle was given a 'budget' of 3000 transitors. The whole thing is remarkable for 1975 and dropped the price of a micro-controller (they didn't 'say' processor) to one tenth of Motorola/Intel. Peddle had also read work (later relied on by Stanford RISC) that suggested a collection of about 50 'useful' instructions for which you could do EVERYTHING. They were right. Fascinating video.
Good memories - one of my first CO-OP jobs after my 2nd year of comp sci was to create a Z80 emulator for an aerospace company. I seem to recall it being able to run about 1000 instructions a second .. back in the dark ages ;) You are right you really learn... I still remember the DAA instruction .. god...
@@DavePoo It was used by that company to test their software before moving it onto real hardware. I dont know if they ever sold it. Was a fun project, my supervisor just left me alone and I gave a demo every friday to the team. Wrote a users guide when I left and had an office overlooking Vancouver from Burnaby Mountain.
Wow. This took me back. Haven't used C±± in over 20 years. Very similar to c sharp but watching this video, reminded me of all the differences. Thanks for this.
@@DavePoo Man I gotta say I'm impressed. I've recently been getting involved with microcontrollers and this project seems like a lot of fun! It makes me remember why I love Computer Science :) (Although this project may be beyond my current skillset)
Some nice bare virtual metal action! It should help some of the younger people get a feeling for how a CPU basically works. (I grew up on bare metal coding (not 6502), I had a NASCOM 1 (Z80), with about 900 bytes of available RAM ... and no luxuries like BASIC, or even assembler! - lots of hand-coding resulted in me memorizing all the Z80 opcodes!) I think you are overcomplicating things by passing so many parameters with most of the function calls. It would be cleaner for CPU to keep them as internal properties (along with cycles, current opcode etc ... and anything else useful for CPU to manage its state). The memory array could be created internally, or passed during initialization. I also think it makes more sense having cycles count the cycles executed (by incrementing), not cycles to execute (and decrementing). And ... that switch() is going to get very large! ... maybe time for an array of functions if someone is going to implement all the 6502 instructions :)
@@DavePoo At 13:45 you made the memory thing and I am not used to the C++ language so I would like to know how it works because I am watching this and making my own version in C# based on yours in C++
The 6502 has three user registers (A, X, and Y); however, it also has several internal registers including the flags register, the address register, the instruction pointer, the stack pointer, and a scratchpad. The most important thing about the 6502's zero page is the zero page addressing modes; these modes basically turn the zero page into 128 16-bit pointers. The stack pointer is only 8 bits because the high byte is always hard wired $01... and the stack pointer counts down from #$FF.
The Reset (and BRK and IRQ) Vector addresses are where a Word containing the address to load the PC should be; not where the PC is set to execute from... So - the program should be loaded elsewhere in Memory e.g. 0x1000 (0xA9, 0x42...) 0xFFFC = 0x00, 0xFFFD = 0x10 (little endian) - PC is loaed from the reset vector = 0x1000 to start execute!
8:15 does that work nicely with memory management? I remember back back BACK when I was young, we used to do structs with was very careful with where we placed nibbles and bit flags, so that we kept them in chucks on 8 when definding the struct, so if we did `bit, byte, bit` it would take up 3 bytes in memory, but if we did `bit, bit, byte` it would only use 2 bytes. Does `Byte C : 1; Byte Z: 1; ` do the same niceity where it knows you are only using 1 bit of a whole byte, it can squeeze them all into the same byte?
BTW, the SP (stack pointer) should only be a Byte (8bits) not a Word (16bits)
Correct! and I got around to fixing it in#8 ruclips.net/video/i5JVCHSNxJY/видео.html
@@DavePoo Cool, keep up the great work!
O.O Pinned? Wow! I feel honored! Especially on a 2-month-old comment!
Doesn't the "stack" idiom work in reverse to what's implemented?
SP starts at top of stack memory. A "push" writes decrements SP, then writes; a "pop" reads from SP, then increments. (A byte or word at a time, appropriately)!
Maybe getting Z8/680000 & 6502 mixed up - but I thought Stacks in general were always that way round?
@@SeanPearceUK Yeah, that's what I'm familiar with. Probably fixed later (maybe in #8 with the Byte vs Word issue?)
The CPU is happily executing code and admiring the amazing world around it, when suddenly thinks to itself, "What if I'm living in a simulation?"
Meanwhile, from the mind of the CPU of the higher plane: "What if I'm a simulation?"
@Check the new Futurama
Or, "What if I am hosting a simulation"? And, "Let's find out how hospitable my guest OS really is..."
You are.
The entire universe is also on the head of a pin that's sitting on a table in the middle of an abandoned mental hospital.
OOGA BOOGA
Anyway, think about game theory and take a look at some of the systems and social ideals that you live under.
You'll begin to feel the world unravel around you and see how meaningless and pointless your life has been.
@@incognit01233weirdo
How astonishing to find this YT suggestion ! I wrote a 6502/6503 emulator in 1987 in C on a PC-XT (8086). Both clocks of 6502 and 8086 were at 4Mhz. The emulation was 400 times slower than the real processor, but it was embedded in a debugger (MS C4-like) and it was possible to set breakpoints, survey memory values, execute step by step, a.s.o... Ahh ! nostalgia...
That's awesome!
Did u use it to crack games?
@@migueld2456 No, only to help development
@@philippelepilote7946 Now if you wrote it in QuickBasic 4.5 , it would be even more impressive :lol:
Wheres your youtube channel showing us! We need to know!
To anyone thinking about coding their own...
Most processors, internally, use predictable bits of the instruction opcode to identify the addressing modes - because, the processor really needs to be able to decode opcodes fast, without having to 'think' about it! Understanding this strategic bit pattern can make writing a CPU emulator SO much easier!
It's been a long time since I coded for 6502 ASM ... but, if you were to plot each instruction in a table, you'd likely notice that the addressing modes fall into very neat predictable columns. This means that you can identify the 'instruction' and 'mode' separately, which then lets you decouple the Instruction logic from it's Addressing logic.
This 'decoupling of concerns' can really help shorten your code and reduce errors _(less code, as every Instruction-Type is "addressing agnostic" ... and less repetition, as each "Addressing logic" is only written once and is shared across all instructions)_
Just an idea for future exploration : )
Unfortunately, sometimes this bit-masking strategy isn't perfect, so you might have to handle some exceptions to the rule.
*My experiences, for what it's worth...*
Last time I emulated an 8-bit fixed-instruction-length processor... I wrote each instruction handler as a function, then mapped them into a function-pointer array of 256 entries. That way (due to ignoring mode differences) several opcodes in an instruction group all called the same basic handler function. I then did the same thing with the modes, in a separate array ... also of 256 entries.
So, every Instruction was invariably a call to : fn_Opcode[memory[PC]] ... using the mode handler : fn_Mode[memory[PC]]
That got rid of any conditionals or longwinded case statements... just one neat line of code, that always called the appropriate Opcode/Mode combination... because the two tables encoded all the combinations.
Hope that makes sense ; )
Obviously, to ensure that this lookup always worked - I first initialised all entries of those tables to point at the 'Bad_Opcode' or 'Bad_Mode' handler, rather than starting life as NULLPTRs. This was useful for debugging ... and for spotting "undocumented" opcodes ; )
It also meant I knew I could ALWAYS call the function pointers ... I didn't have to check they were valid first ; ) It also meant that unimplemented opcodes were self-identifying and didn't crash the emu ; ) As I coded each new Instruction or Mode, I'd just fill out the appropriate entries in the lookup arrays.
But the real beauty of this approach was brevity!
If my Operation logic was wrong, I only had to change it in one place... and if my Addressing Mode code was wrong, I only had to change it in one place. A lot less typing and debugging... and a lot less chance for errors to creep in.
Not a criticism though... far from it!
I just thought I'd present just one more approach - from the millions of perfectly valid ways to code a virtual CPU : )
Understanding how the CPU, internally, separates 'Operation' from 'Addressing' quickly and seamlessly... is damned useful, and can help us emulate the instruction set more efficiently : ) But, ultimately, you might have to also handle various "ugly hacks" the CPU manufacturer used to cram more instructions into the gaps.
By using two simple lookup tables, one for Operation and another for Mode ... you can encode all of this OpCode weirdness in a simple efficient way... and avoid writing the mother of all crazy Switch statements XD
I do agree with you. But I would say that even with my huge switch statement method, i do only write the address mode functions once and then reuse them. Secondly, it's still possible to screw up the lookup table method just as easily as the giant switch statement method (as you still have to fill the tables correctly) , so you would still have to do the unit testing for each instruction that i'm doing to really be sure. I would say the giant switch statement method i have here, is only really good for a small processors like this (150 instructions), it's very easy to read and easy to reason about. If i tried to do this for anything more complex like the Motorola 68000 then i would no way attempt it this way, i would certainly be using the method you are describing above. If you make each opcode into a switch case on the 68000 i'm pretty sure you would have multiple 1000's of switches.
@@DavePoo Oh, absolutely : )
But isn't that the wonderful thing about this project. There's a lot of ways you can go... all with their own little tradeoffs. I think the important thing is that people have a go - and don't be afraid to stray from the path and see where it leads : ))))
I love that there are channels like yours, encouraging people to tackle things like this. 8-bit particularly tickles me, because it's where I got my start in game dev back in the mid-late 80's. Happy times : )
Ahhh! I just found my old code! The Generic 8-bit fixed-length CPU frame... using the opcode call-table so that you could load up CPU personalities as plugins.
The plan at the time was to emulate all the 8-bit fixed-length families and their variants... but I guess life got in the way. I had a couple of almost identical z80 variants, an i8008 and i8080 and a classic (non-C) 6502 ... and, at some point I'd added a simple 8-instruction BF (brainf**k) machine from esolang.
I'd completely forgotten writing most of this : )))) I love exploring old drives, some of it makes me cringe : )
@@garychap8384 At least your old drives work (or exist). I went back to my Amiga 600 to see if the first machine code game i ever attempted was on there, but the hard disk was missing, i think i must have sold the drive or the Amiga in the past and completely forgot. It turns out 30 years is a long time.
@@DavePoo Oh, that's such a shame : (
It's so sad to think of all the things we lose along the way. Not just the code, the files and the hardware... but the childlike wonder when we got our first 8-bit, or the thrill of making a modem connection and manipulating some machine at a distance. Even just groking peripheral ICs for the first time... poking at some addresses or registers and making things happen (or not)
Bah! Growing up sucks : ) Still, I'm so glad I got to do it in the 70s/80s when computers were still a wild frontier and understanding ASM, and the bits on the bus, was the only way to get anything done in a reasonable time. Heroic days :D
Now we all walk around with supercomputers in our pockets, and never give it a moments thought : /
There's that quote about advanced technology being indistinguishable from magic...
... the unfortunate corollary of it is that the more ubiquitous advanced technology becomes - the less 'magic' there is in the world.
Thanks for making your videos... stuff like this is slowly becoming a new field ... digital archaeology : )
Heh, I guess that makes me a relic XD
Anyway, thanks for doing what you do.
are you kidding me? that's a "holy grail" over all the RUclips for the people who studying a CS. Dam, u r an amazing person!
I need a programmer with a heart of gold to get game logic to execute after injected passes. Eventually i wont.
This is a comment you would expect from 1y student.
Why only people who study cs? lol, self-taught here and I already know this stuff, but still very entertaining.
Except that the whole switch statement is NOT THE WAY TO DO IT!
You would create an array with function pointers to each opcode. Or even a map
@Brad Allen but it is nicely unit testable in a function. And INC is simple but it’s not just that you need to advance the PC the right amount of bytes, operate the flags. So a function that has an appropriate UnitTest is the more robust way to do it.
And there’s another good benefit of doing this with dedicated functions. That’s that you have the logica detached from your interpreter. And can reuse it in other contexts. And in case of OO quickly override methods to facilitate a minimally different CPU. Like the 8080 and Z80.
But indeed each their own, but I know that most companies, I’ve worked for, wouldn’t accept this in their code reviews :)
The 6502 is one of the best processors to learn on. Nice and simple and covers most of the concepts.
It is simple, considering that it's actually a complex instruction set processor.
@@DavePooThe reason 6502 is not RISC, is because it’s not a load/store architecture and has many addressing modes. Some RISC ISA’s have complex instructions, but are still considered RISC because they have load/store architecture and few addressing modes.
@@DavePoobut still a lot simpler than the contemporary z80. DJNZ.. shadow registers.. funky stuff. Possibly more powerful for the assembly programmer, though.
I learned everything I ever needed to know about computer science on my Atari 800.
my thesis has to do with writing a 8085 emulator and i find your videos really useful! you earned my subscription! keep it up :)
Thanks, i've never written a CPU emulator before so this is me going through the process. There are probably many different ways to do it.
@@DavePoo that's pretty obvious! but your approach is really user friendly and easy to understand, so probably i'll stick to it for now.
My first CPU emulator in C was for a configurable VLIW-CPU back in the mid/late 80s ;) and that was not considered to be enough for my thesis .... Where do you study? ;)
This is the kind of shit that keeps me up at 4AM. Thank you!
lol, I'm exactly seeing this at 4AM
3:42 am here lol
4:51 here xD
3:52 PM, had a nice nap watching this :D
3:20AM here so its not too late it seems! :D
It's very rare that I comment on the RUclips video, but this was an amazing find! Very clear and concise explaination on how to get started programming your own CPU emulator. Thank you for putting this together!
1 hour video. I can say I finally learned how computer works. thank you so much
I was 14 when I tought myself to program on a C64. It took me 1 week to figure out that Basic was crap. So I basically learned programming using 6502 Assembler. Today I am a computer scientist, still having the C64 ROM listing from 1984 in my bookshelf. I learned so much from it.
I still remember... poke 53280,0
@@twizz223 you mean
LDA 0x00
STA $D020
and zapp.... back border
BASIC is slow, so it's handy to know some commands to speed up a few things.
There was a game for the TRS-80 CoCo which I ported across to GW-BASIC and I sped up one part of the game from taking roughly 30 seconds to taking almost no time at all, simply by choosing to use BSAVE and BLOAD instead of using serial file output and input as the original version of the game did.
For anyone who wants more details: The game used an integer array to store the "levels" of the game, with the numbers representing what was in each location, so I just used BSAVE along with VARPTR to find the start of the numeric array along with calculating the size of the array based on its dimensions and the size of integer variables (not forgetting to add one to each dimension to account for array location 0). Once saved, the data could be reloaded into the array by using BLOAD along with VARPTR to find the start of the array. The size of the array was always the same, so there was no need to limit how much data was loaded.
In the early 80s I wrote several little programs for the 6502 in assembly language, just for fun, on my Apple II. It was always amazing how much faster this was than the same program in Basic language. The 6502 was really a simple design and easy to understand.
Yeah, i agree a very well designed and very successful processor. A marvel of the modern age.
yeah, me too on a C64.
I did that on an ATARI 800 XL.
While the C64 hat a 65C02 processor, the Atari had a 6502C (the C was a speed grading and denoted that the CPU could handle more than 2 MHz. Yet the Atari ran at 1.79 MhZ, almost twice the speed of the C64).
@@enantiodromia My C64 had a 6510 CPU. Maybe they changed it in later models?
@@toby9999 I don't know really, it's the first time I heard of this processor being used in a C64. What year was the computer built?
You use the term "clock cycle" for what is actually a machine cycle. Early processors such as the 6502 required multiple clock cycles to execute one machine cycle. The 68HC11 for example needed 4 clock cycles for each machine cycle.
what are you considering a "machine cycle"? i don't know much about the 68hc11, but one of the things that made the 6502 so awesome was one machine cycle (aka doing a thing, whether that thing be a memory fetch, instruction decode, alu operation, whatever) happened in one clock cycle- there was even a tiny bit of pipelining, though i'm fuzzy on the details- i think a memory fetch would happen in parallel with the instruction decode, so if an operand was needed it was already there by the time the instruction was ready for it. so a 2mhz 6502 was actually pretty close to a 8mhz 68k (at least in terms of fetching and executing instructions, ignoring differences in complexity of those instructions...)
@@MrFukyutube a clock cycle is the actual oscillator frequency, or crystal frequency, while a machine cycle is the internal cycle count which depending on the processor is between 1 clock cycle per machine cycle and I think the worst I saw was around 7. Intel Architectures (8080/Z80/8051 etc.) used to have higher counts where as Motorola and others including 6502 used to use multiple edges of the clock and so appear to be much faster on paper (in a instructions per clock cycle way) but ultimately those devices always had lower maximum clock (crystal) frequencies so ultimately difference was much lower. This link has a reasonable description: en.wikipedia.org/wiki/Cycles_per_instruction
@@MrFukyutube To my knowledge, this is not true. The CPU needed multiple clock cycles per instruction for instruction loading, decoding etc
@@petermuller608 Except RISC (Reduced Intruction Set Cpu) processors like the 6502 where 1 clock cyle is the only unit
On the 6502 if you go from an instruction which does 8 bit (zp) addressing to 16 bit , it needs one additional cycle.
I dunno what the crystal has to with this. Typically, a TV crystal had 15 MHz. Bipolar JT reduce it before feeding the 6502 pin.
If you use uint8_t and uint16_t for your CPU types (in ) you make your code basically platform agnostic, now you depend on 16 bit shorts.
Thanks, quite a few people have suggested that.
all hail stdint
. Best friend of low-level programmers and people who want to *know* for sure how wide their data types are.
@@benjaminmelikant3460 When i program a microcontroller, over 98%, often 100%, of integers i use are uint_t
It’s been my opinion for a while that the designers of “C” made a mistake when they didn’t define the sizes of the int types. I mean what good is an “int” if you don’t even know if it can hold a value of 75000?
Been looking for a series like this for years, bloody brilliant work mate! Keep it up!
Thanks. I said in this video i wasn't going to write the whole thing. But i realised nobody had done the whole thing on video before, and i thought it would be good for people to see how much work it could be to get something like this working.
Thanks Dave. Im a fresher just started working as a softare developer . This is a very interesting side-project. Thanks for sharing it.
Is that link in the description working for you?
@@kartikanand5374 nope
Just found you channel, awesome stuff my guy. You earned a subscriber
I have completed this video and I have learned many things. Much support!!
The instructions are simple to understand as well!!
not only that the SP is only a Byte it also "grows" from top to bottom. so you need to decrement the SP when putting things on the stack and increment when pulling data. also the Reset vector is an indirect jump, so you don't start executing at 0xfffc but you take the address that is stored in 0xfffc/0xfffd and start executing at this address. it is the same for the other vectors as well.
I handled the SP as a byte in a later episode, as well as the direction the stack shrinks. I never really handled the reset vector properly. I think i would get round to that once i started emulating a whole computer system.
@@DavePoo As I saw the video it was not clear to me that this is a series. I've done a lot with the 65(C)02 in the past. I built my own computer from scratch called MOUSE and even used a 6502 emulator on Arduino boards to create an emulated version of my computer (MOUSE2Go). I like the "start simple and evolve" approach, because if our overplan it you might never start due to the complexity. But starting simple gets you into getting simple things work and than improve. SO now I'm curious how this ends :-).
I actually wrote a 6502 emulator in C on my Atari-ST (68000 CPU) in 1987. I was quite proud of it. It used a kind of jump table for the actual instructions. I made an array of pointers to functions, and used the content of the instruction register as an offset into this array to call each Op-code function. For example at A9 was a pointer to the LDA immediate mode function. I started off writing a cross-assembler, and then wanted to test the resulting machine code and so wrote the emulator for it. Amazingly, after all these years I still have the source code!
Care to share it with us? I would like to take a look at it
@@NOBODYUSEFULL Well, 34 years on, I expect there are quite a few embarrassing things about it, and remember, a state-of-the art 68000 Lattice-C from 1987 is going to have issues. But here goes...
@@NOBODYUSEFULL
/* a 6502 simulator and assembler in Lattice C M.Stent Dec. 1987 */
long *jt[256]; /* jump table */
unsigned char memory[0x10000];
/* cpu registers */
unsigned char a; /* accumulator */
unsigned char x; /* index reg x */
unsigned char y; /* index reg y */
unsigned short pc; /* program counter */
unsigned char sp; /* stack pointer */
unsigned char p; /* status reg */
unsigned char ir; /* instruction register */
unsigned short fpaddr; /* front panel address reg. */
unsigned char fpdata; /* front panel data reg. */
unsigned short ruaddr; /* run stop-on-address */
unsigned char ruinst; /* run stop-on-instruction */
int ruclk; /* run stop-on-clock */
/* definitions for status reg. p */
#define CMASK 1
#define ZMASK 2
#define IMASK 4
#define DMASK 8
#define BMASK 16
#define VMASK 64
#define SMASK 128
/* inverse masks */
#define NCMASK 0xfe
#define NZMASK 0xfd
#define NIMASK 0xfb
#define NDMASK 0xf3
#define NBMASK 0xef
#define NVMASK 0xbf
#define NSMASK 0X3f
long time; /* cpu clock */
int clock; /* display clock */
/* here I leave out a lot of stuff connected to the display on an Atari ST but here is the core of the matter...*/
void
execute(func)
void (*func)();
{
(*func)();
}
void
init_jump_table()
{
void adc(),and(),asl(),bcc(),bcs(),beq(),bit(),bmi(),bne(),bpl();
void brk(),bvc(),bvs(),clc(),cld(),cli(),clv(),cmp(),cpx(),cpy();
void dec(),dex(),dey(),eor(),inc(),inx(),iny(),jmp(),jsr(),lda();
void ldx(),ldy(),lsr(),nop(),ora(),pha(),php(),pla(),plp(),rol();
void ror(),rti(),rts(),sbc(),sec(),sed(),sei(),sta(),stx(),sty();
void tax(),tay(),tsx(),txa(),txs(),tya(),xxx();
/* 65c02 */
void bra(),phx(),phy(),plx(),ply(),stz(),trb(),tsb(),bbr(),bbs(),rmb(),smb();
register int i;
for(i=0;i
thank you :D
@@martinstent5339 thanks for sharing, you should put this in a github repository, not in youtube comment
You might wanna take a look at the header. It defines portable integer types of fixed width.
Definitely. I was surprised to see "unsigned short" and whatnot, but I guess it's probably an "old habits" sort of thing.
Yeah, working at this level with “using Word = unsigned short” is gross
This is an easy and fun way to get a handle on how microprocessors work. There were no books on the Motorola 6800 except for the one intended for computer specialists when I started. I must have read that book twenty times before I had a clue what it was talking about. No hobby machines existed, and I was a Mechanical Engineer with a final year Degree project to control a machine using a D2 Evaluation Kit.
To say it was a struggle would be a hell of an understatement. With no assembler, the opcodes had to be looked up and entered using a Hex keypad using a debug monitor program. A hard way to learn, but something you never forget. You guys have it so easy!
Very interesting ... busy making my way though all the emulator videos, very nice indeed.
Well done and well commented as well.
Thanks, i think i said in this first video somewhere that i wasn't going to write the whole CPU emulator, but in the end i went through and started doing the whole thing. I think one of the main purposes is to show that when you are writing a program, what you end up with is not always what you started with. The emulator code evolves and changes as the videos progress.
This video is my kinda ASMR. Thanks for this great resource.
I did a lot of assembly programming on this architecture. Since then I've designed many more complicated processors, and each of them is first done by creating a program to emulate it's function on a clock by clock basis to turn the design into real logic. What you've done here is a behavioral simulator but cool to see it done on the fly
My last project in school was a 32bit cpu on FPGA, and later I wrote an emulator for it in C. Very very fun project. Great video thanks!
I would hit the 'like' button a thousand times if I could.
Just make sure you press it an odd number of times
@@DavePoo nice
Do it
create a python script that creates a thousand channels, and then like from each channel
I love the 6502 instruction set. Make you very handy at packing code into small pages for efficiency.
weird
65C02 got branch always instruction. All microprocessor jump relative for small instructions despite large total memory. Fast page memory came 1987 with the 386.
Also: is cycle time part of the ISA?
The instructions set lacks all the goodies from 6900 . 16 bit stack. B register. I want “do for A and then repeat for B” versions of LDA ADC STA . And TAS and TSA.
ADC AB, imm8 sign extended ( like branch).
The Z80 with some of its more complex instructions allows for more compact code.
@@cigmorfil4101true, but the 6502 was ideal for writing interpreters, compilers, and disassemblers easily, as was my very first computer, the LINC.
Love this. When I was in University in the 1980's, we had to write a microcode engine to implement instructions for the 6809 and get a simple program to execute on it. We had to write the microcode for each instruction. We were given the microcode instructions for the RTL Register transfer language. You could create a microcode engine that could then run any instruction set on top of it! Set the microcode engine up as a state machine to make life a bit easier. At the time we were actually using an IBM/370 and the VM operating system so we each had our own virtual machine. but the microcode engine had to be writeent in 370/assembler and boot as a virtual machine on the mainframe! These days the average PC is capable of this with relative ease!
Great story. Yeah, not only an average PC is capable of this, but even a below average smart phone could emulate this now. I think it's amazing that we now all walk around with super-computers in our pockets and totally take it for granted.
@@DavePoo The best part was we never realize that this super special virtualization technology would become so prevalent back then. It was just what we had to use to get the assignment done. We never sat back and thought about just how much power we had or what would happen to it!
29:00
$fffc is a vector, so if those bytes are loaded there the 6502 will load the PC with $42a9 and try to execute that memory, which contains $00 (BRK) at the moment.
Yep, i never got round to handling the reset vector correctly.
Really interesting! Thanks for the video. I see people have mentioned about std::uint8_t and std::uint16_t, but in C++17 onwards there is also std::byte in the cstdef header which you can use. It also can be passed to a non-member function std::to_integer(std::byte b) for a more numeric output if you're debugging the byte values.
Interesting..I wonder if it fixes the longstanding issue in C++ where std::cout
Thank you for making this tutorial! I could not find a single resource on emulation on google but thankfully, this video came in my recommended :)
Surprisingly it worked despite a bug in FetchWord with cycles++ when it should be cycles- -
Also you should implement the instruction vs mode table to simplify it dramatically. By masking on the opcode bits you can then use a switch statement for the addressing mode. It would reduce the combinations to 23 instruction switch statement and 8 addressing functions.
Btw the pc++ wrapping to 0x0000 is legal so as long as mem is mem[16k] it’s fine.
I hope this isn’t taken as armchairing. The video was fun to see.
Yeah, i was confused at it working as well, but that shows why thorough testing is required of even the simplest of programs.
I did something similar to learn about the 6502, specifically the 65C02.
but i didn't write an emulator, i built the whole CPU in a Logic Simulator.
the end result is the same, you get a better understanding of the hardware. and it was quite fun.
Pretty cool, good work.
@@DavePoo thanks. something a bit more direct to the video:
around 5:13 why did you define a byte and a word instead of just using uint8_t and uint16_t? the "_t" types are made to be universal across all C/C++ compilers and architectures.
also, the endianness of the platform shouldn't matter if you just have 2 temporary 8 bit variables instead of a single 16 bit one.
and i assume in later videos you fixed the thing where the CPU starts executing from 0xFFFC? because that's not where the PC starts at, but rather at 0xFFFC and 0xFFFD is the address that gets loaded into PC before the CPU starts executing. it's like a hardwired jump indirect.
either way your video made we want to try this for myself as well, but i'll try it in C instead of C++.
On a similar theme, back in the 1980's I wrote an assembler/disassembler pair for the Z80 microprocessor that ran on a Pyramid minicomputer. I used it to work out the full functionality of UHF radio scanner that had a Z80 and associated IO chips as it's central control. I dumped the radio's 16k byte EPROM into a file containing a long string of HEX pairs, disassembled it and printed the result. Then spent a few days looking at the printout and filling it with comments. Made my modifications, also adding all the comments to the disassembled program and used my Assembler to create a new HEX file ready for EPROM programming. Started up the radio and all my mods were working as planned. They were fun days. I doubt I could do what I did back then with today's systems.
It’s amazing. Congratulations for a very interesting job!👏👏👏
I have tried to do similar thing for years, not with UHF radio but with another kind of ROM. Anyway I couldn’t to do. My level of knowledge is low and I think I am a bit lazy . . . he he he
Some embedded devices leave debug ports open that can be exploited to read/write data from the system, but it's certainly a lot harder than pulling out the EPROM and dumping the code out. Now you have to get lucky to even be able to see the code without very specialized tools. Low Level Learning has a video where he reads/writes data onto a baby monitor using an Arduino on its debug port, then used that to run arbitrary C code.
Takes me back to my youth where I used to dabble in M680x0 assembler. Not only was M68k assembler fun to work with but it was a blast to cycle (instruction order, instruction type, addressing modes) and instruction pipeline optimize (mainly try and prevent pipeline flushes that would eat cycles due to having to load a stream of new instructions from memory) the code in order to make it as fast as possible. With the advent of caches, branch/jump prediction, vector instructions etc. things have gotten quite more complicated of course. I wouldn't bother to hand optimize assembler code nowadays and let the compiler do it instead. Never the less, I'm still of the opinion that getting to know how a processor works on such a low level is still very valuable for any programmer and can not only help in debugging but also improve the understanding of high level languages and how they translate into assembler.
I tried to hand-optimize assembly code on a risc PPC601 (after learning on a 6502 and then a 68020). It was very complicated, and I am sure I didn't handle all the interdependencies correctly, but trying to achieve this teaches a lot. So trying to do it a couple times is quite worth it, I think.
I am now, after a hiatus of 15+ years, playing with assembly on the ARM Cortex-A (in my Samsung tablet), and while the risc approach is familiar, the complexity of the processor has become astounding. The manuals covering a high-level view of the processor alone are hundreds of pages.
5:20 There is header file which defines precise types like uint8_t, uint16_t instead of things like "unsigned short".
Thanks, quite a few people have suggested that.
Yeah, some industrial coding standards actually require using it. If you make it a habit, your code will always be portable between different architectures - at least concerning POD type sizes.
@@minastaros ... and if you use it where it's not needed (just because a Miserable standard says so), you just make the code less efficient.
@@272zub Ah yeah? Proof that! This mechanism is to aid in platform independent programming, because it is in fact NOT standardized and machine dependant what Dave uses. These storage modifier keywords are platform dependant. What you call "Miserable" (uint_8t, etc.) does in fact translate to the same instructions on Dave's machine. So your efficiency claim is a fallacy. It is just good style to use them. Especially when emulating foreign hardware.
For example, look at the code from the professionals at github.com/stuartcarnie/vice-emu/blob/master/vice/src/arch/XXX/types.h. They have to define for EVERY system what to use. That was a design decision from the start and that project is very mature. In contrary look at the very new github.com/commanderx16/x16-emulator project. They use proper platform-independent code and save a huge amount of code.
@272zub you can't generalize it this way. If this facility is there, why don't use it? What you say is an edge case and is only true in special cases. In addition, what you tell affects the code running on the (compiler-)TARGET. But here the function of this facility is a data-type related to the emulated machine (on the HOST). You are wrong on several levels.
See stackoverflow.com/questions/6144682/should-i-use-cstdint ... I think that is what is related to your thoughts and what doesn't apply here.
minasteros: #include ... Its C++ :) en.cppreference.com/w/cpp/header/cstdint
@@dieSpinnt Hold your horses. :) I think you missed the "where it's not needed" part in my reply. If you need a fixed-size integer, e.g. a 16-bit unsigned integer, then by all means do use cstdint (or stdint.h when in C). It's so much more better than either using an unsigned short because it happens to be 16 bits on your platform, or than making your own half-baked stdint. Clearly when writing an emulator, like it's done in this video, you will often needs such fixed-size types. I am not disputing that at all.
In reality the types from cstdint are just the correct typedefs to some of the "normal" integer types, e.g. on some platforms it is that uint16_t is a typedef of unsigned short. So using the uintNN_t type of course is exactly the same as using the correct "normal" type.
What I didn't like was @minastaros' suggestion to use the fixed-size types everywhere. In the extreme this means don't use an int at all, always use (u)int_NN. And that is where my "less efficient" comment applies: godbolt.org/z/7obs5s - if you use uint16_t when you don't explicitly need it, and an int would have been a good choice - you can see that the 16bit version is actually more complex than the 32 bit one. And that the normal int version is the same as the 32bit one.
And by "Miserable" standard, I didn't mean the C++ standard. I meant en.wikipedia.org/wiki/MISRA_C and especially it's C++ evil cousin, which, to me, is how C programmers, who don't know C++, get their revenge on C++ programmers...
By the way there are also the types (u)int_fastNN_t and (u)int_leastNN_t which could offer the best from the both worlds: Guaranteed minimal size while still being as efficient as possible. As their size is not guaranteed, that can't be used when a specific memory layout is needed though.
Very interesting. Earlier this year I was wanting to expand my knowledge of Java and went through a similar exercise. I had a Heathkit ET-3400A microcomputer a long time ago, and I wrote a functional ET-3400A emulator that runs the ET-3400A ROM in an emulation of a Motorola 6800.
wow
This video is amazing. One of the first people who dont explain first then quickly blurp out some code. You explain while writing. That is amazing.
I didn't realise i was doing that.
At 5 minutes into the video, it is stated "So this is where you have to know exactly the size of a certain type on your platform or compiler". Doing this creates platform-specific code, which only works on platforms with the same type sizes. Instead, it is better to use the platform-independent types that are declared within . Specifically, the followings lines in the author's code:
using Byte = unsigned char;
using Word = unsigned short;
should be something like:
typedef unit8_t byte_t;
typedef uint16_t word_t;
It's debatable whether a using or typedef statement should be used, but the key thing is the use of uint8_t and uint16_t from .
Yep, i could have used those, but i was careful to use my aliases everywhere so it's trivial to fix them later. I prefer Word & Byte to everything having an _t on the end. Not sure why they did that. Would it have been so bad to call them uint8 and uint16 instead?
There is no difference in typedef and using in this case other than the syntax, but i prefer "using" to typedef.
unsigned char is guaranteed to be 1 byte anyway by the spec.
Best 50 minutes I've spent all day.
6502 was the first chip I ever programmed. I had the most fun with it. Good memories...
I think it probably was for me too, but only via BASIC, i think it was a BBC Micro from school where i wrote the classic "i was here" then made it loop. The first chip i programmed in machine code was actually the 68000 (the Amiga)
Cool. For me it was the 8088, then Z80, then 68000 then atmel avr
I learned 6502 assembler by disassembling space invaders on the commodore pet, after writing a disassembler! I was 15 at the time.
I liked the Kim-1 SBC and there was a nice academic trainig board with a 6809 on it that was great fun too.
@@PWingert1966 oh, the 6809 was a REALLY nice cpu to code in assembler. Really good support for higher languages too. Used to code on a 6809 system with MMU so it had 512 KB ram, and run a multi task OS called OS9. We run 8 concurrent users on each system, we got two. Could dynamically load and unload drivers, way advanced system in the mid 1980:th. :-)
I think that only nicer CPU I have worked with in this low level was PDP 11 which had a really nice orthogonal and symmetrical instruction set, much like 6809.
Back in the 80's I purchased a computer kit from a company named "Southwestern Technical Products", out of California. It was the first and only computer I ever built. Had to solder every component (capacitors, diod's , resister's, and even the ram chips, a whole 4k worth. It took about a month to get it all done.
I never built another computer since.
Memory-mapped I/O is still very much in use. A large amount of memory address space on a modern PC is used, e.g., by your video card, which is why 32-bit windows would only have ~3 GB available to applications on a system with 4 GB of RAM installed.
I suppose it is, however nowadays its all virtual memory. You don't even know where in RAM you actually writing to.
Virtual memory is just at the user program level. The OS kernel still has access to the physical address space (IIRC by mapping that to the _kernel's_ virtual address space) and it manages assigning the I/O memory to device drivers.
@@DavePoo I guess u cant do DMA through virtual paged memory... they got to be physical pages and they always locked pages.
4 GB limit was for 32 bit processor... its much higher now. Anyway it was lower 2GB of memory (addresses 0x00000000 through 0x7FFFFFFF) for application programs in windows rest above 2GB (addresses 0x80000000 through 0xFFFFFFFF) was system space normally.. where kernel or all I/O ports or DMA memory resides. But u can specify a boot time option in windows so that lower 3GB is for applications and just upper 1 GB for system.
Now from processor point of view 32 bit processor supports 4 GB of memory normally but with the help of virtual memory mechanisum (paging) and use of PAE or PSE flags it can address 64GB of memory. Windows servers might be supporting that mode I guess. Basically last four digits are assumed 0 ...so all in all 36 bits instead of 32 bit.
Good work! I wrote an x86 emulator in javascript, learnt so much in the process!
Seriously impressive video and very informative. And someone who actually does know C++.I've been around software dev alooong time and when asked 'do you know any C++ devs' I always reply 'I know quite a few who *claim* to be c++ devs'. The rarest of beasts I think.
I would argue that there is no person on the planet anymore that can truthfully say "I know c++", considering the language isn't even _designed_ by one person anymore. Even making the question more constrained, eg "do you know the _syntax_ of c++?", even then, the answer will always be "no".
Besides, what is the point of asking such questions when there is no reference, 100% compliant, verified, compiler?
(there exists exactly one verified _c_ compiler that supports _almost_ all of c11)
Long ago, I wrote a Z80 emulator in x86 assembly. That's a good way to gain a thorough understanding of two different processors at once. I did it a lot like you did in this video but I put the more commonly used opcodes near the top so the emulator wouldn't have to do as many checks on average as it went through the list. I've since wondered if it would've been faster to check the opcode one bit at a time, thus guaranteeing that there are eight comparisons per opcode rather than fewer comparisons for common opcodes but over a hundred comparisons for rare ones. (Unlike with the 6502, Z80 opcodes aren't all 8-bit, but you get the point hopefully.) Or maybe there's an in-between solution that's ideal. I'm glad you have more of these videos so I can see how you do it.
The eventual goal was to make a Sega Master System emulator, but I realized I was in over my head. Emulating the processor seems pretty easy compared to emulating a graphics adapter that's totally different than the machine on which you're running your emulator. Old games would often be timed to the h-sync and v-sync signals from the CRT, which don't exist on modern computers, and sometimes the program would write to the background color register just as the electron beam was at a certain horizontal position on the screen to make the background more than one color. How do you get your emulator to realize the background color was changed when the hypothetical electron beam was halfway across the screen, so the left half needs to be one color and the right half needs to be the other color? Things like that are why it's really hard to make an emulator that works with all software.
cstdint has fixed-width types that are supposed to typedef to the implementation-defined awkward-width types
Thanks, i've got several comments on this now. I made sure to use my own "using" definition for all the types, so it's pretty trivial to change this at any point.
6:21 isn't the stack pointer only an 8 Bit register?
Did a FULL implementation back in 1986 in C to simulate machine tools controlled by a 6502. It was cheaper to test code on a PC before putting it into a tool than it was to put code on the tools and have it break something.
Probably would have been easier in C++ as you could model the various components of the CPU as objects.
That’s awesome godamn!!
Those were the days... Throwing code that actually DOES something is so much more rewarding than crunching rows in a DB and spitting out a PDF report. Tony; just curious, long did it take you to do the emulation code?
@@sempertard better part of a couple of months. I was working part time and still working on my CS and EE degrees. So maybe 20 hours a week working while going to school.
liar.
@@CleoKawisha-sy5xt Who do you think is lying, and why? Everything above sounds reasonable to me...
ha! I wrote this program for 65816 back in 90's when I was interested in some aspects of snes workings. Really one thing I learned is all the addressing modes I didn't know about in 6502
Pretty cool. I did the same thing back in the early 90s using Borland Turbo C++ 1.0. I based it on the book 22 microcontroller project you can do at home, or something like that.
Good old Borland :) Do they still exist?
Your awesome man, keep your content coming!
Great content and great video! 👏👏👏
Here are a few suggestions and things I noticed and would like to point out:
💡 Instead of relying on the width of primitive types for the host platform, using with its uint8_t, uint16_t and so on will make your code more elegant and platform agnostic;
💡 The stack pointer (SP) should actually be 8 bits wide. The 6502 will add 0x100 to it, thus making it able to range from 0x100 to 0x1FF;
💡 Upon reset, the 6502 will not set PC to 0xFFFC and execute from there. Actually, it will set PC to what memory location 0xFFFC points to (least significant byte first);
💡 For your FetchWord() implementation, you don't really have to worry about the endianness of the machine you're compiling your program for. That because endianness affects how numbers are laid out in memory only, and the 6502 will be little endian regardless. Numbers _per se_ and how you handle them will be the same regardless, thus (v
Thanks. The stack pointer was fixed in a later episode -> ruclips.net/video/i5JVCHSNxJY/видео.html . I don't think I ever got the reset correct, but it wouldn't really affect this implementation as I'm not actually getting a working computer (just a CPU). You are correct about the Endian thing, it was fixed here -> ruclips.net/video/i5JVCHSNxJY/видео.html
Thank you so much, Dave! It's exactly what I have been looking for!
Ive coded my 6502 emulator in C, so very similar to yours. You've probably fixed this later, but just mentioning that the stack is an 8 bit register and it starts from FF downwards. The actual memory used is from 1FF to 100 (so the processor adds 100 to the register value). And remember, you will need to store the stack pointer register in the stack itself, as a single 8 bit byte. Looking forward to watching your next videos.
Thanks. You are right about the stack pointer and I got around to fixing it in#8 ruclips.net/video/i5JVCHSNxJY/видео.html
Great :) I remember doing it as a Msc. comp-sci student - we were emulating the Motorola 6800 using C and assembler. Deep nerdery but fun and very satisfying being able to run stuff on the finished emulator.
The first job my son had at Intel (15 or so Yrs ago) was debugging cpu design by emulating hardware with software
did with amd haha
of course that only works if the software(usually written to the documentation) works as the actual hardware does, certainly not the case with the original 8086, the manual perfect 86 chip clones weren't perfect to the actual intel chips. not a enviable position imo, to try and insist to the hardware lot they messed up as i can fully see they would swear black and blue its the emulator that's bugged.
@@mystsnake indeed. only to discover that they made a mistake in the specifications
This is really cool, I've always wondered how to write something like this. Now I know where to start! Thanks :)
And now we all have a *hugely* greater appreciation of the folks that have written the C64 emulators and NES emulators and PS1 emulators that we run on our RetroPie machines! :-)
Yeah, those emulators are usually the collective work of quite a few poeple
I spent four years programming a 6502. One of my last application versions would overflow the 2K EEPROM by one byte, but I could manage to shrink the last program by one byte.... by modifying a jump back so that it jumped to the previous byte, which was the second byte of a 2-byte instruction but happened to be the right opcode I needed next! This chancy patch let me deliver the application without a thorough revision of programs to find out whether I could squeeze a byte off one of them. For subsequent versions, I had to modify the hardware and two 2K EEPROMs were installed in the one EEPROM socket, one above the other, all pins but one (the strobe pin, there working as the 2K-page address bit) correspondingly soldered together.
Those crazy days when you were just trying to save a single byte! Nowadays, i could leak a gigabyte of memory and probably not notice.
when I was in college (1980s), the assembler class I was taking didn't include how to do output but instead had us dumping memory [to paper] and then highlighting and labeling the registers, and key memory locations. I do recall reading files at some point because, due to a bug, I corrupted my directory structure and lost access to my home dir. Thanks to a brilliant Lab Tech (he was like 14 or so and attending college), my directory was restored. I couldn't say if that was from a backup or if he fiddled with the bits to correct the directory but I'm pretty sure it was the former.
whats your story have to do with anything
Great, didn't think I would understand anything of what you said, but actually everything makes sense. Great video thank you a lot!
Excellent, I think it's good to know at least a little about how a computer works inside. It takes away a little of the mystery but once you realise all the things it's doing and even a CPU like this which is so old is doing operations at lightning speed. Computers are really a modern miracle.
the interrupt vectors dont actually get executed when they get triggered, but instead go to the address that they point to. So for example, when you were testing LDA ZP, you have 0xFFFC as 0xA5 and 0xFFFD as 0x42, the processor after resetting would real that and set the program counter to 0x42A5 and then begin program execution.
Correct. There was too many major mistakes like this, that I lost interest.
This is really cool. I think I'll try this myself. One idea I had that I think will make things a lot easier is to make an array of void pointers for the instructions which you can then assign to addresses of c++ functions that you write to do what each instruction needs to do. Then all you need to do is call the function at that address.
Since all of the required information is going to be at the program counter, you can just have the function itself grab the data it needs, advance the program count, and decrement cycles accordingly.
You could actually then stub out every instruction available to the 6502 without actually implementing them and just implement them one by one, only having to fill in the function and nothing else.
Yep, that's one way to do it but it wouldn't be much different to the switch statement in the end. You could create function pointers to the relevant addresssing mode and cycle counts and the actual instruction and then decode the instruction and lookup the correct functions to call. If i was to implement a more complex CPU (such as the 68000) then this would be the way to go as the number of addressing modes and registers goes up dramatically, which would make the switch statement approach too cumbersome.
@@DavePoo So after playing around with this a bit, I decided the best way to do this may actually be to create a control word lookup table. Then I only really need to write functions for the instructions that use the ALU. I've starting writing to table and handling for microcodes in my CPU class but I'm trying to keep the control word to 16 bits and I currently have 19 flags I'd like to have. I can't seem to find any documentation on how the 6502 handles control logic only that it apparently handles microcodes "differently than modern cpus" (Yeah the article I found was very vague about that). Any ideas on how I might be able to handle this? Personally the switch statement approach is already too cumbersome for my liking and the cycle counts already add quite a bit to that complexity so I'd like to avoid it if possible.
14:00(ish)
The 6502 does no initialisation itself outside of using the vector ($fffc) to provide the code to start executing (and set the I-flag): its registers (and memory) can only be assumed to hold random values (except the I-flag which is set to prevent any IRQs) - it is up to the start code to set whatever is necessary, eg clearing memory.
The first instruction of the called program should be to [re]set the stack pointer: LDX #$FF (or whatever value the system designer wants during reset), TXS.
I was just about to post this. He shows 6502 code in the rest routine to initialize the stack pointer and decimal flag, and proceeds to hard code this into the power-on reset hardware sequence instead. Just wrong.
Very impressive and most interesting. My C/C++ is pretty rusty, but most of this made sense, I'm keen to play with this idea
20 years ago, the very famous device in China which called 文曲星(Wen Quxing) NC2000 series, 6502 CPU with GvBasic app, which I started the opcode from. So cool
if you ever want to understand C , low level programming , CPU architecture and how to read their specification documents , all this in one project , writing a 6502 emulation (eventually add PPU emulation ) software is just the perfect thing to do...
I'm even astonished we didn't have this as a 5 months project in university ... it just reveals every flaws in your understanding of pointers and bit operations .
Like , if you don't consider yourself so much a beginner and start to get a good grasp of basic concepts in C , I very highly suggest trying to emulate a NES ... first , because there's massive documentation on how to do it , there's a community that can help you , and you are going to gain a massive knowledge from it.
There's two projects that made me very comfortable in C , I did a CPU nes emulation back in the days , also in C++ and SFML , and an OS kernel.
trust me , the time you put in those kinds of project will massively pay , and it will be waaaaaaay better than any of the projects you get from schools and university.
Well done Mr Poo.
Love it. I can't imagine being this clever.
08:30
You are missing the unused (expansion) flag of the 6502 in bit 5 between B and V (I'm assuming your compiler assigns the bits from the least significant bit 0 upwards).
Without this bit some processor status manipulations (such as PHP, PLA, play with bits, PHA, PLP) could fail as V and N would be stored in the wrong bits.
Yes, but it was added at a later date
Definitely will go though the playlist! Hope to learn a lot!
I've started doing this exact same thing but in VHDL for an FPGA.
This is the start of a 6502 emulator. Just a few points to make. If PC is a 16-bit value the wrapping around of addresses will be taken care of automatically. The handling of the reset vector at 29:32 is wrong. On reset the PC is set to the address stored at FFFC/D and execution starts at that address. The comments said that the SP wasn't incremented in the JSR implementation. It needs to be decremented. The stack works from the top down. I'm sure these things will be caught and fixed in the next video.
Brother i built this with your instructions and i didn't understand a THING. but when i ran this code in VICE, man did it feel good. im going to be doing this alone now, everyday all the time bro. thanks
I've done something like this before, just watched to see how someone else would do it. Instead of writing code for every instruction you can find a pattern in the bits of all the instructions, and make lookup table to indicate which instruction and addressing mode and flags and cycles are used. Then you don't need code for every instruction. That's how the real chip actually works I think.
There is a table of instructions here, and maybe you can see that most of them are organized in a pattern, just a few look out of place. www.masswerk.at/6502/6502_instruction_set.html
Yep, and i think if i was emulating a more complex CPU then that would definitely be the way to go.
@@DavePoo I would probably argue that it is easier to see the pattern on an early 8-bit CPU like 6502 then on a 68k, Intel 8086 etc or a modern RISC V CPU. ;-)
(Even though the RISC V is a orthogonal CPU design, like 6809, 68K and PDP 11. Real nice CPU to code machine code in. :-) )
thanks mr Poo. this is right in the crosshairs of what i needed to learn. it seems 6502 was everywhere back in the day. didn't realize the atari and NES were the same freaking processor
As crazy as it sounds 6502 is actually 45yrs old!!
I like your opening about how knowing the 6502 is relevant to modern processors. I learned assembly on the Commodore 64, and when I moved to a PS/2 and the 80386, apart form learning segmented memory and real mode vs. protected mode, everything was about the same!
41:46
The endianess of the host is irrelevant unless you are using non-char types to read or write the "memory" (an array of char).
On a bigendian system the memory byte array is still going to be in little endian form - it is just a byte stream.
If you were going to use the endianess of the host cpu I would expect you to cast the memory from char to word and then read/write the word in one go.
Yep, i got rid of that comment later
What a great video. Thanks a lot for creating and sharing!
48:00
Using subroutine test address of $4242 could hide a bug of swapping the MSB and LSB.
Good point.
Nicely done. A better and simpler way than when I wrote a MIPS simulator.
Finally someone who formats their code like a real man: opening brackets in newlines and spaces within parentheses! Upvoted just for that fact alone.
Also used 273% zoom for your viewing pleasure.
@@DavePoo seriously though, nice video! Very easy to follow and well explained :)
I love the way everything is kept "on screen" and reformatted for that very reason throughout :D
Allman Indentation.
Coding style beauty is in the eye of the beholder, but man, I dropped in as a first time viewer and can't forgive the spacing, the bracing, the TitleCasing... oh my!
Nice, I wrote my own 6502 emulator in c# in spare time in just two days. One class contained 151 methods corresponding to the mnemonics which emits code to the memory. So JSR emits three bytes and increments the PC. Then the emulator can emulate the actual 6502 code (without cycle counts). Works like a dream.
Thank you for making this video. But (*sigh*) please stop calling the 6502 old! Some of your audience is older than it is! :D
Old compared to my Ryzen 7
@@DavePoo lmaooo
Don't call it 'wierd' either! :)
It was a creature of its day. The Computer History Museum has audio histories with both Bill Mensch and Chuckle Peddle about why they did things the way they do. For example Peddle was given a 'budget' of 3000 transitors. The whole thing is remarkable for 1975 and dropped the price of a micro-controller (they didn't 'say' processor) to one tenth of Motorola/Intel.
Peddle had also read work (later relied on by Stanford RISC) that suggested a collection of about 50 'useful' instructions for which you could do EVERYTHING. They were right.
Fascinating video.
@@rabidbigdog
the turing machine has even fewer instructions than 50 !
It can’t be old if it’s still in production. 😉 Bill Mensch’s company, Western Design Center, is the one making them now.
Good memories - one of my first CO-OP jobs after my 2nd year of comp sci was to create a Z80 emulator for an aerospace company. I seem to recall it being able to run about 1000 instructions a second .. back in the dark ages ;) You are right you really learn... I still remember the DAA instruction .. god...
Was the Z80 emulator you did put into use in a product?
@@DavePoo It was used by that company to test their software before moving it onto real hardware. I dont know if they ever sold it. Was a fun project, my supervisor just left me alone and I gave a demo every friday to the team. Wrote a users guide when I left and had an office overlooking Vancouver from Burnaby Mountain.
That blew my mind (in a very elating manner).
Wow. This took me back. Haven't used C±± in over 20 years. Very similar to c sharp but watching this video, reminded me of all the differences. Thanks for this.
C++ has vastly changed since then. It's definitely worth checking out again.
This video was C though as opposed to C++
I cried at 8:32; not only are the comments not aligned with the top ones, but the lower ones are missing a space!
i'm not sure what
you are
talking about.
@@DavePoo haha
This just popped up in my feed and I subscribed because it's like you knew what I was thinking
It's rude to read other peoples thoughts without their permission.
Your "FetchWord" function is adding 2 cycles instead of subtracting it.
Yep, that got fixed in a later video.
@@DavePoo Man I gotta say I'm impressed. I've recently been getting involved with microcontrollers and this project seems like a lot of fun!
It makes me remember why I love Computer Science :) (Although this project may be beyond my current skillset)
Some nice bare virtual metal action! It should help some of the younger people get a feeling for how a CPU basically works.
(I grew up on bare metal coding (not 6502), I had a NASCOM 1 (Z80), with about 900 bytes of available RAM ... and no luxuries like BASIC, or even assembler! - lots of hand-coding resulted in me memorizing all the Z80 opcodes!)
I think you are overcomplicating things by passing so many parameters with most of the function calls.
It would be cleaner for CPU to keep them as internal properties (along with cycles, current opcode etc ... and anything else useful for CPU to manage its state).
The memory array could be created internally, or passed during initialization.
I also think it makes more sense having cycles count the cycles executed (by incrementing), not cycles to execute (and decrementing).
And ... that switch() is going to get very large! ... maybe time for an array of functions if someone is going to implement all the 6502 instructions :)
Very clear explanation and code was also clean and straightforward. 👍
0:30 it sounds so much like you said “it’s like 4 years old now”
It's 1979 all over again
The amazing thing is that the 6502 is still manufactured.
@@DavePoo At 13:45 you made the memory thing and I am not used to the C++ language so I would like to know how it works because I am watching this and making my own version in C# based on yours in C++
@@patfre it's just an array that is 64kb in size
@@DavePoo Thanks for the answer it helped
The 6502 has three user registers (A, X, and Y); however, it also has several internal registers including the flags register, the address register, the instruction pointer, the stack pointer, and a scratchpad. The most important thing about the 6502's zero page is the zero page addressing modes; these modes basically turn the zero page into 128 16-bit pointers. The stack pointer is only 8 bits because the high byte is always hard wired $01... and the stack pointer counts down from #$FF.
From a web applications developer:
"WHAT?!?"
This might be a bit lower level that you are used to.
@@DavePoo Definitely, but very interesting!
These days you can have 6502 emulated in your web app, why not. I feel it might only debloat it :D
@@DavePoo that is understatement of the day. Most people now have little concept of hardware behaviour
This is absolutely amazing.
The Reset (and BRK and IRQ) Vector addresses are where a Word containing the address to load the PC should be; not where the PC is set to execute from...
So - the program should be loaded elsewhere in Memory e.g. 0x1000 (0xA9, 0x42...)
0xFFFC = 0x00, 0xFFFD = 0x10 (little endian) - PC is loaed from the reset vector = 0x1000 to start execute!
Thanks, i never handled the reset vector properly.
@@DavePoo
Great video, BTW!
Think of the Vectors as the JMP instruction - read addr & set PC :-).
8:15 does that work nicely with memory management?
I remember back back BACK when I was young, we used to do structs with was very careful with where we placed nibbles and bit flags, so that we kept them in chucks on 8 when definding the struct, so if we did `bit, byte, bit` it would take up 3 bytes in memory, but if we did `bit, bit, byte` it would only use 2 bytes.
Does `Byte C : 1; Byte Z: 1; ` do the same niceity where it knows you are only using 1 bit of a whole byte, it can squeeze them all into the same byte?