That was a lot of info to absorb. Kudos to you in your undertaking. I cannot wait to see where all this takes you and the platform. It is awesome that you are thinking ahead to the future potential of the MiSTer project. Great ideas, awesome attitude to current constraints of the hardware, but your view of the projects potential is fantastic. Thank you!
To be fair, this is super complex stuff. I am not surprised that it only hits a small audience. It is the end-product that will catch more people and get it talked about. o.o
I have gone back to this video every once and a while, dreaming of a potential Saturn core on MiSTer, while hearing it wasn't likely/possible. Now that we actually have a Saturn core in development for MiSTer, it seems your estimations may be very close to what is possible on the platform. I look forward to seeing what you do with your talents, on either MiSTer or a more powerful FPGA gaming device in the future. SGI systems are a monumentous goal.. best of luck!
Yes, I did the exact same thing, going back and forth to the video and he keeps answering the questions that people leave him. He really knows what he's talking about. This is very interesting to me :)
I'm glad you found it useful! I don't recall what I said in the past about the Saturn, but as of right now, I would basically give it a 50/50 chance of fitting. It's a pretty complex system (when compared to the PS1), though not as complex as the N64. The N64 wouldn't fit without a hybrid emulation approach though (e.g. run the CPU on the ARM via emulation, and the RCP in the FPGA - that would be a feasible implementation, though perhaps overly complicated). I probably will never touch the DE10-Nano version of MiSTer, since I find the "newer" systems more interesting. After all, the FPGA being used in it is over 10 years old (it was released in 2012, but Altera probably started designing it in 2010 or possibly 2009). I am, however, working on something related.
@@RTLEngineering Do you think in a future with this high performance FPGA on a PC we could pick some consoles in too fast category and emulate some parts and give to fpga other part? Making a different way to emulate Modern plataforms.
@@carlospc223 Possibly. One of the major restrictions will be clock speed though (although, depending on the architecture, you could also run it slower but go wider). Realistically speaking right now, the furthest you can hope to push a high end FPGA is going to be around 600 MHz (you can push them further with super-pipelining, but that doesn't fit one of the systems mentioned). However, when you throw cost into the mix, you're looking at 300-400 MHz (600 MHz would be one of those very expensive binned FPGAs that cost 2x-3x more, in this context that means $4,000 USD easily). I am guessing though, you were asking about hybrid emulation. For example, if you put the xbox system (GPU, sound system, etc.) on the FPGA but implement the P6 CPU via virtualization on an x86 host. That would be a good use case for a FPGA in a PC. That might also open the door to a system like the Wii, but then you start to blur the line between the FPGA and software emulator side, and at that point you are better off just sticking with software emulation (a xbox would be different, because you can run the game/kernel code via virtualization on a x86 host and not via emulation). All of the newer systems though, had GPUs running in the GHz too, so while you could implement them on a FPGA, they will have to run at a fraction of their original speeds.
@@RTLEngineering Understood! Thanks. I thought some hybrid emulation would be better in some way but i see its no the case. But still looking foward in some high end fpga for newer hardwares(Ps2, N64)
I also hope to see some "development kits" which made out of mister and more powerful FPGA cards. So, open source games could be developed on those! Good work.
That wouldn't be feasible. The more powerful FPGA wouldn't be compatible with the Cyclone V, and would require that the two be combined in a final version (beyond a development kit). There's also no good way to connect another FPGA card to the DE-10 Nano, the best you could do is give up the SDRAM and the MiSTer hat and replace it with a parallel link to the other FPGA (then you would have to move that I/O to the new FPGA and every core would have to be rewritten to take advantage of that as well as the increased SDRAM latency). Also, if the Cyclone V were included, you're probably looking at the base card cost going up from $500 to $700 (the Cyclone V doesn't have competitive pricing on it). And a dev kit would likely be well over $1,400. Furthermore, the Cyclone V is going to be reaching end of life soon (within the next few years), so including it in any design wouldn't be wise - it could only be produced for a few years and then it's done forever. Tl;dr, including anything to do with the MiSTer here isn't economically viable, nor would it provide much of an engineering incentive as all of the MiSTer cores could be ported to other FPGA architectures (no longer requiring a Cyclone V).
Your work is fantastic. I would love to see you develop an exclusive FPGA based retro gaming hardware platform. Starting off with a 1080p native Nintendo 64 core would definitely get the ball rolling. Hopefully, it would be possible to make it standalone and not require a computer even if this option would be a bit more expensive.
Thanks! The platform that I proposed in this video could also work with a single board computer which has a M.2 slot (it would need an adapter cable, which does exist). So something like the Latte Panda would work, though it would still be a full length PCIe card, so it would be a large form-factor. I have also been considering the idea of using a Zynq or Zynq Ultrascale instead, which is effectively equivalent to the Cyclone V in the DE10-Nano, except it has more logic and runs faster (while being more expensive). That could potentially have the option to attach to a PCIe carrier card (and go into a PC), or be standalone with the ARM cores in the Zynq running a Linux OS (the whole reason for the PC is so that there is an OS running which can handle things like USB, Ethernet, SD Card, game ROM loading, etc.). Though using a Zynq like that could potentially mean compatibility issues, where some cores would require both ARM cores for hybrid emulation, and therefore would require a support computer regardless. Just a few things to think about.
@@RTLEngineering I wish you the best of luck with all of your projects! I feel like the more popular retro gaming gets, enthusiasts would be willing to pay 3x or more for a standalone arm coupled FPGA "all in one solution" that could support N64 and other later generation consoles and handheld cores.
It probably seems impossible to a hobbyist who does this sort of thing on the side, however, my main job is trying to figure out how to get the most performance out of FPGAs and to best utilize the architecture - this is no different. The only issue is that it requires a lot of trial and error / experimentation to achieve the desired result, which takes time. It also means not only attempting to optimize the HDL description, but also how to optimize the architecture (which primitive parts go where). It's for all of those reasons that this may seem impossible, even to more experienced developers, like Kevtris for example (getting something like an N64 to work, let alone a PS2, is far more work than he would be willing to put into such a project, and would require a fundamentally different development method and experience set, so he sees it as impossible).
The future of FPGA re-implementation for the market of retro gaming isn't all that clear, this was just a proposal for somewhere the physical hardware can go, since MiSTer cannot be the future (it's already starting to show its physical limits). There are other people working on similar platforms, however, I don't think that any will ever approach the cost of the DE-10 Nano used by the MiSTer (since Intel effectively subsidizes it). On the other hand, I think that we may start to see more PCIE FPGA cards in the consumer market in the next decade (which will drive prices down), to the point where anyone who has the equivalent of an RTX 2080 Ti (i.e. a high end consumer GPU) will also probably have an FPGA card along side it (Apple has already started to do this for the new Mac Pro). And then obviously with cheaper, larger, and more mainstream consumer FPGAs, more complex cores will be developed and become feasible for enthusiasts to use. Hopefully that helps add a little more insight into the future of FPGAs in this application space (or at least my thoughts on them).
@@RTLEngineering thank you for the insight, yes, I agree with you all can help, even just suggesting an idea can spark new way of thinking so, what you are doing is commendable...I 'm into game design, I consider ideas very high, I'm also aware that just ideas are half of the story and without the execution, nothing happens. here the link to my idea called WOR(L)DS: In the link below you can find PDFs with game scripts I wrote, check them out if have the resources to transform them into games: raulgubertart.wixsite.com/wor-l-ds ''3 NEW MONSTERS AND 3 NEW WEAPONS FOR QUAKE'' ''3 NEW WAYS FOR SUPER METROID'' ''NUKE THE QUAKES! BOMB THE DOOMS!! KARMA KARMA BOOM!!! ''CONFUSED AND BADLY WRITTEN IDEAS FOR VIDEOGAMES BUT UNDOUBTEDLY BRILLIANT'' ''THE FORTRESSES TO INFERNO''
@@RTLEngineering I think the main problem for the mister project its not the limitations of de10 but the lack of developers with the skills and passion to make cores there is alot of low hanging fruit out there from arcade to pc, consoles ect somebody can do. I been analyzing how the mister project is being run it reminds me alot of how Mame began with little team effort and alot of big egos .
I agree that there is still a lot of low hanging fruit out there, but you can only fit so much into an FPGA with 110K LEs. You also have an IO problem with the DE10, where it is severely lacking, especially in terms of attaching external memory. While there is the SDRAM add-on, it doesn't have sufficient bandwidth for any of the 90s era systems / computers. I can't comment on the internal dynamics of the MiSTer team, but I can imagine that there are some political issues there which hinder progress (it always happens in opensource projects). So if that's an issue, then the hardware lacking is just part of the limitation to growth. Either-way, the hardware limitations need to be faced at some point. Another issue is probably the complexity of cores which need to run relatively fast (50 MHz+), requiring optimization which takes a lot of time and effort. And then there is the effort to debug complex cores as well (I am still a bit surprised that someone wrote a 486 core). I guess that falls into the category of skilled and passionate developers (ao is clearly one of them though).
@@RTLEngineering I am only in the income bracket of buying a 1500 ish dollar computer no more often than every 5 years but I would still put a upto 750 dollar PCI-E card in to accelerate emulation. Cost less than all the mods/repairs I do to older consoles to keep them going. That being said I would just move up to repairing the next generation and so forth and so forth on until the day I die.
I'm curious now that we did get (almost all) of an N64 core on Mister if you think this platform is more interesting to pursue. In this video you mentioned you'd like to wait for that to come to reality before pursuing this further.
One day, I will have a powerful GPU sitting in my case, but having to share the airflow with an FPGA based emulation card. I personally hope that FPGA based emulation becomes more widely available as it can circumvent issues that current emulation has. Like the odd quirks in the N64 that give Mupen64 quite a run for it's money on lower-end platforms... :p. My ideal dream-FPGA-emulation-thing would be an FPGA card that can be driver-controlled by the host OS. This driver would do exactly what it is named to do: flash the FPGA with the core, send some input (like the rom), maybe write some settings to it, take output from the card and display it - or even let the driver link an existing GPU with the FPGA board in order to do graphics-intensive work to render the actual graphical layer. Sure, I am not an engeneer, and this is just dreaming. But how cool would it be to have a gaming PC that doesn't just run modern games, but also retro games at pretty much intended speeds? I would love that. I'd easily throw out some money to have that, and even if I only would get the naked board, I would see if I could invest in a proper and nice cooling solution with a nice 3D case. While I like the looks of traces, transistors and chips, I also like the look of a closed-up product with some proper casing. Really enjoyed the video, thank you very much for sharing this!
Luckily FPGAs are actually significantly more power efficient than GPUs. The one I proposed in this video is rated for something like 6W maximum, compared to a 120W GPU (as I recall, the RTX 2080 Super can pull 210W when modded). As for your ideal FPGA emulation thing, that's pretty much what I described. And if things like SLI were open-source, you could potentially put an FPGA card into an SLI array with a GPU (to share data / video). Obviously that would have a lot of engineering hurdles, but it's conceivable for the future. Though if it got to that stage, it would be more general purpose than for emulation. Silicon may be relatively cheap, and it may be advantageous to put special function hardware blocks all over the chips, but eventually it will become wasteful, especially when the utilization drops to some small number (i.e. if you use the special function block 0.0001% of the time). That's where having driver-controlled FPGA cards (like GPUs) would come in handy - imagine using one specialized in video encoding with something like Adobe Premiere (Apple already does that because they don't get along with NVidia). They also make good targets for neural network evaluation, and ray-tracing calculations (why waste space on RT cores if you could bank on the end user having a configurable logic card that could implement them as needed). I know that's not quite what you had in mind, but if that came to be, then those cards would be far cheaper, and they could then be leveraged for emulating old hardware too!
@@RTLEngineering If this was put into a computer like this would the emulator have to take up a full screen or could it pass through the Computer and be placed in a window that could be resized and popped in and out of the full screen? I am thinking that you would have to flip back and forth between the two outputs unless there is an internal or external solution
That would depend on how it's configured. To allow the images to be resized, you either need to combine the image as an overlay (which would be quite complicated to manage - likely requiring hooks into the OS window manager), or you would have to copy the framebuffer from the FPGA card into the computer's display controller (i.e. the graphics card). If you did the latter, then you would necessarily have to add multiple frames of lag, and you would stress the PCIe link - there's only so much bandwidth to go around. Flipping back and forth between two different outputs would likely be the better solution. The design shown in the video had a video in port which would allow you to overlay a full-screen output from the FPGA (no window resizing though), similar to the original 3dfx Voodoo1 and 2 cards. The other option there is to use a KVM type switch and swap between the monitor sources. Unfortunately, modern display systems and window management are not well suited to handle a secondary video input source.
This is super interesting. Thanks for posting. I get the impression you want to keep all your progress private until you have a working result. Could you shed some light into why? Also, just out of curiosity, what HDL are you developing this in?
Thanks! That's correct. I actually would prefer to keep all of the implementation details private (I would probably release the board files though for the PCIe card). If you were more commenting on the span since the last update, the reason is that I have put that project on hold in place of another one which is also private (for the same reasons as keeping the implementation private). So a few reasons to keep it private: 1) I don't want to open source it - as soon as you allow random people to start helping out, the entire project looses order and becomes a nightmare. Some people like that aspect, but I'm not one of them. 2) I don't want to provide the ideas for someone else to beat me to it. I can't work on these projects full time as I am a full time academic researcher (that takes priority). 3) Releasing implementation details publicly can be a problem with future job prospects, especially companies that receive classified government contracts. (i.e. they may be concerned that you would leak their project details). Hopefully that answers your question. Note that I don't have much of an issue regarding releasing the final bitstream publicly (and would probably open the implementation eventually if I became tired of supporting it). I might also consider releasing an encrypted IP, however, I do know that some of those have been decrypted. As for the HDL. I use both VHDL and Verilog. They both have their strengths and weaknesses, often covering each other. So I use the best tool for the task at hand. Often, the tricky logic parts are done in VHDL (as well as ROM LUTs), and the structural connections are done in Verilog. I don't use System Verilog though, or any other language - I prefer to have more control over what the synthesizer implements (I suspect that System Verilog for example would have problems tuning logic to change path delays, as it's less explicit, same thing with things like Scala based description methods).
@@RTLEngineering Thanks for the detailed response! Are you familiar with Henry Wong's work on creating a Superscalar Out-of-Order x86 soft processor? There might be some overlap with your work.
Yep, I have read his thesis. There are a few problems with using his work though. 1) It's incomplete to the best of my knowledge. Parts of it are done / work, but not enough to be a full x86 CPU. 2) It consumes a lot of resources... I'm not even sure it would fit in the largest Artix-7 FPGA, and 3) it would never run nearly as fast as would needed in an Artix-7. So to be used for an original Xbox, it would need to run at 700+ MHz, but the fastest it can run in an Artix-7 will probably be around 80 MHz. I suspect that even in one of the new Ultrascale+ FPGAs, it would only hit a few hundred MHz (certainly not more than 300 MHz). It's a cool project, but not all that useful here. Also, if you are interested in an x86 soft core, there is the ao486 (it won't run all that fast, but it's a i486 CPU). An alternative solution could be to use a RISC-V soft core as a base and rewrite the instruction fetch / decode stages to take in x86 instructions and translate them into RISC-V microcode. I honestly think that may be the best solution to get a "fast" x86 soft core on an FPGA, especially since there are small RISC-V cores that can run at over 150 MHz on the slowest Artix-7 (granted in that case, the x86 would probably be around 120-140 MHz with a very low IPC).
the n64 fell short on the mister platform now we have an idea what it takes for the ps1.saturn,N64 to run what are your thoughts on the new platforms like the mars and the replay2 are you still making your own fpga . its been almost 2 years would like for you to upload a new video what do you think now ?
Unfortunately the pricepoint of using something like that would be too high. People who purchased this platform would be looking at between $900 and $1200 USD for something as powerful as a the $300 Ultra96 board. Some people may buy it at that price, but the majority of those using FPGAs for retro emulation / reimplementation won't go for that price considering the DE10-Nano is $150 USD at its highest price.
Regardless, it wouldn't be viable. As a SoM, there are many other vendors that have lower pricepoint SoMs with the same FPGA, and in terms of making a custom board... forget about it (that's the sort of think that takes a team of 2-3 engineers about 1 year full time to produce, it's incredibly complicated to implement an MPSoC board, especially for an FPGA). And then to produce a board with one of those FPGAs... for what you would really want to use for a project like this, you're looking at about $500 USD per chip for 10K quantity (assuming you could secure and guarantee 10K orders). And that's ONLY the FPGA. You still need the DDR4 (around $40 for the chips), other miscellaneous chips to match the expected quality (another $60), the board and assembly (another $120 most likely), enclosure (anywhere from $20 to $200 per unit depending on volume), packaging, assembly, etc. By the time you're done with everything, the minimum price it could be sold for (at cost) would be around $900, and that's after about 5 year man-hours of design and production work (all of which would be free labor since it would be at cost... so paying the engineers and adding in some profit gives you probably $1200 USD per unit for a 10K order, and that without any HDMI capability which is an additional cost). Meanwhile, compare all of that to the $150 USD DE10-Nano you can buy, or the new Xilinx offerings for $200 USD or $250 USD with the same chip. It's no longer feasible to roll your own unless price isn't an objective (i.e. if you aren't selling to consumers). I just wanted to give you an idea of the complexity of using a chip like that, and how it translates to the cost and design / production time commitment. I had a similar idea to yours when I started working on this project, but I didn't fully understand the complexities and costs with designing such a system. I do appreciate you mentioning the Antimicro module though, it's always nice to know what various companies are producing.
Have you considered using a Xilinx ZYNQ Ultrascale+ device - that would get you a couple of fast Arm cores for emulating the target CPU, a GPU and a bunch of very tightly coupled FPGA gates, as well as a DDR-4 controller and other goodies...
I have considered the Zynq Ultrascale+, however, they are a bit too expensive for a consumer application. Using a Zynq 7000 would probably be a better option (which I have been considering instead of the Artix-7). The biggest problem there is that CPU emulation isn't really a good solution for many of these systems since it will always be slower. Furthermore, you're going to introduce a huge amount of latency by the ARM cores talking to the FPGA logic over AXI (it's a "high-latency" protocol). To actually implement a PS2 reliably, or something faster like a gamecube, you would probably have to move to Ultrascale+ chips regardless though (for the speed increase). The DDR4 wouldn't really be a selling point, even DDR2 would be sufficient for most of these systems, it's the bus width that's the problem, although a physical controller would be nice as it doesn't use logic gates.
@@RTLEngineering Yes, the Zynq Ultrasale+ are a bit more expensive, but are you sure you couldn't get decent CPU emulation with a pair of 1GHz+ 64-bit ARM cores? And AXI really isn't a high-latency bus - especially when it's running at 100's of MHz. I don't know how much logic you need on the FPGA side, but if you can squeeze into a XCZ2UCG, there seem to be some interesting little modules for $250 odd e.g. www.enclustra.com/en/products/system-on-chip-modules/mars-xu3/ . Anyway looks like you've got an interesting project on your hands, I look forward to seeing how you get on.
The biggest problem there is that cost is a limitation for such a project. If the final product costs $1000, then that's going to greatly limit how many people can buy and use it. It would be great to have an N64 core on a high end Ultrascale+, where the MALI GPU is used to upscale to HD. But that would be a lot more expensive, and it also begs the question of "is this really that much different than a Raspberry pi for 28x the price?" FPGA re-implementation already has an image problem when it comes to emulation, and that would just end up blurring the lines. As for the 1GHz+ ARM cores, it all comes down to how good the emulator is. So if you wanted to do an original Xbox implementation without using original chips, then yes, you would have to emulate the Pentium CPU on a faster chip like an ARM core - the same thing is true for the PPC CPU for the GameCube. But for 1GHz, you may have to write the entire Pentium emulator in ARM assembly, otherwise it will be too slow. If the Pentium executed 1 instruction per cycle at 700 MHz, and 1 instruction takes on average 5 ARM instructions, then the ARM chip would have to run at at-least 3.5 GHz. And to get it down to just 5 ARM instructions, it would almost certainly need to be done in assembly. Obviously the ARM CPU is going to be more efficient (OOE and branch prediction, as well as close to 1 IPC if not more), when the Pentium probably had an IPC of 1/2? So it might be doable, just not easy. That's the biggest problem with emulation that sort of hardware on a CPU. On the other hand, some of the Ultrascale+ silicon can run at 700 MHz+, so you could potentially put the entire Pentium chip in there, but that comes back to the final product cost. Anyway, just some things to think about.
I would love to see a card I could put in my computer that would let me leverage an FPGA. I wish you would have a way to transfer files back and forth. Such as have the Roms saved on the computer that the FPGA could read or a way to transfer backups of savefiles and carts with saves on them for backup purposes and use the computer's video outputs or run the program in a window along with use peripherals attached to the computer. I know probably just dreams but to have a seamless FPGA/Software Hybrid emulator all in one system would be great. All the benefits of both. Makes me think of the old 3DO blaster.
Transferring files would require an agent on the FPGA side, though that's doable with a FPGA+ARM SoC hybrid (like what the MiSTer uses). If it were only a FPGA on the card, there wouldn't really be anything to transfer files from/to, since the host CPU would be doing all of the work / management. One of the challenges that I haven't been able to solve is the video output as you mentioned. The problem being that a PCIe link would be too low / high latency to send video data from the FPGA to a GPU, and you have the opposite problem if the FPGA acts as the sink. The only solution I have come up with is to use a KVM switch / multi-input monitor, and switch sources where appropriate (alternatively, you could use the FPGA output as another monitor - though not for the desktop environment on the host). I think a FPGA+Software Hybrid emulator is likely the solution for many systems, especially if you want higher quality video output. Although, it does pose the question of: at what point is it mostly a software emulator / would a software emulator by itself make more sense.
@@RTLEngineering indeed display being picture in picture be ok if not full window and I could undo the inputs and swap or kvm them when needed I am keeping open eyes on all options and an open mind. Fpga is the hotness now amd many are really hating on software approaches to emulation. I just want all options explored and let the results speak for themselves.
Unfortunately the MiSTer is too small and too slow to run a Dreamcast. It might work if you put some of it on the ARM CPU (hybrid emulation), but the best you could hope to get is about 7FPS max for a game, which would be unplayable. That's not to say that a Dreamcast is impossible or impractical. The problem is that the FPGA used in the MiSTer is small and old (I believe it was designed in 2009). There are newer chips that are in a similar price range that can run circles around the Cyclone V on the DE-10 (one core that I was testing got a 4.5x speed improvement on the newer chip, which is the difference between 7 FPS and 31 FPS, it's also 4x the internal size).
That depends on the FPGA (Altera/Intel vs Xilinx/AMD). MiSTer cores also require a support interface from the ARM CPU, so that would have to be ported too, as well as pretty much all of their software from the ARM CPUs. It's possible, but it would be a lot of work. In terms of the pure core itself though, that would probably be easier to port, but it would require replacing the IO interfaces as well as any FPGA specific macros (i.e. for the Cyclone V - if moving to a Cyclone 10, then that would be easier). I was working on benchmarking a RISCV CPU, which was not meant for the MiSTer, but it ran at 80 MHz there and 345 MHz on an FPGA that I am planning to use in my current project. Note that 345 MHz is faster than the Dreamcast CPU and GPU, and while the RISCV core is optimized for an FPGA, it could be optimized further to probably add another 100 MHz (you would never get above the required 200 MHz threshold on the Cyclone V though).
@@RTLEngineering Would it even be too slow if one tried to put the whole Dreamcast CPU in the ARM section and just implemented the GPU and the rest on FPGA? BTW, would it be in theory possible for FPGA to do some JIT compilation for the ARM CPU (for instance for a different CPU implemented in the ARM part in software) to help speed up some non-native instructions? Maybe in this way one could also get a faster IBM PC core, or are hybrid approaches just too cumbersome?
No mention of the sega saturn? It's just grouped in with the other 5th gen consoles. Would it actually be possible on the current mister? It used 2 sh2 cpus, and 2 custom graphics chips I believe. It is notoriously difficult to emulate, would an fpga core even be possible?
The Sega Saturn is pretty much a given (that's why I didn't mention it). It may be technically challenging to emulate, but that's not due to the speed (i.e. the N64 runs far faster / has tighter timing requirements) - so someone will eventually do it. As for if it will fit in the MiSTer... I'm not sure. The 2x SH2s + the vector DSP for 3D math would be pushing that chip. It would probably fit into the largest Artix-7 as proposed in this video (it has roughly 2x the logic elements as the FPGA in the MiSTer has). My guess is that the order of "impossible to emulate" systems will be PS1, N64, and then the Saturn. And maybe down the line a PS2 or Dreamcast. The only one of those that has a chance of fitting in the MiSTer though is the PS1.
I haven't talked to anyone in the MiSTer community about this, and I suspect that they may be less than thrilled. For now, it's just an idea on where it could go once the limits of the Cyclone V in the DE-10 are researched. Eventually they will run out of cores that can be implemented.
Hi, first i want to say thank you for your contributions to the Mister project :) Second: Are you the guy behind ULTRAFP64? Cause that project is progresing very well :)
Thanks, though I haven't actually contributed to the MiSTer project. This was just a proposal of how it can expand. I'm unrelated to the ULTRAFP64, though I know the guy working on it. He has indeed made impressive progress.
I feel like once you get to 3d games, the issue isn't accuracy and latency like it is with earlier platforms, but the overall presentation - aliasing, resolution, slow-down, draw-distance etc. Could say a FPGA PS2 then work in conjuncture with a graphics card to handle post-processing and get an accurate, upscaled and prettied ps2 game?
It's both actually, though there are several software emulators for the PS2 that do a pretty good job. There are only a handful of games that are known to be unplayable with PCSX2, which most likely rely on hardware quirks that you don't get in emulation. On the other hand, the only feasible way to implement a PS2 core would be to use a black-box approach, which could fail to capture some of those hardware quirks as well. As for using a real GPU, that's a possibility. Presumably, once the draw commands are sent to the GS (the hardware component that drew the 2D graphics to the screen), the only other quirks would come from bus behavior and bottlenecks, which hopefully no games rely upon. So you could potentially treat the GS as a data / command sink, and abstract it out of the core and implement it on a modern GPU. After all, once you know the triangle positions in the screen, the actual resolution of the screen nor the method for rendering them no longer matters, as long as it executes the commands given to it. You would, however, run into issues with data synchronization. For example, if the GS was writing to a framebuffer to later be used as a texture, then you would have to somehow recognize that and pull it out of the modern GPU. Similarly, if a texture was updated, then you would need to send the update to the modern GPU. So the synchronization could be challenging. There are also SoC type FPGAs that have ARM Mali GPUs in them, which could be used to do the final rendering at a higher resolution, and would allow for unified memory. However, those are significantly more expensive. Tl;Dr. Using a modern GPU could work, but the logistics of doing so would be challenging.
You would probably have to convert the HDMI to composite to connect to a CRT. I wasn't planning on targeting the CRT market directly, since most people would use an HDMI compatible device.
Well, the Spartan-7 wouldn't be user programmable (it was to combine the two HDMI signals) - if you can't solve the problem with 1 FPGA, just throw more at it!
It would most likely look quite strange if it was based on a SBC like a Latte Panda. Effectively, the Latte Panda would sit under the FPGA (being about 1/4 the size of the PCIE card), with a short PCIE ribbon cable connecting the two. It wouldn't be the prettiest solution, but it would certainly have more power than the current MiSTer. I could probably do a quick render if you would like, but as I said, it's going to look quite strange.
The s478 seemed to be the most readily available, also I was unable to find any good docs from Intel on the s370. Honestly, I don't think it will matter that much, though it may require a custom kernel (in which case using an Atom may be a better choice - unfortunately doing so would mean that the project couldn't be open-sourced since the Atom is under an NDA).
A coppermine CPU similar to the one used in the Xbox is socket 370, not 478. Socket 478 supports Pentium 4 and later. Even if you source a real Pentium 3 CPU, the clock speed and cache would need to be very nearly identical to the original xbox to insure proper compatibility. Xbox enthusiasts have been soldering in faster Pentium 3 models into their consoles for a long time. It requires a clock switch to to get close to real hardware speed, but even then, depending on the model, you often don't get identical performance. Even a very small difference in clock speed of a few megahertz can cause glitches or anomalies on real hardware. If you are only interested in providing a real x86 onboard to process Xbox native instructions, than a 478 socket is not ideal either. Any atom or similar x86 chip could serve as a stand in for similar cost and much lower power consumption. Either way, you will have to deal with the speed issue, which may not be possible with an FPGA approach where things are usually clock synced. It is also unclear to me how a wii has a clock speed that is too high for an FPGA, but an Xbox does not. The Xbox CPU runs at a 733mhz clock speed and the NV2A GPU runs at 233mhz. The bus speed on the mainboard is either 100mhz or 133mhz, with SDRAM refreshing at that cycle, as well PCI link of 66mhz in addition to other devices like the onboard audio, storage controller, ect. Any component that an FPGA tries to implement is likely going to be running or syncing at a frequency of 66mhz or greater.
Someone else had commented about the socket 478 mistake. The reason for choosing the 478 was that Pentium 4s are easier to find than Pentium 3s. That is a valid concern, but I think that would be a drawback that would have to be accepted when running on non-original hardware. The only other solution to the problem once all of the original xboxs have failed, would be software emulation, which will most likely have similar issues (unless software patches are applied). Unfortunately, the extent to which this is an issue would not be known until it is attempted. I did look into using an Atom, however, they are no longer available in a usable form (the Z510 probably would have worked). The issue is that unless you want to rewrite the entire kernel and bios, you need to have a system that presents an identical memory map to the original xbox. This is something that only the FSB based Intel Atoms (now discontinued) provide. So that led me to the idea of using a Pentium instead of an Atom. I'm not sure which speed issue you are specifically referring to? The FSB operates at between 66 MHz and 150 MHz, depending on the CPU, that entire range is possible with the Artix-7. As for the clock syncing, that's true, but FPGAs have multiple clock networks, so as long as the FSB interface is synced to the FSB, and the internal implementation is synchronized to another clock, they can be interfaced using one of many different techniques, which also work for variable frequency clock domains. I thought I mentioned it in the video, but perhaps it wasn't clear. The whole point to the on board x86 was to offload the part of the xbox that was too fast for the FPGA. The same thing is true for the GameCube. In both cases, only the chipset which operates at a maximum of 250 MHz is implemented in the FPGA, which is more than doable. I haven't really looked into the Wii, so it's possible that it could be implemented the same way, if the GPU and chipset is less than 250 MHz, otherwise, it would be too fast for a "low cost" FPGA. In terms of synchronization, you seem to be under the assumption that the entire FPGA uses the same clock. While an FPGA usually uses the same input clock, it contains many PLLs which each can have up to 6 independent outputs which do not need to be integer multiples. These outputs are then connected to internal clock networks (domains), and can interface with each other regardless of a frequency mismatch. Typically, this interfacing is done via a dual-port block RAM, in the form of a FIFO. There are also methods such a double-buffering, and clock gating which can be used. For the example of an xbox, you would probably have different clock domains driven by a different PLL output for: the FSB interface, the DRAM controller, the PCI controller, the HDMI output IP, the NV2A (might require several clocks), the DSPs (might require several clocks), an IO emulator (to drive the PCI controller), the BIOS loader (from external SRAM), and the interconnecting switch / bus fabric between them. None of which are required to be integer multiples of the others. Hopefully that makes sense.
@@RTLEngineering Thanks for the response, I am excited for your project. My intention was just to give you information you might not already have, not to discourage you or tell you what is or isn't possible.
Thanks, I do appreciate that. There are obviously points that I had not thought of / addressed, especially since I am not actively working on an FPGA xbox implementation. It also occurred to me that the issue with soldering faster Pentium 3s may have been the chipset not playing nice with them (since the chipset was most likely designed for that specific CPU). After-all, that would exhibit timing an glitch-like behavior. Though you probably have a better idea as to whether or not that was the case. My thought was that it's possible, however, I do acknowledge the fact that I could be mistaken. If that's the case, the socket 478 could be adapted to fit any other CPU that has a similar communication spec, include another FPGA on a daughter board, or simply used as IO. Anyway, further development on this FPGA care is about 3 years off, and an xbox implementation probably closer to 10 years, so there may be a much better solution by then.
That's correct, it's possible that I implied something else though. The Cyclone Vs do have a DDR3 PHYS (with the SoC variants having at least two). Additionally, the Cyclone 10 LP is slower and smaller than the Cyclone Vs, so it wouldn't be useful in this application. That's one of the main reasons that I started looking at Xilinx chips instead (the main competitor to Intel / Altera).
Very fascinating video about future retro FPGA development options. I'm a retro gaming enthusiast, not a developer, so a lot of the details are above my pay grade, but I'm still able to follow the basics. I have all the parts to assemble a full MISTer arriving this week, but I've been following the retro emulation scene closely for 20 years and it's fun to get a glimpse of the future from a developer such as yourself. You mentioned that an N64 core would not be able to fit onto the DE10-nano while outputting at 1080p. Forgetting 1080p for a moment, do you think it would be feasible to build an N64 core on MISTer that output at the original native system resolution? I read in another comment here that you're targeting HDMI on your new proposed platform as "most people" would be using an HDMI device. However, it would seem that anybody who is buying DIY FPGA platforms for retro gaming purposes inherently sit outside the categorization of "most people", as most people either are perfectly happy buying the new "official" mini-consoles or running software emulators on their cel phones and don't care one jot about the likelihood of adding 4-to-11 variable frames of latency to their classic gaming experience. For anybody considering paying upwards of $400 on this proposed FPGA platform for PS1/N64/XBox cores, analog output options seems like the most obvious make-or-break feature. But getting back to the MISTer for a moment... if an N64 core could be made to fit on the DE10-nano, but could only output at the native resolution, this would still be sufficient for anybody using an analog display and ,of course, pairing the MISTer up with a device like the OSSC could create a satisfying result for somebody using a digital display, could it not? I realise, of course, that running a 3D system like the N64 and rendering polygons at native resolution may not be as sexy as rendering them natively at 1080p or whatever - an issue which does not affect 2D pixel games on older systems - but at least it would be accurate? Or, do you think that an N64 core on MISTer with full logic implementation is just impossible, period? I'm curious, is all.
Thanks, I have been thinking about what comes next for a system like MiSTer for quite some time, and that's what I came up with. Though, this no longer actually seems like the best option... building an add-on FPGA daughter board for the DE10 Nano is probably more beneficial in the near future. That's correct, and N64 core which is not overclocked, will probably require between 120-200K logic elements, the DE10 has 115K. That means that stripping away the on-screen-display (OSD), and upscaling hardware will not be sufficient. Basically, the only way to get an N64 to fit, would be to strip out the texture unit (so no textures), as well as the floating-point unit in the CPU. And at that point, it will probably just barely fit (with the OSD, but no upscalers). With that said, however, the RCP itself should fit in the DE10 with an upscaler and the OSD (just no CPU). So, in theory, if a daughter-board was created for just the CPU on the N64, which communicated with the DE10 via the MIPS SysAD bus, you could run a full native N64 on the MiSTer. That's a good point about "most people". However, it's highly doubtful than an "official" N64 mini will be released, or if it is, it will be using a virtual console like emulator (so it won't play general N64 games). Analog can always be achieved via a converter box though (I know it's not the same, but if the goal is 1080p native output, that's not something that an older CRT can handle). But that's another reason to go with an expansion daughter-board instead of a new system altogether - leverage MiSTer's current analog support. Note that I don't think that an RCP on the DE10 could output 1080p native resolution - the Cyclone V is too slow for that, but it can do 480p. So doing 240p or 480p (basically implementing what the N64 could do natively) would be far easier than implementing 1080p. My concern is that people are spoiled from running software emulators, and having more modern GPUs redraw the triangles in 1080p. In order to achieve that, you basically need to pixel multiply. The N64 actually only drew one pixel per cycle, and in some modes, only drew one pixel every two cycles (there was a fast fill mode where it could draw 2 / 1 cycle). So if you are drawing a 240p screen, and multiply that by 4 (2x wide and 2x high), you get a 480p screen. That means drawing 4 pixels per cycle. Then you could multiply that by 4 again, to get 960p, and draw 16 pixels per cycle (that's the highest you could reasonably go). But drawing 16 pixels is a tall order, since you basically need 16 copies of the RDP in the N64 to draw 16 at a time, and that's going to be a massive resource usage. There is a saving cheat, where you can actually get away with overclocking it to 120 MHz and only have 8 copies of the RDP (since your cycle becomes 1/2 of what it was on the N64). Obviously, though, if the N64 wouldn't fit natively, a version with 8 RDPs wouldn't fit either. So if we only have 1 RDP, that would be the best bet to getting it to fit (with an expansion daughter board). Hopefully that wasn't too much detail. It's possible that an Xbox core may fit in the MiSTer too (if the DE10 only has the GPU / Audio DSP / system controller). However, the bandwidth of the DE10 is too low for that to be practical (and there is no way to expand that). But it might fit. In comparison, there is no hope at getting a PS2 to fit, even with a daughter board (a PS1 should fit though, and so would a Saturn - hopefully). As an aside, it may be possible to also fork the development of the MiSTer into the DE10 Nano and the DE10-Standard (a bit more expensive), where the DE10-Standard has a high-speed expansion header. Though that would probably result in $500-$600 for a full "MiSTer" like setup, which is probably not practical. Just some ideas - there are many possible directions for the project to talk, so hopefully that gives you some hope.
@@RTLEngineering Thanks for the very detailed and considered reply. I certainly wasn't expecting an FPGA N64 any time soon, but this gives me a good idea about the challenges involved. Either way, I got my MiSTer set up late last night and it's working like a dream - FPGA tech is an amazing addition to the retro gaming hobby and preservation efforts! Cheers ;)
The problem is even if you have all the logic gates available the difficulty to implement those consoles post N64 increases ten fold. I can't imagine a single person implementing say Dreamcast in FPGA within a reasonable time frame.
Well, you have two problem there. 1) can you fit such a design into the FPGA, and 2) can someone realistically implement such a design in a reasonable timeframe. If you look at them, the more important problem is (1), because even if you could implement the design in a reasonable timeframe and it doesn't fit, you're sort of suck... Even an N64 is 10x - 100x more complicated than a Genesis / SNES. I would say that an N64 is about 10x more complicated than a PS1. And yes, the Dreamcast is 10x more complicated than the N64, and the PS2 is 100x more complicated than the N64, etc. So then the next question is, what do you define as "reasonable". I would argue that who ever gets it done first defines the definition of reasonable for the system. So if it takes me 10 years to make a working N64 core, and no one else released one further, than my 10 year long project is the definition of reasonable for the N64, especially since the timespan of the project would not deter me. Though, I probably won't release the first N64 core - mine will probably just be the first one that doesn't rely on other IPs (for example, there is one in the works that is using a modified version of a MIPS32 core). Obviously, more people working on a single project will typically allow it to complete sooner. The problem there though, is that N64+ systems require high levels of optimization to run on "cheap" FPGAs ($200 for a chip might not seem cheap, but it's a lot less than the high end ones which cost $10,000 per chip). And the high levels of optimization require time and skill, so the pool of people who would be able to achieve that result efficiently is quite small - it's likely that many of the more prominent retro core FPGA engineers are not in that pool. Though I wouldn't use that as an argument to discourage someone from working on one of these cores, it may turn out that optimization doesn't really matter - I'm not sure.
@@RTLEngineering Thats what I mean, can we realistically expect someone to dedicate a decade to write a core? Surely one would have to be really motivated to sacrifice that amount of time in their life to implement something. I'm just trying to have realistic expectations, that is all. PS1 seems like the best we can expect in a reasonable time frame (2 - 3 yrs) and thankfully someone is already on that. He's only put in 7months of work, but so far it looks like he's taking this seriously if you read the github.
I think you are thinking about this in the wrong way. We can't realistically expect someone to dedicate a decade to write a core, however, if someone is motivated enough to write such a core, they will gladly spend a decade on it. Basically, the difference is you are thinking about it in terms of the community making a request, where as I think it will be done by someone who chooses to do so regardless. At the rate I am going, it will probably take me two decades to do, but that won't stop me from doing it, regardless of what the community expects / requests. If no one else beats me to it, then that's when the first core will be completed. Keep in mind, that I have been working on an N64 implementation on and off for the past 4 years. There are several people working on a PS1 core actually. So I'm not sure to whom you are specifically referring to. I believe that both developers that I am aware of, are basing their main CPU off of the aoR3000 though, which could potentially lead to problems - we shall see.
The problem with analog video output, is that it's not as simple as you may think. If you think about output resolutions, you basically have HD (720/1080p+) and LD/SD (analog 240/360/480p). If you build a system to output in LD/SD, then it can be upscaled to HD easily (though upscaling doesn't add any new information, so an N64 game at 1080p would have giant pixel blocks). If you build a system to output in HD, however, it can't be downscaled without loosing information (i.e. some pixels won't be displayed). Additionally, HD introduces additional signal integrity issues, where for example, a 1080p pixel bus must be point-to-point, so you can't send the same wires to an analog output driver - you would need 2x the pixel buses. MiSTer is able to output analog because it targets LD/SD output, however, my idea for this platform was to instead target HD output. So then the options that I could currently see are: 1) Don't target HD output, but then PS1/N64/PS2 games will look terrible, and that sort of defeats the purpose of this new platform 2) Add an extra output for analog which could decimate the video output (i.e. draw every other or every four pixels). Due to the signal integrity issues though, that would mean having another output from the video combiner FPGA, which doesn't have any free IO, so then it needs to be a larger FPGA which costs more - so the price might go up from $400 USD per platform to $450 USD per platform, regardless if you want analog or not. Is that fair? (I don't know, maybe) 3) Use an external HDMI to Analog converter (probably not the option that you would want) 4) Stick with the MiSTer if you want analog output (but that means little or no 3D system options for the MiSTer) It's an annoying and frustrating aspect of engineering in general - often there is no practical solution that satisfies all of the design requirements, and compromises on features have to be made. Hopefully this helps you understand the complexity of what you are asking for / why it wouldn't be trivial to implement.
This is still being worked on, but has changed form. The only viable option from a value proposition is to use a Zynq Ultrascale+ SoM (any other solution would have a significantly lower performance/price ratio). So that's the current plan. Since it's using a SoM, the main RAM will be soldered to the SoM and thus can't be upgradable, though since that RAM is higher latency, there is a plan to have a secondary set attached directly to the FPGA logic. Also, there won't be a 486 socket, since there isn't enough IO, but there's another solution to that problem. Finally, it will most likely not be a PCIe card because the MPSoC part of the Zynq is more than capable of performing the required host functions - i.e. it's going to be an mini-ITX or micro-ATX form factor. In terms of the PCIe spec, you can use the main system RAM, but the latency is terrible (up to 10us for a read) and the bandwidth quite low (as low as 1 GB/s), so it's not practical. It may be more practical in an environment with something like PCIe 5.0/CXL, but only the new Versal FPGAs support that.
@@RTLEngineering Darn I really was hoping for it to be a PCIe Card. Having it connect to my PC would mean I could experiment connecting it to other PCIe devices, like graphics cards and other older cards.
So I was considering a PCIe card still with the new FPGA, but there is one major problem.... drivers. It would be almost impossible to produce a consistent set of drivers across many different system configurations (look at how much NVidia and AMD struggle across windows alone). If it was a PCIe card, it would probably only work on Linux, and would be compiled from source by the end user supplied AS IS. So any bugs or compilation errors would probably be up to the end-user or community to figure out... I don't think it would be viable. (I originally thought it was doable, but looking into it more just shows how complex it is). Although... the new design I am working with (which uses a SoM) has the ability for the FPGA to act as a PCIe host. I had looked into the possibility of being able to plug a GPU directly into the FPGA (so you could use a GTX1650 for example with the ARM CPU), but that turns out to have problems at the distribution level (i.e. NVidia applies a user license to the driver which means that the end-user must be the one to install it and debug any issues as a result). I guess the TL;DR there is that it's very complicated to do PCIe things in a non-standardized way. M.2 SSD PCIe is easy, M.2 Wifi is easy, PCIe devices not easy (that's not a standard, only PCIe is), and GPUs are definitely not easy (they're not a standard either).
So how long before you become a monopoly on great Retro Hardware emulation? Jokes aside, you really explained the concepts very well. Can't wait to put an order ;)
Thanks! I doubt that I will become a monopoly. And besides, I don't plan to to what Analogue did (the platform presented here would be open source, just expensive for someone to make themselves), though I might sell the pre-compiled cores. I can't wait until I have something that can be ordered / produced and actually works!
@Fun. No Commitment. I don't think that Analogue would have any interest in this platform, since it would be reconfigurable unlike their current products. And in terms of working with them for an N64 core, I don't think that would be in either of our best interest (it would be too expensive for them to sell due to the FPGA cost, and I don't particularly like the idea of selling a new product for each core). That's the main reason that I was trying to appeal to the MiSTer community, one up-front cost to an open platform where you easily reconfigure it to play games from any classic console.
@soloM81 I have only spoken to one member of their team (before releasing the video), whom I was able to convince. So it's likely that if this is produced, at the price-point that I mentioned, and as easy to convert existing cores as I think, then there will be support from the MiSTer community.
A fitting FPGA already exists, I think it was mentioned in the video (the Artix-7 200T). The Artix-7 35T would not work, since it lacks sufficient resources (i.e. you may be able to fit a C64 core, possibly an SNES core in it, but nothing more). Comparably speaking, the 35T is roughly 1/3 the size of the MiSTer. However, I have actually been thinking more about the idea of using a Zynq Ultrascale instead of an Artix-7, since it could allow for "hybrid emulation", and could presumably be far larger than an Artix-7 200T (at more cost of course).
Which core are you referring to? The idea is to allow the MiSTer cores to be ported over to this new system, so those wouldn't be re-written. As for an N64 / PS2 core, the only way to get the required performance is to manually write and optimize them in HDL (I have other videos on the channel discussing optimizations for both cores - since they use a similar CPU architecture). I use a combination of VHDL and Verilog, mostly VHDL though, since it's not prone to the same sort of typo errors that Verilog is (Verilog is nice for connecting blocks though). A Risc-V core would have no place in this design, since none of the existing ROMs are compatible (i.e. you can't make MIPS machine code run on a Risc-V - you could by turning it into a MIPS, at which point it's no longer a Risc-V). However, that's not to say that you couldn't use it for something new, which does use a Risc-V. A Zynq Ultrascale, however, has hard ARM processors built in (at least the one I was thinking of), but you could always use the ARM to act as a system interface between a host and a soft-core Risc-V.
@@RTLEngineering Yes indeed referring to the vr4300 from your other videos. I mentioned RISC-V because that is also a risc chip and it seems people make nice implementations of it with Chisel, so maybe it works for this. But I guess you can't optimize to the level as VHDL.
Correct, using a tool to generate the core will not produce optimal results. Honestly, I had never heard of Chisel before you mentioned it. It looks like it's useful for general FPGA implementations, but not for this specific task. Furthermore, RISC-V being a RISC chip doesn't mean much when you are talking about a VR4300. The context of my channel has be surrounding implementations for "Archival Preservation", which means running existing software. You can't run VR4300 code on a RISC-V chip anymore than you can drive a cruise ship on a highway. They may be similar, but when it comes down to it, similar doesn't count for anything, except blowing out transistors (it has to be a perfect match - you can't run N64 code on a PS1 either, even though they both use a MIPS processor). If you wanted to create something new, however, using Chisel and RISC-V on an Artix-7 FPGA would be a feasible option.
I don't think that's a viable solution. First of all, it uses M.2 which means you would somehow have to pass a framebuffer via PCIe (huge bandwidth bottleneck + latency). Secondly, the FPGA doesn't appear to be specified, so there is no way to know what the FPGA is capable of. Thirdly (most importantly), it appears that the platform is ONLY for crypto mining, i.e. they abstract away the FPGA so that you can only load bitstreams that they provide via their API. And fourthly, it's not cheap considering what it is... It would depend on the FPGA in it, but if it's the FPGA I suspect, you are paying 2x what the hardware is actually worth (the cost is in their software and API which wouldn't be useful for this type of application). Thanks for the suggestion though. It's always interesting to see how other markets are using FPGA accelerators.
@@RTLEngineering Bandwidth is 4Gbit/s, latency should be fairly low considering all GPUs are PCIe too. FPGA is Artix7-200T. JTAG-Headers are exposed and I can load and flash bitstreams with a generic FT2232H-board with OpenOCD. You can get them very cheap second hand for $70 + shipping. ($40 + shipping for the lower cost and -noise model with an Artiz7-100T). But aggreed, the integrated solution will perform better, you'll get what you pay for.
4Gbit/s isn't sufficient in this case, since a 1080p60 stream takes 3Gbit/s of bandwidth. Also, PCIe latency is quite high, especially when talking directly to an FPGA. GPUs are able to tolerate it better since they can handle the incoming packets more efficiently than the FPGA can. Also, having to rely on a JTAG header and OpenOCD is not a viable solution if you want to make it accessible to the general public (the whole point of the proposed system). Same thing for buying second hand mining solutions. Alternatively, there is nothing stopping an individual from taking a MiSTer core for example, and porting it over to the 100T/200T for this miner module and running the core there. The drivers and infrastructure would still have to be developed from scratch though, so the effort is likely not worth it.
a ps2 or xbox sound kinda overkill. If it can be done id be surprised. Id be surprised on a n64 core as even emulation has trouble with that. PS1 has very good emulation, It just has troubles with a few things that aren't right or mess up, but most games are playable and surprisingly runs on lower end devices fine like a raspberry pi.. I wonder if you can attach 2 d10 nano's together thou the expansion slot or via hdmi if there is enough bandwidth.
Anything can be done with enough effort. I'm not sure if it's practical though. You would need for at the very minimum, emulators on a PC to be able to play those system games flawlessly, which as far as I am aware, is not currently possible (they're pretty good, but not flawless). You could always make the argument that the new xbox is able to play original xbox games, however, that platform will become obsolete at some point in the future as well, so the target hardware must be something that is continually being produced and can therefore always be obtained (the benefit of FPGAs - even when the DE10 is no longer made, the cores can be ported to a new FPGA). An N64 isn't trivial to implement in an FPGA, but it's no where near the limits of the DE10 for example - the problem is mostly space (lots of logic gates). There are currently at least 3 people working on an N64 FPGA implementation, and one has gotten a simple test program to display via VGA (so it's going to happen). You could theoretically connect two DE10 Nanos together, but that would be a dumb idea. The DE10-Nano has extra dead-weight on it (the ARM cores), which you wouldn't need in an expansion. I have thought about developing an expansion FPGA board for the DE10-Nano, which uses either a Cyclone VE or an Artix-7, both of which could be programmed from the ARM CPUs on the DE10 Nano. Unfortunately, connecting the two would mean 1) no SDRAM (it would need those pins). The SDRAM could be attached to the secondary FPGA though, but that's another hop. And 2) limited bandwdith as you guessed. The expansion connector on the DE10 Nano is a 2.54mm header, which will not allow for high-speed signals to pass through it, so any communication link will be limited to below 100 MHz. That would probably be fine for most of the current cores, and may even be okay for the PS1, but it would probably be a bit slow for an N64 (unless it were partitioned correctly), and would be far too slow for a PS2. There would also be the issue of link latency - it may introduce more issues than it solves.
I'm no tech, but it's a nice presentation... :-) Regarding the XBOX emulation, i'm surprised. The actual state fo emulation on PC is... "weak" to be polite. I was prepared to give up XBOX "classic" emulation, even on PC. I read - long time ago - that we didn't know -precisely- some elements of the xbox ( the GPU chipset ?? it's a special version of an nvidia GPU)... and it was a problem to recreate a good emulation.... maybe there is light at the end of the tunnel, as we say ??
One of the XQEMU claims that it has made great progress in the past few years (I have not really looked into it). It's true that we don't know all of the elements of the xbox chipset, but we have some idea of how they were composed. It may require leaked documents, or some old fashion reverse engineering to figure it out though. In terms of developing a chipset, the actual architecture of the NV2A GPU is pretty straightforward from an architecture level, the issue comes down to how precisely to control it. It may, not be possible to implement a PC emulated version of the NV2A though, since it's basically an early version of a modern GPU (multiple shader cores, running at a few hundred MHz). It should also be possible to hookup a high-speed logic analyzer to a development XBox, and sniff the front-side-bus while the CPU talks to the chipset running known operations. That could provide the missing information on where / how to communicate with the chipset components (perhaps that has already been done though). An FPGA implementation may be an alternative option there (running the CPU emulation on a PC, and using the FPGA to just do the chipset stuff). I may have given the impression that is had been figured out though, when my main intention was to suggest that it is feasible from an engineering standpoint (i.e. the details still have to be worked out). Anyway, I wouldn't give up on being able to run classic XBox games on a newer general purpose system. And at the very least, I recall hearing the CEO of AMD (Lisa Su) mentioning that the newest XBox was built with backwards compatibility for the original XBox games (though that just kicks the can down the road).
Several University degrees (for the theory), and many projects (for the practical). I am still learning though, since the electronics design space is huge - I usually learn something new every day.
Ohh wow okay, what do you mean with several university degrees? Multiple masters? Doctors? And how are you learning everyday? With papers? I really enjoy watching your videos and want to get involved in the field as well
Multiple MSc (done concurrently), and I'm currently in a Ph.D. program (i.e. doing research in a related field). Then there are papers (which don't actually teach you that much, but often have little nuggets of useful ideas), textbooks (which I have found the most helpful), blog posts / articles, courses (there are a lot of good architecture courses on youtube for free), conference proceedings (Hotchips has a lot of great talks - SGI presented a talk on the N64 RCP back in 1997), open source projects (dissecting code / examples on github), etc. A more concrete example is that when I was doing my B.Sc, Intel published several papers and the code for their Smoke game engine architecture. I learned quite a bit from those papers and their code, including a C++ style that I still use, design patterns, low-level multi-threading, etc. And then I started to implement my own game engine with what I learned (that was before even UE4 was released). Learning the basic theory though from courses and textbooks did really help me when trying to understand their architecture, and then reading other textbooks on design patterns helped even more. As for learning things every day, it could be as simple as coming across some small bit of information that puts large amounts of information into perspective. For example, learning that the Xilinx FPGA tool will automatically re-align timing regions if you manually instantiate a BUFG primitive. And then timing greatly improves on many of the other projects I was working on as a result. (the tool doesn't always infer that a BUFG is needed) Thanks! I wish I had more time to upload videos. It would depend on what part of the field you want to get involved in, and what your background is. For example, if you want to get into PCB design, but don't have an EE background, it's going to be much harder. Architecture and FPGAs are much easier to get into without the EE background, because digital circuits are much simpler than analog (designing an FPGA PCB uses a lot of analog knowledge for power supplies, loads, signal integrity, etc., whereas using an FPGA dev board takes care of all of that for you). In other words, the closer you are to software, the easier it is to get into. That's not to say that it's impossible, but you would be better off taking at least an intro EE course (there are a lot of free ones on MIT OCW though - you don't need to do it for a degree, but you have to take it seriously).
Oh wow, very impressive. I’m at my hard limit with only one MSc at the moment. May I ask which degrees did you complete? I am currently studying Computer Engineering. Is that too little for that field? What was your approach to plan your future? Is experience working at companies necessary when I want to get into research?
Not really, most of those had a lot of overlap (i.e. take 2 extra courses and get another degree). I wouldn't recommend doing multiple degrees at once without having significant overlap (that's just too much work). Often however, many degrees will have "electives", where you can substitute required courses for each degree in those slots. Note that you pretty much need to get a PhD though to do academic research (you can do R&D research in the private sector without one). And if you do that, then you want to make sure you find a University with faculty members that are doing research in a field you are interested in. A few levels of Physics degrees, Applied Mathematics (numerical stuff - i.e. what's needed for HPC), and a few engineering degrees (one of which is Computer Engineering). So yes, Computer Engineering is very helpful here - digital logic falls squarely within that sort of program. For a future approach, I didn't really have one - I sort of went with the flow. I knew that I wanted to teach at a University and do research, so I followed the simplest path to get to that goal. And in terms of working for companies, that's not required to do research, though many people in academia recommend getting some experience there (many companies have paid summer R&D internships). I can't really give you much advice there though, since I came from a science background (physics) and not engineering, so all of my non-university research experience is with government funded laboratories.
I have considered it, but I don't think it would make sense until I have around 1k subscribers. Though I am not sure that I would have enough time per month to spend on this project, so I'm not sure if that would be a good idea / option. Thanks for asking though!
It wasn't relevant for 5th and 6th gen, so I figured it was better to leave it out than add an extra line to the table making everything smaller and harder to read. There is a MiSTer Coleco core, so that's already covered.
Nope, I meant to say Pentium 4 for the entire video, even though the xbox used a Pentium 3 (they are close enough in architecture that there shouldn't be any issues).
@@RTLEngineering Xbox games use `rdtsc` (assuming 733.333..MHz) to synchronize the CPU and GPU. So you need the proper timings or you have to patch games. I'm not sure what changed in Pentium 4, but I'd assume it could have bad consequences. Also, if you struggle with the 733MHz Pentium 3, you might also struggle with the 2x DSP56k and the NV2A GPU. I also assume that software solutions (XQEMU in particular) will be fast enough in the future (similar to Dolphin). A major roadblock in usability of Xbox emulation is that you need the MCPX ROM / crypto-keys - but a FPGA implementation wouldn't change this.
@Jannik Vogel I am pretty sure that you can under-clock a P4 to operate at that frequency, though I would have to assume that there is some other form of synchronization, since the two clocks can become out of sync by a cycle very easily. I haven't probed an xbox to check this, or tried running a P4 at 733 MHz, but there should be a solution other than patching the games. As for the struggle with 733 MHz, it's not possible to run logic in a low-cost FPGA that high, the maximum global clock that I have seen is on the Cyclone 10 @ around 600 MHz. However, didn't the DSPs and NV2A operate at around 230 MHz? If that's the case, then that is completely doable in the Artix-7. Eventually CXBX will be fast enough, since the xbox was and x86 machine, I doubt that XQEMU will be fast enough, since it runs a complete software model for the CPU. That doesn't negate the benefit of a hardware re-implementation, otherwise that same argument could be made for any of the retro consoles, making this platform moot (i.e. not just an issue with the xbox). It was my understanding that the major roadblock was getting the hardware components to work together correct at a reasonable speed ( the problem with XQEMU), if the issue was the MCPX ROM / crypto-keys, I would think that none of the working / partly working games in emulators would work at all. Besides, there are ways around that problem, especially if you are able to utilize a DMCA loophole / exception.
@@RTLEngineering Yes - DSP and NV2A run around those clock speeds (I believe 160MHz DSP and 230MHz GPU). I was more concerned with the number of LUTs and internal pipeline states which might be hard to recreate (I could imagine requiring a higher clock rate for naive implementations). For the record, I'm a XQEMU developer; there are experimental XQEMU branches which reach full-speed (or very close to it) in many games. The main bottleneck for performance is GPU emulation (which those branches address); for users a critical issue is lack of audio emulation (which also exists in experimental branches). XQEMU does *not* only have a software CPU. XQEMU has supported CPU virtualization (KVM) for more than 5 years, and has lately also been getting better support through HVF, WHPX and HAXM on Windows, macOS and Linux. There's also hardfloat in TCG (CPU interpreter) now, so even the software interpreter on a strong single-core machine will work fine for some games. Cxbx is a radically different emulation approach which has many drawbacks (such as not being able to support all Xbox games as it depends on code-pattern detection, which won't work for link-time-optimized titles). So if you are going to compare a hardware re-implementation to a software-emulator, your only references should be MAME and XQEMU. The MCPX / crypto-keys (required for boot-code decryption) are dumpable and known - so they don't block progress. But they are obviously illegal to distribute and you need to modify hardware to dump them (which end-users can't do); there's also no freely available tools to dump them. So it's an end-user *usability* issue that prevents growth of XQEMU [LLE] compared to something like Cxbx [UHLE / HLE] (among other factors) despite XQEMU being a superior emulator in many ways.
@Jannik Vogel Clearly you have a bias for XQEMU. That sounds like it has made great progress though! I briefly looked at it about 2 years ago, and convinced myself that it would be helpful if I ever tried to do an xbox implementation on an FPGA. How is XQEMU solving the MCPX / crypto-key problem? Is it relying on dumps? If so, then your argument about usability of an FPGA implementation vs XQEMU are identical. I think in either case, dumping the ROMs would be illegal, unless you are doing it on behalf of a digital preservation organization or university. You do raise valid concerns over an FPGA implementation, but this sort of platform would be the best hope for such an implementation if it is possible. Though part of the benefit here, is you could for example, implement just the NV2A on the FPGA, and do the rest with XQEMU. I'm not sure what your concern with the DSP and NV2A are. In this example, the FPGA is only implementing the DSPs, the NV2A, the DRAM controller (which is small), and the FSB controller. The NV2A GPU was pretty simple, and is actually very similar to the VU cores within the PlayStation 2, so I don't think LUTs will be an issue (sure it will be large, but there are a lot of LUTs in the XC7A200T). As for the internal pipeline states, I don't think they need to be recreated exactly, as long as the end functional behavior is reproduced (i.e. it follow the spec that the software expected). If the concern is meeting the speed requirement due to the LUTs and pipeline states, that's exactly how you meet the speed requirement - LUT decomposition and pipe-lining. I think it's possible to do, but the actual feasibility won't be known until it's attempted (a downside to FPGA implementations - you can put a lot of work into them, only to find out that they are too slow or don't fit).
@RTL Engineering Can you comment on this board numato.com/product/aller-artix-7-m-2-fpga-module For FPGA Game Console and Retro PCs (Amiga, Atari etc) Emulation The idea would be to implement a virtual vga capture card within the fpga and send the game video directly to the host PC memory (or even directly pcie to pcie into the gpu). The PC would be the host co-ordinating the emulator and handling all external io. While interested in retro gaming I'm also interested in all the other applications listed on their product page and don't really want another standalone device. i.e. I want a general purpose fpga solution.
I haven't used the Aller, but it does have the potential to do what you are suggesting. Someone had mentioned a similar platform specifically geared towards crypto-mining, however, I think the Numato one is cheaper, with a larger / faster off-chip memory. It should be able to easily handle an Amiga (a stock version), as well as any of the MiSTer cores. I also think that you could fit a virtual capture card in there along with the emulation core, since it's a 200T. I have never implemented a capture card, so I am not exactly sure how that would work. But it should be simple enough to have a framebuffer in the DDR3 which is streamed to the system RAM of a host computer, and then drawn to the screen via some graphics library (like SDL or OpenGL). I don't think that you will have much luck dumping the framebuffer directly into the GPU though. If your primary interest revolves around using a host machine (either a SBC or PC), then this would be a good choice for an Artix-7. Though be warned that getting the PCIE interface working is tricky, though should allow you to essentially connect it to an AXI bus like a memory master. Another word of caution. It's not clear on their website if the JTAG connection is via the PCIE connector, or via the unpopulated header. My guess is that it's the unpopulated header, so you would need to solder a connector as well as buy a Xilinx compatible JTAG programmer to use the Aller. The only other Artix-7 solutions that I know of are stand-alone, which don't have PCIE lanes, though do have a lot more IO (mostly for stand-alone development).
That was stated in the video, though may not have been all that clear. It looks like the cost would be around $400 USD, where $230 of that is going to the FPGA chip alone.
I would, but I am not going to join Subscribe Star or Patreon until I have 1k+ subscribers (arbitrary number, I know). Also, I am not currently working on the project presented in this video (I am working on something potentially more interesting), but it wouldn't feel right accepting donations for something that I am not currently working on.
I will consider it. My main reservation is that I can't guarantee regular content, which goes against the terms of those platforms. But I can look into what's involved in the process.
The current MiSTer is based on a mass produced (and significantly discounted for educational market) board, Will this "new" board be significantly cheaper than the existing low cost MiSTer board?
I am aware of why the MiSTer uses the DE10-Nano. As mentioned in the video, the new board would be around 400$ USD (~4x more expensive than the DE10-Nano). So it wouldn't exactly "replace" the MiSTer, but the goal would be to have a backwards compatibility, so that those who want to use cores which cannot run on the DE10-Nano can still use the original MiSTer cores. As stated in the video, you basically have 3 options for a system like the N64: 1) Don't emulate it and give up, 2) buy a more expensive emulation platform with a more expensive and beefier FPGA, or 3) stick with buggy software emulation. Many people may not care about being able to run an N64 FPGA core though, and will be able to stick with using the DE10-Nano (the same argument can be made for the PS2 / Dreamcast, etc.).
What is the point of implementing an FPGA version of PS2, Wii and up? They are designed around a frame buffer, unlike the cycle accurate timings of prior generations, so the benefit of FPGA seems wasted on it, when modern computers can emulate those consoles perfectly.
That's a good question. Though the PS1, N64, and Sega Saturn are also framebuffer designs. So by that logic, why bother trying to get those to work on an FPGA? Cycle accurate timings aren't needed for any of those. Although, many devices still struggle to emulate the N64 or don't do so correctly (there's a similar issue with the PS2, where it requires a relatively powerful system to run). Perhaps this is a more philosophical question though... Why bother with an FPGA at all? You can do cycle accurate emulation of a C64 on a high end x86 machine for example (a low level, timing accurate emulator). After all, MAME has pretty good compatibility and it's not LLE. I think the answer to your question is has a an academic answer, a practical answer, and a psychological answer. For the academic... software preservation. We want to be able to run every piece of software ever released (perhaps ever written) for archeological reasons. While emulators are a tool that can help, as the system complexity increases, it becomes harder for a single computer to replicate all of the required behaviors, especially if you want them to be true to the original hardware (i.e. rendering a PS1 game at 480i with all the flaws and quirks). For the practical... cost. If you need a $1200 PC to run a PS2 emulator, vs a $600 FPGA system, wouldn't the FPGA system make more sense for those who do not have a $1200 PC? For psychological... there's something about running on actual hardware vs a software emulator. Maybe it has to do with the fact that it's the only thing running. I for one would prefer to play PS2 games on an original PS2 rather than an emulator. Same thing for N64, etc. And an FPGA implementation doesn't feel any different to me than the real thing, unlike an emulator. That same thing applies even more so with a computer... (which in this context has applications to the SGI systems like the Indy)
Another reason to go FPGA vs software is dealing with "software rot". That has implications for the academic side, but also for other cases.... Here's an example for you: Let's say the PCXS team vanishes and decides to stop working on the emulator. Microsoft releases Windows 11, and PCXS doesn't run on it, nor does it compile. Then someone has to go through and update the PCXS code to make it compile and run on Windows 11. If the team vanishes, then that may be a nearly impossible task. If it were implemented on an FPGA instead though... there may be no reason to update the FPGA core as long as it runs on a specific device. So all of the MiSTer cores for example will always run on the DE10-Nano, as long as you can still get that board. The linux support side could have rot issues, however, the need to update the linux kernel on the ARM processor is not the same as for a desktop PC. Who cares if MiSTer is running linux 2.0 from 2003. It could have tons of security flaws, but you aren't going to do anything security critical on it. But if you have to keep your PC running windows XP in the year 2035 to run PCXS, then you probably don't want to connect that PC to the internet, and you wouldn't want to do anything security critical (like online banking) on it. Whether or not software rot is a big enough issue to justify an FPGA over emulation is uncertain. But FPGAs do last much longer than PCs do, and they don't require constant updates that can break massive software dependency chains. Note that in this case, something like the Playstation Classic would fall into the same category as the FPGAs (though the FPGA will probably be able to run for a longer lifetime and the SoC in the PS Classic). However, it would be economically infeasible to release a software emulation system like the PS Classic for a PS2. The only way to do that.... would be to use an FPGA.
@@RTLEngineering Thank you for your very thoughtful explanation. Makes a lot of sense, when you look at it from this nuanced perspective. I’m curious about the point you made about FPGA lasting much longer than PCs. Is this primarily due to your aforementioned points about software rot and future compatibility issues, or do you also mean the hardware itself has higher MTBF ratings than modern CPUs and SOCs? I’m just now getting into and learning about FPGA, having just ordered my first Mister kit last week. One thing I’ve been most curious about, is how the system will hold up after years of moderate to heavy use, constantly writing different cores.
I'm glad it made sense. It was a difficult but very relevant question to answer. So FPGAs typically do not suffer from the same software rot issue as PCs, but that doesn't mean they can't. For example, if the MiSTer github repo gets deleted and all of the source code is lost, then it can never be ported to another FPGA when it reaches end-of-life. Luckily though, porting FPGA code is much easier than software (if written correctly, which most of the MiSTer HDL is not - though it's still probably easier than software). The reason for that is that there are only a couple of things that can go wrong with a port... IO interfaces (which are external to the core), weird hardware quirks (things like internal tri-state busses, which FPGAs don't have anymore), and vendor specific IPs (using an Intel macro to declare part of the FPGA to use rather than inferred code - these can often be translated though). For the IO interfaces, the MiSTer cores all use a HPS bus coming from the ARM processor, which I believe was changed to AXI on the Cyclone 10. The Xilinx chips all use AXI, and I think the Microchip SoCs use AXI or AHB. Anyway, all of those buses can be converted to HPS using an intermediate core, so that's a lot easier than trying to get a bunch of dynamically loaded libraries to recompile or run on a new linux kernel. In my experience, the MTBF for FPGAs is far higher than CPUs and SOCs mostly because of their use mode. The first thing to keep in mind is the expected device lifetime. FPGAs are designed to last for 20+ years, and the automotive and aerospace grade ones are designed for a much longer lifetime. Your desktop CPU on the other hand is only meant for a 5 year lifespan, same thing with your GPU, and even the raspberry pi (general SoCs are designed to last longer, but the pi is cut down in cost, so it's likely that the MTBF is much shorter). But I think the more important aspect is related to software rot... you wouldn't want to use a PC from 2004 to do emulation today when you can have a new one that's much more responsive. The FPGA on the other hand doesn't have quite the same concept, because the design was implemented to do exactly what it did when it was designed. That's why you can still buy brand new products with FPGA designs for 20 years ago, and they work exactly as advertised. As for the FPGA holding up with reprogramming, don't worry about that. The FPGAs use the same type of SRAM as in a CPU cache, and the CPU cache is written to millions if not billions of times per second. The part of the MiSTer that will probably fail first is the SD card if you continually write / save games to it. There are some FPGAs like the Max 10 which have internal flash for the configuration, and those will have a limited number of flash program cycles. But those will still have a virtually unlimited number of power on cycles (every time the FPGA is powered on, the flash is read and the FPGA configuration SRAM written). A similar thing happens with the DE10 on the MiSTer, except the FPGA SRAM is written via the ARM within the FPGA by reading from the SD card. If you use the USB Blaster interface to program the FPGA, then it's writing the FPGA SRAM via the USB. The one thing you have to worry about are IO constraints though. Make sure you don't set an IO to be "out" when something else is driving "in", and make sure that the voltages on the IO are correct (both with configuration and anything plugged in). Those are easy ways to destroy the IO buffers / transistors. Other than that, I suspect that the other components on the DE10 boar will fail before the FPGA itself.
replay 2 has very few cores to mess with and do you have a link to the PS2 core wip page . can it play mister cores if it can why has he not port theses cores over?
The Replay2 uses a Spartan 7, and I don't think the claimed 1.5 GB/s DDR3 would be sufficient for an N64 with a 1080i framebuffer. If that person is able to get a PS2 core working on that at full speed, they should get a medal, because 1) it won't fit, and 2) the memory bandwidth is too low. There was a reason that a x64 SODIMM was chosen, as well as the Artix-7. As it is, It's likely that only the emotion engine will fit in a single Artix-7, let alone a Spartan-7, 2/3 the size (if it even uses the largest Spartan-7).
@@RTLEngineering not sure where you getting information on spartan 7, Mike I stated they are using ultra 96 platform has spartan 7 is end of life you must be looking at out of date part of the website. Not the current 2019 plans for replay 2.
@Matthew Langtry The information on the Spartan 7 is directly from the Replay 2 website: www.fpgaarcade.com/replay2/ The information regarding the the memory is also on that website. If the plans for the Replay 2 have changed from what is on the website, there is no way for me to have known that. Feel free to update me with the currently planned FPGA. If instead you were asking about the information specific to the Spartan-7 vs Artix-7, that can be readily found within the Xilinx datasheets.
@@RTLEngineering The website is a little hard to navigate, here are more details of the ultrascale design :www.fpgaarcade.com/replay2-first-mock-up/ I was targeting the S75 or S100 in spartan7, which is as large as the second biggest Artix-7. Using a 64 bit memory system the performance is very similar to the Artix-7. Pretty much all the cores which are available on Mister/Mist will be available on R2 - most on R1 shortly. The reason for shifting to the ultrascale was to get the tight CPU integration for hybrid emulation and potentially the GPU as well, which will be handy for the Amiga etc.
Reality often sucks because of what we wish to be possible and what actually is differ. I can't tell you how many times I have run into this issue as an academic researcher. At least I am giving you hope that an N64 core is possible on another FPGA, unlike Kevtris (the engineer to worked on the Analogue FPGA cores), who claimed that an N64 core would be impossible flat out. Also, to put things into perspective, the FPGA in the MiSTer was designed back in 2009, so it's 12 years old. And at the time, it was 20% slower than the Xilinx 7-series which was designed in 2008. There were several poor choices made along the lines by Altera, Intel, and the MiSTer community that lead to the bad news you spoke of (i.e. the community could have chosen a better FPGA to use, there are even ones now around the same price of the DE-10 Nano that could run the relevant part of a PlayStation 2, or a Dreamcast). Unless you were referring to the bad news being that other people are working on N64 cores, where one person supposedly has Mario 64 and Golden Eye running on an Artix-7?
This is seriously interesting work. I would love to write games for such a platform, not just for emulation.
That was a lot of info to absorb. Kudos to you in your undertaking. I cannot wait to see where all this takes you and the platform. It is awesome that you are thinking ahead to the future potential of the MiSTer project. Great ideas, awesome attitude to current constraints of the hardware, but your view of the projects potential is fantastic. Thank you!
Great job !
I'm a little bit surprised that this kind of content don't get more views. Keep going.
To be fair, this is super complex stuff. I am not surprised that it only hits a small audience. It is the end-product that will catch more people and get it talked about. o.o
this sounds awesome. looking forward to more news of it.
I have gone back to this video every once and a while, dreaming of a potential Saturn core on MiSTer, while hearing it wasn't likely/possible. Now that we actually have a Saturn core in development for MiSTer, it seems your estimations may be very close to what is possible on the platform. I look forward to seeing what you do with your talents, on either MiSTer or a more powerful FPGA gaming device in the future. SGI systems are a monumentous goal.. best of luck!
Yes, I did the exact same thing, going back and forth to the video and he keeps answering the questions that people leave him. He really knows what he's talking about. This is very interesting to me :)
I'm glad you found it useful!
I don't recall what I said in the past about the Saturn, but as of right now, I would basically give it a 50/50 chance of fitting. It's a pretty complex system (when compared to the PS1), though not as complex as the N64. The N64 wouldn't fit without a hybrid emulation approach though (e.g. run the CPU on the ARM via emulation, and the RCP in the FPGA - that would be a feasible implementation, though perhaps overly complicated).
I probably will never touch the DE10-Nano version of MiSTer, since I find the "newer" systems more interesting. After all, the FPGA being used in it is over 10 years old (it was released in 2012, but Altera probably started designing it in 2010 or possibly 2009). I am, however, working on something related.
@@RTLEngineering Do you think in a future with this high performance FPGA on a PC we could pick some consoles in too fast category and emulate some parts and give to fpga other part?
Making a different way to emulate Modern plataforms.
@@carlospc223 Possibly. One of the major restrictions will be clock speed though (although, depending on the architecture, you could also run it slower but go wider).
Realistically speaking right now, the furthest you can hope to push a high end FPGA is going to be around 600 MHz (you can push them further with super-pipelining, but that doesn't fit one of the systems mentioned). However, when you throw cost into the mix, you're looking at 300-400 MHz (600 MHz would be one of those very expensive binned FPGAs that cost 2x-3x more, in this context that means $4,000 USD easily).
I am guessing though, you were asking about hybrid emulation. For example, if you put the xbox system (GPU, sound system, etc.) on the FPGA but implement the P6 CPU via virtualization on an x86 host. That would be a good use case for a FPGA in a PC. That might also open the door to a system like the Wii, but then you start to blur the line between the FPGA and software emulator side, and at that point you are better off just sticking with software emulation (a xbox would be different, because you can run the game/kernel code via virtualization on a x86 host and not via emulation).
All of the newer systems though, had GPUs running in the GHz too, so while you could implement them on a FPGA, they will have to run at a fraction of their original speeds.
@@RTLEngineering Understood! Thanks.
I thought some hybrid emulation would be better in some way but i see its no the case.
But still looking foward in some high end fpga for newer hardwares(Ps2, N64)
Ur channel is underrated... Great video
I also hope to see some "development kits" which made out of mister and more powerful FPGA cards. So, open source games could be developed on those! Good work.
That wouldn't be feasible. The more powerful FPGA wouldn't be compatible with the Cyclone V, and would require that the two be combined in a final version (beyond a development kit). There's also no good way to connect another FPGA card to the DE-10 Nano, the best you could do is give up the SDRAM and the MiSTer hat and replace it with a parallel link to the other FPGA (then you would have to move that I/O to the new FPGA and every core would have to be rewritten to take advantage of that as well as the increased SDRAM latency). Also, if the Cyclone V were included, you're probably looking at the base card cost going up from $500 to $700 (the Cyclone V doesn't have competitive pricing on it). And a dev kit would likely be well over $1,400.
Furthermore, the Cyclone V is going to be reaching end of life soon (within the next few years), so including it in any design wouldn't be wise - it could only be produced for a few years and then it's done forever.
Tl;dr, including anything to do with the MiSTer here isn't economically viable, nor would it provide much of an engineering incentive as all of the MiSTer cores could be ported to other FPGA architectures (no longer requiring a Cyclone V).
Your work is fantastic. I would love to see you develop an exclusive FPGA based retro gaming hardware platform. Starting off with a 1080p native Nintendo 64 core would definitely get the ball rolling. Hopefully, it would be possible to make it standalone and not require a computer even if this option would be a bit more expensive.
Thanks! The platform that I proposed in this video could also work with a single board computer which has a M.2 slot (it would need an adapter cable, which does exist). So something like the Latte Panda would work, though it would still be a full length PCIe card, so it would be a large form-factor. I have also been considering the idea of using a Zynq or Zynq Ultrascale instead, which is effectively equivalent to the Cyclone V in the DE10-Nano, except it has more logic and runs faster (while being more expensive). That could potentially have the option to attach to a PCIe carrier card (and go into a PC), or be standalone with the ARM cores in the Zynq running a Linux OS (the whole reason for the PC is so that there is an OS running which can handle things like USB, Ethernet, SD Card, game ROM loading, etc.). Though using a Zynq like that could potentially mean compatibility issues, where some cores would require both ARM cores for hybrid emulation, and therefore would require a support computer regardless. Just a few things to think about.
@@RTLEngineering I wish you the best of luck with all of your projects! I feel like the more popular retro gaming gets, enthusiasts would be willing to pay 3x or more for a standalone arm coupled FPGA "all in one solution" that could support N64 and other later generation consoles and handheld cores.
2 years ago when I asked why this wasn't a thing, I was told it was impossible!
It probably seems impossible to a hobbyist who does this sort of thing on the side, however, my main job is trying to figure out how to get the most performance out of FPGAs and to best utilize the architecture - this is no different. The only issue is that it requires a lot of trial and error / experimentation to achieve the desired result, which takes time. It also means not only attempting to optimize the HDL description, but also how to optimize the architecture (which primitive parts go where). It's for all of those reasons that this may seem impossible, even to more experienced developers, like Kevtris for example (getting something like an N64 to work, let alone a PS2, is far more work than he would be willing to put into such a project, and would require a fundamentally different development method and experience set, so he sees it as impossible).
Fascinating concepts well presented. Subbed.
Upto to GameCube will be great. WII ,Wii U are modern and will run judy fine in software emulation.
that's very interesting...I was looking for someone at least talking about the future of FPGA...thank you!
The future of FPGA re-implementation for the market of retro gaming isn't all that clear, this was just a proposal for somewhere the physical hardware can go, since MiSTer cannot be the future (it's already starting to show its physical limits). There are other people working on similar platforms, however, I don't think that any will ever approach the cost of the DE-10 Nano used by the MiSTer (since Intel effectively subsidizes it).
On the other hand, I think that we may start to see more PCIE FPGA cards in the consumer market in the next decade (which will drive prices down), to the point where anyone who has the equivalent of an RTX 2080 Ti (i.e. a high end consumer GPU) will also probably have an FPGA card along side it (Apple has already started to do this for the new Mac Pro).
And then obviously with cheaper, larger, and more mainstream consumer FPGAs, more complex cores will be developed and become feasible for enthusiasts to use.
Hopefully that helps add a little more insight into the future of FPGAs in this application space (or at least my thoughts on them).
@@RTLEngineering thank you for the insight, yes, I agree with you all can help, even just suggesting an idea can spark new way of thinking so, what you are doing is commendable...I 'm into game design, I consider ideas very high, I'm also aware that just ideas are half of the story and without the execution, nothing happens.
here the link to my idea called WOR(L)DS:
In the link below you can find PDFs with game scripts I wrote, check them out if have the resources to
transform them into games:
raulgubertart.wixsite.com/wor-l-ds
''3 NEW MONSTERS AND 3 NEW WEAPONS FOR QUAKE''
''3 NEW WAYS FOR SUPER METROID''
''NUKE THE QUAKES! BOMB THE DOOMS!! KARMA KARMA BOOM!!!
''CONFUSED AND BADLY WRITTEN IDEAS FOR VIDEOGAMES BUT UNDOUBTEDLY BRILLIANT''
''THE FORTRESSES TO INFERNO''
@@RTLEngineering I think the main problem for the mister project its not the limitations of de10 but the lack of developers with the skills and passion to make cores there is alot of low hanging fruit out there from arcade to pc, consoles ect somebody can do. I been analyzing how the mister project is being run it reminds me alot of how Mame began with little team effort and alot of big egos .
I agree that there is still a lot of low hanging fruit out there, but you can only fit so much into an FPGA with 110K LEs. You also have an IO problem with the DE10, where it is severely lacking, especially in terms of attaching external memory. While there is the SDRAM add-on, it doesn't have sufficient bandwidth for any of the 90s era systems / computers.
I can't comment on the internal dynamics of the MiSTer team, but I can imagine that there are some political issues there which hinder progress (it always happens in opensource projects). So if that's an issue, then the hardware lacking is just part of the limitation to growth. Either-way, the hardware limitations need to be faced at some point.
Another issue is probably the complexity of cores which need to run relatively fast (50 MHz+), requiring optimization which takes a lot of time and effort. And then there is the effort to debug complex cores as well (I am still a bit surprised that someone wrote a 486 core). I guess that falls into the category of skilled and passionate developers (ao is clearly one of them though).
@@RTLEngineering I am only in the income bracket of buying a 1500 ish dollar computer no more often than every 5 years but I would still put a upto 750 dollar PCI-E card in to accelerate emulation. Cost less than all the mods/repairs I do to older consoles to keep them going. That being said I would just move up to repairing the next generation and so forth and so forth on until the day I die.
Brilliant stuff
I'm curious now that we did get (almost all) of an N64 core on Mister if you think this platform is more interesting to pursue. In this video you mentioned you'd like to wait for that to come to reality before pursuing this further.
One day, I will have a powerful GPU sitting in my case, but having to share the airflow with an FPGA based emulation card.
I personally hope that FPGA based emulation becomes more widely available as it can circumvent issues that current emulation has. Like the odd quirks in the N64 that give Mupen64 quite a run for it's money on lower-end platforms... :p. My ideal dream-FPGA-emulation-thing would be an FPGA card that can be driver-controlled by the host OS. This driver would do exactly what it is named to do: flash the FPGA with the core, send some input (like the rom), maybe write some settings to it, take output from the card and display it - or even let the driver link an existing GPU with the FPGA board in order to do graphics-intensive work to render the actual graphical layer.
Sure, I am not an engeneer, and this is just dreaming. But how cool would it be to have a gaming PC that doesn't just run modern games, but also retro games at pretty much intended speeds? I would love that. I'd easily throw out some money to have that, and even if I only would get the naked board, I would see if I could invest in a proper and nice cooling solution with a nice 3D case. While I like the looks of traces, transistors and chips, I also like the look of a closed-up product with some proper casing.
Really enjoyed the video, thank you very much for sharing this!
Luckily FPGAs are actually significantly more power efficient than GPUs. The one I proposed in this video is rated for something like 6W maximum, compared to a 120W GPU (as I recall, the RTX 2080 Super can pull 210W when modded).
As for your ideal FPGA emulation thing, that's pretty much what I described. And if things like SLI were open-source, you could potentially put an FPGA card into an SLI array with a GPU (to share data / video). Obviously that would have a lot of engineering hurdles, but it's conceivable for the future. Though if it got to that stage, it would be more general purpose than for emulation. Silicon may be relatively cheap, and it may be advantageous to put special function hardware blocks all over the chips, but eventually it will become wasteful, especially when the utilization drops to some small number (i.e. if you use the special function block 0.0001% of the time). That's where having driver-controlled FPGA cards (like GPUs) would come in handy - imagine using one specialized in video encoding with something like Adobe Premiere (Apple already does that because they don't get along with NVidia). They also make good targets for neural network evaluation, and ray-tracing calculations (why waste space on RT cores if you could bank on the end user having a configurable logic card that could implement them as needed).
I know that's not quite what you had in mind, but if that came to be, then those cards would be far cheaper, and they could then be leveraged for emulating old hardware too!
@@RTLEngineering If this was put into a computer like this would the emulator have to take up a full screen or could it pass through the Computer and be placed in a window that could be resized and popped in and out of the full screen? I am thinking that you would have to flip back and forth between the two outputs unless there is an internal or external solution
That would depend on how it's configured. To allow the images to be resized, you either need to combine the image as an overlay (which would be quite complicated to manage - likely requiring hooks into the OS window manager), or you would have to copy the framebuffer from the FPGA card into the computer's display controller (i.e. the graphics card). If you did the latter, then you would necessarily have to add multiple frames of lag, and you would stress the PCIe link - there's only so much bandwidth to go around.
Flipping back and forth between two different outputs would likely be the better solution. The design shown in the video had a video in port which would allow you to overlay a full-screen output from the FPGA (no window resizing though), similar to the original 3dfx Voodoo1 and 2 cards. The other option there is to use a KVM type switch and swap between the monitor sources.
Unfortunately, modern display systems and window management are not well suited to handle a secondary video input source.
@@RTLEngineering basically what I imagined but wanted someone more in the know to confirm thank you
This is super interesting. Thanks for posting. I get the impression you want to keep all your progress private until you have a working result. Could you shed some light into why? Also, just out of curiosity, what HDL are you developing this in?
Thanks! That's correct. I actually would prefer to keep all of the implementation details private (I would probably release the board files though for the PCIe card). If you were more commenting on the span since the last update, the reason is that I have put that project on hold in place of another one which is also private (for the same reasons as keeping the implementation private).
So a few reasons to keep it private:
1) I don't want to open source it - as soon as you allow random people to start helping out, the entire project looses order and becomes a nightmare. Some people like that aspect, but I'm not one of them.
2) I don't want to provide the ideas for someone else to beat me to it. I can't work on these projects full time as I am a full time academic researcher (that takes priority).
3) Releasing implementation details publicly can be a problem with future job prospects, especially companies that receive classified government contracts. (i.e. they may be concerned that you would leak their project details).
Hopefully that answers your question. Note that I don't have much of an issue regarding releasing the final bitstream publicly (and would probably open the implementation eventually if I became tired of supporting it). I might also consider releasing an encrypted IP, however, I do know that some of those have been decrypted.
As for the HDL. I use both VHDL and Verilog. They both have their strengths and weaknesses, often covering each other. So I use the best tool for the task at hand. Often, the tricky logic parts are done in VHDL (as well as ROM LUTs), and the structural connections are done in Verilog. I don't use System Verilog though, or any other language - I prefer to have more control over what the synthesizer implements (I suspect that System Verilog for example would have problems tuning logic to change path delays, as it's less explicit, same thing with things like Scala based description methods).
@@RTLEngineering Thanks for the detailed response! Are you familiar with Henry Wong's work on creating a Superscalar Out-of-Order x86 soft processor? There might be some overlap with your work.
Yep, I have read his thesis. There are a few problems with using his work though. 1) It's incomplete to the best of my knowledge. Parts of it are done / work, but not enough to be a full x86 CPU. 2) It consumes a lot of resources... I'm not even sure it would fit in the largest Artix-7 FPGA, and 3) it would never run nearly as fast as would needed in an Artix-7. So to be used for an original Xbox, it would need to run at 700+ MHz, but the fastest it can run in an Artix-7 will probably be around 80 MHz. I suspect that even in one of the new Ultrascale+ FPGAs, it would only hit a few hundred MHz (certainly not more than 300 MHz).
It's a cool project, but not all that useful here.
Also, if you are interested in an x86 soft core, there is the ao486 (it won't run all that fast, but it's a i486 CPU). An alternative solution could be to use a RISC-V soft core as a base and rewrite the instruction fetch / decode stages to take in x86 instructions and translate them into RISC-V microcode. I honestly think that may be the best solution to get a "fast" x86 soft core on an FPGA, especially since there are small RISC-V cores that can run at over 150 MHz on the slowest Artix-7 (granted in that case, the x86 would probably be around 120-140 MHz with a very low IPC).
the n64 fell short on the mister platform now we have an idea what it takes for the ps1.saturn,N64 to run what are your thoughts on the new platforms like the mars and the replay2 are you still making your own fpga . its been almost 2 years would like for you to upload a new video what do you think now ?
Take a look at the UltraScale+ Processing Module from Antimicro. It is open source and will likely be good reference material for you.
Unfortunately the pricepoint of using something like that would be too high. People who purchased this platform would be looking at between $900 and $1200 USD for something as powerful as a the $300 Ultra96 board. Some people may buy it at that price, but the majority of those using FPGAs for retro emulation / reimplementation won't go for that price considering the DE10-Nano is $150 USD at its highest price.
@@RTLEngineering Ya, it was a reference for inspiration. I wasn't expecting you to use the same FPGA.
Regardless, it wouldn't be viable. As a SoM, there are many other vendors that have lower pricepoint SoMs with the same FPGA, and in terms of making a custom board... forget about it (that's the sort of think that takes a team of 2-3 engineers about 1 year full time to produce, it's incredibly complicated to implement an MPSoC board, especially for an FPGA).
And then to produce a board with one of those FPGAs... for what you would really want to use for a project like this, you're looking at about $500 USD per chip for 10K quantity (assuming you could secure and guarantee 10K orders). And that's ONLY the FPGA. You still need the DDR4 (around $40 for the chips), other miscellaneous chips to match the expected quality (another $60), the board and assembly (another $120 most likely), enclosure (anywhere from $20 to $200 per unit depending on volume), packaging, assembly, etc. By the time you're done with everything, the minimum price it could be sold for (at cost) would be around $900, and that's after about 5 year man-hours of design and production work (all of which would be free labor since it would be at cost... so paying the engineers and adding in some profit gives you probably $1200 USD per unit for a 10K order, and that without any HDMI capability which is an additional cost).
Meanwhile, compare all of that to the $150 USD DE10-Nano you can buy, or the new Xilinx offerings for $200 USD or $250 USD with the same chip. It's no longer feasible to roll your own unless price isn't an objective (i.e. if you aren't selling to consumers).
I just wanted to give you an idea of the complexity of using a chip like that, and how it translates to the cost and design / production time commitment. I had a similar idea to yours when I started working on this project, but I didn't fully understand the complexities and costs with designing such a system.
I do appreciate you mentioning the Antimicro module though, it's always nice to know what various companies are producing.
Have you considered using a Xilinx ZYNQ Ultrascale+ device - that would get you a couple of fast Arm cores for emulating the target CPU, a GPU and a bunch of very tightly coupled FPGA gates, as well as a DDR-4 controller and other goodies...
I have considered the Zynq Ultrascale+, however, they are a bit too expensive for a consumer application. Using a Zynq 7000 would probably be a better option (which I have been considering instead of the Artix-7). The biggest problem there is that CPU emulation isn't really a good solution for many of these systems since it will always be slower. Furthermore, you're going to introduce a huge amount of latency by the ARM cores talking to the FPGA logic over AXI (it's a "high-latency" protocol).
To actually implement a PS2 reliably, or something faster like a gamecube, you would probably have to move to Ultrascale+ chips regardless though (for the speed increase). The DDR4 wouldn't really be a selling point, even DDR2 would be sufficient for most of these systems, it's the bus width that's the problem, although a physical controller would be nice as it doesn't use logic gates.
@@RTLEngineering Yes, the Zynq Ultrasale+ are a bit more expensive, but are you sure you couldn't get decent CPU emulation with a pair of 1GHz+ 64-bit ARM cores? And AXI really isn't a high-latency bus - especially when it's running at 100's of MHz. I don't know how much logic you need on the FPGA side, but if you can squeeze into a XCZ2UCG, there seem to be some interesting little modules for $250 odd e.g. www.enclustra.com/en/products/system-on-chip-modules/mars-xu3/ . Anyway looks like you've got an interesting project on your hands, I look forward to seeing how you get on.
The biggest problem there is that cost is a limitation for such a project. If the final product costs $1000, then that's going to greatly limit how many people can buy and use it. It would be great to have an N64 core on a high end Ultrascale+, where the MALI GPU is used to upscale to HD. But that would be a lot more expensive, and it also begs the question of "is this really that much different than a Raspberry pi for 28x the price?" FPGA re-implementation already has an image problem when it comes to emulation, and that would just end up blurring the lines.
As for the 1GHz+ ARM cores, it all comes down to how good the emulator is. So if you wanted to do an original Xbox implementation without using original chips, then yes, you would have to emulate the Pentium CPU on a faster chip like an ARM core - the same thing is true for the PPC CPU for the GameCube. But for 1GHz, you may have to write the entire Pentium emulator in ARM assembly, otherwise it will be too slow. If the Pentium executed 1 instruction per cycle at 700 MHz, and 1 instruction takes on average 5 ARM instructions, then the ARM chip would have to run at at-least 3.5 GHz. And to get it down to just 5 ARM instructions, it would almost certainly need to be done in assembly. Obviously the ARM CPU is going to be more efficient (OOE and branch prediction, as well as close to 1 IPC if not more), when the Pentium probably had an IPC of 1/2? So it might be doable, just not easy. That's the biggest problem with emulation that sort of hardware on a CPU. On the other hand, some of the Ultrascale+ silicon can run at 700 MHz+, so you could potentially put the entire Pentium chip in there, but that comes back to the final product cost.
Anyway, just some things to think about.
I would love to see a card I could put in my computer that would let me leverage an FPGA. I wish you would have a way to transfer files back and forth. Such as have the Roms saved on the computer that the FPGA could read or a way to transfer backups of savefiles and carts with saves on them for backup purposes and use the computer's video outputs or run the program in a window along with use peripherals attached to the computer.
I know probably just dreams but to have a seamless FPGA/Software Hybrid emulator all in one system would be great. All the benefits of both.
Makes me think of the old 3DO blaster.
Transferring files would require an agent on the FPGA side, though that's doable with a FPGA+ARM SoC hybrid (like what the MiSTer uses). If it were only a FPGA on the card, there wouldn't really be anything to transfer files from/to, since the host CPU would be doing all of the work / management.
One of the challenges that I haven't been able to solve is the video output as you mentioned. The problem being that a PCIe link would be too low / high latency to send video data from the FPGA to a GPU, and you have the opposite problem if the FPGA acts as the sink. The only solution I have come up with is to use a KVM switch / multi-input monitor, and switch sources where appropriate (alternatively, you could use the FPGA output as another monitor - though not for the desktop environment on the host).
I think a FPGA+Software Hybrid emulator is likely the solution for many systems, especially if you want higher quality video output. Although, it does pose the question of: at what point is it mostly a software emulator / would a software emulator by itself make more sense.
@@RTLEngineering indeed display being picture in picture be ok if not full window and I could undo the inputs and swap or kvm them when needed
I am keeping open eyes on all options and an open mind. Fpga is the hotness now amd many are really hating on software approaches to emulation. I just want all options explored and let the results speak for themselves.
This answers my question whether Dreamcast would be on MiSTer
Unfortunately the MiSTer is too small and too slow to run a Dreamcast. It might work if you put some of it on the ARM CPU (hybrid emulation), but the best you could hope to get is about 7FPS max for a game, which would be unplayable.
That's not to say that a Dreamcast is impossible or impractical. The problem is that the FPGA used in the MiSTer is small and old (I believe it was designed in 2009). There are newer chips that are in a similar price range that can run circles around the Cyclone V on the DE-10 (one core that I was testing got a 4.5x speed improvement on the newer chip, which is the difference between 7 FPS and 31 FPS, it's also 4x the internal size).
@@RTLEngineering wow, how easily could I get a dev board with newer FPGA and port existing core to be faster?
That depends on the FPGA (Altera/Intel vs Xilinx/AMD). MiSTer cores also require a support interface from the ARM CPU, so that would have to be ported too, as well as pretty much all of their software from the ARM CPUs. It's possible, but it would be a lot of work. In terms of the pure core itself though, that would probably be easier to port, but it would require replacing the IO interfaces as well as any FPGA specific macros (i.e. for the Cyclone V - if moving to a Cyclone 10, then that would be easier).
I was working on benchmarking a RISCV CPU, which was not meant for the MiSTer, but it ran at 80 MHz there and 345 MHz on an FPGA that I am planning to use in my current project. Note that 345 MHz is faster than the Dreamcast CPU and GPU, and while the RISCV core is optimized for an FPGA, it could be optimized further to probably add another 100 MHz (you would never get above the required 200 MHz threshold on the Cyclone V though).
Yup, the Robert, the guy behind the awesome GBA core said that 200 MHz on the DE-10 are imposible. He said that on a discussion about a PSP Core iirc.
@@RTLEngineering Would it even be too slow if one tried to put the whole Dreamcast CPU in the ARM section and just implemented the GPU and the rest on FPGA?
BTW, would it be in theory possible for FPGA to do some JIT compilation for the ARM CPU (for instance for a different CPU implemented in the ARM part in software) to help speed up some non-native instructions? Maybe in this way one could also get a faster IBM PC core, or are hybrid approaches just too cumbersome?
No mention of the sega saturn? It's just grouped in with the other 5th gen consoles. Would it actually be possible on the current mister? It used 2 sh2 cpus, and 2 custom graphics chips I believe. It is notoriously difficult to emulate, would an fpga core even be possible?
The Sega Saturn is pretty much a given (that's why I didn't mention it). It may be technically challenging to emulate, but that's not due to the speed (i.e. the N64 runs far faster / has tighter timing requirements) - so someone will eventually do it. As for if it will fit in the MiSTer... I'm not sure. The 2x SH2s + the vector DSP for 3D math would be pushing that chip. It would probably fit into the largest Artix-7 as proposed in this video (it has roughly 2x the logic elements as the FPGA in the MiSTer has).
My guess is that the order of "impossible to emulate" systems will be PS1, N64, and then the Saturn. And maybe down the line a PS2 or Dreamcast. The only one of those that has a chance of fitting in the MiSTer though is the PS1.
Oh wow this is a cool platform. Have you talked to Sorlieg and others in the MiSTer community about this?
I haven't talked to anyone in the MiSTer community about this, and I suspect that they may be less than thrilled. For now, it's just an idea on where it could go once the limits of the Cyclone V in the DE-10 are researched. Eventually they will run out of cores that can be implemented.
Question if you were to do this today what would you change
Hi, first i want to say thank you for your contributions to the Mister project :)
Second: Are you the guy behind ULTRAFP64? Cause that project is progresing very well :)
Thanks, though I haven't actually contributed to the MiSTer project. This was just a proposal of how it can expand.
I'm unrelated to the ULTRAFP64, though I know the guy working on it. He has indeed made impressive progress.
I feel like once you get to 3d games, the issue isn't accuracy and latency like it is with earlier platforms, but the overall presentation - aliasing, resolution, slow-down, draw-distance etc. Could say a FPGA PS2 then work in conjuncture with a graphics card to handle post-processing and get an accurate, upscaled and prettied ps2 game?
It's both actually, though there are several software emulators for the PS2 that do a pretty good job. There are only a handful of games that are known to be unplayable with PCSX2, which most likely rely on hardware quirks that you don't get in emulation. On the other hand, the only feasible way to implement a PS2 core would be to use a black-box approach, which could fail to capture some of those hardware quirks as well.
As for using a real GPU, that's a possibility. Presumably, once the draw commands are sent to the GS (the hardware component that drew the 2D graphics to the screen), the only other quirks would come from bus behavior and bottlenecks, which hopefully no games rely upon. So you could potentially treat the GS as a data / command sink, and abstract it out of the core and implement it on a modern GPU. After all, once you know the triangle positions in the screen, the actual resolution of the screen nor the method for rendering them no longer matters, as long as it executes the commands given to it. You would, however, run into issues with data synchronization. For example, if the GS was writing to a framebuffer to later be used as a texture, then you would have to somehow recognize that and pull it out of the modern GPU. Similarly, if a texture was updated, then you would need to send the update to the modern GPU. So the synchronization could be challenging. There are also SoC type FPGAs that have ARM Mali GPUs in them, which could be used to do the final rendering at a higher resolution, and would allow for unified memory. However, those are significantly more expensive.
Tl;Dr. Using a modern GPU could work, but the logistics of doing so would be challenging.
This is great. This would be good for n64 and ps1 since you could increase the frame rate. I would only buy it if it could connect to a crt.
You would probably have to convert the HDMI to composite to connect to a CRT. I wasn't planning on targeting the CRT market directly, since most people would use an HDMI compatible device.
An Artix-7 plus an Spartan-7 ... works for me :-)
Well, the Spartan-7 wouldn't be user programmable (it was to combine the two HDMI signals) - if you can't solve the problem with 1 FPGA, just throw more at it!
Would love to see a 3D rendered mock up of what a standalone console with your build inside would look like?
It would most likely look quite strange if it was based on a SBC like a Latte Panda. Effectively, the Latte Panda would sit under the FPGA (being about 1/4 the size of the PCIE card), with a short PCIE ribbon cable connecting the two. It wouldn't be the prettiest solution, but it would certainly have more power than the current MiSTer. I could probably do a quick render if you would like, but as I said, it's going to look quite strange.
RTL Engineering I’m sure a nice shell with a little style would look good. Just shove it in a mega drive shell lol Jkn
Definitely looking forward to your work
Why s478? Isn't that a later-gen Pentium 4 socket? Pretty sure s370 was for Coppermine and Tualatin PIII/Celeron.
The s478 seemed to be the most readily available, also I was unable to find any good docs from Intel on the s370. Honestly, I don't think it will matter that much, though it may require a custom kernel (in which case using an Atom may be a better choice - unfortunately doing so would mean that the project couldn't be open-sourced since the Atom is under an NDA).
A coppermine CPU similar to the one used in the Xbox is socket 370, not 478. Socket 478 supports Pentium 4 and later.
Even if you source a real Pentium 3 CPU, the clock speed and cache would need to be very nearly identical to the original xbox to insure proper compatibility. Xbox enthusiasts have been soldering in faster Pentium 3 models into their consoles for a long time. It requires a clock switch to to get close to real hardware speed, but even then, depending on the model, you often don't get identical performance. Even a very small difference in clock speed of a few megahertz can cause glitches or anomalies on real hardware.
If you are only interested in providing a real x86 onboard to process Xbox native instructions, than a 478 socket is not ideal either. Any atom or similar x86 chip could serve as a stand in for similar cost and much lower power consumption. Either way, you will have to deal with the speed issue, which may not be possible with an FPGA approach where things are usually clock synced.
It is also unclear to me how a wii has a clock speed that is too high for an FPGA, but an Xbox does not. The Xbox CPU runs at a 733mhz clock speed and the NV2A GPU runs at 233mhz. The bus speed on the mainboard is either 100mhz or 133mhz, with SDRAM refreshing at that cycle, as well PCI link of 66mhz in addition to other devices like the onboard audio, storage controller, ect. Any component that an FPGA tries to implement is likely going to be running or syncing at a frequency of 66mhz or greater.
Someone else had commented about the socket 478 mistake. The reason for choosing the 478 was that Pentium 4s are easier to find than Pentium 3s.
That is a valid concern, but I think that would be a drawback that would have to be accepted when running on non-original hardware. The only other solution to the problem once all of the original xboxs have failed, would be software emulation, which will most likely have similar issues (unless software patches are applied). Unfortunately, the extent to which this is an issue would not be known until it is attempted.
I did look into using an Atom, however, they are no longer available in a usable form (the Z510 probably would have worked). The issue is that unless you want to rewrite the entire kernel and bios, you need to have a system that presents an identical memory map to the original xbox. This is something that only the FSB based Intel Atoms (now discontinued) provide. So that led me to the idea of using a Pentium instead of an Atom.
I'm not sure which speed issue you are specifically referring to? The FSB operates at between 66 MHz and 150 MHz, depending on the CPU, that entire range is possible with the Artix-7. As for the clock syncing, that's true, but FPGAs have multiple clock networks, so as long as the FSB interface is synced to the FSB, and the internal implementation is synchronized to another clock, they can be interfaced using one of many different techniques, which also work for variable frequency clock domains.
I thought I mentioned it in the video, but perhaps it wasn't clear. The whole point to the on board x86 was to offload the part of the xbox that was too fast for the FPGA. The same thing is true for the GameCube. In both cases, only the chipset which operates at a maximum of 250 MHz is implemented in the FPGA, which is more than doable. I haven't really looked into the Wii, so it's possible that it could be implemented the same way, if the GPU and chipset is less than 250 MHz, otherwise, it would be too fast for a "low cost" FPGA.
In terms of synchronization, you seem to be under the assumption that the entire FPGA uses the same clock. While an FPGA usually uses the same input clock, it contains many PLLs which each can have up to 6 independent outputs which do not need to be integer multiples. These outputs are then connected to internal clock networks (domains), and can interface with each other regardless of a frequency mismatch. Typically, this interfacing is done via a dual-port block RAM, in the form of a FIFO. There are also methods such a double-buffering, and clock gating which can be used.
For the example of an xbox, you would probably have different clock domains driven by a different PLL output for: the FSB interface, the DRAM controller, the PCI controller, the HDMI output IP, the NV2A (might require several clocks), the DSPs (might require several clocks), an IO emulator (to drive the PCI controller), the BIOS loader (from external SRAM), and the interconnecting switch / bus fabric between them. None of which are required to be integer multiples of the others. Hopefully that makes sense.
@@RTLEngineering Thanks for the response, I am excited for your project. My intention was just to give you information you might not already have, not to discourage you or tell you what is or isn't possible.
Thanks, I do appreciate that. There are obviously points that I had not thought of / addressed, especially since I am not actively working on an FPGA xbox implementation.
It also occurred to me that the issue with soldering faster Pentium 3s may have been the chipset not playing nice with them (since the chipset was most likely designed for that specific CPU). After-all, that would exhibit timing an glitch-like behavior. Though you probably have a better idea as to whether or not that was the case.
My thought was that it's possible, however, I do acknowledge the fact that I could be mistaken. If that's the case, the socket 478 could be adapted to fit any other CPU that has a similar communication spec, include another FPGA on a daughter board, or simply used as IO.
Anyway, further development on this FPGA care is about 3 years off, and an xbox implementation probably closer to 10 years, so there may be a much better solution by then.
The cyclone 10 LP doesn't have a DDR3 PHY and Quartus Prime Lite doesn't support the GX variant.
That's correct, it's possible that I implied something else though. The Cyclone Vs do have a DDR3 PHYS (with the SoC variants having at least two). Additionally, the Cyclone 10 LP is slower and smaller than the Cyclone Vs, so it wouldn't be useful in this application. That's one of the main reasons that I started looking at Xilinx chips instead (the main competitor to Intel / Altera).
Very fascinating video about future retro FPGA development options. I'm a retro gaming enthusiast, not a developer, so a lot of the details are above my pay grade, but I'm still able to follow the basics. I have all the parts to assemble a full MISTer arriving this week, but I've been following the retro emulation scene closely for 20 years and it's fun to get a glimpse of the future from a developer such as yourself. You mentioned that an N64 core would not be able to fit onto the DE10-nano while outputting at 1080p. Forgetting 1080p for a moment, do you think it would be feasible to build an N64 core on MISTer that output at the original native system resolution? I read in another comment here that you're targeting HDMI on your new proposed platform as "most people" would be using an HDMI device. However, it would seem that anybody who is buying DIY FPGA platforms for retro gaming purposes inherently sit outside the categorization of "most people", as most people either are perfectly happy buying the new "official" mini-consoles or running software emulators on their cel phones and don't care one jot about the likelihood of adding 4-to-11 variable frames of latency to their classic gaming experience. For anybody considering paying upwards of $400 on this proposed FPGA platform for PS1/N64/XBox cores, analog output options seems like the most obvious make-or-break feature. But getting back to the MISTer for a moment... if an N64 core could be made to fit on the DE10-nano, but could only output at the native resolution, this would still be sufficient for anybody using an analog display and ,of course, pairing the MISTer up with a device like the OSSC could create a satisfying result for somebody using a digital display, could it not? I realise, of course, that running a 3D system like the N64 and rendering polygons at native resolution may not be as sexy as rendering them natively at 1080p or whatever - an issue which does not affect 2D pixel games on older systems - but at least it would be accurate? Or, do you think that an N64 core on MISTer with full logic implementation is just impossible, period? I'm curious, is all.
Thanks, I have been thinking about what comes next for a system like MiSTer for quite some time, and that's what I came up with. Though, this no longer actually seems like the best option... building an add-on FPGA daughter board for the DE10 Nano is probably more beneficial in the near future.
That's correct, and N64 core which is not overclocked, will probably require between 120-200K logic elements, the DE10 has 115K. That means that stripping away the on-screen-display (OSD), and upscaling hardware will not be sufficient. Basically, the only way to get an N64 to fit, would be to strip out the texture unit (so no textures), as well as the floating-point unit in the CPU. And at that point, it will probably just barely fit (with the OSD, but no upscalers).
With that said, however, the RCP itself should fit in the DE10 with an upscaler and the OSD (just no CPU). So, in theory, if a daughter-board was created for just the CPU on the N64, which communicated with the DE10 via the MIPS SysAD bus, you could run a full native N64 on the MiSTer.
That's a good point about "most people". However, it's highly doubtful than an "official" N64 mini will be released, or if it is, it will be using a virtual console like emulator (so it won't play general N64 games). Analog can always be achieved via a converter box though (I know it's not the same, but if the goal is 1080p native output, that's not something that an older CRT can handle). But that's another reason to go with an expansion daughter-board instead of a new system altogether - leverage MiSTer's current analog support. Note that I don't think that an RCP on the DE10 could output 1080p native resolution - the Cyclone V is too slow for that, but it can do 480p. So doing 240p or 480p (basically implementing what the N64 could do natively) would be far easier than implementing 1080p.
My concern is that people are spoiled from running software emulators, and having more modern GPUs redraw the triangles in 1080p. In order to achieve that, you basically need to pixel multiply. The N64 actually only drew one pixel per cycle, and in some modes, only drew one pixel every two cycles (there was a fast fill mode where it could draw 2 / 1 cycle). So if you are drawing a 240p screen, and multiply that by 4 (2x wide and 2x high), you get a 480p screen. That means drawing 4 pixels per cycle. Then you could multiply that by 4 again, to get 960p, and draw 16 pixels per cycle (that's the highest you could reasonably go). But drawing 16 pixels is a tall order, since you basically need 16 copies of the RDP in the N64 to draw 16 at a time, and that's going to be a massive resource usage. There is a saving cheat, where you can actually get away with overclocking it to 120 MHz and only have 8 copies of the RDP (since your cycle becomes 1/2 of what it was on the N64). Obviously, though, if the N64 wouldn't fit natively, a version with 8 RDPs wouldn't fit either. So if we only have 1 RDP, that would be the best bet to getting it to fit (with an expansion daughter board). Hopefully that wasn't too much detail.
It's possible that an Xbox core may fit in the MiSTer too (if the DE10 only has the GPU / Audio DSP / system controller). However, the bandwidth of the DE10 is too low for that to be practical (and there is no way to expand that). But it might fit. In comparison, there is no hope at getting a PS2 to fit, even with a daughter board (a PS1 should fit though, and so would a Saturn - hopefully). As an aside, it may be possible to also fork the development of the MiSTer into the DE10 Nano and the DE10-Standard (a bit more expensive), where the DE10-Standard has a high-speed expansion header. Though that would probably result in $500-$600 for a full "MiSTer" like setup, which is probably not practical. Just some ideas - there are many possible directions for the project to talk, so hopefully that gives you some hope.
@@RTLEngineering Thanks for the very detailed and considered reply. I certainly wasn't expecting an FPGA N64 any time soon, but this gives me a good idea about the challenges involved. Either way, I got my MiSTer set up late last night and it's working like a dream - FPGA tech is an amazing addition to the retro gaming hobby and preservation efforts! Cheers ;)
The problem is even if you have all the logic gates available the difficulty to implement those consoles post N64 increases ten fold. I can't imagine a single person implementing say Dreamcast in FPGA within a reasonable time frame.
Well, you have two problem there. 1) can you fit such a design into the FPGA, and 2) can someone realistically implement such a design in a reasonable timeframe. If you look at them, the more important problem is (1), because even if you could implement the design in a reasonable timeframe and it doesn't fit, you're sort of suck...
Even an N64 is 10x - 100x more complicated than a Genesis / SNES. I would say that an N64 is about 10x more complicated than a PS1. And yes, the Dreamcast is 10x more complicated than the N64, and the PS2 is 100x more complicated than the N64, etc.
So then the next question is, what do you define as "reasonable". I would argue that who ever gets it done first defines the definition of reasonable for the system. So if it takes me 10 years to make a working N64 core, and no one else released one further, than my 10 year long project is the definition of reasonable for the N64, especially since the timespan of the project would not deter me. Though, I probably won't release the first N64 core - mine will probably just be the first one that doesn't rely on other IPs (for example, there is one in the works that is using a modified version of a MIPS32 core).
Obviously, more people working on a single project will typically allow it to complete sooner. The problem there though, is that N64+ systems require high levels of optimization to run on "cheap" FPGAs ($200 for a chip might not seem cheap, but it's a lot less than the high end ones which cost $10,000 per chip). And the high levels of optimization require time and skill, so the pool of people who would be able to achieve that result efficiently is quite small - it's likely that many of the more prominent retro core FPGA engineers are not in that pool. Though I wouldn't use that as an argument to discourage someone from working on one of these cores, it may turn out that optimization doesn't really matter - I'm not sure.
@@RTLEngineering Thats what I mean, can we realistically expect someone to dedicate a decade to write a core? Surely one would have to be really motivated to sacrifice that amount of time in their life to implement something.
I'm just trying to have realistic expectations, that is all. PS1 seems like the best we can expect in a reasonable time frame (2 - 3 yrs) and thankfully someone is already on that. He's only put in 7months of work, but so far it looks like he's taking this seriously if you read the github.
I think you are thinking about this in the wrong way. We can't realistically expect someone to dedicate a decade to write a core, however, if someone is motivated enough to write such a core, they will gladly spend a decade on it. Basically, the difference is you are thinking about it in terms of the community making a request, where as I think it will be done by someone who chooses to do so regardless. At the rate I am going, it will probably take me two decades to do, but that won't stop me from doing it, regardless of what the community expects / requests. If no one else beats me to it, then that's when the first core will be completed. Keep in mind, that I have been working on an N64 implementation on and off for the past 4 years.
There are several people working on a PS1 core actually. So I'm not sure to whom you are specifically referring to. I believe that both developers that I am aware of, are basing their main CPU off of the aoR3000 though, which could potentially lead to problems - we shall see.
What about analog video output? This is what sold the Mister for me.
The problem with analog video output, is that it's not as simple as you may think. If you think about output resolutions, you basically have HD (720/1080p+) and LD/SD (analog 240/360/480p).
If you build a system to output in LD/SD, then it can be upscaled to HD easily (though upscaling doesn't add any new information, so an N64 game at 1080p would have giant pixel blocks).
If you build a system to output in HD, however, it can't be downscaled without loosing information (i.e. some pixels won't be displayed). Additionally, HD introduces additional signal integrity issues, where for example, a 1080p pixel bus must be point-to-point, so you can't send the same wires to an analog output driver - you would need 2x the pixel buses.
MiSTer is able to output analog because it targets LD/SD output, however, my idea for this platform was to instead target HD output.
So then the options that I could currently see are:
1) Don't target HD output, but then PS1/N64/PS2 games will look terrible, and that sort of defeats the purpose of this new platform
2) Add an extra output for analog which could decimate the video output (i.e. draw every other or every four pixels). Due to the signal integrity issues though, that would mean having another output from the video combiner FPGA, which doesn't have any free IO, so then it needs to be a larger FPGA which costs more - so the price might go up from $400 USD per platform to $450 USD per platform, regardless if you want analog or not. Is that fair? (I don't know, maybe)
3) Use an external HDMI to Analog converter (probably not the option that you would want)
4) Stick with the MiSTer if you want analog output (but that means little or no 3D system options for the MiSTer)
It's an annoying and frustrating aspect of engineering in general - often there is no practical solution that satisfies all of the design requirements, and compromises on features have to be made. Hopefully this helps you understand the complexity of what you are asking for / why it wouldn't be trivial to implement.
2 things: Is this still being worked on? Why do you need to have ram on the board if the PCIe spec allows you to use the system ram?
This is still being worked on, but has changed form. The only viable option from a value proposition is to use a Zynq Ultrascale+ SoM (any other solution would have a significantly lower performance/price ratio). So that's the current plan.
Since it's using a SoM, the main RAM will be soldered to the SoM and thus can't be upgradable, though since that RAM is higher latency, there is a plan to have a secondary set attached directly to the FPGA logic. Also, there won't be a 486 socket, since there isn't enough IO, but there's another solution to that problem. Finally, it will most likely not be a PCIe card because the MPSoC part of the Zynq is more than capable of performing the required host functions - i.e. it's going to be an mini-ITX or micro-ATX form factor.
In terms of the PCIe spec, you can use the main system RAM, but the latency is terrible (up to 10us for a read) and the bandwidth quite low (as low as 1 GB/s), so it's not practical. It may be more practical in an environment with something like PCIe 5.0/CXL, but only the new Versal FPGAs support that.
@@RTLEngineering Darn I really was hoping for it to be a PCIe Card. Having it connect to my PC would mean I could experiment connecting it to other PCIe devices, like graphics cards and other older cards.
So I was considering a PCIe card still with the new FPGA, but there is one major problem.... drivers. It would be almost impossible to produce a consistent set of drivers across many different system configurations (look at how much NVidia and AMD struggle across windows alone). If it was a PCIe card, it would probably only work on Linux, and would be compiled from source by the end user supplied AS IS. So any bugs or compilation errors would probably be up to the end-user or community to figure out... I don't think it would be viable. (I originally thought it was doable, but looking into it more just shows how complex it is).
Although... the new design I am working with (which uses a SoM) has the ability for the FPGA to act as a PCIe host. I had looked into the possibility of being able to plug a GPU directly into the FPGA (so you could use a GTX1650 for example with the ARM CPU), but that turns out to have problems at the distribution level (i.e. NVidia applies a user license to the driver which means that the end-user must be the one to install it and debug any issues as a result).
I guess the TL;DR there is that it's very complicated to do PCIe things in a non-standardized way. M.2 SSD PCIe is easy, M.2 Wifi is easy, PCIe devices not easy (that's not a standard, only PCIe is), and GPUs are definitely not easy (they're not a standard either).
So how long before you become a monopoly on great Retro Hardware emulation? Jokes aside, you really explained the concepts very well. Can't wait to put an order ;)
Thanks! I doubt that I will become a monopoly. And besides, I don't plan to to what Analogue did (the platform presented here would be open source, just expensive for someone to make themselves), though I might sell the pre-compiled cores.
I can't wait until I have something that can be ordered / produced and actually works!
@@RTLEngineering I wouldn't mind a Analogue N64 solution that implemented this. Maybe you and Analogue should talk about it.
@Fun. No Commitment. I don't think that Analogue would have any interest in this platform, since it would be reconfigurable unlike their current products. And in terms of working with them for an N64 core, I don't think that would be in either of our best interest (it would be too expensive for them to sell due to the FPGA cost, and I don't particularly like the idea of selling a new product for each core). That's the main reason that I was trying to appeal to the MiSTer community, one up-front cost to an open platform where you easily reconfigure it to play games from any classic console.
any word from mister team on your idea what about sound @@RTLEngineering
@soloM81 I have only spoken to one member of their team (before releasing the video), whom I was able to convince. So it's likely that if this is produced, at the price-point that I mentioned, and as easy to convert existing cores as I think, then there will be support from the MiSTer community.
Fascinating, I hope some fitting fpga comes out some time in the future. Would this already work on the Artix-7 35t?
A fitting FPGA already exists, I think it was mentioned in the video (the Artix-7 200T). The Artix-7 35T would not work, since it lacks sufficient resources (i.e. you may be able to fit a C64 core, possibly an SNES core in it, but nothing more). Comparably speaking, the 35T is roughly 1/3 the size of the MiSTer.
However, I have actually been thinking more about the idea of using a Zynq Ultrascale instead of an Artix-7, since it could allow for "hybrid emulation", and could presumably be far larger than an Artix-7 200T (at more cost of course).
@@RTLEngineering Interesting. Are you implementing the core in Verilog or using something like chisel (also used for Risc-v cores).
Which core are you referring to? The idea is to allow the MiSTer cores to be ported over to this new system, so those wouldn't be re-written.
As for an N64 / PS2 core, the only way to get the required performance is to manually write and optimize them in HDL (I have other videos on the channel discussing optimizations for both cores - since they use a similar CPU architecture). I use a combination of VHDL and Verilog, mostly VHDL though, since it's not prone to the same sort of typo errors that Verilog is (Verilog is nice for connecting blocks though).
A Risc-V core would have no place in this design, since none of the existing ROMs are compatible (i.e. you can't make MIPS machine code run on a Risc-V - you could by turning it into a MIPS, at which point it's no longer a Risc-V). However, that's not to say that you couldn't use it for something new, which does use a Risc-V. A Zynq Ultrascale, however, has hard ARM processors built in (at least the one I was thinking of), but you could always use the ARM to act as a system interface between a host and a soft-core Risc-V.
@@RTLEngineering Yes indeed referring to the vr4300 from your other videos. I mentioned RISC-V because that is also a risc chip and it seems people make nice implementations of it with Chisel, so maybe it works for this. But I guess you can't optimize to the level as VHDL.
Correct, using a tool to generate the core will not produce optimal results. Honestly, I had never heard of Chisel before you mentioned it. It looks like it's useful for general FPGA implementations, but not for this specific task. Furthermore, RISC-V being a RISC chip doesn't mean much when you are talking about a VR4300. The context of my channel has be surrounding implementations for "Archival Preservation", which means running existing software. You can't run VR4300 code on a RISC-V chip anymore than you can drive a cruise ship on a highway. They may be similar, but when it comes down to it, similar doesn't count for anything, except blowing out transistors (it has to be a perfect match - you can't run N64 code on a PS1 either, even though they both use a MIPS processor). If you wanted to create something new, however, using Chisel and RISC-V on an Artix-7 FPGA would be a feasible option.
Exellent work! Hope it happens
Have you considered the Acorn CLE-215? Very cheap
I don't think that's a viable solution.
First of all, it uses M.2 which means you would somehow have to pass a framebuffer via PCIe (huge bandwidth bottleneck + latency).
Secondly, the FPGA doesn't appear to be specified, so there is no way to know what the FPGA is capable of.
Thirdly (most importantly), it appears that the platform is ONLY for crypto mining, i.e. they abstract away the FPGA so that you can only load bitstreams that they provide via their API.
And fourthly, it's not cheap considering what it is... It would depend on the FPGA in it, but if it's the FPGA I suspect, you are paying 2x what the hardware is actually worth (the cost is in their software and API which wouldn't be useful for this type of application).
Thanks for the suggestion though. It's always interesting to see how other markets are using FPGA accelerators.
@@RTLEngineering Bandwidth is 4Gbit/s, latency should be fairly low considering all GPUs are PCIe too. FPGA is Artix7-200T. JTAG-Headers are exposed and I can load and flash bitstreams with a generic FT2232H-board with OpenOCD. You can get them very cheap second hand for $70 + shipping. ($40 + shipping for the lower cost and -noise model with an Artiz7-100T). But aggreed, the integrated solution will perform better, you'll get what you pay for.
4Gbit/s isn't sufficient in this case, since a 1080p60 stream takes 3Gbit/s of bandwidth. Also, PCIe latency is quite high, especially when talking directly to an FPGA. GPUs are able to tolerate it better since they can handle the incoming packets more efficiently than the FPGA can.
Also, having to rely on a JTAG header and OpenOCD is not a viable solution if you want to make it accessible to the general public (the whole point of the proposed system). Same thing for buying second hand mining solutions.
Alternatively, there is nothing stopping an individual from taking a MiSTer core for example, and porting it over to the 100T/200T for this miner module and running the core there. The drivers and infrastructure would still have to be developed from scratch though, so the effort is likely not worth it.
@@RTLEngineering Agree. Points taken. You can program the FPGA via PCIe too, but that needs coding
Very interesting
a ps2 or xbox sound kinda overkill. If it can be done id be surprised.
Id be surprised on a n64 core as even emulation has trouble with that. PS1 has very good emulation, It just has troubles with a few things that aren't right or mess up, but most games are playable and surprisingly runs on lower end devices fine like a raspberry pi..
I wonder if you can attach 2 d10 nano's together thou the expansion slot or via hdmi if there is enough bandwidth.
Anything can be done with enough effort. I'm not sure if it's practical though. You would need for at the very minimum, emulators on a PC to be able to play those system games flawlessly, which as far as I am aware, is not currently possible (they're pretty good, but not flawless). You could always make the argument that the new xbox is able to play original xbox games, however, that platform will become obsolete at some point in the future as well, so the target hardware must be something that is continually being produced and can therefore always be obtained (the benefit of FPGAs - even when the DE10 is no longer made, the cores can be ported to a new FPGA).
An N64 isn't trivial to implement in an FPGA, but it's no where near the limits of the DE10 for example - the problem is mostly space (lots of logic gates). There are currently at least 3 people working on an N64 FPGA implementation, and one has gotten a simple test program to display via VGA (so it's going to happen).
You could theoretically connect two DE10 Nanos together, but that would be a dumb idea. The DE10-Nano has extra dead-weight on it (the ARM cores), which you wouldn't need in an expansion. I have thought about developing an expansion FPGA board for the DE10-Nano, which uses either a Cyclone VE or an Artix-7, both of which could be programmed from the ARM CPUs on the DE10 Nano. Unfortunately, connecting the two would mean 1) no SDRAM (it would need those pins). The SDRAM could be attached to the secondary FPGA though, but that's another hop. And 2) limited bandwdith as you guessed. The expansion connector on the DE10 Nano is a 2.54mm header, which will not allow for high-speed signals to pass through it, so any communication link will be limited to below 100 MHz. That would probably be fine for most of the current cores, and may even be okay for the PS1, but it would probably be a bit slow for an N64 (unless it were partitioned correctly), and would be far too slow for a PS2. There would also be the issue of link latency - it may introduce more issues than it solves.
I'm no tech, but it's a nice presentation... :-)
Regarding the XBOX emulation, i'm surprised. The actual state fo emulation on PC is... "weak" to be polite.
I was prepared to give up XBOX "classic" emulation, even on PC.
I read - long time ago - that we didn't know -precisely- some elements of the xbox ( the GPU chipset ?? it's a special version of an nvidia GPU)... and it was a problem to recreate a good emulation.... maybe there is light at the end of the tunnel, as we say ??
One of the XQEMU claims that it has made great progress in the past few years (I have not really looked into it).
It's true that we don't know all of the elements of the xbox chipset, but we have some idea of how they were composed. It may require leaked documents, or some old fashion reverse engineering to figure it out though. In terms of developing a chipset, the actual architecture of the NV2A GPU is pretty straightforward from an architecture level, the issue comes down to how precisely to control it. It may, not be possible to implement a PC emulated version of the NV2A though, since it's basically an early version of a modern GPU (multiple shader cores, running at a few hundred MHz). It should also be possible to hookup a high-speed logic analyzer to a development XBox, and sniff the front-side-bus while the CPU talks to the chipset running known operations. That could provide the missing information on where / how to communicate with the chipset components (perhaps that has already been done though).
An FPGA implementation may be an alternative option there (running the CPU emulation on a PC, and using the FPGA to just do the chipset stuff). I may have given the impression that is had been figured out though, when my main intention was to suggest that it is feasible from an engineering standpoint (i.e. the details still have to be worked out).
Anyway, I wouldn't give up on being able to run classic XBox games on a newer general purpose system. And at the very least, I recall hearing the CEO of AMD (Lisa Su) mentioning that the newest XBox was built with backwards compatibility for the original XBox games (though that just kicks the can down the road).
where did you get all that design experience from?
Several University degrees (for the theory), and many projects (for the practical). I am still learning though, since the electronics design space is huge - I usually learn something new every day.
Ohh wow okay, what do you mean with several university degrees? Multiple masters? Doctors? And how are you learning everyday? With papers?
I really enjoy watching your videos and want to get involved in the field as well
Multiple MSc (done concurrently), and I'm currently in a Ph.D. program (i.e. doing research in a related field). Then there are papers (which don't actually teach you that much, but often have little nuggets of useful ideas), textbooks (which I have found the most helpful), blog posts / articles, courses (there are a lot of good architecture courses on youtube for free), conference proceedings (Hotchips has a lot of great talks - SGI presented a talk on the N64 RCP back in 1997), open source projects (dissecting code / examples on github), etc.
A more concrete example is that when I was doing my B.Sc, Intel published several papers and the code for their Smoke game engine architecture. I learned quite a bit from those papers and their code, including a C++ style that I still use, design patterns, low-level multi-threading, etc. And then I started to implement my own game engine with what I learned (that was before even UE4 was released). Learning the basic theory though from courses and textbooks did really help me when trying to understand their architecture, and then reading other textbooks on design patterns helped even more.
As for learning things every day, it could be as simple as coming across some small bit of information that puts large amounts of information into perspective. For example, learning that the Xilinx FPGA tool will automatically re-align timing regions if you manually instantiate a BUFG primitive. And then timing greatly improves on many of the other projects I was working on as a result. (the tool doesn't always infer that a BUFG is needed)
Thanks! I wish I had more time to upload videos.
It would depend on what part of the field you want to get involved in, and what your background is. For example, if you want to get into PCB design, but don't have an EE background, it's going to be much harder. Architecture and FPGAs are much easier to get into without the EE background, because digital circuits are much simpler than analog (designing an FPGA PCB uses a lot of analog knowledge for power supplies, loads, signal integrity, etc., whereas using an FPGA dev board takes care of all of that for you). In other words, the closer you are to software, the easier it is to get into. That's not to say that it's impossible, but you would be better off taking at least an intro EE course (there are a lot of free ones on MIT OCW though - you don't need to do it for a degree, but you have to take it seriously).
Oh wow, very impressive. I’m at my hard limit with only one MSc at the moment.
May I ask which degrees did you complete?
I am currently studying Computer Engineering. Is that too little for that field?
What was your approach to plan your future?
Is experience working at companies necessary when I want to get into research?
Not really, most of those had a lot of overlap (i.e. take 2 extra courses and get another degree). I wouldn't recommend doing multiple degrees at once without having significant overlap (that's just too much work). Often however, many degrees will have "electives", where you can substitute required courses for each degree in those slots. Note that you pretty much need to get a PhD though to do academic research (you can do R&D research in the private sector without one). And if you do that, then you want to make sure you find a University with faculty members that are doing research in a field you are interested in.
A few levels of Physics degrees, Applied Mathematics (numerical stuff - i.e. what's needed for HPC), and a few engineering degrees (one of which is Computer Engineering). So yes, Computer Engineering is very helpful here - digital logic falls squarely within that sort of program.
For a future approach, I didn't really have one - I sort of went with the flow. I knew that I wanted to teach at a University and do research, so I followed the simplest path to get to that goal. And in terms of working for companies, that's not required to do research, though many people in academia recommend getting some experience there (many companies have paid summer R&D internships). I can't really give you much advice there though, since I came from a science background (physics) and not engineering, so all of my non-university research experience is with government funded laboratories.
Thought about starting a patreon?
I have considered it, but I don't think it would make sense until I have around 1k subscribers. Though I am not sure that I would have enough time per month to spend on this project, so I'm not sure if that would be a good idea / option. Thanks for asking though!
No Coleco?
It wasn't relevant for 5th and 6th gen, so I figured it was better to leave it out than add an extra line to the table making everything smaller and harder to read. There is a MiSTer Coleco core, so that's already covered.
There was a s478 Pentium 3?
Nope, I meant to say Pentium 4 for the entire video, even though the xbox used a Pentium 3 (they are close enough in architecture that there shouldn't be any issues).
@@RTLEngineering Xbox games use `rdtsc` (assuming 733.333..MHz) to synchronize the CPU and GPU. So you need the proper timings or you have to patch games. I'm not sure what changed in Pentium 4, but I'd assume it could have bad consequences.
Also, if you struggle with the 733MHz Pentium 3, you might also struggle with the 2x DSP56k and the NV2A GPU.
I also assume that software solutions (XQEMU in particular) will be fast enough in the future (similar to Dolphin).
A major roadblock in usability of Xbox emulation is that you need the MCPX ROM / crypto-keys - but a FPGA implementation wouldn't change this.
@Jannik Vogel I am pretty sure that you can under-clock a P4 to operate at that frequency, though I would have to assume that there is some other form of synchronization, since the two clocks can become out of sync by a cycle very easily. I haven't probed an xbox to check this, or tried running a P4 at 733 MHz, but there should be a solution other than patching the games.
As for the struggle with 733 MHz, it's not possible to run logic in a low-cost FPGA that high, the maximum global clock that I have seen is on the Cyclone 10 @ around 600 MHz. However, didn't the DSPs and NV2A operate at around 230 MHz? If that's the case, then that is completely doable in the Artix-7.
Eventually CXBX will be fast enough, since the xbox was and x86 machine, I doubt that XQEMU will be fast enough, since it runs a complete software model for the CPU. That doesn't negate the benefit of a hardware re-implementation, otherwise that same argument could be made for any of the retro consoles, making this platform moot (i.e. not just an issue with the xbox).
It was my understanding that the major roadblock was getting the hardware components to work together correct at a reasonable speed ( the problem with XQEMU), if the issue was the MCPX ROM / crypto-keys, I would think that none of the working / partly working games in emulators would work at all. Besides, there are ways around that problem, especially if you are able to utilize a DMCA loophole / exception.
@@RTLEngineering Yes - DSP and NV2A run around those clock speeds (I believe 160MHz DSP and 230MHz GPU). I was more concerned with the number of LUTs and internal pipeline states which might be hard to recreate (I could imagine requiring a higher clock rate for naive implementations).
For the record, I'm a XQEMU developer; there are experimental XQEMU branches which reach full-speed (or very close to it) in many games. The main bottleneck for performance is GPU emulation (which those branches address); for users a critical issue is lack of audio emulation (which also exists in experimental branches).
XQEMU does *not* only have a software CPU. XQEMU has supported CPU virtualization (KVM) for more than 5 years, and has lately also been getting better support through HVF, WHPX and HAXM on Windows, macOS and Linux. There's also hardfloat in TCG (CPU interpreter) now, so even the software interpreter on a strong single-core machine will work fine for some games.
Cxbx is a radically different emulation approach which has many drawbacks (such as not being able to support all Xbox games as it depends on code-pattern detection, which won't work for link-time-optimized titles). So if you are going to compare a hardware re-implementation to a software-emulator, your only references should be MAME and XQEMU.
The MCPX / crypto-keys (required for boot-code decryption) are dumpable and known - so they don't block progress. But they are obviously illegal to distribute and you need to modify hardware to dump them (which end-users can't do); there's also no freely available tools to dump them. So it's an end-user *usability* issue that prevents growth of XQEMU [LLE] compared to something like Cxbx [UHLE / HLE] (among other factors) despite XQEMU being a superior emulator in many ways.
@Jannik Vogel Clearly you have a bias for XQEMU. That sounds like it has made great progress though! I briefly looked at it about 2 years ago, and convinced myself that it would be helpful if I ever tried to do an xbox implementation on an FPGA.
How is XQEMU solving the MCPX / crypto-key problem? Is it relying on dumps? If so, then your argument about usability of an FPGA implementation vs XQEMU are identical. I think in either case, dumping the ROMs would be illegal, unless you are doing it on behalf of a digital preservation organization or university.
You do raise valid concerns over an FPGA implementation, but this sort of platform would be the best hope for such an implementation if it is possible. Though part of the benefit here, is you could for example, implement just the NV2A on the FPGA, and do the rest with XQEMU.
I'm not sure what your concern with the DSP and NV2A are. In this example, the FPGA is only implementing the DSPs, the NV2A, the DRAM controller (which is small), and the FSB controller. The NV2A GPU was pretty simple, and is actually very similar to the VU cores within the PlayStation 2, so I don't think LUTs will be an issue (sure it will be large, but there are a lot of LUTs in the XC7A200T). As for the internal pipeline states, I don't think they need to be recreated exactly, as long as the end functional behavior is reproduced (i.e. it follow the spec that the software expected). If the concern is meeting the speed requirement due to the LUTs and pipeline states, that's exactly how you meet the speed requirement - LUT decomposition and pipe-lining. I think it's possible to do, but the actual feasibility won't be known until it's attempted (a downside to FPGA implementations - you can put a lot of work into them, only to find out that they are too slow or don't fit).
Hope this becomes a reality
What happens to this
@RTL Engineering
Can you comment on this board numato.com/product/aller-artix-7-m-2-fpga-module For FPGA Game Console and Retro PCs (Amiga, Atari etc) Emulation The idea would be to implement a virtual vga capture card within the fpga and send the game video directly to the host PC memory (or even directly pcie to pcie into the gpu). The PC would be the host co-ordinating the emulator and handling all external io. While interested in retro gaming I'm also interested in all the other applications listed on their product page and don't really want another standalone device. i.e. I want a general purpose fpga solution.
I haven't used the Aller, but it does have the potential to do what you are suggesting. Someone had mentioned a similar platform specifically geared towards crypto-mining, however, I think the Numato one is cheaper, with a larger / faster off-chip memory.
It should be able to easily handle an Amiga (a stock version), as well as any of the MiSTer cores. I also think that you could fit a virtual capture card in there along with the emulation core, since it's a 200T.
I have never implemented a capture card, so I am not exactly sure how that would work. But it should be simple enough to have a framebuffer in the DDR3 which is streamed to the system RAM of a host computer, and then drawn to the screen via some graphics library (like SDL or OpenGL). I don't think that you will have much luck dumping the framebuffer directly into the GPU though.
If your primary interest revolves around using a host machine (either a SBC or PC), then this would be a good choice for an Artix-7. Though be warned that getting the PCIE interface working is tricky, though should allow you to essentially connect it to an AXI bus like a memory master.
Another word of caution. It's not clear on their website if the JTAG connection is via the PCIE connector, or via the unpopulated header. My guess is that it's the unpopulated header, so you would need to solder a connector as well as buy a Xilinx compatible JTAG programmer to use the Aller.
The only other Artix-7 solutions that I know of are stand-alone, which don't have PCIE lanes, though do have a lot more IO (mostly for stand-alone development).
What would it cost, though?
That was stated in the video, though may not have been all that clear. It looks like the cost would be around $400 USD, where $230 of that is going to the FPGA chip alone.
@@RTLEngineering 😬
Do it. Amazing!
Sold open up a subscribe star account and I would gladly throw you some dollars.
I would, but I am not going to join Subscribe Star or Patreon until I have 1k+ subscribers (arbitrary number, I know). Also, I am not currently working on the project presented in this video (I am working on something potentially more interesting), but it wouldn't feel right accepting donations for something that I am not currently working on.
@@RTLEngineering You are now beyond 1k good going
I will consider it. My main reservation is that I can't guarantee regular content, which goes against the terms of those platforms. But I can look into what's involved in the process.
@@RTLEngineering In any case I wish you the best you are doing good work here.
The current MiSTer is based on a mass produced (and significantly discounted for educational market) board, Will this "new" board be significantly cheaper than the existing low cost MiSTer board?
I am aware of why the MiSTer uses the DE10-Nano. As mentioned in the video, the new board would be around 400$ USD (~4x more expensive than the DE10-Nano). So it wouldn't exactly "replace" the MiSTer, but the goal would be to have a backwards compatibility, so that those who want to use cores which cannot run on the DE10-Nano can still use the original MiSTer cores.
As stated in the video, you basically have 3 options for a system like the N64: 1) Don't emulate it and give up, 2) buy a more expensive emulation platform with a more expensive and beefier FPGA, or 3) stick with buggy software emulation. Many people may not care about being able to run an N64 FPGA core though, and will be able to stick with using the DE10-Nano (the same argument can be made for the PS2 / Dreamcast, etc.).
What is the point of implementing an FPGA version of PS2, Wii and up? They are designed around a frame buffer, unlike the cycle accurate timings of prior generations, so the benefit of FPGA seems wasted on it, when modern computers can emulate those consoles perfectly.
That's a good question. Though the PS1, N64, and Sega Saturn are also framebuffer designs. So by that logic, why bother trying to get those to work on an FPGA? Cycle accurate timings aren't needed for any of those. Although, many devices still struggle to emulate the N64 or don't do so correctly (there's a similar issue with the PS2, where it requires a relatively powerful system to run). Perhaps this is a more philosophical question though... Why bother with an FPGA at all? You can do cycle accurate emulation of a C64 on a high end x86 machine for example (a low level, timing accurate emulator). After all, MAME has pretty good compatibility and it's not LLE.
I think the answer to your question is has a an academic answer, a practical answer, and a psychological answer.
For the academic... software preservation. We want to be able to run every piece of software ever released (perhaps ever written) for archeological reasons. While emulators are a tool that can help, as the system complexity increases, it becomes harder for a single computer to replicate all of the required behaviors, especially if you want them to be true to the original hardware (i.e. rendering a PS1 game at 480i with all the flaws and quirks).
For the practical... cost. If you need a $1200 PC to run a PS2 emulator, vs a $600 FPGA system, wouldn't the FPGA system make more sense for those who do not have a $1200 PC?
For psychological... there's something about running on actual hardware vs a software emulator. Maybe it has to do with the fact that it's the only thing running. I for one would prefer to play PS2 games on an original PS2 rather than an emulator. Same thing for N64, etc. And an FPGA implementation doesn't feel any different to me than the real thing, unlike an emulator. That same thing applies even more so with a computer... (which in this context has applications to the SGI systems like the Indy)
Another reason to go FPGA vs software is dealing with "software rot". That has implications for the academic side, but also for other cases.... Here's an example for you:
Let's say the PCXS team vanishes and decides to stop working on the emulator. Microsoft releases Windows 11, and PCXS doesn't run on it, nor does it compile. Then someone has to go through and update the PCXS code to make it compile and run on Windows 11. If the team vanishes, then that may be a nearly impossible task.
If it were implemented on an FPGA instead though... there may be no reason to update the FPGA core as long as it runs on a specific device. So all of the MiSTer cores for example will always run on the DE10-Nano, as long as you can still get that board. The linux support side could have rot issues, however, the need to update the linux kernel on the ARM processor is not the same as for a desktop PC. Who cares if MiSTer is running linux 2.0 from 2003. It could have tons of security flaws, but you aren't going to do anything security critical on it. But if you have to keep your PC running windows XP in the year 2035 to run PCXS, then you probably don't want to connect that PC to the internet, and you wouldn't want to do anything security critical (like online banking) on it.
Whether or not software rot is a big enough issue to justify an FPGA over emulation is uncertain. But FPGAs do last much longer than PCs do, and they don't require constant updates that can break massive software dependency chains.
Note that in this case, something like the Playstation Classic would fall into the same category as the FPGAs (though the FPGA will probably be able to run for a longer lifetime and the SoC in the PS Classic). However, it would be economically infeasible to release a software emulation system like the PS Classic for a PS2. The only way to do that.... would be to use an FPGA.
@@RTLEngineering Thank you for your very thoughtful explanation. Makes a lot of sense, when you look at it from this nuanced perspective.
I’m curious about the point you made about FPGA lasting much longer than PCs. Is this primarily due to your aforementioned points about software rot and future compatibility issues, or do you also mean the hardware itself has higher MTBF ratings than modern CPUs and SOCs?
I’m just now getting into and learning about FPGA, having just ordered my first Mister kit last week. One thing I’ve been most curious about, is how the system will hold up after years of moderate to heavy use, constantly writing different cores.
I'm glad it made sense. It was a difficult but very relevant question to answer.
So FPGAs typically do not suffer from the same software rot issue as PCs, but that doesn't mean they can't. For example, if the MiSTer github repo gets deleted and all of the source code is lost, then it can never be ported to another FPGA when it reaches end-of-life. Luckily though, porting FPGA code is much easier than software (if written correctly, which most of the MiSTer HDL is not - though it's still probably easier than software). The reason for that is that there are only a couple of things that can go wrong with a port... IO interfaces (which are external to the core), weird hardware quirks (things like internal tri-state busses, which FPGAs don't have anymore), and vendor specific IPs (using an Intel macro to declare part of the FPGA to use rather than inferred code - these can often be translated though). For the IO interfaces, the MiSTer cores all use a HPS bus coming from the ARM processor, which I believe was changed to AXI on the Cyclone 10. The Xilinx chips all use AXI, and I think the Microchip SoCs use AXI or AHB. Anyway, all of those buses can be converted to HPS using an intermediate core, so that's a lot easier than trying to get a bunch of dynamically loaded libraries to recompile or run on a new linux kernel.
In my experience, the MTBF for FPGAs is far higher than CPUs and SOCs mostly because of their use mode. The first thing to keep in mind is the expected device lifetime. FPGAs are designed to last for 20+ years, and the automotive and aerospace grade ones are designed for a much longer lifetime. Your desktop CPU on the other hand is only meant for a 5 year lifespan, same thing with your GPU, and even the raspberry pi (general SoCs are designed to last longer, but the pi is cut down in cost, so it's likely that the MTBF is much shorter).
But I think the more important aspect is related to software rot... you wouldn't want to use a PC from 2004 to do emulation today when you can have a new one that's much more responsive. The FPGA on the other hand doesn't have quite the same concept, because the design was implemented to do exactly what it did when it was designed. That's why you can still buy brand new products with FPGA designs for 20 years ago, and they work exactly as advertised.
As for the FPGA holding up with reprogramming, don't worry about that. The FPGAs use the same type of SRAM as in a CPU cache, and the CPU cache is written to millions if not billions of times per second. The part of the MiSTer that will probably fail first is the SD card if you continually write / save games to it. There are some FPGAs like the Max 10 which have internal flash for the configuration, and those will have a limited number of flash program cycles. But those will still have a virtually unlimited number of power on cycles (every time the FPGA is powered on, the flash is read and the FPGA configuration SRAM written). A similar thing happens with the DE10 on the MiSTer, except the FPGA SRAM is written via the ARM within the FPGA by reading from the SD card. If you use the USB Blaster interface to program the FPGA, then it's writing the FPGA SRAM via the USB.
The one thing you have to worry about are IO constraints though. Make sure you don't set an IO to be "out" when something else is driving "in", and make sure that the voltages on the IO are correct (both with configuration and anything plugged in). Those are easy ways to destroy the IO buffers / transistors. Other than that, I suspect that the other components on the DE10 boar will fail before the FPGA itself.
rofl allready done replay 2 by Mike J
replay 2 has very few cores to mess with and do you have a link to the PS2 core wip page . can it play mister cores if it can why has he not port theses cores over?
The Replay2 uses a Spartan 7, and I don't think the claimed 1.5 GB/s DDR3 would be sufficient for an N64 with a 1080i framebuffer. If that person is able to get a PS2 core working on that at full speed, they should get a medal, because 1) it won't fit, and 2) the memory bandwidth is too low. There was a reason that a x64 SODIMM was chosen, as well as the Artix-7. As it is, It's likely that only the emotion engine will fit in a single Artix-7, let alone a Spartan-7, 2/3 the size (if it even uses the largest Spartan-7).
@@RTLEngineering not sure where you getting information on spartan 7, Mike I stated they are using ultra 96 platform has spartan 7 is end of life you must be looking at out of date part of the website. Not the current 2019 plans for replay 2.
@Matthew Langtry The information on the Spartan 7 is directly from the Replay 2 website: www.fpgaarcade.com/replay2/
The information regarding the the memory is also on that website.
If the plans for the Replay 2 have changed from what is on the website, there is no way for me to have known that. Feel free to update me with the currently planned FPGA.
If instead you were asking about the information specific to the Spartan-7 vs Artix-7, that can be readily found within the Xilinx datasheets.
@@RTLEngineering The website is a little hard to navigate, here are more details of the ultrascale design :www.fpgaarcade.com/replay2-first-mock-up/ I was targeting the S75 or S100 in spartan7, which is as large as the second biggest Artix-7. Using a 64 bit memory system the performance is very similar to the Artix-7. Pretty much all the cores which are available on Mister/Mist will be available on R2 - most on R1 shortly. The reason for shifting to the ultrascale was to get the tight CPU integration for hybrid emulation and potentially the GPU as well, which will be handy for the Amiga etc.
Do an internet archive search
At archive.org
For ique source code
You’re welcome
Thanks, but from what I have heard, a lot of those docs were released over 10 years ago. I'm not going anywhere near them for legal reasons though.
RTL Engineering
Understood, lol
I am giving this video a thumbs down because if gave me bad news about a n64 fpga core
Reality often sucks because of what we wish to be possible and what actually is differ. I can't tell you how many times I have run into this issue as an academic researcher.
At least I am giving you hope that an N64 core is possible on another FPGA, unlike Kevtris (the engineer to worked on the Analogue FPGA cores), who claimed that an N64 core would be impossible flat out. Also, to put things into perspective, the FPGA in the MiSTer was designed back in 2009, so it's 12 years old. And at the time, it was 20% slower than the Xilinx 7-series which was designed in 2008. There were several poor choices made along the lines by Altera, Intel, and the MiSTer community that lead to the bad news you spoke of (i.e. the community could have chosen a better FPGA to use, there are even ones now around the same price of the DE-10 Nano that could run the relevant part of a PlayStation 2, or a Dreamcast).
Unless you were referring to the bad news being that other people are working on N64 cores, where one person supposedly has Mario 64 and Golden Eye running on an Artix-7?
@@RTLEngineering just to be clear...I didn't give the video a thumbs down.
Too much complicated, too much expensive.