You probably knew more than 99% of other teens. Or other people for that matter. But the commenter? Well he (oh, it was a he, for sure) sounds like an absolute genius.
It might be. It is definitely surprising what you can do with just that while loop, for sure 😁 For example it can be used to make a rather clunky sort of if: ... if = x
I'm pretty sure it isn't. You can't allocate arbitrarily sized working space, and it doesn't have scope local variables or recursive function calls, so it would be difficult to get around that limitation.
@@brandonmack111 The problem is that it can only store data in a finite set of variable names that are determined at program start. For a general Turing machine, it needs to be able to grow its working space dynamically depending on the program input (and how much that is can't in general be determined without running the program on that input). I think that may be the only thing that is missing though.
i wrote a compiled imperative language in rust :) writing a compiled language is probably the most educational project that exists ever, because you learn (on a deep level): - how memory works (including pointers, allocator kinds aka stack, region, heap, temp, gc, etc, how stack frames work) - how compilers work (lexer -> parser -> pre-IR for semanalysis -> compilation to IR -> asm -> object code -> linked into executable) - how to maintain a larger scale project (with potentially 10k+ lines of code) - how to structure a larger scale project (with project file names and splitting of code into different files/functions etc) - best practices in terms of data structures to use (Array vs HashMap vs HashSet, null terminated string vs sized string, etc) - abstract syntax trees and/or grammar (including different parsing algorithms for operator precedence if you write your own parser (as you should) such as recursive descent or shunting yard) - how to debug potentially thousands of lines of code, - how modern language features actually work under the hood (for example, loops and conditionals being compiled into just jumps, monomorphization vs dynamic dispatch for generics on structs and/or functions, capturing lambdas and their environments, etc etc) - how to use tooling to your advantage - how to effectively test your code among others. i think there is not a project that ticks more boxes at once than writing a compiled language. plus its really fun seeing it slowly expand and being able to do more things over time :3
Do you have any recommended resources for a project like this? I'm currently going through the 'Writing An Interpreter In Go' book (which I will follow with its sequel 'Writing A Compiler In Go') and I was wondering if you had some more insights. Thanks! :)
@@HumaniNihil-c8k I'd recommend just looking into the parser part first. Start by poking around with existing languages (eg Python's "ast" module) and learn about how a stream of source code gets tokenized, then converted into a syntax tree, and finally executed. Most languages "execute" by converting into a series of instructions (machine code, or a higher level bytecode), but for learning purposes, directly interpreting the Abstract Syntax Tree is a lot easier to get your head around. Design your own language with a very simple grammar. Parse it into its corresponding source tree. Then run it by taking the root node of the source tree and recursively doing what it says! Mastering that will give you a great insight into how programming languages work.
I like to follow, and implement when possible, this way of sharing knowledge when introducing a topic: - No frameworks - No add ons - Code something practical, pause, ask questions, implement the answers. Very nice explanation Dr. Tratt
A fun excercise - I always thought such things would be purely academic, but I wound up writing a custom sort of query language for a work project. It was to allow non or less technical staff to write their own custom tests for some tooling I'd created. Devilishly hard to actually get right end to end - huge amounts of considerations around data types, parsing, things like brackets, ands ors. But I got there and it works well.
I mean... that's nice, but wouldn't it be easier to actually teach them the very basics of a high level language? Maybe write a custom library that's easy to use, but still most of heavy lifting done by the tried and tested language?
@@jakistam1000Nope, not at all. There's a reason DSLs or Domain Specific Languages are very useful. Even for someone who programs, that can be very useful. If you've ever used SQL or jq to query a database or JSON, you've used a DSL
@@Chex_Mex While I have some VERY basic knowledge of SQL, I mostly employ the approach "there's always a Python library for this". Maybe it's a habit, maybe a bad one (?), maybe just the consequence of no need to optimize for speed - but it's kinda working for me so far. (I'm still learning, though - I used to write everything myself. At work I transitioned somewhat out of necessity, but in my off time I still prefer that, even if I get worse code.) The point is, it's difficult for me to believe that writing a programming language is a better solution than teaching an existing one. You're always going to have more tools and flexibility at your disposal, at relatively low cost (imo). I mean, if you say that DSLs are "better" for some applications, I believe it, but I don't really understand it.
@@muzzletov "There is a wonderful happy little accident at line 102. I am sure you meant to put a close parenthesis instead of a closed square bracket, didn't you now? Don't worry, this sort of mistake happens to everybody! Let's not even call it a mistake, it was just a twitch of the finger. You are such a great programmer, I really mean it!".
Small piece of advice: write out full variable names. For somebody just starting out in programming, toks and ev don't mean anything. Tokens and eval is more expressive and it's just a few more keystrokes.
@@ZT1ST Of course the early C standard library functions names were limited by what the early linkers could handle. These days we don't have to live with such limitations.
It is short-sighted to limit your programming style to what is easily understood by beginners. Those people will either quickly become experienced, or they will leave the field. Either way, there is no sense in catering to them.
Nice quick introduction to writing a programming language! Although it has an inception problem, kinda like the one I wrote for a project in 1978. It was called SIMPLE (Student-Implemented Programming Language Experiment), and it implemented a subset of the BASIC language in BASIC, along with a bare-bones operating environment to save and load files.
A long time ago, I made a simple stack-based esolang (esoteric programming language) called "Noy", because it used every letter of the English alphabet except Y. (Alternatively, I also called it Alphabet Soup.) It had two stacks: a main stack and a temp/scratch stack that you could transfer to and from. Operations were a single letter each, attempting to be mnemonic (all I remember was "k" for "kick to temp stack" and "u" for "unkick from temp stack"). Digits were also a single letter each ("a" through "i"), and to get bigger numbers you'd have to just operate on the digits you pushed. Every program, therefore, was just a string of letters, without any symbols, numbers, whitespace, or line breaks. I made an interpreter for it in JavaScript, C++, and PHP (JS and PHP were my go-to languages at the time). Totally useless, but also really fun to make 😁
that sounds super fun! I definitely want to make some esolangs
Месяц назад+42
20:35 "No one uses reverse polish notation in a real language"... Forth begs to differ! A quirky but really fun and powerful language once you get your head around it 😆
@@Acorn_Anomaly It's on TIOBE's top 100 languages, it comes with Debian, EA released several games written in it (Worms? and Lords of Conquest, among others). What more do you want?
This is a lot closer to real, practical language development than the more academic treatment you'll usually see online that spends forever talking about parsing and leaves doing anything interesting as an exercise for the reader. There is so much you can do quick and easy with this kind of approach that will give you wonderful results. Even something like Java is way more complicated than what people actually need.
In my experience I actually never really see this kind of programming. This is fine for simple stuff, but for most cases, a standard regex will work better (albeit a little slower). If you want something "serious", you're going to want something more purpose built, like ANTLR, for you lexing/parsing
@@lylerolleman1564 Parser generators are the worst way to handle this, and tend to indicate a very small amount of thought was put into a language's design. Write more complex parsers by hand if you really need them(you don't). They're trivial.
@@theshermantanker7043 Crafting Interpreters is one of the best resources for learning the conventional wisdom around basic compiler construction, but that's a low bar as far as I'm concerned. You'll learn a lot more practical knowledge from building a FORTH and playing with macro assemblers.
@@MisterFanwank I've done them by hand. It can work fine for simple stuff, or for stuff that really needs to perform well but for complex but not THAT complex stuff, a parser generator works very well, provided of course you understand the pitfalls (same as everything else) That all said, you also seem to be assuming that the developer is the one in charge of the language design. I have never actually seen this be the case in the real world. Far more likely you got a designer or client who wants all sorts of shinies and will change their mind every 4 hours. In this scenario, I very much appreciate having to maintain 50-100 lines of grammar, not 2k lines of code.
Interpreter being interpreted by an interpreter... - C/C++ programmers fuming - integrated circuits screaming in fear - memory controller in shambles - LLVM crying
:) I actually made an expression evaluator in C# quite some time back but it's an actual infix evaluator. And I parse it in infix and don't convert it to post or prefix :) It's not really "that" efficient, but it actually creates a custom expression tree which can be executed / evaluated as often as you like with changed variables. The actual insight and main idea was to look at the issue in reverse. Instead of thinking about the highest priority first, I was looking for the lowest priority first and simply split on that. The only issue was brackets. So I made a bracket substitution step first and the brackets were evaluated seperately. The whole thing simply ran recursively, split on the lowest priority first and evaluated the individual parts again. At the end where numbers, variables and functions. It's quite extensible as you can add custom functions as well. Also once the expression tree was parsed, the expression could be evaluated quickly. Each operation was just a class instance. In the end the only fundamental operations were: addition, multiplication and power. There were additional unary operations like negate and reciprocal. So a subtraction is actually an addition with the second argument negated. A division is a multiplication with the reciprocal of the second argument. Later I rewrote the whole thing to include boolean operators and boolean logic and it turned into my LogicExpressionParser. It all came out of a question that was asked on Unity Answers years ago about a parametric Lindenmayer system. The L-System is actually way simpler than the parsing of the parametric expression :P It's all on github (MIT license). Just look for "Bunny83 LogicExpressionParser". The whole parser is a single C# file in just under 1000 lines of code. The only thing that it does not handle well is an actual unary minus. It needs to have parentheses. So you can not do "5 * -3" but you have to do "5 * (-3)" I made some WebGL examples, one expression parser example that essentially draws a line sequence of I think 2000 segments according to the expression entered in realtime and a somewhat graphical calculator that manipulates the height of a square mesh.
3:42: You might want to use .isdecimal() instead of .isdigit()!! The latter actually will tell True on "¹", "²" and "³" although you cannot cast them via int() 🤷♀ and the former actually only tells you True on 0-9!
I love this! One thing to watch out for... I think that interpreter only allows you to have one while statement in the program. If there are two (even not nested), the "end" handler for the second while loop will jump back to the top of the first while loop.
→ 7:45 Reverse polish expressions for arithmetic, but then infix notation for assignments? Not quite consistent. (But I guess otherwise you'll need some differentiation for variables to assign from the ones you evaluate.) → 8:34 Throwing away the `=`sign looks strange. So »x = 2 4 +« is equivalent to »x + 2 4 +« or even »x y 2 4 +«. Maybe better use (name, expr) = split(" = ") here?
I strongly recommend the two books by Thorsten Ball, "Writing an interpreter in Go" and "Writing a compiler in Go"! These two make interpreters and compilers understandable, while implementing a full language.
Even if no one ever uses the programming language you create, the process of building it is worth it. Writing an interpreter or compiler forces you to really understand the language you’re working in-not just in theory, but in practice.
@@JohannaMueller57 I don't see why not. It's conceptually simpler than normal notation in every way. The majority of the world's languages even use a subject-object-verb word order, so it's not even unnatural. It takes some relearning to instinctively understand it, sure, but at least you can always understand it by applying very simple evaluation rules, unlike infix notation, which requires some very complicated evaluation rules, to the point that developers often have to look up operator precedence, and proactively use parentheses to avoid confusion. I've read a fair amount of Forth code and it's not difficult to grasp, it's pretty intuitive as far as source code goes.
@@JohannaMueller57 Which one requires more knowledge to process? You have to separate what you're used to and what's simplest. Of course you're used to the second. Does that mean the second is simpler? No. Does it mean that the first will still be difficult to process even when you're used to it? No. I can understand a b * c d * + just fine because I'm used to Forth and so I know how the operators and and operands relate. I still process a*b + c*d faster, but that's only because I've seen that exact form a million times. If I had seen the Forth form a million times, I would be able to process it just as fast if not faster.
I wrote a stack machine called tapescript for embedding access controls in distributed applications/data types. It compiles to a byte code that is then interpreted, and it has a bunch of useful tools and advanced cryptography implementations.
As many here I tried a few times, however I used Bison/YACC instead because I was at university and we were learning compilers and I decided to go a step ahead and start learning Bison and all that. I also uploaded it at Sourceforge. This was for a strategy game which where you would write script, feed it to the game so that the game would execute them would play the game until the end, then you would have to rewrite the rules to handle events, attacks and invasions and continue improving the script. Writing the parser itself was the easiest part, though, once Bison clicked. Unfortunately (and with most things) once I reached the point to start writing graphics code I gave up.
I've done this to create my own functional programming language. I wrote it in C to run fast. Rather than use dirty functions like split I just spin the file in a loop so the loop gives me one character at a time. This then goes to a select statement where I can do something for each character, or for groups of them in some instances. I use this to handle the basic syntax like separating out words that mean something, your string literals, your various symbols and in some instances branch off according to the character before. The idea of doing it like this is you can accomplish quite a lot in just one loop of the file. It gets a bit more complicated as one needs to process brackets and ensure execution is done in the right order. The system then goes on to parse the data several times for the various things one has to do, e.g. like handling higher level issues like function calls in the language. Anyway, it was great fun to do and my new language has been put to useful work in running programs on a micro controller and using OTA updates. Compile times are measured in milliseconds.
That's crazy! Nice work! I'm really curious though: How long did it take you? Do you feel like you learned a lot of things and the experience was worth it? I'm now considering writing one in C too since I'm already learning C at school
@@vladimirnicolescu1342 It's about 5000 lines for the compiler and about the same again for the rest of it. I built it in stages and had some experience of trying it once before. My first attempt was to use a split function, but soon realised it was a bad idea.
One of my C tutorial books had a problem where you created SML (simple machine language). It was a fun project to work on and extend. It reminded me of programming the TI-58 when I was a kid. I have nothing against interpreted languages, I learned how to code in one, BASIC.
I was hoping to see something written in Bison / YACC. Many years ago I decided to write my own BASIC interpreter in Bison for Linux. I got it to the point where it’s able to run several of the programs in the book of “BASIC Computer Games” either as-is or with minimal changes; the features I hadn’t got to yet include handling arrays and graphics commands. Adding X11 graphics is hard though, so I put the project on indefinite hold.
Making programming languages has always fascinated me as I always thought "But don't you already need a programming language to make one. And how is your programming language a language if it needs the original language to work?"
The base programming language is basically your interface with the CPU/ALU. Then you create wrappers or abstraction using programming languages for each layer. If you want to learn more, search for Nand2Tetris. Its an amazing project that allows you to see how NAND gate and clever abstraction can be used to create a OS with tetris game...
The language is written in itself. Oh, the first iteration of it will have to be written in some other language, you can use assembly if you want the bragging rights. But then the first goal is to get a compiler that can compile itself. After that it's just adding features. :)
i made a programming language out of english words and numbers. any words it doesnt know, it just ignores. its purpose is to make 2d drawings, or 3d models of things you describe, with or without motion, with or without text and or audio captions. it has dolists, loops, paths, sequences, and transforms. i recently added sound and conditional flags with switches.
I once wrote a DSL between 9 AM Friday morning and 6 PM Sunday. I was in a fury from beginning to end because a delivery failure put my project at risk. My boss and I eventually got a patent for the interpreter. I'm emulating a Control Data 160-A for a computer museum and wrote a simple assembler to support testing. The assembler is a dictionary mapping instruction names to tiny code generators. It works a treat.
Years ago I wrote a stack based expression evaluator as part of a data analysis program used by others. For that same system of software I used Yacc and Lex to create a simple language that was used to guide the installation of the software. I am currently implementing a version of FORTH that will run on some older microprocessors.
My favorite roy project i ever sid was defining a paeudo assemply language with like 2 instructions, and then inplementing a few sinple functions in it. I wrote an _incredibly_ slow recursive factorial function with it
Just a FYI for anyone who might want to make a programming language: parsing and evaluating expressions are actually the easiest parts of a compiler/interpreter, academia tries its hardest to make it seem like it's not.
Just finished a PEMDAS algorithm for the text parser for a shell I am working on, and it is definitely an interesting problem to solve. Since some operators hold equal precedence, I ended up using a 2D array to hold the operator strings, with the 'Y' axis being the precedence and the 'X' axis being each operator at a given precedence level.
I've done 3 small programming languages. The first one was in college for the compiler class final project - but I went way over and beyond what was needed. The school used it as a teaching language for years after that. the second was a scripting language for controlling the process of building an application, running the unit tests, and building the distribution media way back before there were such tools available online or as open source. That language and tool was used for many years by several companies in the area and I think there's one small company still using it 30 years later (it was written for Win95, it still works today - go figure). The third one was just recently and it is a domain specific language for an expert system I built at work. Off and on I've dabbled with another language that I think is an embodiment of the saying "just because you can doesn't mean you should". It has some interesting traits but I've never really seen where they'd be that useful.
Nice video! Seeing a nice little interpreter in Python is super refreshing and fun bc my brain is so fried rn. I'm currently trying to make a lang with a tiny syntax and a really algebraic type system... it's insanely hard but I do have a roadmap of things that I need to write so maybe in like 3 more years it will be real
I use entr (eradman software) for instant feedback from terminal. It's great for tdd-like development or leetcode/beecrowd checking. It would make the video even smoother. Just a ctrl+s on the software would not need to re-type the python3 command
For a programming language to be useful, you need: 1. PC 2. Conditional Branching 3. Indirection (pointer) With those, you can implement loops, variables, and arrays. Then, move on to stacks, queues, and lists. Operator precedent isn't that hard to do. Well, Dijkstra algorithm one is hard to understand. I personally used 3 stacks algo. 2 stacks if you don't mind right to left parsing. I don't understand why Tiny BASIC has FOR loops but not WHILE loops. From experience, I can say that implementing WHILE/REPEAT loops are easy. FOR loops may be easy to compile, but not interpreted! TinyBASIC is surprisingly close to assembly language, imo, that I'm surprised that not more people are working on it!
WHILE/WEND was never part of the specification (ECMA-55, Minimal BASIC, 1978). I guess it was introduced with Microsoft Quick BASIC. There is a DO/WHILE loop in ECMA-116 (1986). Tiny BASIC (as the name suggests) uses only a subset of the full language. Take a look at how Microsoft BASIC implements FOR loops on a Commodore 64, for example. It breaks the specification in many ways. It's pretty easy to interpret, and you would have to emulate the interpreter's behaviour when compiling it.
Using RPN is not cheating, it's doing it right. :-D Just saying. Loved my HP48sx. Everything is so much more natural and simple, and clear, with RPN, it's not cheating. 😀
Forth uses reverse polish notation, so i would imagine Forth programmers might take issue with the "no serious programmers use reverse polish notation" quip. 😂 This video is actually very interesting. Thank you for doing it!
I didn't call it reverse Polish. It was postfix Polish. I also noted that function notation func1(arg1,arg2, arg3) is like a prefix Polish, but Polish notation would not have parenthesis.
I am actually currently in the middle of defining a general-purpose programming language and implementing a compiler for it as a proof-of-concept. Writing the parser/interpreter is the easy part. Writing the evaluator is the real nightmare. The hardest part seems to be evaluating function calls.
What kind of problems are you running into? I remember it being a little bit of a puzzle but nothing too hard. Implementing the call stack and stack frames helped facilitate it all in the end a lot.
@@Clank-j6w If you only pass values in and out of functions then a stack-based approach is quite straightforward. The problem is that in my case I have the concept of value ownership (similar to Rust) and functions can take full ownership of a value (a.k.a. sinking a value) or temporary ownership (a.k.a. borrowing a value) which is returned after the function finishes. This complicates things a lot. Also, the language I'm developing is compiled. So only things that can be evaluated at compile-time are evaluated. For the rest of the code I have to generate actual machine code, which I haven't even started. I'm planning on using C++ as an IR, since it suits my needs and it's the language I know the best.
Also want to do a little programming language. In my case it should be a transpiler cause I can't do the low level stuff as I am not a computer scientist just a hobby programmer.
@@muesique don't believe a transpiler is much easier, since you still have to go through all of the steps of making a regular compiler but you also have to worry about how to map your semantics to the target language and also how to desugar (if you've got syntax sugar) constructs in your language. You are doing 95% of the work of a real compiler, except the code generation is different.
@stefanalecu9532 worth thinking about it... 🤔 But at the moment it's much easier for me to do it that way. If you wanna know: I fell in love with LDPL which is a much cleaner subset of COBOL. But there are issues with errors. The developer has to go some way to catch all the rough edges. Because my C++ is... basic I want to translate with Tcl to C or even Pascal (which is much more readable and to think in).
I very recently made a stack-based reverse Polish notation "language" for creating bytebeat, where instructions are single characters. I'm still working on it, but it has functions, labels, conditionals, and embedding.
Nice vid. My SwissMicros DM42n calculator is RPN just like my old HP's. Good enough for CERN, bonkers good for me. It's great when someone asks to borrow it for a minute... and then the grey clouds of confusion drift over their face
I wrote a compiler as a project in university. I'm a bit embarrassed to say I used libraries to make my lexer and parser and to do code generation. But I don't think I would have had the time to hand roll them for a language I managed to make Turing-complete
Can you make a video on creating your own Database Management System? I dont mean downloading sqlite or postrges and create a database. I mean actually coding your own database technology from scratch. I want to know the learning path, the recommended languages, its something im trying to look into but it is really hard to find resources. I used databases for years and its a project i like to try to work on.
I agree. As soon as I saw the function name was "ev", I lost interest and saw where this was going. The interesting part comes from defining the language grammar and how does that grammar get implemented.
To be fair, one reason to do this, especially if you are planning to write globally accessible functions, is to ensure that you don't have conflict with user defined functions and classes. Though yeah - there is plenty of room to improve in what they wrote, it's understandable why they would do it that way.
I looked up "it's all Greek to me" in Wikipedia; one of the German versions is "rückwärts polnisch" (backwards Polish). So I looked up "reverse Polish" in Wiktionary, and it's "umgekehrte polnische" (turned-around Polish). Forth and PostScript both use reverse Polish.
Reverse polish notation with longer formulas that would have parentheses is a mindf..k. I can understand how you dig into the inner parens and break them down into two terms on each line, like in simple scripting languages. But when you condense them down all on the same line, I can't read it. There was an AviSynth plugin requiring RPN, I think it was for applying curves.
Не так давно написал свой простенький язык на расте, было в меру сложно (я боролся с бороу чекером 90% времени), но весь компилятор и интерпритатор уместились в, вроде, строках 700. Сам язык стек-ореинтированный и максимально прост в реализации, но тьюринг полноту доказать я смог, добавив в примеры rule110. Я даже добавил крайне убогое подобие системы шаблонов, но только для базовых команд. Я даже добавил встраивание функций(только в режиме компиляции и на уровне трансляции в ассемблер), ведь вызов функций в этом языке, из-за его стек-ореинтированной природы требует некоторых извращений. Кому интересно: на гитхабе лежит "jalgo", ссылки не будет, ищи сам. Кому это вообще интересно?
Depends how you define simplest. Any Turing-complete language should be able to interpret itself. There are languages like Brainf**k (name censored) which is incredibly simple, it uses only eight different tokens including input and output, but it has self-hosted interpreters and even compilers.
I created a similar interprer in Java just for fun, but it reads ahead only the tokens it needs at the moment, and then I created api to get the rest of input by the code, so I could define an expression with operators with priorities and parentheses already in the language itself.
Месяц назад
If someone actually wants to create a language I recommend using the language workbench JetBrains MPS. It works very differently from other language creation methods and I think it's very nice.
For so many years the standard tools were LEX and YACC (I used slightly improved versions called FLEX and Bison). They were powerful but I wouldn't call them user friendly. We have such better tools today.
Honestly it's pretty straightforward to parse relative to the rest of creating a language that I never got the desire to automate that part. You switch on the next character to see what type of token you're parsing, then skip while it's valid for that kind, you've got a lexer. Write a function that returns a value and the remaining input for each syntax rule that calls the functions for other rules and you have a recursive descent parser. Use a Pratt parser, which is just a couple of stacks and a loop, and you can parse infix expressions. It's simple to read, lets you provide much nicer errors, and you have a real language if you need to do something fancier at every point. Compared to type analysis, optimization, emit, parsing is the easiest bit by far.
Месяц назад
@@SimonBuchanNz In MPS, you don't create a parser at all, you do it the opposite way.
More replying to @xlerb2286, but MPS is also solving the same issue, from what can tell: that "parsers are hard". Possibly true for small DSLs, but if you're talking about implementing a full language, they're a trivial part of the effort.
This should even be Turing Complete, right? It supports while loops (WHILE is Turing complete), well no nested loops directly, but you should be able to emulate them eg. a while loops until break != 0 and inside the body every arithmetic is x = x * (1-nested) + (changed value from nested loop body) * nested. In this case nested would be set to one, when the inner loop should do its work and it could also set itself back to zero when it's done effectively skipping all the changes it would do while running. This process has no limit, maybe someone should build a Transpiler, which translates nested loops into this form. You also still have if-conditions by using the same multiply by {1,0} Trick. With this many nested loops it would be no problem to define a power function as well as a floored log, which allows you to create a stack or even a tree on an arbitrarily sized Integer (which I'm definitely going to allow, since python supports them by default and being limited by hardware is no requirement for Turing Completeness). With stacks you could then simulate a Turing Machine or do some Game of life or whatever you like, would be very tedious tho.
This code will have some problem with nested loops. Not sure what's the best way to handle this. Probably using a Stack data structure for keeping tracks of how many While/End(s) are there
What's interesting to see here is how he starts with doing things a cheaty way, and as features are added to the language, things get more and more complicated, until you see the need for a better way of doing the whole thing. It makes you realise why lexers and parsers are so complex.
When designing a computer language, it might be helpful to first determine the programming paradigm. You chose an imperative paradigm, and proceeded to describe a procedural language, but did not mention that. There are, of course, several other possibilities. However, it was a great introduction to the simplicity of Forth! 🤭
If anyone has played around with this and is looking for a bigger more complicated version of it as a project to try out, 'Crafting Interpreters' is a beginner-friendly guide.
Made a small one in my teens, a commenter wrote on sourceforge I had no idea what I was doing ... they were right
Don't let that stop you. The next one you'll know more, and the 3rd one even more :)
@@xlerb2286 its not like hes a teen right now.
You probably knew more than 99% of other teens. Or other people for that matter. But the commenter? Well he (oh, it was a he, for sure) sounds like an absolute genius.
Make a worse one
Was it Python?
Proving the Turing completeness of this programming language is left as an exercise to the viewer.
It might be. It is definitely surprising what you can do with just that while loop, for sure 😁
For example it can be used to make a rather clunky sort of if:
...
if = x
I'm pretty sure it isn't. You can't allocate arbitrarily sized working space, and it doesn't have scope local variables or recursive function calls, so it would be difficult to get around that limitation.
@@brandonmack111 The problem is that it can only store data in a finite set of variable names that are determined at program start. For a general Turing machine, it needs to be able to grow its working space dynamically depending on the program input (and how much that is can't in general be determined without running the program on that input). I think that may be the only thing that is missing though.
Just proved it, easy.
@@TheOriginalJohnDoeproof left as an exercise to the commenter
i wrote a compiled imperative language in rust :) writing a compiled language is probably the most educational project that exists ever, because you learn (on a deep level):
- how memory works (including pointers, allocator kinds aka stack, region, heap, temp, gc, etc, how stack frames work)
- how compilers work (lexer -> parser -> pre-IR for semanalysis -> compilation to IR -> asm -> object code -> linked into executable)
- how to maintain a larger scale project (with potentially 10k+ lines of code)
- how to structure a larger scale project (with project file names and splitting of code into different files/functions etc)
- best practices in terms of data structures to use (Array vs HashMap vs HashSet, null terminated string vs sized string, etc)
- abstract syntax trees and/or grammar (including different parsing algorithms for operator precedence if you write your own parser (as you should) such as recursive descent or shunting yard)
- how to debug potentially thousands of lines of code,
- how modern language features actually work under the hood (for example, loops and conditionals being compiled into just jumps, monomorphization vs dynamic dispatch for generics on structs and/or functions, capturing lambdas and their environments, etc etc)
- how to use tooling to your advantage
- how to effectively test your code
among others. i think there is not a project that ticks more boxes at once than writing a compiled language. plus its really fun seeing it slowly expand and being able to do more things over time :3
Do you have any recommended resources for a project like this? I'm currently going through the 'Writing An Interpreter In Go' book (which I will follow with its sequel 'Writing A Compiler In Go') and I was wondering if you had some more insights. Thanks! :)
I'm just commenting in case you answer the fella above me's question
@@HumaniNihil-c8k "Crafting Interpreters" is free online and it goes writing an implementation of the Lox language in both Java and C.
@@HumaniNihil-c8k I'd recommend just looking into the parser part first. Start by poking around with existing languages (eg Python's "ast" module) and learn about how a stream of source code gets tokenized, then converted into a syntax tree, and finally executed. Most languages "execute" by converting into a series of instructions (machine code, or a higher level bytecode), but for learning purposes, directly interpreting the Abstract Syntax Tree is a lot easier to get your head around.
Design your own language with a very simple grammar. Parse it into its corresponding source tree. Then run it by taking the root node of the source tree and recursively doing what it says! Mastering that will give you a great insight into how programming languages work.
^
I like to follow, and implement when possible, this way of sharing knowledge when introducing a topic:
- No frameworks
- No add ons
- Code something practical, pause, ask questions, implement the answers.
Very nice explanation Dr. Tratt
A fun excercise - I always thought such things would be purely academic, but I wound up writing a custom sort of query language for a work project. It was to allow non or less technical staff to write their own custom tests for some tooling I'd created. Devilishly hard to actually get right end to end - huge amounts of considerations around data types, parsing, things like brackets, ands ors. But I got there and it works well.
I mean... that's nice, but wouldn't it be easier to actually teach them the very basics of a high level language? Maybe write a custom library that's easy to use, but still most of heavy lifting done by the tried and tested language?
@@jakistam1000in the experience of everyone that's tried, no, absolutely not.
@@jakistam1000Nope, not at all. There's a reason DSLs or Domain Specific Languages are very useful.
Even for someone who programs, that can be very useful. If you've ever used SQL or jq to query a database or JSON, you've used a DSL
@@jakistam1000 Even if that were the case ... why not snaffle some $$$ to do a fun project? Gotta live a little in this life.
@@Chex_Mex While I have some VERY basic knowledge of SQL, I mostly employ the approach "there's always a Python library for this". Maybe it's a habit, maybe a bad one (?), maybe just the consequence of no need to optimize for speed - but it's kinda working for me so far. (I'm still learning, though - I used to write everything myself. At work I transitioned somewhat out of necessity, but in my off time I still prefer that, even if I get worse code.)
The point is, it's difficult for me to believe that writing a programming language is a better solution than teaching an existing one. You're always going to have more tools and flexibility at your disposal, at relatively low cost (imo). I mean, if you say that DSLs are "better" for some applications, I believe it, but I don't really understand it.
Q: Why do programmers have such big egos?
A: So there's something left after the compiler gets done telling them how bad they are at their job!
well, i write the compiler. and since i do, it only tells me nice things.
@@muzzletov motivated to start working on mine.
@@muzzletov "There is a wonderful happy little accident at line 102. I am sure you meant to put a close parenthesis instead of a closed square bracket, didn't you now? Don't worry, this sort of mistake happens to everybody! Let's not even call it a mistake, it was just a twitch of the finger. You are such a great programmer, I really mean it!".
6:40 "Programming is just a continual reminder that I can't do anything correctly"
😭
Small piece of advice: write out full variable names. For somebody just starting out in programming, toks and ev don't mean anything. Tokens and eval is more expressive and it's just a few more keystrokes.
Sad response to that: you're going to see variable names like that in some languages - IIRC, the function to tokenize strings in C is `strtok'.
@@ZT1ST Of course the early C standard library functions names were limited by what the early linkers could handle. These days we don't have to live with such limitations.
It is short-sighted to limit your programming style to what is easily understood by beginners. Those people will either quickly become experienced, or they will leave the field. Either way, there is no sense in catering to them.
Short variable names make programs much, much easier to read.
Just starting out programming and checking a video on how to implement a small interpreter with RPN?
The comment section makes me so happy. So many people sharing their own language design experiences :)
Nice quick introduction to writing a programming language! Although it has an inception problem, kinda like the one I wrote for a project in 1978. It was called SIMPLE (Student-Implemented Programming Language Experiment), and it implemented a subset of the BASIC language in BASIC, along with a bare-bones operating environment to save and load files.
A long time ago, I made a simple stack-based esolang (esoteric programming language) called "Noy", because it used every letter of the English alphabet except Y. (Alternatively, I also called it Alphabet Soup.) It had two stacks: a main stack and a temp/scratch stack that you could transfer to and from. Operations were a single letter each, attempting to be mnemonic (all I remember was "k" for "kick to temp stack" and "u" for "unkick from temp stack"). Digits were also a single letter each ("a" through "i"), and to get bigger numbers you'd have to just operate on the digits you pushed. Every program, therefore, was just a string of letters, without any symbols, numbers, whitespace, or line breaks.
I made an interpreter for it in JavaScript, C++, and PHP (JS and PHP were my go-to languages at the time).
Totally useless, but also really fun to make 😁
that sounds super fun! I definitely want to make some esolangs
20:35 "No one uses reverse polish notation in a real language"... Forth begs to differ! A quirky but really fun and powerful language once you get your head around it 😆
"real language" 😜
@@Acorn_Anomaly It's on TIOBE's top 100 languages, it comes with Debian, EA released several games written in it (Worms? and Lords of Conquest, among others). What more do you want?
PostScript comes to mind
@@Acorn_Anomaly It is Turing complete, so ....
@@deadmarshal bf is turing complete
This is a lot closer to real, practical language development than the more academic treatment you'll usually see online that spends forever talking about parsing and leaves doing anything interesting as an exercise for the reader. There is so much you can do quick and easy with this kind of approach that will give you wonderful results. Even something like Java is way more complicated than what people actually need.
In my experience I actually never really see this kind of programming.
This is fine for simple stuff, but for most cases, a standard regex will work better (albeit a little slower). If you want something "serious", you're going to want something more purpose built, like ANTLR, for you lexing/parsing
Crafting Interpreters is an excellent book on how to implement an Interpreter for an example language, if you're looking for one
@@lylerolleman1564 Parser generators are the worst way to handle this, and tend to indicate a very small amount of thought was put into a language's design. Write more complex parsers by hand if you really need them(you don't). They're trivial.
@@theshermantanker7043 Crafting Interpreters is one of the best resources for learning the conventional wisdom around basic compiler construction, but that's a low bar as far as I'm concerned. You'll learn a lot more practical knowledge from building a FORTH and playing with macro assemblers.
@@MisterFanwank I've done them by hand. It can work fine for simple stuff, or for stuff that really needs to perform well but for complex but not THAT complex stuff, a parser generator works very well, provided of course you understand the pitfalls (same as everything else)
That all said, you also seem to be assuming that the developer is the one in charge of the language design. I have never actually seen this be the case in the real world. Far more likely you got a designer or client who wants all sorts of shinies and will change their mind every 4 hours. In this scenario, I very much appreciate having to maintain 50-100 lines of grammar, not 2k lines of code.
Forth you love if honk then.
"let's try this and see how I've gone wrong"
my entire programming career, right there
The language should've been called Splits because it's an excellent demo on how far split carries software design 😊
😂 love it! Oh! How about SplitSplat! Because every time it threw an error he called it going "splat".
Split++
he only used it to implement a quick demo. dont be naive.
Splat-- might be more appropriate.
Interpreter being interpreted by an interpreter...
- C/C++ programmers fuming
- integrated circuits screaming in fear
- memory controller in shambles
- LLVM crying
exactly
:) I actually made an expression evaluator in C# quite some time back but it's an actual infix evaluator. And I parse it in infix and don't convert it to post or prefix :) It's not really "that" efficient, but it actually creates a custom expression tree which can be executed / evaluated as often as you like with changed variables. The actual insight and main idea was to look at the issue in reverse. Instead of thinking about the highest priority first, I was looking for the lowest priority first and simply split on that. The only issue was brackets. So I made a bracket substitution step first and the brackets were evaluated seperately. The whole thing simply ran recursively, split on the lowest priority first and evaluated the individual parts again. At the end where numbers, variables and functions. It's quite extensible as you can add custom functions as well. Also once the expression tree was parsed, the expression could be evaluated quickly. Each operation was just a class instance.
In the end the only fundamental operations were: addition, multiplication and power. There were additional unary operations like negate and reciprocal. So a subtraction is actually an addition with the second argument negated. A division is a multiplication with the reciprocal of the second argument. Later I rewrote the whole thing to include boolean operators and boolean logic and it turned into my LogicExpressionParser. It all came out of a question that was asked on Unity Answers years ago about a parametric Lindenmayer system. The L-System is actually way simpler than the parsing of the parametric expression :P It's all on github (MIT license). Just look for "Bunny83 LogicExpressionParser". The whole parser is a single C# file in just under 1000 lines of code.
The only thing that it does not handle well is an actual unary minus. It needs to have parentheses. So you can not do "5 * -3" but you have to do "5 * (-3)"
I made some WebGL examples, one expression parser example that essentially draws a line sequence of I think 2000 segments according to the expression entered in realtime and a somewhat graphical calculator that manipulates the height of a square mesh.
I just wanted to comment like now create a programming lang from scratch
@@ognjenjakovljevic494 First you have to create a universe
@@qwfp Wasn't Virtual Universe an IBM product?
3:42: You might want to use .isdecimal() instead of .isdigit()!! The latter actually will tell True on "¹", "²" and "³" although you cannot cast them via int() 🤷♀ and the former actually only tells you True on 0-9!
I love this! One thing to watch out for... I think that interpreter only allows you to have one while statement in the program. If there are two (even not nested), the "end" handler for the second while loop will jump back to the top of the first while loop.
→ 7:45 Reverse polish expressions for arithmetic, but then infix notation for assignments? Not quite consistent. (But I guess otherwise you'll need some differentiation for variables to assign from the ones you evaluate.)
→ 8:34 Throwing away the `=`sign looks strange. So »x = 2 4 +« is equivalent to »x + 2 4 +« or even »x y 2 4 +«. Maybe better use (name, expr) = split(" = ") here?
my thoughts exactly, consistent left-to-right evaluation and assignment, like "2 3 + -> x" would have been better
I strongly recommend the two books by Thorsten Ball, "Writing an interpreter in Go" and "Writing a compiler in Go"!
These two make interpreters and compilers understandable, while implementing a full language.
Even if no one ever uses the programming language you create, the process of building it is worth it. Writing an interpreter or compiler forces you to really understand the language you’re working in-not just in theory, but in practice.
Excellent explanation. Thank you.
I also recommend Ruslan Spivak's serie “let's build a simple interpreter”.
Reverse Polish Notation is awesome. I love the consistency of not needing parentheses. Begone, operator precedence.
i guess it's awesome for implementing something like this, but is it awesome to use it?
@@JohannaMueller57 ask all Forth developers and they'll give you their opinion, you won't find many negative complaints
@@JohannaMueller57 I don't see why not. It's conceptually simpler than normal notation in every way. The majority of the world's languages even use a subject-object-verb word order, so it's not even unnatural. It takes some relearning to instinctively understand it, sure, but at least you can always understand it by applying very simple evaluation rules, unlike infix notation, which requires some very complicated evaluation rules, to the point that developers often have to look up operator precedence, and proactively use parentheses to avoid confusion. I've read a fair amount of Forth code and it's not difficult to grasp, it's pretty intuitive as far as source code goes.
@@P-39_Airacobra so you're saying a b * c d * + is easier and more intuitive than a*b + c*d?
@@JohannaMueller57 Which one requires more knowledge to process? You have to separate what you're used to and what's simplest. Of course you're used to the second. Does that mean the second is simpler? No. Does it mean that the first will still be difficult to process even when you're used to it? No. I can understand a b * c d * + just fine because I'm used to Forth and so I know how the operators and and operands relate. I still process a*b + c*d faster, but that's only because I've seen that exact form a million times. If I had seen the Forth form a million times, I would be able to process it just as fast if not faster.
I wrote a stack machine called tapescript for embedding access controls in distributed applications/data types. It compiles to a byte code that is then interpreted, and it has a bunch of useful tools and advanced cryptography implementations.
Thought that was a hair on my phone screen at the start lol
As many here I tried a few times, however I used Bison/YACC instead because I was at university and we were learning compilers and I decided to go a step ahead and start learning Bison and all that. I also uploaded it at Sourceforge. This was for a strategy game which where you would write script, feed it to the game so that the game would execute them would play the game until the end, then you would have to rewrite the rules to handle events, attacks and invasions and continue improving the script. Writing the parser itself was the easiest part, though, once Bison clicked. Unfortunately (and with most things) once I reached the point to start writing graphics code I gave up.
A very long time ago I wrote a language for fun. It wasn't very good but I enjoyed figuring out how to make it work.
Great intro to writing your own programming language!
Good to see you also have the Great I Key Press that starts every good vim session
officially one of my favorite videos in this channel!
I've done this to create my own functional programming language. I wrote it in C to run fast. Rather than use dirty functions like split I just spin the file in a loop so the loop gives me one character at a time. This then goes to a select statement where I can do something for each character, or for groups of them in some instances. I use this to handle the basic syntax like separating out words that mean something, your string literals, your various symbols and in some instances branch off according to the character before. The idea of doing it like this is you can accomplish quite a lot in just one loop of the file. It gets a bit more complicated as one needs to process brackets and ensure execution is done in the right order. The system then goes on to parse the data several times for the various things one has to do, e.g. like handling higher level issues like function calls in the language.
Anyway, it was great fun to do and my new language has been put to useful work in running programs on a micro controller and using OTA updates. Compile times are measured in milliseconds.
That's crazy! Nice work! I'm really curious though:
How long did it take you?
Do you feel like you learned a lot of things and the experience was worth it?
I'm now considering writing one in C too since I'm already learning C at school
@@vladimirnicolescu1342 It's about 5000 lines for the compiler and about the same again for the rest of it. I built it in stages and had some experience of trying it once before. My first attempt was to use a split function, but soon realised it was a bad idea.
One of my C tutorial books had a problem where you created SML (simple machine language). It was a fun project to work on and extend. It reminded me of programming the TI-58 when I was a kid. I have nothing against interpreted languages, I learned how to code in one, BASIC.
I was hoping to see something written in Bison / YACC.
Many years ago I decided to write my own BASIC interpreter in Bison for Linux. I got it to the point where it’s able to run several of the programs in the book of “BASIC Computer Games” either as-is or with minimal changes; the features I hadn’t got to yet include handling arrays and graphics commands. Adding X11 graphics is hard though, so I put the project on indefinite hold.
Making programming languages has always fascinated me as I always thought "But don't you already need a programming language to make one. And how is your programming language a language if it needs the original language to work?"
And that's where paper tape comes in.
The base programming language is basically your interface with the CPU/ALU.
Then you create wrappers or abstraction using programming languages for each layer.
If you want to learn more, search for Nand2Tetris. Its an amazing project that allows you to see how NAND gate and clever abstraction can be used to create a OS with tetris game...
Ikr! And then you read about language bootstrapping and your mind is blown
The language is written in itself. Oh, the first iteration of it will have to be written in some other language, you can use assembly if you want the bragging rights. But then the first goal is to get a compiler that can compile itself. After that it's just adding features. :)
Sometimes. PHP, for example, is mostly written in C. But read the essay "Reflections on Trusting Trust" for more about self-hosting.
His enthusiasm is really infectious :D
yeah you are right. hes great
i made a programming language out of english words and numbers.
any words it doesnt know, it just ignores.
its purpose is to make 2d drawings, or 3d models
of things you describe, with or without motion,
with or without text and or audio captions.
it has dolists, loops, paths, sequences,
and transforms. i recently added sound
and conditional flags with switches.
20:30 Bitcoin script kind-of uses reverse polish notation, because it uses the stack for evaluating expressions, too!
finally the next Porth video after years
Welcome to yet another recreational programming session by Mr. Zozin
Can your assembly language do that!?
Very happy to know Prof. Laurie is a man of great taste! Gotta love Gruvbox
for viewers that want to get their hands dirty i HIGHLY recommend Crafting Interpreters
its a hands on book teaching main concepts of compilers
I once wrote a DSL between 9 AM Friday morning and 6 PM Sunday. I was in a fury from beginning to end because a delivery failure put my project at risk. My boss and I eventually got a patent for the interpreter.
I'm emulating a Control Data 160-A for a computer museum and wrote a simple assembler to support testing. The assembler is a dictionary mapping instruction names to tiny code generators. It works a treat.
Years ago I wrote a stack based expression evaluator as part of a data analysis program used by others. For that same system of software I used Yacc and Lex to create a simple language that was used to guide the installation of the software. I am currently implementing a version of FORTH that will run on some older microprocessors.
Great stuff! And yes that split function was a really useful addition to a language.
My favorite roy project i ever sid was defining a paeudo assemply language with like 2 instructions, and then inplementing a few sinple functions in it.
I wrote an _incredibly_ slow recursive factorial function with it
Just a FYI for anyone who might want to make a programming language: parsing and evaluating expressions are actually the easiest parts of a compiler/interpreter, academia tries its hardest to make it seem like it's not.
Wat? Clearly defining the semantics and proving soundness of the type system are the interesting and challenging bits.
Look at you implementing a PostScript interpreter right on screen!
Why not use a stack for the while-loop? Then you could just pop the base pc/ip off the stack when you hit the end of an iteration
They would be the "assume it's not nested" part, you still need to handle finding the end
“Programming is just a basic continual reminder that I can’t do anything properly..” - couldn’t agree more!
Just finished a PEMDAS algorithm for the text parser for a shell I am working on, and it is definitely an interesting problem to solve. Since some operators hold equal precedence, I ended up using a 2D array to hold the operator strings, with the 'Y' axis being the precedence and the 'X' axis being each operator at a given precedence level.
Very entertaining video. I coded along and learnt something.
Quickly coded, but very clearly explained. Well done!
I've done 3 small programming languages. The first one was in college for the compiler class final project - but I went way over and beyond what was needed. The school used it as a teaching language for years after that. the second was a scripting language for controlling the process of building an application, running the unit tests, and building the distribution media way back before there were such tools available online or as open source. That language and tool was used for many years by several companies in the area and I think there's one small company still using it 30 years later (it was written for Win95, it still works today - go figure). The third one was just recently and it is a domain specific language for an expert system I built at work. Off and on I've dabbled with another language that I think is an embodiment of the saying "just because you can doesn't mean you should". It has some interesting traits but I've never really seen where they'd be that useful.
"interesting"
I'm following along him and implementing features he does! Currently at branches
Imagine, for a second, that this is the first Computerphile video you've ever seen.
this is my first video ever seen by this guy I understand zero
@@jespensonson7351why?
Nice video! Seeing a nice little interpreter in Python is super refreshing and fun bc my brain is so fried rn. I'm currently trying to make a lang with a tiny syntax and a really algebraic type system... it's insanely hard but I do have a roadmap of things that I need to write so maybe in like 3 more years it will be real
I use entr (eradman software) for instant feedback from terminal. It's great for tdd-like development or leetcode/beecrowd checking.
It would make the video even smoother. Just a ctrl+s on the software would not need to re-type the python3 command
For reading the file lines you can just iterate over the result of ppen directly. People dont know this often for some reason
For a programming language to be useful, you need:
1. PC
2. Conditional Branching
3. Indirection (pointer)
With those, you can implement loops, variables, and arrays. Then, move on to stacks, queues, and lists.
Operator precedent isn't that hard to do. Well, Dijkstra algorithm one is hard to understand. I personally used 3 stacks algo. 2 stacks if you don't mind right to left parsing.
I don't understand why Tiny BASIC has FOR loops but not WHILE loops. From experience, I can say that implementing WHILE/REPEAT loops are easy. FOR loops may be easy to compile, but not interpreted!
TinyBASIC is surprisingly close to assembly language, imo, that I'm surprised that not more people are working on it!
WHILE/WEND was never part of the specification (ECMA-55, Minimal BASIC, 1978). I guess it was introduced with Microsoft Quick BASIC. There is a DO/WHILE loop in ECMA-116 (1986). Tiny BASIC (as the name suggests) uses only a subset of the full language.
Take a look at how Microsoft BASIC implements FOR loops on a Commodore 64, for example. It breaks the specification in many ways. It's pretty easy to interpret, and you would have to emulate the interpreter's behaviour when compiling it.
You aren't programming until you're speaking English and typing reverse Polish simultaneously.
I do love the elegance of reverse Polish notation, even if it's not the most intuitive as a human code author
Using RPN is not cheating, it's doing it right. :-D Just saying. Loved my HP48sx. Everything is so much more natural and simple, and clear, with RPN, it's not cheating. 😀
This is exactly what i was looking for !!
Forth uses reverse polish notation, so i would imagine Forth programmers might take issue with the "no serious programmers use reverse polish notation" quip. 😂
This video is actually very interesting. Thank you for doing it!
This should have been computerphile's first video :)
I didn't call it reverse Polish. It was postfix Polish. I also noted that function notation func1(arg1,arg2, arg3) is like a prefix Polish, but Polish notation would not have parenthesis.
I am actually currently in the middle of defining a general-purpose programming language and implementing a compiler for it as a proof-of-concept. Writing the parser/interpreter is the easy part. Writing the evaluator is the real nightmare. The hardest part seems to be evaluating function calls.
What kind of problems are you running into? I remember it being a little bit of a puzzle but nothing too hard. Implementing the call stack and stack frames helped facilitate it all in the end a lot.
@@Clank-j6w If you only pass values in and out of functions then a stack-based approach is quite straightforward. The problem is that in my case I have the concept of value ownership (similar to Rust) and functions can take full ownership of a value (a.k.a. sinking a value) or temporary ownership (a.k.a. borrowing a value) which is returned after the function finishes. This complicates things a lot. Also, the language I'm developing is compiled. So only things that can be evaluated at compile-time are evaluated. For the rest of the code I have to generate actual machine code, which I haven't even started. I'm planning on using C++ as an IR, since it suits my needs and it's the language I know the best.
Also want to do a little programming language. In my case it should be a transpiler cause I can't do the low level stuff as I am not a computer scientist just a hobby programmer.
@@muesique don't believe a transpiler is much easier, since you still have to go through all of the steps of making a regular compiler but you also have to worry about how to map your semantics to the target language and also how to desugar (if you've got syntax sugar) constructs in your language. You are doing 95% of the work of a real compiler, except the code generation is different.
@stefanalecu9532 worth thinking about it... 🤔
But at the moment it's much easier for me to do it that way.
If you wanna know: I fell in love with LDPL which is a much cleaner subset of COBOL. But there are issues with errors. The developer has to go some way to catch all the rough edges. Because my C++ is... basic I want to translate with Tcl to C or even Pascal (which is much more readable and to think in).
Adobe Postscript uses RPN.
I very recently made a stack-based reverse Polish notation "language" for creating bytebeat, where instructions are single characters. I'm still working on it, but it has functions, labels, conditionals, and embedding.
Where is the source available? 😊
Nice vid. My SwissMicros DM42n calculator is RPN just like my old HP's. Good enough for CERN, bonkers good for me. It's great when someone asks to borrow it for a minute... and then the grey clouds of confusion drift over their face
some people, It seems, know the difference between RPN and a stack😂
I wrote a compiler as a project in university. I'm a bit embarrassed to say I used libraries to make my lexer and parser and to do code generation. But I don't think I would have had the time to hand roll them for a language I managed to make Turing-complete
"This is never going to work the first time."
That's how you know he's an experienced programmer, and not just a theoretical teacher.
Can you make a video on creating your own Database Management System? I dont mean downloading sqlite or postrges and create a database. I mean actually coding your own database technology from scratch. I want to know the learning path, the recommended languages, its something im trying to look into but it is really hard to find resources. I used databases for years and its a project i like to try to work on.
Writing your own database and management software sounds like one of the hardest things ever tbh
Reverse Polish Notation Made My Year!
One thing that I know from learning Forth is that Forth is best at creating your own Forth.
The way that guy is enjoying what he's doing, makes me wanting to create a programming language myself 😅
Really nice and motivating video!
This is a field I've been really interested in for a long time. Enjoyed the video, keep it up! 👍
Hooray for Reverse Polish Notation!
"Polish Reverse Notation" "Hooray" for
I started using the HP48g back in 99, to this day I still have trouble if I have to use a regular calculator.
@@AloisMahdal 😂
hsiloP all day ✊
Why is it that academics always seem to write least readable code. It's no longer 1983, we have space for variable names longer than two characters.
I agree. As soon as I saw the function name was "ev", I lost interest and saw where this was going.
The interesting part comes from defining the language grammar and how does that grammar get implemented.
Also, two-space indents...
(I would like them, but it ain't convention.)
Also seems scared of using vertical white-space!
To be fair, one reason to do this, especially if you are planning to write globally accessible functions, is to ensure that you don't have conflict with user defined functions and classes.
Though yeah - there is plenty of room to improve in what they wrote, it's understandable why they would do it that way.
Its totally readable to me.
More videos with Dr. Tratt please!
1. This guy is great.
2. Fellow `split()` fans rise up.
I looked up "it's all Greek to me" in Wikipedia; one of the German versions is "rückwärts polnisch" (backwards Polish). So I looked up "reverse Polish" in Wiktionary, and it's "umgekehrte polnische" (turned-around Polish).
Forth and PostScript both use reverse Polish.
Reverse polish notation with longer formulas that would have parentheses is a mindf..k. I can understand how you dig into the inner parens and break them down into two terms on each line, like in simple scripting languages. But when you condense them down all on the same line, I can't read it. There was an AviSynth plugin requiring RPN, I think it was for applying curves.
in my compiler college course, the professor wanted the class to use recursive algorithms for the token reader as well as other parts of the compiler.
"No one uses reverse Polish notation in a real programming language."
Picking a fight with all the Forth fans, I see!
NEOVIM BTW
Nice to see that Tratt is a cultured person
Aaaaa, now one of my favourite topics, i will take a pit stop here for a while..
Next assignment: write that split function using this language
Не так давно написал свой простенький язык на расте, было в меру сложно (я боролся с бороу чекером 90% времени), но весь компилятор и интерпритатор уместились в, вроде, строках 700. Сам язык стек-ореинтированный и максимально прост в реализации, но тьюринг полноту доказать я смог, добавив в примеры rule110.
Я даже добавил крайне убогое подобие системы шаблонов, но только для базовых команд.
Я даже добавил встраивание функций(только в режиме компиляции и на уровне трансляции в ассемблер), ведь вызов функций в этом языке, из-за его стек-ореинтированной природы требует некоторых извращений.
Кому интересно: на гитхабе лежит "jalgo", ссылки не будет, ищи сам.
Кому это вообще интересно?
What would be the shortest/simplest programming language that can interpret itself?
Depends how you define simplest. Any Turing-complete language should be able to interpret itself. There are languages like Brainf**k (name censored) which is incredibly simple, it uses only eight different tokens including input and output, but it has self-hosted interpreters and even compilers.
I do love building little engines! it would help readability to have more human-readable names for the classes, methods, and variable names, though!
I created a similar interprer in Java just for fun, but it reads ahead only the tokens it needs at the moment, and then I created api to get the rest of input by the code, so I could define an expression with operators with priorities and parentheses already in the language itself.
If someone actually wants to create a language I recommend using the language workbench JetBrains MPS. It works very differently from other language creation methods and I think it's very nice.
For so many years the standard tools were LEX and YACC (I used slightly improved versions called FLEX and Bison). They were powerful but I wouldn't call them user friendly. We have such better tools today.
Honestly it's pretty straightforward to parse relative to the rest of creating a language that I never got the desire to automate that part.
You switch on the next character to see what type of token you're parsing, then skip while it's valid for that kind, you've got a lexer. Write a function that returns a value and the remaining input for each syntax rule that calls the functions for other rules and you have a recursive descent parser. Use a Pratt parser, which is just a couple of stacks and a loop, and you can parse infix expressions. It's simple to read, lets you provide much nicer errors, and you have a real language if you need to do something fancier at every point.
Compared to type analysis, optimization, emit, parsing is the easiest bit by far.
@@SimonBuchanNz In MPS, you don't create a parser at all, you do it the opposite way.
The parser creates you?
More replying to @xlerb2286, but MPS is also solving the same issue, from what can tell: that "parsers are hard". Possibly true for small DSLs, but if you're talking about implementing a full language, they're a trivial part of the effort.
This should even be Turing Complete, right?
It supports while loops (WHILE is Turing complete), well no nested loops directly, but you should be able to emulate them eg. a while loops until break != 0 and inside the body every arithmetic is x = x * (1-nested) + (changed value from nested loop body) * nested. In this case nested would be set to one, when the inner loop should do its work and it could also set itself back to zero when it's done effectively skipping all the changes it would do while running. This process has no limit, maybe someone should build a Transpiler, which translates nested loops into this form. You also still have if-conditions by using the same multiply by {1,0} Trick. With this many nested loops it would be no problem to define a power function as well as a floored log, which allows you to create a stack or even a tree on an arbitrarily sized Integer (which I'm definitely going to allow, since python supports them by default and being limited by hardware is no requirement for Turing Completeness). With stacks you could then simulate a Turing Machine or do some Game of life or whatever you like, would be very tedious tho.
This code will have some problem with nested loops. Not sure what's the best way to handle this. Probably using a Stack data structure for keeping tracks of how many While/End(s) are there
What's interesting to see here is how he starts with doing things a cheaty way, and as features are added to the language, things get more and more complicated, until you see the need for a better way of doing the whole thing. It makes you realise why lexers and parsers are so complex.
When designing a computer language, it might be helpful to first determine the programming paradigm. You chose an imperative paradigm, and proceeded to describe a procedural language, but did not mention that. There are, of course, several other possibilities.
However, it was a great introduction to the simplicity of Forth! 🤭
Since I use HP48 yet, I use reverse polish notation and love stack machines
If anyone has played around with this and is looking for a bigger more complicated version of it as a project to try out, 'Crafting Interpreters' is a beginner-friendly guide.