I recently wrote a tokenizer and parser for RESP for an implementation of Redis and I think I'll like to build on that to write my own interpreter and then compiler for a language too. Good video. 👍
39:06 Hey, first of all, great job! I think the list/array implementation is usually done the other way around - so lists/arrays are primitives built into the compiler and a string is then added later in the language itself. Think about it, a string is just an array of bytes or a pointer and length!
I started a calculator project in c about a week ago and i'm working like full time on that shit and i currently have a simple cli, the tokenizer and the lexer. It has pre defined math constants to use and also simple math functions like sqrt and similar. Only thing missing is the parser and the evaluator for it to be fully usable. Also i need to generate the ast but i think thats a part of the Parser. I really don't know how you did so much in so little time. Great project
dang this is such an interesting video. I feel like I am so limited by just sticking to doing CRUD apps when people are building mf COMPILERS. Cool shit man please keep making stuff I love it
Damn, this is really interesting, I made a compiler too a while ago but it looks very different to what you can up with. I did it in Python (full soy disclosure) though, so that's probably why lol. Cool vid sir
Also I'm glad you went the same way with IC, I found I was basically just doing that for dogmatic reasons instead of because it was particularly useful for me. I did about 1 optimization (constant folding like you mentioned in the vid) using it in the first place that could've probably been done w/o it tbh.
Vertical slicing such as creating enough of the language to write assembly for a program that exits with a specific number has done me well so I don't get in my own head about perfecting every little stage one stage at a time. There's at least enough boilerplate code that each state can morph well for my needs because of that vertical slicing and writing lots of test cases for the target language. Thorsten Ball's books on the subject are worth considering as primary material. He may not exactly get to the point right away, so skipping around a bit to see what he had as the final result per chapter is recommended over just reading it linearly.
Same, when I was writing mine I oriented all my efforts towards making it able to output a basic "hello, world" before trying to make it do anything more complicated - just to make sure all the main infra was working.
7:25 agree, i dislike C enums too. Grouping constant values by prefixes and __ it's annoying with the preprocessor it's annoying. C++ it's a little bit better, at least assures you they will be constants instead of constant variables. Hope to try cpp2 syntax with the new enums to overload functions and operators on them.
Sure, I used kdenlive for video editing. It also has a feature of speech recognition that generates captions. For the source code I just screenshot my code from neovim :)
Hi, Alex! Awesome personal challenge and even more awesome that you had the grit to finish it in just 25 days 👏🏻 I'm curious what resources you used for the theoretical aspects. I think you're mentioning Thorsten Ball's "Writing An Interpreter In Go" in the Lexer section of the video, but did you use anything else besides that? Like how to represent an AST in C, how to structure lexing and parsing, different forms of IRs, etc. The README file in the repo is lacking too in this respect. Also, you're right about type checking resembling an interpreter. I had the exact same feeling when I implemented my first type checker. Some folks even say that type checking is _abstract_ interpretation (see "Interpreting types as abstract values" by Oleg Kiselyov & Chung-chieh Shan).
Yes, for the lexing stage I made use of Thorsten Ball's "Writing An Interpreter In Go". But past that I honestly just experimented with stuff until it worked. I have tried to do parsers in the past, so I kind of already knew what might work. But I just tried to follow the language's manual (for the structure of the AST). I did not come up with COOL by myself, I think it is a well known language for compiler courses :D
Great stuff, bro! What book were you using as referrence? I am currently trying my own compiler / interpreter and following the guide on Crafting Interpreters by Robert Nystrom.
6:53 Why should the tokenizer never fail? What's wrong with reporting an unexpected closing bracket or eof in the middle of a string literal? I mean, I agree that the tokenizer should be able to recover after an invalid token, but I don't think it's good to mix valid tokens with errors in a single type.
I’m still handling the errors that you mentioned, it’s just that the lexer outputs tokens (invalid is one of them) and the parser is supposed to report the errors. At the end of the lexing stage you want to see a stream of tokens, not a mix of tokens and error messages
@@AlexTheRealDev I see your point. However, I didn't mean returning an error message specifically, just returning an object that encapsulated all information necessary to report an error, which is in your case an invalid token. I guess it makes sense to postpone error messages until parsing if you're collecting all tokens first, it's just that in my case I return either a token or an error, and stop immediately tokenizing if an error is encountered. On a side note, wouldn't it be better to only begin parsing if all tokens are valid? I'm starting to think that reporting tokenizer errors is not really among parser's responsibilities 🤔
I see. I just wanted to be able to report as many errors as possible (e.g all invid tokens, and then all the sytax errros, so this way you will show both the “eof in string” and the “missing semicolon” in one run)
I'm pretty sure it's Thorsten Ball's "Writing An Interpreter In Go". It's that book that defines a language called Monkey (Alex brings up Monkey in the video).
"Never reinvent the wheel" is a dumb phrase. If everyone followed that we wouldn't have options and we wouldn't discover new and interesting wheel designs.
imagine not just delimiting by space and parsing by simple rpn syntax for looking up dictionary of macros for generating relevant target assembly this reply was written by 1 afternoon compiler developers union and approved by forth gang stupid stackbrained jokes aside, good job! I rub my head a lot thinking of the why and how of the concept of lexing and advanced syntax, so it is always interesting to see what people do
I have used "Writing An Interpreter In Go - by Thorsten Ball" for the lexer stage. Then most of my knowledge for parser syntax etc comes from the courses I took in uni on compilers + experimenting around with parser for other things (like JSON, ini)
Where did you learn C mate? Can you provide some resources to learn C and compiler development, I'm pretty interested in making my own compiler. And thanks in advance
The more interesting question is what problem do you want to solve and how do you want people using the language to feel compared to them using other languages that have pain points you hope to solve with your particular language. That should be the guiding compass for you to better research how you want to use C or any other systems programming language for the sake of compiler engineering. Wish you good luck and godspeed!
Without a doubt, the first resource I would point you towards would be “Crafting Interpreters” by Robert Nystrom, a wonderful book and available free online. The second half is also in C. It will give you tools and knowledge around language design through to execution. Now, the compiler part is trickier: The “Dragon Book” (Compilers: Principles, Techniques, and Tools) is the venerable text of yore, it’s got good stuff in there no doubt, but techniques have implementations have changed. It’s very good but it’s no longer the holy grail it was. I read Andrew Appel’s “Modern Compiler Implementation in ML” and enjoyed it. He has a C version available, I can’t comment on that publication, but I imagine it’s very similar. No Starch have “Write A C Compiler” in Early Release and final publication set for August. The author, Nora Sadler, wrote a series of blog posts on writing a compiler a good few years back, I imagine it’s a fleshed out version of these, so either get the early access or read those posts. The last option is also the most lucrative in my opinion, but also easily the most information dense and assumes a lot of knowledge you’ll have to pick up from the previous resources and use C++, not strictly just C. It is LLVM’s official series on implementing “Kaleidoscope”, a language made up for the tutorial. It’s a very fun topic, so so much to dig your teeth in to from optimising assembly to high level type systems. Don’t be daunted and expect to have many years of illuminating moments (“oh this is why this language does X… this is why seemingly unrelated features of language Y and Z feel familiar”)
This is crazy I’m making a compiler in python it will be self hosted at some point. I made the lever in c++ and I realized my summer is almost up so I ain’t gonna make it in c++ I might move to typescript possibly but yeah
I agree, if you plan on self hosting the compiler you should make a “just works” implementation in something simple like python or ts and then focus on the self hosting part
I also had the compiler course in my last sem and the teacher was so easy to fool that most students just wrote code in any language and fooled them into thinking it's the compiler. Even after seeing the code they got fooled man 😭, it was pretty funny but it def felt bad. So, i actually learned it on my own, great experience, Compilers are fun forreal! But yeah, that course sucked ass because of that teacher, and i am happy i did not indulge in the fooling (i did that in the first class tho, infact it was started by me 💀, i just never did it again)
They were cheating themselves though. The purpose of college is to learn not to pass classes (well obviously you need to pass the class but that won't help you long term aside from getting a degree)
im also writing my own language currently and I've been stuck on expression parsing for the past 3 weeks do you have any advice? the most difficult part so far is parentheses
I am really curious how that happened. Usually parentheses are the easiest ones to solve :) you can look into "Parsing expression grammar" on wikipedia or think how you can split you grammar into "Factor" "Term" and "Expression" (This is just for */+- but can be extended easily)
using Classes ans namespaces for functions is basically all I do when I have to work with 100% OOP language rsrsrsrs it just doesn't click to me the "OOP style"
Hey, as a viewer, I’d recommend trimming down this video. I’m two minutes in and haven’t gotten to the part that I clicked on far. I want to learn about building a compiler not as much about your language choice. I actually left the video but decided to come and comment this since I know feedback is really important for initial channel growth. “If I had more time, I would have written a shorter letter” - T.S. Elliot
@@osogrande4999 analytics says otherwise, but I get your point. I am not an editor so my other videos are just raw content. Hope that you will find them better :D
super interesting deep-dive, the only downside is that now I want to build a compiler too 😳
compile me instead 😳
@@senzmaki😳 😳
@@senzmaki wtf 💀
@@senzmaki de fuk 😳
that's not a downside :)
it is a huge achievement for 25 days
neovim, tmux and i3/sway user spotted
nice choices
Also the explanations were really good
Thanks! 😃
i watched some part of your videos , it felt great , keep it up im a big fan.
Happy to hear that!
You dont need subtitles, your english is fine
I prefer subtitles always. I watch movies in my native language with subtitles.
Awsome brother ! keep going ahead!
I recently wrote a tokenizer and parser for RESP for an implementation of Redis and I think I'll like to build on that to write my own interpreter and then compiler for a language too. Good video. 👍
Really cool!
39:06 Hey, first of all, great job! I think the list/array implementation is usually done the other way around - so lists/arrays are primitives built into the compiler and a string is then added later in the language itself. Think about it, a string is just an array of bytes or a pointer and length!
I agree with that. Was too lazy to do proper arrays :)
Thank you so much for making a compiler in 25 days!
Nice video bro!!!!
Underrated content
I started a calculator project in c about a week ago and i'm working like full time on that shit and i currently have a simple cli, the tokenizer and the lexer. It has pre defined math constants to use and also simple math functions like sqrt and similar. Only thing missing is the parser and the evaluator for it to be fully usable. Also i need to generate the ast but i think thats a part of the Parser. I really don't know how you did so much in so little time. Great project
Thank you :) you are pretty close so keep going :D
@AlexTheRealDev thank you very much. I'm very happy that i found your channel
Esti un genius frate, respectele mele !!!
absolutely loved the series, followed everyday, it would be helpful if you could mention references and books or papers etc
Thank you :) check the description for references
Very impressive :)
dang this is such an interesting video. I feel like I am so limited by just sticking to doing CRUD apps when people are building mf COMPILERS. Cool shit man please keep making stuff I love it
Damn, this is really interesting, I made a compiler too a while ago but it looks very different to what you can up with. I did it in Python (full soy disclosure) though, so that's probably why lol. Cool vid sir
Also I'm glad you went the same way with IC, I found I was basically just doing that for dogmatic reasons instead of because it was particularly useful for me. I did about 1 optimization (constant folding like you mentioned in the vid) using it in the first place that could've probably been done w/o it tbh.
foarte misto 🚀
Vertical slicing such as creating enough of the language to write assembly for a program that exits with a specific number has done me well so I don't get in my own head about perfecting every little stage one stage at a time. There's at least enough boilerplate code that each state can morph well for my needs because of that vertical slicing and writing lots of test cases for the target language.
Thorsten Ball's books on the subject are worth considering as primary material. He may not exactly get to the point right away, so skipping around a bit to see what he had as the final result per chapter is recommended over just reading it linearly.
Same, when I was writing mine I oriented all my efforts towards making it able to output a basic "hello, world" before trying to make it do anything more complicated - just to make sure all the main infra was working.
7:25 agree, i dislike C enums too.
Grouping constant values by prefixes and __ it's annoying with the preprocessor it's annoying.
C++ it's a little bit better, at least assures you they will be constants instead of constant variables.
Hope to try cpp2 syntax with the new enums to overload functions and operators on them.
I am curious to try the c3 language, I heard it is the new "C replacement"
I am building a compiler for my own language in Rust, and Rust is amazing for parser stuff!
Fancy seeing you here
@@Speykious Hehe. I am everywhere and nowhere at the same time.
Really awesome for just 25 days
Man i really liked this video! Can you tell us the tools you used to create and edit the video with the nice source code and captions?
Sure, I used kdenlive for video editing. It also has a feature of speech recognition that generates captions. For the source code I just screenshot my code from neovim :)
Hi, Alex! Awesome personal challenge and even more awesome that you had the grit to finish it in just 25 days 👏🏻
I'm curious what resources you used for the theoretical aspects. I think you're mentioning Thorsten Ball's "Writing An Interpreter In Go" in the Lexer section of the video, but did you use anything else besides that? Like how to represent an AST in C, how to structure lexing and parsing, different forms of IRs, etc. The README file in the repo is lacking too in this respect.
Also, you're right about type checking resembling an interpreter. I had the exact same feeling when I implemented my first type checker. Some folks even say that type checking is _abstract_ interpretation (see "Interpreting types as abstract values" by Oleg Kiselyov & Chung-chieh Shan).
Yes, for the lexing stage I made use of Thorsten Ball's "Writing An Interpreter In Go". But past that I honestly just experimented with stuff until it worked. I have tried to do parsers in the past, so I kind of already knew what might work. But I just tried to follow the language's manual (for the structure of the AST). I did not come up with COOL by myself, I think it is a well known language for compiler courses :D
@@AlexTheRealDevThanks. Yes, I knew about COOL and Alex Aiken's compiler course. All very... well, cool stuff :)
Amazing 🔥🔥🔥🔥🔥🔥
Super interesting, i have a question, i saw the repository and i found lib folder, how you implement librarys ?
I added some docs + you can watch the networking videos to see how i did it for pthreads or just copy paste existing modules
ive been planning for so long but just delaying. currently trying to complete ben eater 6502
Great stuff, bro! What book were you using as referrence?
I am currently trying my own compiler / interpreter and following the guide on Crafting Interpreters by Robert Nystrom.
book that helped me build the parser: Writing An Interpreter In Go - Thorsten Ball :D good luck with the compiler!!
6:53 Why should the tokenizer never fail? What's wrong with reporting an unexpected closing bracket or eof in the middle of a string literal?
I mean, I agree that the tokenizer should be able to recover after an invalid token, but I don't think it's good to mix valid tokens with errors in a single type.
I’m still handling the errors that you mentioned, it’s just that the lexer outputs tokens (invalid is one of them) and the parser is supposed to report the errors. At the end of the lexing stage you want to see a stream of tokens, not a mix of tokens and error messages
@@AlexTheRealDev I see your point. However, I didn't mean returning an error message specifically, just returning an object that encapsulated all information necessary to report an error, which is in your case an invalid token. I guess it makes sense to postpone error messages until parsing if you're collecting all tokens first, it's just that in my case I return either a token or an error, and stop immediately tokenizing if an error is encountered.
On a side note, wouldn't it be better to only begin parsing if all tokens are valid? I'm starting to think that reporting tokenizer errors is not really among parser's responsibilities 🤔
I see. I just wanted to be able to report as many errors as possible (e.g all invid tokens, and then all the sytax errros, so this way you will show both the “eof in string” and the “missing semicolon” in one run)
What is the name of the book that he is talking about during the lexer section?
I'm pretty sure it's Thorsten Ball's "Writing An Interpreter In Go". It's that book that defines a language called Monkey (Alex brings up Monkey in the video).
"Never reinvent the wheel" is a dumb phrase. If everyone followed that we wouldn't have options and we wouldn't discover new and interesting wheel designs.
amazing project ! Could you please tell me WHat is that font you are using ?
thanks. I use Iosevka Term
@@AlexTheRealDev Ohh thank you so much. I swear I did not realize it was a variant of Iosevka 🤣
Sometimes I don’t even notice the difference between fonts and I am always suprised of people asking about the font😂
@@AlexTheRealDev 🤣
Fucking awesome bro, keep going
imagine not just delimiting by space and parsing by simple rpn syntax for looking up dictionary of macros for generating relevant target assembly
this reply was written by 1 afternoon compiler developers union and approved by forth gang
stupid stackbrained jokes aside, good job! I rub my head a lot thinking of the why and how of the concept of lexing and advanced syntax, so it is always interesting to see what people do
sir may I ask, how you setting the line height ?
I use lazyvim/nvim and the code is too cramping
that must be a setting in your terminal emulator. which one do you use?
Would you able share which book you used to leane how to build compiler? I am also interested building my own language for sake of learning.
I have used "Writing An Interpreter In Go - by Thorsten Ball" for the lexer stage. Then most of my knowledge for parser syntax etc comes from the courses I took in uni on compilers + experimenting around with parser for other things (like JSON, ini)
I understand your English perfectly. Please don't put captions on the video when the RUclips captions will work just fine and those can be turned off.
thank you for letting me know :D
@@AlexTheRealDev Hey man, appreciate it. I don't intend to be negative as this is good content. Just felt I had to say it.
Where did you learn C mate? Can you provide some resources to learn C and compiler development, I'm pretty interested in making my own compiler.
And thanks in advance
You should check out craftinginterpreters
You are helpless
The more interesting question is what problem do you want to solve and how do you want people using the language to feel compared to them using other languages that have pain points you hope to solve with your particular language.
That should be the guiding compass for you to better research how you want to use C or any other systems programming language for the sake of compiler engineering.
Wish you good luck and godspeed!
Without a doubt, the first resource I would point you towards would be “Crafting Interpreters” by Robert Nystrom, a wonderful book and available free online. The second half is also in C. It will give you tools and knowledge around language design through to execution.
Now, the compiler part is trickier:
The “Dragon Book” (Compilers: Principles, Techniques, and Tools) is the venerable text of yore, it’s got good stuff in there no doubt, but techniques have implementations have changed. It’s very good but it’s no longer the holy grail it was.
I read Andrew Appel’s “Modern Compiler Implementation in ML” and enjoyed it. He has a C version available, I can’t comment on that publication, but I imagine it’s very similar.
No Starch have “Write A C Compiler” in Early Release and final publication set for August. The author, Nora Sadler, wrote a series of blog posts on writing a compiler a good few years back, I imagine it’s a fleshed out version of these, so either get the early access or read those posts.
The last option is also the most lucrative in my opinion, but also easily the most information dense and assumes a lot of knowledge you’ll have to pick up from the previous resources and use C++, not strictly just C. It is LLVM’s official series on implementing “Kaleidoscope”, a language made up for the tutorial.
It’s a very fun topic, so so much to dig your teeth in to from optimising assembly to high level type systems. Don’t be daunted and expect to have many years of illuminating moments (“oh this is why this language does X… this is why seemingly unrelated features of language Y and Z feel familiar”)
Check out the ”Crafting Interpreters” book by Robert Nystrom. It goes over how to create a simple interpreter and compiler in both Java and C.
This is crazy I’m making a compiler in python it will be self hosted at some point. I made the lever in c++ and I realized my summer is almost up so I ain’t gonna make it in c++ I might move to typescript possibly but yeah
I agree, if you plan on self hosting the compiler you should make a “just works” implementation in something simple like python or ts and then focus on the self hosting part
@@AlexTheRealDevas in skip like the ast and error handling? So it works and i can go straight to making it self hosted?
@@devaughntimoll9493 that can save you some time for sure
@@AlexTheRealDevI have made integers,strings variables and exit codes it quite basic but I’m getting somewhere
I also had the compiler course in my last sem and the teacher was so easy to fool that most students just wrote code in any language and fooled them into thinking it's the compiler. Even after seeing the code they got fooled man 😭, it was pretty funny but it def felt bad. So, i actually learned it on my own, great experience, Compilers are fun forreal!
But yeah, that course sucked ass because of that teacher, and i am happy i did not indulge in the fooling (i did that in the first class tho, infact it was started by me 💀, i just never did it again)
That is crazy 😂 glad you decided to learn tho that is great 👍
They were cheating themselves though. The purpose of college is to learn not to pass classes (well obviously you need to pass the class but that won't help you long term aside from getting a degree)
Interested to see how the gwee game goes.
How did you learn the stuff before building compiler.
That’s a good question. I would say just by writting a lot of small projects. But I also have a CS degree, so I am a bit privileged in that way :)
im also writing my own language currently and I've been stuck on expression parsing for the past 3 weeks do you have any advice? the most difficult part so far is parentheses
I am really curious how that happened. Usually parentheses are the easiest ones to solve :) you can look into "Parsing expression grammar" on wikipedia or think how you can split you grammar into "Factor" "Term" and "Expression" (This is just for */+- but can be extended easily)
@@AlexTheRealDevthanks, I'll give you some updates here if I remember lol
@@AlexTheRealDev thanks again, I managed to make it work
What resources did you referred while making this
Writing An Interpreter In Go - Thorsten Ball
Zig lang would've probably been optimal both for readability and ability
using Classes ans namespaces for functions is basically all I do when I have to work with 100% OOP language rsrsrsrs it just doesn't click to me the "OOP style"
completely agree
you've talked about a book What is it ??
check the description :D
Good job, meet me at my office
- Josh
"enuf yappin" 😭😭
“.cl” is a common lisp extension
I noticed that too, but that is what the author of the language decided to use, so I tried to follow that
i heard a quote that said there are two types of coders: those who wrote compilers and those who didnt
Hey, as a viewer, I’d recommend trimming down this video. I’m two minutes in and haven’t gotten to the part that I clicked on far. I want to learn about building a compiler not as much about your language choice.
I actually left the video but decided to come and comment this since I know feedback is really important for initial channel growth.
“If I had more time, I would have written a shorter letter” - T.S. Elliot
Thanks man, I tried to make use of the chapters for that
Go make content on India's RUclips knockoff
this fad of cutting out silence between sentences is the stupidest and most annoying thing. I can not listen.
aight bet, will try to do that better. tried to please the majority :P
@@AlexTheRealDev the majority don’t want onsinglesentenceutteredwithoutstoppingforbreathitsoundsveryveryirritating
@@osogrande4999 analytics says otherwise, but I get your point. I am not an editor so my other videos are just raw content. Hope that you will find them better :D
@@AlexTheRealDev I don’t give a fuck about analytics. Blocked