Thanks! Me too. Maybe it means I'm weird, but I like watching Tsoding or Handmade Hero or similar where they are actually working real problems, not preparing a PowerPoint presentation.
What resources did you use to start making this? Ive made a lot of lexers and parsers but never a generator and it sounds honestly fun. I want to implement a parser generator with antlr's ALL* algorithm at some point
For code resources ... there are no libraries other than what comes in with C++17 standard libraries. Of course the definitive work is probably "Compilers: Principles, Techniques, & Tools" by Aho, Lam, Sethi and Ullman. (Non-affiliate link: www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_1). I don't think this is a good place for most people to start. It is dense, academic and poorly explained in places. I slogged through this book for quite a long time before I felt like I was understanding the pieces I needed. And there's a lot more there that I haven't bothered to dig into seriously. This isn't a bad place to start for NFA/DFA construction: - en.wikipedia.org/wiki/Thompson%27s_construction But this was way more helpful than anything else to get me over the understanding hump for how to go from a pattern all the way to a DFA: - swtch.com/~rsc/regexp/regexp1.html If I remember right this uses a postfix regex pattern instead of the in-fix we are more accustomed to seeing. It also shows only limited regex features. But from this you can really build the rest. Of course most modern regex engines use backtracking algorithms and not a DFA. But I like DFAs and don't need the features that backtracking enables. For UTF8 encoding: - en.wikipedia.org/wiki/UTF-8 But for looking up characters this has been super useful: - design215.com/toolbox/ascii-utf8.php I hope that gets you started! I'd love to see what you come up with. This is pretty niche and I haven't found many that share my interest. :)
For the LALR parser generator I'll make Some Day ... I just got that from slogging through the Purple Dragon Book ("Compilers: Principles, Techniques, & Tools" mentioned above) the hard way.
Thank you for your comment. If you read the title of the video, you would see that I am not writing a Tokenizer; I am writing a Tokenizer Generator which is a whole different beast. Also, if you cared to actually watch the video, I note multiple times per stream that I DON'T recommend doing this. For context, this is just a project that I enjoy working on for variety one day each week, so I feel no pressure to do anything other than meander where I want to on the project.
@@thediscouragerofhesitancy83 I think raw string literals would make it a lot easier and more effecient to generate the C++ where you're just doing large blocks of code, just a suggestion. like R"_()_"
@@joshnjoshgaming You are probably right. It probably would be easier to write with raw string literals. Maybe it's a sign that my brain is petrifying from old age that I haven't switched that code over yet. Or I'm becoming too curmudgeonly. :)
Hm. I can't say I'm surprised; I'm a programmer in a home office. I haven't taken much time to learn the finer points of audio recording. Too busy programming.
Out of curiosity, I compared the audio on Twitch, RUclips, Rumble and my local recording. RUclips has the worst rendition of it. Rumble is probably the second worst, then Twitch and my local recording sounds okay to me. I'm not sure why there's so much variation between the sites, since I upload from the same file.
i always enjoy raw coding videos.
Thanks! Me too. Maybe it means I'm weird, but I like watching Tsoding or Handmade Hero or similar where they are actually working real problems, not preparing a PowerPoint presentation.
This is super cool +1
Thanks!
What resources did you use to start making this? Ive made a lot of lexers and parsers but never a generator and it sounds honestly fun. I want to implement a parser generator with antlr's ALL* algorithm at some point
For code resources ... there are no libraries other than what comes in with C++17 standard libraries.
Of course the definitive work is probably "Compilers: Principles, Techniques, & Tools" by Aho, Lam, Sethi and Ullman. (Non-affiliate link: www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_1). I don't think this is a good place for most people to start. It is dense, academic and poorly explained in places. I slogged through this book for quite a long time before I felt like I was understanding the pieces I needed. And there's a lot more there that I haven't bothered to dig into seriously.
This isn't a bad place to start for NFA/DFA construction:
- en.wikipedia.org/wiki/Thompson%27s_construction
But this was way more helpful than anything else to get me over the understanding hump for how to go from a pattern all the way to a DFA:
- swtch.com/~rsc/regexp/regexp1.html
If I remember right this uses a postfix regex pattern instead of the in-fix we are more accustomed to seeing. It also shows only limited regex features. But from this you can really build the rest.
Of course most modern regex engines use backtracking algorithms and not a DFA. But I like DFAs and don't need the features that backtracking enables.
For UTF8 encoding:
- en.wikipedia.org/wiki/UTF-8
But for looking up characters this has been super useful:
- design215.com/toolbox/ascii-utf8.php
I hope that gets you started! I'd love to see what you come up with. This is pretty niche and I haven't found many that share my interest. :)
For the LALR parser generator I'll make Some Day ... I just got that from slogging through the Purple Dragon Book ("Compilers: Principles, Techniques, & Tools" mentioned above) the hard way.
Jones Joseph Williams Carol Jones George
Confusion
5 hours to write a simple tokenizer? also have you not heard of raw string literals?
Thank you for your comment.
If you read the title of the video, you would see that I am not writing a Tokenizer; I am writing a Tokenizer Generator which is a whole different beast. Also, if you cared to actually watch the video, I note multiple times per stream that I DON'T recommend doing this. For context, this is just a project that I enjoy working on for variety one day each week, so I feel no pressure to do anything other than meander where I want to on the project.
@@thediscouragerofhesitancy83 I think raw string literals would make it a lot easier and more effecient to generate the C++ where you're just doing large blocks of code, just a suggestion. like R"_()_"
@@joshnjoshgaming You are probably right. It probably would be easier to write with raw string literals. Maybe it's a sign that my brain is petrifying from old age that I haven't switched that code over yet. Or I'm becoming too curmudgeonly. :)
terrible audio
Hm. I can't say I'm surprised; I'm a programmer in a home office. I haven't taken much time to learn the finer points of audio recording. Too busy programming.
Out of curiosity, I compared the audio on Twitch, RUclips, Rumble and my local recording. RUclips has the worst rendition of it. Rumble is probably the second worst, then Twitch and my local recording sounds okay to me. I'm not sure why there's so much variation between the sites, since I upload from the same file.