Compiler From Scratch: Phase 1 - Tokenizer Generator 008: Code generating Context and Token

Поделиться
HTML-код
  • Опубликовано: 5 ноя 2024

Комментарии • 16

  • @emptycode1782
    @emptycode1782 Месяц назад

    i always enjoy raw coding videos.

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  Месяц назад

      Thanks! Me too. Maybe it means I'm weird, but I like watching Tsoding or Handmade Hero or similar where they are actually working real problems, not preparing a PowerPoint presentation.

  • @jamesmorris3756
    @jamesmorris3756 Месяц назад +1

    This is super cool +1

  • @havenselph
    @havenselph Месяц назад

    What resources did you use to start making this? Ive made a lot of lexers and parsers but never a generator and it sounds honestly fun. I want to implement a parser generator with antlr's ALL* algorithm at some point

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  Месяц назад

      For code resources ... there are no libraries other than what comes in with C++17 standard libraries.
      Of course the definitive work is probably "Compilers: Principles, Techniques, & Tools" by Aho, Lam, Sethi and Ullman. (Non-affiliate link: www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_1). I don't think this is a good place for most people to start. It is dense, academic and poorly explained in places. I slogged through this book for quite a long time before I felt like I was understanding the pieces I needed. And there's a lot more there that I haven't bothered to dig into seriously.
      This isn't a bad place to start for NFA/DFA construction:
      - en.wikipedia.org/wiki/Thompson%27s_construction
      But this was way more helpful than anything else to get me over the understanding hump for how to go from a pattern all the way to a DFA:
      - swtch.com/~rsc/regexp/regexp1.html
      If I remember right this uses a postfix regex pattern instead of the in-fix we are more accustomed to seeing. It also shows only limited regex features. But from this you can really build the rest.
      Of course most modern regex engines use backtracking algorithms and not a DFA. But I like DFAs and don't need the features that backtracking enables.
      For UTF8 encoding:
      - en.wikipedia.org/wiki/UTF-8
      But for looking up characters this has been super useful:
      - design215.com/toolbox/ascii-utf8.php
      I hope that gets you started! I'd love to see what you come up with. This is pretty niche and I haven't found many that share my interest. :)

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  Месяц назад

      For the LALR parser generator I'll make Some Day ... I just got that from slogging through the Purple Dragon Book ("Compilers: Principles, Techniques, & Tools" mentioned above) the hard way.

  • @공정환-n1q
    @공정환-n1q Месяц назад

    Jones Joseph Williams Carol Jones George

  • @PerriPaprikash
    @PerriPaprikash Месяц назад

    5 hours to write a simple tokenizer? also have you not heard of raw string literals?

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  Месяц назад +1

      Thank you for your comment.
      If you read the title of the video, you would see that I am not writing a Tokenizer; I am writing a Tokenizer Generator which is a whole different beast. Also, if you cared to actually watch the video, I note multiple times per stream that I DON'T recommend doing this. For context, this is just a project that I enjoy working on for variety one day each week, so I feel no pressure to do anything other than meander where I want to on the project.

    • @joshnjoshgaming
      @joshnjoshgaming 27 дней назад

      @@thediscouragerofhesitancy83 I think raw string literals would make it a lot easier and more effecient to generate the C++ where you're just doing large blocks of code, just a suggestion. like R"_()_"

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  27 дней назад +1

      @@joshnjoshgaming You are probably right. It probably would be easier to write with raw string literals. Maybe it's a sign that my brain is petrifying from old age that I haven't switched that code over yet. Or I'm becoming too curmudgeonly. :)

  • @nccnm
    @nccnm Месяц назад

    terrible audio

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  Месяц назад

      Hm. I can't say I'm surprised; I'm a programmer in a home office. I haven't taken much time to learn the finer points of audio recording. Too busy programming.

    • @thediscouragerofhesitancy83
      @thediscouragerofhesitancy83  Месяц назад

      Out of curiosity, I compared the audio on Twitch, RUclips, Rumble and my local recording. RUclips has the worst rendition of it. Rumble is probably the second worst, then Twitch and my local recording sounds okay to me. I'm not sure why there's so much variation between the sites, since I upload from the same file.