Regex Library in Rust from Scratch (Finite-State Machines)

Поделиться
HTML-код
  • Опубликовано: 4 июн 2021
  • References:
    - Finite-state Machine: en.wikipedia.org/wiki/Finite-...
    - Turing Machine: en.wikipedia.org/wiki/Turing_...
    - Regex Source Code: github.com/tsoding/regex-stream
    Support:
    - Patreon: / tsoding
    - Twitch Subscription: / tsoding
    - Streamlabs Donations: streamlabs.com/tsoding/tip
    Feel free to use this video to make highlights and upload them to RUclips (also please put the link to this channel in the description)
  • НаукаНаука

Комментарии • 34

  • @titaniumtomato7247
    @titaniumtomato7247 Год назад +93

    This "recreational coding" is exactly what I want to spend my life doing

  • @omarmagdy1075
    @omarmagdy1075 2 года назад +15

    Just found this channel and it's an absolute gem mine.
    your content is just precious and very educational too.

  • @nilstrieb
    @nilstrieb 3 года назад +61

    You can use Enums for indexing, you just have to set the values for the enum like
    enum A { B = 0, C = 1}
    and then cast it
    arr[A::B as usize]

    • @zaneoblaneo7624
      @zaneoblaneo7624 2 года назад

      I might be dumb for asking, but can't you also repr(C) the enums now-a-days? While it's less readable and less "rust"y, it looks like it would do the right thing. ¯\_(ツ)_/¯

    • @relacibo
      @relacibo 2 года назад +29

      you could also do #[repr(usize)] enum A { B, C }

  • @fr3fou
    @fr3fou 3 года назад +45

    this seemed really interesting, hope you finish working on the groups next time!

  • @martinfrances107
    @martinfrances107 2 года назад +5

    For array initialization, the lazy_static module is a good option.

  • @kyuantym
    @kyuantym Год назад

    This is an absolute masterpiece.

  • @erenmetesar8427
    @erenmetesar8427 2 года назад +5

    seeing this, even though i am great at understanding and writing code, i suck at algorithms and all the other mathematical shit so i don't have what it takes to be a full blown developer. i now understand why entrance exams require mathematical knowledge now

    • @legion_prex3650
      @legion_prex3650 2 года назад +1

      if you like coding, its just a little jump over to math and algorithms. a really good course for algorithms is by princeton professor Rob Sedgewick (by hisself student of the great donald knuth, look them up, if you dont know those guys): ruclips.net/video/1QZDe28peZk/видео.html

  • @erenmetesar8427
    @erenmetesar8427 2 года назад +14

    You could've done something like:
    ('a'..='z').map(|a| a as u8 ).collect::()
    range is mappable, you just need to call collect method on it

  • @JMCe6a
    @JMCe6a 3 года назад +14

    Seems like this kind of engine does not work properly with regex like ".*bc", doesn't it? It will stuck on FsmColumn of *, which will never propagate to latter columns. I guess we need some kind of lookahead for it.

    • @TsodingDaily
      @TsodingDaily  3 года назад +14

      Oh yeah! That is true! Thank you so much for noticing! I'll think what we can do about it.

    • @iuppiterzeus9663
      @iuppiterzeus9663 8 месяцев назад

      ​@@TsodingDailycompile your regex to an NFA (non-deterministic finite automaton) which is equivalent to a DFA (deterministic finite automaton, what you called "finite state machine" thoughout the video). The operations that compose the parts of regexes are concatenation, branching/option and repeating (Kleene-closure). All other regex features (like +) are syntactic sugar for the basic operations. These operations are fairly easy to implement over NFAs. Therefore, compile the parts of your regex to separate NFAs and apply the corresponding operations. Then convert your NFA to a DFA (there are fairly stright-forward algorithms for that) and you have it.
      That's how I wrote a regex library myself, also using rust and for recreational purposes.
      quick note on your finite state machine terminology:
      turing machines, push-down-automatons, NFAs and DFAs are all finite state machines.
      NFAs and DFAs are equivalent and can accept regular languages (regexes define a regular language).
      push-down-automatons are basically NFAs with a stack, the top of which is an argument to the transition function, and can accept regular and context-free languages (all programming languages are context-free languages).
      turing machines can compute anything computable (*the* famous paper by Alan Turing) and, as you mentioned, modify some memory which is also their input and output.

  • @boboxxrbt
    @boboxxrbt Год назад +1

    Great content here! But does it work with the regex "a.*bc"? I guess that the .* will consume all remaining characters and fail on valid inputs as "adfgbc"... Am I wrong?
    BTW, I would love to see a follow up video where you tackle this problem and the nested regexps you were mentioning at the end.

  • @rodelias9378
    @rodelias9378 3 месяца назад

    Cool video. Thanks mr zozin

  • @DMWatchesYoutube
    @DMWatchesYoutube 2 года назад +1

    I saw a good video on command pattern for a more complex state machine

  • @andresalejandrogarciahurta5856
    @andresalejandrogarciahurta5856 2 года назад +3

    you could also match the combinations of all the dimensions of the state machine inside a tuple, so that you get flat arms (rather than nested) and the code is "cleaner" and more manageable, like: `match (state, event, other_dim)`. For a few number of dimensions I think this approach is more idiomatic and less "hacky". Thanks for the video anyways, really enjoy it thoroughly.

  • @MACAYCZ
    @MACAYCZ Год назад +3

    1:35:55 You should rename your channel to T-rex Daily

  • @GPandzik
    @GPandzik 2 года назад +6

    Unexpectedly excited to see another emacs user writing Rust! 😀 It seems everybody is using (or at least demoing) VSCode, which I don't use for a lot of reasons.

    • @legion_prex3650
      @legion_prex3650 2 года назад +2

      i really wonder, why so many people use VSCode these days. It is by far not an good editor (as far as i know, i just tried it out once). I don't use emacs either, i am only a little vim user and quite happy with it (and cannot even program properly in common lisp).

    • @JohnDoe-sp3dc
      @JohnDoe-sp3dc Год назад +1

      @@legion_prex3650 so you can't program yet you're complaining what code editor other people use? Cognitive dissonance in action here folks.

    • @belaolson8172
      @belaolson8172 Год назад

      @@JohnDoe-sp3dc as someone who can program reasonably alright, I agree with the sentiment that VSC is a pain simply because ease of access. You install Vim, all you gotta do is read the docs and you know what you're doing. For VSC you have to download 15 packages to do one task, then you need internet to update those packages just to get them to work properly, it's all a big fuss. I probably would have less to complain about if it wasn't so finnicky with everything, but as it serves now it's just not nice to use unlike something like Vim or emacs

    • @syth-1
      @syth-1 10 месяцев назад

      ​@@belaolson8172download everything? I mean usually when it's a new language sure, and you get a pop up asking you to download all major packages for that language,
      Once it's set up, that's it - you never need to worry about that sort of thing again, updating is done automatically when you get internet connection, your not forced to do so nor does it render your editor useless by not updating said packages (dunno how you got that issue - never had any thing of that sort)
      I'm not saying Vs code is great, but the points mentioned seemed trivial and not highlighting actual problems with Vs code - and painting a worst picture than it rlly is (yes downloading extensions for new languages suck, I get that point, but it's not as* bad as you made it sound to be where it's problem ridden etc - it's a automated process the way I see it)
      Vs code has a slow start up that's for sure - I've not used many other editors, but Vs code is for my purpose, very usable - sure it's ease of access for things may not be the same - where using vim will be faster cause kB shortcut (n powerful commands that can come with that) vs navigating a UI, plus mem intensive compared to running in a terminal etc etc

    • @lainiwakura3741
      @lainiwakura3741 10 месяцев назад +4

      ​@@JohnDoe-sp3dc If you want to insult people you should at least try to properly understand what they're saying.. They specified that he cannot program in lisp, which is the language you use to configure Emacs, which is why they stick to vim. There was absolutely no cognitive dissonance in that sentence. I guess you can still be angry with them for having an opinion on vscode, but that seems silly imo.

  • @yapayzeka
    @yapayzeka Год назад +1

    16:20 thank you but you didn't inform user what to write?! also enum means already const.

  • @simonfarre4907
    @simonfarre4907 2 года назад +12

    This non-edge lord attitude in this video is infinitely more pleasant to watch. I know, super gay. But it is the truth.

    • @abujessica
      @abujessica 2 года назад +6

      you mean him not trying to be edgy is good? hell yea

    • @simonfarre4907
      @simonfarre4907 2 года назад +5

      @@abujessica yes

    • @yes-ni1od
      @yes-ni1od 2 года назад +2

      if you really know him then he is edgy. He banned many people from his chat for just trying to help him

    • @user-ck8kp8vb4l
      @user-ck8kp8vb4l 7 месяцев назад

      gay? what are you talking about?