Regex Library in Rust from Scratch (Finite-State Machines)
HTML-код
- Опубликовано: 4 июн 2021
- References:
- Finite-state Machine: en.wikipedia.org/wiki/Finite-...
- Turing Machine: en.wikipedia.org/wiki/Turing_...
- Regex Source Code: github.com/tsoding/regex-stream
Support:
- Patreon: / tsoding
- Twitch Subscription: / tsoding
- Streamlabs Donations: streamlabs.com/tsoding/tip
Feel free to use this video to make highlights and upload them to RUclips (also please put the link to this channel in the description) Наука
This "recreational coding" is exactly what I want to spend my life doing
me too
@@travaaa8432God's plan
go for it
Just found this channel and it's an absolute gem mine.
your content is just precious and very educational too.
You can use Enums for indexing, you just have to set the values for the enum like
enum A { B = 0, C = 1}
and then cast it
arr[A::B as usize]
I might be dumb for asking, but can't you also repr(C) the enums now-a-days? While it's less readable and less "rust"y, it looks like it would do the right thing. ¯\_(ツ)_/¯
you could also do #[repr(usize)] enum A { B, C }
this seemed really interesting, hope you finish working on the groups next time!
For array initialization, the lazy_static module is a good option.
This is an absolute masterpiece.
seeing this, even though i am great at understanding and writing code, i suck at algorithms and all the other mathematical shit so i don't have what it takes to be a full blown developer. i now understand why entrance exams require mathematical knowledge now
if you like coding, its just a little jump over to math and algorithms. a really good course for algorithms is by princeton professor Rob Sedgewick (by hisself student of the great donald knuth, look them up, if you dont know those guys): ruclips.net/video/1QZDe28peZk/видео.html
You could've done something like:
('a'..='z').map(|a| a as u8 ).collect::()
range is mappable, you just need to call collect method on it
Seems like this kind of engine does not work properly with regex like ".*bc", doesn't it? It will stuck on FsmColumn of *, which will never propagate to latter columns. I guess we need some kind of lookahead for it.
Oh yeah! That is true! Thank you so much for noticing! I'll think what we can do about it.
@@TsodingDailycompile your regex to an NFA (non-deterministic finite automaton) which is equivalent to a DFA (deterministic finite automaton, what you called "finite state machine" thoughout the video). The operations that compose the parts of regexes are concatenation, branching/option and repeating (Kleene-closure). All other regex features (like +) are syntactic sugar for the basic operations. These operations are fairly easy to implement over NFAs. Therefore, compile the parts of your regex to separate NFAs and apply the corresponding operations. Then convert your NFA to a DFA (there are fairly stright-forward algorithms for that) and you have it.
That's how I wrote a regex library myself, also using rust and for recreational purposes.
quick note on your finite state machine terminology:
turing machines, push-down-automatons, NFAs and DFAs are all finite state machines.
NFAs and DFAs are equivalent and can accept regular languages (regexes define a regular language).
push-down-automatons are basically NFAs with a stack, the top of which is an argument to the transition function, and can accept regular and context-free languages (all programming languages are context-free languages).
turing machines can compute anything computable (*the* famous paper by Alan Turing) and, as you mentioned, modify some memory which is also their input and output.
Great content here! But does it work with the regex "a.*bc"? I guess that the .* will consume all remaining characters and fail on valid inputs as "adfgbc"... Am I wrong?
BTW, I would love to see a follow up video where you tackle this problem and the nested regexps you were mentioning at the end.
Cool video. Thanks mr zozin
I saw a good video on command pattern for a more complex state machine
you could also match the combinations of all the dimensions of the state machine inside a tuple, so that you get flat arms (rather than nested) and the code is "cleaner" and more manageable, like: `match (state, event, other_dim)`. For a few number of dimensions I think this approach is more idiomatic and less "hacky". Thanks for the video anyways, really enjoy it thoroughly.
1:35:55 You should rename your channel to T-rex Daily
Unexpectedly excited to see another emacs user writing Rust! 😀 It seems everybody is using (or at least demoing) VSCode, which I don't use for a lot of reasons.
i really wonder, why so many people use VSCode these days. It is by far not an good editor (as far as i know, i just tried it out once). I don't use emacs either, i am only a little vim user and quite happy with it (and cannot even program properly in common lisp).
@@legion_prex3650 so you can't program yet you're complaining what code editor other people use? Cognitive dissonance in action here folks.
@@JohnDoe-sp3dc as someone who can program reasonably alright, I agree with the sentiment that VSC is a pain simply because ease of access. You install Vim, all you gotta do is read the docs and you know what you're doing. For VSC you have to download 15 packages to do one task, then you need internet to update those packages just to get them to work properly, it's all a big fuss. I probably would have less to complain about if it wasn't so finnicky with everything, but as it serves now it's just not nice to use unlike something like Vim or emacs
@@belaolson8172download everything? I mean usually when it's a new language sure, and you get a pop up asking you to download all major packages for that language,
Once it's set up, that's it - you never need to worry about that sort of thing again, updating is done automatically when you get internet connection, your not forced to do so nor does it render your editor useless by not updating said packages (dunno how you got that issue - never had any thing of that sort)
I'm not saying Vs code is great, but the points mentioned seemed trivial and not highlighting actual problems with Vs code - and painting a worst picture than it rlly is (yes downloading extensions for new languages suck, I get that point, but it's not as* bad as you made it sound to be where it's problem ridden etc - it's a automated process the way I see it)
Vs code has a slow start up that's for sure - I've not used many other editors, but Vs code is for my purpose, very usable - sure it's ease of access for things may not be the same - where using vim will be faster cause kB shortcut (n powerful commands that can come with that) vs navigating a UI, plus mem intensive compared to running in a terminal etc etc
@@JohnDoe-sp3dc If you want to insult people you should at least try to properly understand what they're saying.. They specified that he cannot program in lisp, which is the language you use to configure Emacs, which is why they stick to vim. There was absolutely no cognitive dissonance in that sentence. I guess you can still be angry with them for having an opinion on vscode, but that seems silly imo.
16:20 thank you but you didn't inform user what to write?! also enum means already const.
This non-edge lord attitude in this video is infinitely more pleasant to watch. I know, super gay. But it is the truth.
you mean him not trying to be edgy is good? hell yea
@@abujessica yes
if you really know him then he is edgy. He banned many people from his chat for just trying to help him
gay? what are you talking about?