@EgonFreeman From experience... had they spoken up, they would almost certainly have been met with one of "who asked you?", "shut up, we're doing this!", or just ignored. Dozens of others didn't notice any issues, so who's going to listen to "that guy"?
I was actually at Cloudflare in the room for Cloudbleed and this issue, in SF for Cloudbleed and happened to be in London for this one. The real story is much better than this. We were at lunch doing a tech talk in the lunchroom when someone grabbed the mic and announced we were having a P0. We stampeded back to our desks and got to work fixing it. The issue was obviously related to the WAF from the start and it was just a matter of cleaning up. Keep up the videos they are great
@@NayayomIt’s actually really awesome. Tons of really smart, kind, curious people. Everything internally is about transparency, execution, and learning. Definitely engineering-centric, but also super product-focused in that the customer is always considered during meetings/talks/decisions.
Despite all this, I still very much love Cloudflare especially because of their transparency. They always go into great depth explaining what happened, what they did, and how they resolved it. Many companies can learn a thing or 2 from them in that regard. Customers tend to have more faith in a company that just owns up to it's mistakes rather then trying to have a PR department cover it up in nice words.
I hate cloudflare because its trying to become monopoly on internet ethics. Its not your job to pass judgement what is allowed on the internet. Banning something that is illegal is fair. But banning because its immoral according to them... yea I hate cloudflare.
@@StkyDkNMeBlz I really want to be tracked by Google across the internet with their corpo issued cryptographic IDs. You do understand what you are shilling?
I work in a cybersecurity administration space where regex is used all the time as a necessity. This is a story we tell people all the time to make sure they understand how important it is to make efficient regex.
@@WyvernnnnI disagree. The pattern didn’t make much sense. It was clearly missing something between the initial wildcard and non-capturing group. There’s never any reason to put two wildcards next to each other like that.
@@Zei33 Yeah that was weird, but it should still get O(n) in the end, that's the whole point of regular expressions (as long as you don't have capturing groups that can be re-used within the regex aka backtracking)
I can understand how it happened, even an experienced programmer can struggle to parse a regex by eye and it's easy to make something that's a resource hog without realising. It's certainly a lesson to test your regex thoroughly before release.
@@Croz89 oh yeah I’m constantly making mistakes. No one is perfect, that’s why debugging and beta testing exists. After over a decade of programming, I don’t make a lot of mistakes, but when I do they’re usually obscure cases or subtle logic errors. When you’re working with tens of thousands of lines of code and looking at them for 8 to 20 hours a day, mistakes are gonna happen.
Worth mentioning that Cloudflare isn’t just a CDN. It’s predominantly used by most websites as a web proxy responsible for the majority if not all requests to the origin.
@@emeraldbonsai Typically a CDN serves static or at least mostly static data. A CDN may be implemented as a caching web proxy, but a web proxy can do a lot more than what usually falls under the definition of "CDN". In CloudFlare's case, they basically offer both and blur the line about which is which, which is fine since it's a blurry line, but the person you are replying to isn't wrong.
@@VeggieRice DNS has nothing to do with what's being discussed here (aside from being an earlier step in the chain that would take you to the page's configured web proxy or CDN, of course, but equivalently so to saying "the user can elect their own browser").
The distinction is useful here as the outage is much more impactful if your web pages wont even load themselves (because the web proxy is down) rather than just CDN assets not loading (which could be only large assets).
It's important to note that re2 actually has other downsides compared to other regex engines, such as being unable to handle lookaheads and lookbehinds. This isn't just an implementation issue either: adding these operations actually makes regex strictly stronger than a finite state machine (instead it becomes a pushdown automaton). There's also a lot of fun math with finite state machines, where it turns out they're strictly equivalent to generating functions, which are basically power series where you don't care about convergence!
I think "look-around" assertions could still be implemented to run in linear time. As far as I know, back references is the only feature that can make the run time go exponential. In fact, matching regexes with backrefs is proven NP-hard.
Lacking some of the more advanced PCRE features in order to make guarantees about the maximum runtime seems like the right compromise to make for a high-volume security frontend that sits between the global population and a large swath of the internet.
@@hoo2042 That's actually why Russ Cox developed RE2 in the first place. He made it for google code search (now defunct). You can't really be Google and expect tech people to only input well-behaved regexes. He has a very interesting series of articles named "Implementing Regular Expressions". I really recommend every developer to read them.
@@hoo2042The problem now is that they use it on every Google product. re2 is the regex engine of BigQuery and I’m stuck with this limitations. It doesn’t make sense in a data warehouse.
Same here dude. I can't understand how people are still relying on regex for such important aspects of the code. It's just mind-blowing that a firewall rule is managed with that in 2023.
lol just write your own parser. You are acting as if that's a hard problem to solve and as if customers are not important. You just want to make your lives as "programmers" easier. Have some responsibility for the unnecessary amount of code that runs on users machines.
This is why as a general rule I NEVER use .* in my regexes. If I want to match everything before an equals sign, I'd use [^=]*= rather than .*= because it's always better to be as explicit as possible.
But that would match just the first '=' , not all of them. If you have a lot of parameters on a URL , you will have a lot of '=' and you will want to search all of them for certain things.
@@framegrace1 That's why you don't anchor the expression to the end of the string in this case. We don't care what else is at the end of the URL if we find a "bad thing" near the start. Also, most regex engines have a shortcut implementation for regexes ending in ".*"/".*$", so the one at the end is of no concern. And BTW, the issue was mostly the ".*.*", not so much the ".*=". Backtracking the latter isn't so expensive---it doesn't really matter if the engine has to search for the = from the start of the end of the remaining string. It most likely has a shortcut for "fixed character after match all" anyway. There's a good chance that ".*?=" is faster than "[^=]*?="/"[^=]*=" as it can scan the string using a simple "equals" comparison and be done. This, however, all goes out the window once there are multiple ways to match, like the infamous ".*.*". So when using this optimisation on purpose, it makes sense to manually commit after the "=" (e.g. with "(*COMMIT)").
It's amazing how casually things are actually handled behind the scenes in the IT world. I once wrote some software for a bank, did a 3 hour audit of the code with 5 of their top developers, after which they installed a pre-compiled earlier test version on their prod system. smh
I once fixed a globally crashing iOS app by hacking the backend to send out technically incorrect data. The app passed all tests because the test suites didn't include any data to reveal division-by-zero bugs. This was especially bad since the time to get Apple to review and deploy an updated version could take a week or more, IIRC. After conversing with the dev responsible, I asked him how he habdled fractional numbers, and he was sure that fractional numbers were always displayed as integers, so I changed the API to send instances of 0 as 0.001, effectively circumventing the bug while displaying calculated numbers (and 0s) correctly in the app. I think it's the most hacky fix I've ever deployed. It felt terrible and exhilarating and awesome all at the same time 😂 I'm actually a little proud 😇
Someone wrote a paper ages ago about backtracking vs non-backtracking regex engines and the state of software slowness... The title is "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)" written by a Russ Cox in 2007. I bet he's feeling vindicated
I work as a sysadmin, yet I wish I had this much insight into how all the technologies I use daily work this deeply. I never finished my college, so I only ever heard of DFA, but despite that, your video explained it very well, and showed how much of an issue a simple regex can be, when executed thousands a times a second. Please make more videos, I cannot wait for more
It is interesting to note that any regex can be represented as a nondeterministic finite automata (NFA) and any NFA can be converted into a DFA using a simple algorithm. The only downside is that the DFA may end up with exponentially more states than the NFA which can take up a lot of memory.
@@Nayayombro same. I hated it. But understanding them is pretty good. We can see how google's devs used automata and formal grammar theory to develop a useful practical application with regex.
@pav431 even without some formal education, I'd say anyone working as sysadmin should know something about algorithms and complexity theory. Especially when writing code that systems used by others depend on. And shell scripts _are_ code. Knowing that there are regular expressions, and highly irregular expressions that are just _called_ something like "Perl compatible regular expressions" or "extended regexp" or whatever, is important. So is not writing script that unnecessarily nest three or more loops, and work fine on small testdata, but take "forever" with realistic sizes of data. Know and understand the various o-notations. Just because quicksort is usually quick, doesn't prevent it from having O(n²) worst case complexity. You may be fine with that, but you will want to know why you can live with it. There has to be a metric ton of good books on this, so it's possible to learn. Enjoy!
@@lhpl As someone who's inherited, and then had to completely rewrite from scratch, core automation scripts for clusters that were written by novice sysadmins, I concur that learning these things is important. Some aysadmins learn that awk and grep exist and that's the end of their training. Rewriting from scratch saved me hours upon hours it would have cost me to try to maintain the poorly made code inherited from novice sysadmins.
I dont blame you for doing Cloudflare again, their RCAs are always excellent. This is such an excellent channel, you deserve far more subs! These are exceedingly entertaining and interesting for software engineers (and probably most other folks too!)
Absolutely crazy videos you're pumping up. Love your comedic editing style too! Every video of yours makes me feel like to the entire internet could break at any moment lol
Yeah, as much as I love Cloudflare for smaller stuff there's a reason a lot of large enterprises use Akamai. A little overpriced for a simple growth phase startup and not as transparent as Cloudflare when something breaks on their end, but that massive bucket list of features available with Ion Premier, Cloudlets, and many more, especially Datastreams and their web security analytics portal, is an absolute lifesaver. Hell, it helps us debug all sorts of broken stuff upstream of it too, although I wouldn't be surprised if Cloudflare offered something like Akamai Reference IDs for easy, enterprise-friendly tracing. Specifically, Akamai is really particular about having identical Staging and Production sections with really fast rollback when production error rates increase even a little.
All those rules are stored locally to each node, and you cannot rollback to a machine that is dead or so high on CPU that can't even handle a connection. I presume they globally disabled WAF and restarted the nodes, so when up, they didn't try to apply the WAF rules and were free to be rolled back/forward. Then they re-enabled WAF (very slowly, I presume :) ) and all was back to normal.
It's kinda funny that the Internet was designed to be a `web` that hopefully would prevent failures of a single node taking down the whole system, but nowadays we heavily rely on a handful of service providers just to run the Internet.
Regex is great like shell scripts: works everywhere and does it jobs... up until a certain script size when the chance of bugs starts increasing and you should think of using another tool instead or in conjunction. Also this sounds like GitOps to the extreme: when you can only change your state via your repo and all the triggers that come with it you might as well replace your CD with a single bash script (see above).
I had a regex blow up on me like that once. Not **quite** as silly as .*(?:.*=.*), but pretty close. The regex library we were using implemented backtracking with recursion, so instead of eating CPUs like a bag of chips it would instead masticate for a while before eventually running out of stack, whereupon it would puke Pringles. This was an especially fun one to fix because if you google “regex stack overflow” you’ll find that there are zillions of questions on stackoverflow about regexes that have nothing to do with stack overflows. And yes, in shame I must admit the regex in question did not fit on my screen all at once. In my defense, however, that was because a year or so earlier I had torn the line noise apart and put 3 to 5 characters of actual regex on each line, followed by a comment. Only two lines had // I have no idea, this shouldn’t do anything, but it doesn’t work without it.
The reason we predominantly use NFA regular expression engines is not just because they're usually faster if we don't throw non-degenerative expressions at them, but also because they support expressions that exceed the capabilities of a regular grammar, such as back references to a specific capture group that has been seen previously.
@@MH_VOIDfor a normal case the performance is generally similar, but the difference is that these linear engines like RE2 are more predictable and less likely to blow up in your face. If you don't have control of the pattern and the input, they are *much* safer, and losing features that depend on backtracking is generally not a big deal. If it's really performance critical just don't use regex at all if you can avoid it.
NFAs and DFAs are computationally equivalent and recognise only exactly the regular languages. So a NFA-backed RE engine would have to implement additional functionality as languages with backreferences are not regular.
I've learned in the meantime that the biggest speed advantage is actually due to unrelated technologies such as a JIT compiler in PCRE2, which is in fact a top-down parser that happens to accept regex-like expressions. The only thing that is definitely faster about NFA is compiling regular expressions.
Ken Thompson is crying really hard. His work has been around for decades. As a CS guy who has specialized in algorithms, this hurts in the middle of the heart.
Awesome and informative video! A small correction: NFA matching is still linear in the input string. You just have to store the configuration as a set of NFA states, rather than a single state. You don't get exponentially many paths in the way you describe in the video because paths ending at the same state are merged in this set representation.
Except it's wrong. You do non-capturing for performance reasons, to consume characters in some group (this is simply parentheses syntax reuse). In this particular case this was so obviously wrong I can't imagine anyone familiar not to spot this, but in general you shouldn't capture what's not required after the match is done.
I stumbled across your videos yesterday, and i find them really entertaining and interesting to watch! Thank you explaining these topics in a clear way that even if I know nothing about regex or cloudfare, i can still follow along and understand the video :)
Love these videos Kevin! Your amazing storytelling, editing, animations, and everything else comes together in an amazing way! Love watching every video you put out, keep it up :)
Well, from the thumbnail image, the regexp (.*=.*) says "find the LARGEST chunk of text possible before a literal = sign, then find the largest chunk after it, including other = signs if they exist", and it will walk the entire chunk of data many times to ensure it gets ALL of them. They probably meant to do (.*?=.*?), which would have found the SMALLEST chunks of text around literal = signs, and would stop as soon as it found even a single = sign.
I'm starting to hope another outage somewhere on the internet occurs just to see a Kevin Fang video with hundreds of explosion effects in my recommendations yet again
Non-capturing groups (unlike lookahead and lookbehind) do get included in the match result (think $0), they just don't create an additional sub-match. Eg at 6:35, that would match $0: $1: $2: while removing the ?: would make it $0: $1: $2: $3:
I mean they could have just done .*?=.* but I guess RE2 is safer long-term. Still this screams "I don't understand regex, it's just magic to me" on the part of that developer.
which is fair honestly, regex is basically just magic and once you understand the syntax you dont question it's ways. though im surprised nowhere else along the development process was anybody concerned over it. Apparently nobody that looked over it had any idea what it was doing.
@11:50 You don't need a DFA to guarantee time linear in the input length because you don't actually need to follow every path, you just need to keep track of the set of states that the NFA is in. This does make it more expensive proportional to the size of the NFA, but it's still linear in the input length.
While technically all NFA can be converted into DFA, the algorithm to do so (subset construction algorithm) has an exponential worst case time runtime. This is probably why people try to approximate the DFA.
You are a gem, thanks for the detailed information. You didn't only explain complex information, you also explained how WAF companies work. Thanks again. Salam!!
Didn't expect to hear about theoretical computer science (which is a subject I take this year) in this video but nice work. It's nice to see actual real-world usage of converting e-NFA to DFA's. I wish our prof would have included this video in his lecture...
The line at the end could be revised; it isn't that "convoluted" regex should be avoided, in fact writing regex that appears more complicated often ends up being better in that it's more specific. Really bad regex happens when people want to write something quick and dirty that will consume all valid cases, but without considering constraints, which was part of the issue here. I know this is probably what you meant by saying convoluted, but one thing missing here is a bit talking about what you should consider writing instead, and people unfamiliar with good vs bad regex might not come away with the right idea.
Cloudflare's free tier has the most value of any other free tier on the internet. They give you access to almost everything that the big companies have access to, just with certain limitations like max rule count. Amazing company
God I love watching these vids, I love the duality of high quality, digestible, information coupled with a nice sprinkling of "don't be a dipsh*t" commentary over issues and causation. Developer wise, nothing brings me more joy about my job than someone pointing out how much of an imbecile I *could* have been on that one day.
I'm currently working on my thesis where I'm also dealing with regular expressions and their internal NFA representation quite a bit. And I recently encountered some papers about these risks of the naive backtracking implementation most engines use. Very interesting to now see one of these problems occur in practice. This happening in practice also gives quite some validity to the approach Rust is taking, which ensures good asymptotic characteristics. Only thing I want to point out is that you saying "the increase in steps can potentially be exponential" is fairly misleading. Since it makes it sound like this particular case has an exponential asymptotic runtime, while it only has a quadratic one.
Define "Most". All unix utilities use DFA's . The problem are interpreted languages, they use PCRE so they can use the ~= operator dynamically, as the recursive regex doesn't need compilation and the implementation is easier. (even bash uses PCRE for the =~ operator) Using compilation caching, for example, all of them could use the normal regex library, but I guess it increases complexity.
@@framegrace1 That's fair. Most regex implementations I've encountered 🙂 But I'm a windows user plus I mostly use higher level interpreted languages, so adds up with what you're saying. The DFA version combined with capture groups just gets much more complex, and I don't know of any implementation that supports lookarounds with this either yet.
I still have questions as to why there are so many non-capturing groups in that regex and isn't the second ".*" before the "=" redundant? Could not this regex have been simplified? Edit: Also considering parameter names are typically shorter than their values, even if you had to do this I would assume (.*?=.*) would be more efficient on average.
The following regex would return the exact same match count: = If the purpose of that regex was to consume an entire line containing "=" (another thing it does, badly) then we could do something similar to what ya wrote: (?:.*?=.*) Though, I'm unsure if the non-capturing group is necessary, so maybe the version provided is fine. If we run the original through a regex tester? It doesn't matter how long the string is, it plain fails in every way. There is no use case for that regex except DoSing thyself.
It's a shame that even in 2023, most programmers still don't know how to make correct atomic regex, or even heard of the idea of applying atomicity to regex. That regex has more issues than just .*(?:.*=.*) but, yes, that's the most problematic part.
It does sweep some disadvantages under the table though. First, the DFA can be exponential in size, which means what you gain in execution time, you already lost in building time. Second, not all things which are possible with an NFA can be done with a DFA, or they can only be done at a huge cost. In particular, back references. For instance, matching a sequence of a's, then a b, then a sequence of a's of the same length as the first sequence of a's. As a regex, you write that as /(a*)b\1/. Try turning that into a DFA.
Wow great explanation of the matching process, I already did not like regex but this is next level... especially because one could easily write UNDERSTANDABLE code that does the job, but runs in O(n)...
If the input is arbitrary and especially comes from untrusted source, regex is almost never the answer, unless you have already limited the input size and complexity. This kind of logic should've called for a deterministic parser instead. While it would possibly have taken longer to write, it would have been more readable. On top of that, I would also consider using property-based testing framework to test with arbitrary input. Good learning experience, and great that Cloudflare shared detailed explanation of their mistakes.
You cannot (practically) use a deterministic parser for WAF, but you can compile the requests as said in the video which is what they should have done instead of using PCRE (I bet they used PCRE because it doesn't need compilation and "makes things easier")
Checking the code with property based testing, like with hypothesis, should find easily and relatively fast, that the regexp doesn't work because it timeouts for relatively small strings. Of course, at the time of the desaster, this wasn't wide spread, and you also need to be aware to set the timeout to a relatively short time (because the problem here is that you have to handle many requests per seconds, so a usual timeout in some seconds might keep the problem undedected at worse).
The original incarnation of regular expressions was very efficient and fast. But then people started adding features to it - features people liked, so they've stuck around. But - they open the door to VERY inefficient expressions. It doesn't take a lot of study to understand why some of these things break your performance.
This is why I pretty much always avoid * and + in regex without the question mark. Adding the question mark makes it very efficient. Take this regex for example /.*?end/ This matches any string ending in "end", but it will terminate the first time it sees "end". So it will only match the first 4 characters of "0endend". As soon as it is able to match the full regex once, it stops backtracking and trying to eat more characters. If we were to use the greedy version instead without the question mark though... /.*end/ It will match the full string "0endendendendend" because the ".*" tries to consume as many characters as possible, resulting in tons of unecessary backtracking, and it is almost never my intent to consume as many characters as possible. If I want to check for the beginning or end of a line, I can just use ^ and $ respectively.
I do a fuckton of regexing, but I rarely make them lazy, and just leave it as the default eager unless I actually need to in order to get the correct matches (which is pretty rare IME). Does it really matter most of the time though if you make it lazy? Surely not, because if it did, regex by default would be lazy and the question mark would make it eager, no?
@@MH_VOID No idea why it's eager by default. I just know that the question mark improves performance because it doesn't have to search as much of the string. I can't say exactly how much it improves performance, or how often it's relevant, but is basically guaranteed to be faster some of the time because it literally performs fewer operations. I imagine this is more relevant when the repeated group is very vague and will match lots of things, like /.*/. On the other hand, something like /o*/ would stop as soon as it found something that wasn't an "o". But here's an example of where the greedy version is really bad. Let's go with this regex again. /.*end/ And lets run it on this hypothetical string, "hello end, [pretend there are a fuckton of characters here], etc, etc, bender" So instead of just matching "hello end", it has to go through this insanely long string, and ends up matching everything up to "bend" because "end" is in there. And frankly, that probably isn't even what you wanted. Even worse though, it would still have to go through this enormous string even if it didn't end with the word "bender". Even if your actual match is some tiny thing at the beginning of the string, the greedy version forces the engine to go through the ENTIRE string. It's just silly and 99% of the time isn't what you want. This is why I pretty much always use the question mark unless I absolutely need it to be greedy for some reason. It's more likely to be efficient, and less likely to result in confusing bugs.
@@iamstickfigure question mark helps if normal path of your program is match. but what if it's non-match in 90% cases? regex with question mark will perform same amount of work in my practice I often worked with regex matching entire string /^...something...$/ question marks will not help much with them either
I know that i'm late for the party, and that maybe it was already been written, but i need to point it out (because i'm studying it right now) that at the end it's not (AFIK) completely correct to call it NFA. Technically because every transition consume an item it's a Petri net. Aside from that good job, and beautiful content!
Reminds me of when I first started at SAP and my soul job was to hunt down expensive sql statements in clients db's.. Loved being able to reduce server load and response times while learning client systems...
Tldr They added new code to stop a new type of attack. In a hurry, they skipped testing. The new code was so badly broken it was mistaken for an attack. It was later rolled back. New internal systems should reduce the risk of the same happening again.
If I had a nickel for every time quadratic complexity passed testing but blew up in prod I'd be rich
What if every time it quadratic complexity passed testing, you got a nickel for every time quadratic complexity passed testing?
If I had a nickel for every time I got a nickel...
If I got a nickel for every time someone says "If I got a nickel" I would be a gabbagillionaire!
This is why we have and should have computer science and not just computer engineering - math is important.
This is exponential, not quadratic. If it were quadric, there wouldn't be any problems.
2:51 I like how the 1000x engineer just foreshadows all the events that about to happen, and then approves the change.
par for the course tbh
Yeah average senior software engineer moment tbh
@EgonFreeman From experience... had they spoken up, they would almost certainly have been met with one of "who asked you?", "shut up, we're doing this!", or just ignored. Dozens of others didn't notice any issues, so who's going to listen to "that guy"?
a true 1000x enginerd would not use regex lol
@@jfbeam sounds like the culture in your organisation is arse. A let’s be careful should be taken serious coming from a day 1 intern.
"Some programmers run into a problem and think, 'I will use regex to solve this!' Now they have two problems."
- Zawinski
I was actually at Cloudflare in the room for Cloudbleed and this issue, in SF for Cloudbleed and happened to be in London for this one. The real story is much better than this. We were at lunch doing a tech talk in the lunchroom when someone grabbed the mic and announced we were having a P0. We stampeded back to our desks and got to work fixing it. The issue was obviously related to the WAF from the start and it was just a matter of cleaning up. Keep up the videos they are great
So how's working with cloudflare like ? :)
@@NayayomIt’s actually really awesome. Tons of really smart, kind, curious people. Everything internally is about transparency, execution, and learning. Definitely engineering-centric, but also super product-focused in that the customer is always considered during meetings/talks/decisions.
@@0xggbrnr sounds like a good place to work! Glad to hear that
Ok, but is not BETTER than the video story lol
What happened to the employee who made the regex?
Despite all this, I still very much love Cloudflare especially because of their transparency. They always go into great depth explaining what happened, what they did, and how they resolved it.
Many companies can learn a thing or 2 from them in that regard. Customers tend to have more faith in a company that just owns up to it's mistakes rather then trying to have a PR department cover it up in nice words.
@@KabodankiCaptcha? Google the "Privacy Pass" extension. It lets you skip the tests by doing tests beforehand.
@@StkyDkNMeBlz they don't have captcha anymore, they use turnstile
I love my state sponsored man in the middle
I hate cloudflare because its trying to become monopoly on internet ethics. Its not your job to pass judgement what is allowed on the internet. Banning something that is illegal is fair. But banning because its immoral according to them... yea I hate cloudflare.
@@StkyDkNMeBlz I really want to be tracked by Google across the internet with their corpo issued cryptographic IDs. You do understand what you are shilling?
2:37 laughed my ass off at "delete master after the pull request is merged"
and there are people that would click that checkbox.
Chill, "master" is the dev branch here while "main" is the master branch of course.
@@Blast-Forward Let's just agree that it's a funny easter egg in the video and laughing i justified.
Glad to see others had seen that too 😂❤
Yeah that was such a nice touch!😂
I work in a cybersecurity administration space where regex is used all the time as a necessity. This is a story we tell people all the time to make sure they understand how important it is to make efficient regex.
The regex is fine there was no reason for the engine to backtrack on it
@@WyvernnnnI disagree. The pattern didn’t make much sense. It was clearly missing something between the initial wildcard and non-capturing group. There’s never any reason to put two wildcards next to each other like that.
@@Zei33 Yeah that was weird, but it should still get O(n) in the end, that's the whole point of regular expressions (as long as you don't have capturing groups that can be re-used within the regex aka backtracking)
I can understand how it happened, even an experienced programmer can struggle to parse a regex by eye and it's easy to make something that's a resource hog without realising. It's certainly a lesson to test your regex thoroughly before release.
@@Croz89 oh yeah I’m constantly making mistakes. No one is perfect, that’s why debugging and beta testing exists. After over a decade of programming, I don’t make a lot of mistakes, but when I do they’re usually obscure cases or subtle logic errors. When you’re working with tens of thousands of lines of code and looking at them for 8 to 20 hours a day, mistakes are gonna happen.
I guarantee all the engineers who reviewed it didn't even look at the regex. you don't go poking someone elses regex
Or even your own if you wrote it more than 1 week ago.
Lol I hate that I’m relating to this😂🤣😂
Worth mentioning that Cloudflare isn’t just a CDN. It’s predominantly used by most websites as a web proxy responsible for the majority if not all requests to the origin.
the webproxy is a cdn last i checked
users can elect their own dns service
@@emeraldbonsai Typically a CDN serves static or at least mostly static data. A CDN may be implemented as a caching web proxy, but a web proxy can do a lot more than what usually falls under the definition of "CDN". In CloudFlare's case, they basically offer both and blur the line about which is which, which is fine since it's a blurry line, but the person you are replying to isn't wrong.
@@VeggieRice DNS has nothing to do with what's being discussed here (aside from being an earlier step in the chain that would take you to the page's configured web proxy or CDN, of course, but equivalently so to saying "the user can elect their own browser").
The distinction is useful here as the outage is much more impactful if your web pages wont even load themselves (because the web proxy is down) rather than just CDN assets not loading (which could be only large assets).
It's important to note that re2 actually has other downsides compared to other regex engines, such as being unable to handle lookaheads and lookbehinds. This isn't just an implementation issue either: adding these operations actually makes regex strictly stronger than a finite state machine (instead it becomes a pushdown automaton). There's also a lot of fun math with finite state machines, where it turns out they're strictly equivalent to generating functions, which are basically power series where you don't care about convergence!
I think "look-around" assertions could still be implemented to run in linear time. As far as I know, back references is the only feature that can make the run time go exponential. In fact, matching regexes with backrefs is proven NP-hard.
Lacking some of the more advanced PCRE features in order to make guarantees about the maximum runtime seems like the right compromise to make for a high-volume security frontend that sits between the global population and a large swath of the internet.
@@hoo2042 That's actually why Russ Cox developed RE2 in the first place. He made it for google code search (now defunct). You can't really be Google and expect tech people to only input well-behaved regexes. He has a very interesting series of articles named "Implementing Regular Expressions". I really recommend every developer to read them.
shouldn't regex expanded like this be called cfex instead, since it's, well, no longer a regular expression
@@hoo2042The problem now is that they use it on every Google product. re2 is the regex engine of BigQuery and I’m stuck with this limitations. It doesn’t make sense in a data warehouse.
my brain automatically shut down when you start explaining the regex...
Same here dude. I can't understand how people are still relying on regex for such important aspects of the code. It's just mind-blowing that a firewall rule is managed with that in 2023.
@@TMRick1 what alternative is there that is universally supported and has the same level of flexibility for how "compact" it is?
@@TMRick1 you just don't know the pain of using anything else to do what regex can do. What do you suggest? awk?
lol just write your own parser. You are acting as if that's a hard problem to solve and as if customers are not important. You just want to make your lives as "programmers" easier. Have some responsibility for the unnecessary amount of code that runs on users machines.
@@Игор-ь9щI use arch btw lol
These stories are so cathartic. Thanks for applying your storytelling to these niche topics!
upside down cloudflare logo for australia was gold
This is why as a general rule I NEVER use .* in my regexes. If I want to match everything before an equals sign, I'd use [^=]*= rather than .*= because it's always better to be as explicit as possible.
But that would match just the first '=' , not all of them. If you have a lot of parameters on a URL , you will have a lot of '=' and you will want to search all of them for certain things.
@@framegrace1 That's why you don't anchor the expression to the end of the string in this case. We don't care what else is at the end of the URL if we find a "bad thing" near the start. Also, most regex engines have a shortcut implementation for regexes ending in ".*"/".*$", so the one at the end is of no concern.
And BTW, the issue was mostly the ".*.*", not so much the ".*=". Backtracking the latter isn't so expensive---it doesn't really matter if the engine has to search for the = from the start of the end of the remaining string. It most likely has a shortcut for "fixed character after match all" anyway. There's a good chance that ".*?=" is faster than "[^=]*?="/"[^=]*=" as it can scan the string using a simple "equals" comparison and be done. This, however, all goes out the window once there are multiple ways to match, like the infamous ".*.*". So when using this optimisation on purpose, it makes sense to manually commit after the "=" (e.g. with "(*COMMIT)").
@@framegrace1 You can still get the last '=' by being more explicit: /([^=]*=)*=/ or /=[^=]*$/
It's amazing how casually things are actually handled behind the scenes in the IT world. I once wrote some software for a bank, did a 3 hour audit of the code with 5 of their top developers, after which they installed a pre-compiled earlier test version on their prod system. smh
😂
I once fixed a globally crashing iOS app by hacking the backend to send out technically incorrect data. The app passed all tests because the test suites didn't include any data to reveal division-by-zero bugs.
This was especially bad since the time to get Apple to review and deploy an updated version could take a week or more, IIRC.
After conversing with the dev responsible, I asked him how he habdled fractional numbers, and he was sure that fractional numbers were always displayed as integers, so I changed the API to send instances of 0 as 0.001, effectively circumventing the bug while displaying calculated numbers (and 0s) correctly in the app.
I think it's the most hacky fix I've ever deployed. It felt terrible and exhilarating and awesome all at the same time 😂 I'm actually a little proud 😇
@@DanielSmedegaardBuus That's awful, I love it.
this is my new favorite channel. explaining everything clearly, and being humorous with small jokes here and there
And explosions, lots of explosions.
Same. And the upside-down cloudflare logo on Australia killed me. 😂
"Which you may notice is not linear."
This is one of those comp. sci. campfire horror story jumpscares.
Right? I just saw 22, 33, 44+1 and immediately thought "oh no". 😆
They cannot be DDOSed, but they can DDOS themselves...
Someone wrote a paper ages ago about backtracking vs non-backtracking regex engines and the state of software slowness...
The title is "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)" written by a Russ Cox in 2007. I bet he's feeling vindicated
That article, and several others, should be mandatory reading for anyone using regular expressions.
3:50 Putting cloudlfare upside down above Australia was a hilarious touch!
I work as a sysadmin, yet I wish I had this much insight into how all the technologies I use daily work this deeply. I never finished my college, so I only ever heard of DFA, but despite that, your video explained it very well, and showed how much of an issue a simple regex can be, when executed thousands a times a second.
Please make more videos, I cannot wait for more
It is interesting to note that any regex can be represented as a nondeterministic finite automata (NFA) and any NFA can be converted into a DFA using a simple algorithm. The only downside is that the DFA may end up with exponentially more states than the NFA which can take up a lot of memory.
Good thing you enjoyed it, i have not so fun memories doing DFAs and NFAs by hand on college. :(
@@Nayayombro same. I hated it. But understanding them is pretty good. We can see how google's devs used
automata and formal grammar theory to develop a useful practical application with regex.
@pav431 even without some formal education, I'd say anyone working as sysadmin should know something about algorithms and complexity theory. Especially when writing code that systems used by others depend on. And shell scripts _are_ code.
Knowing that there are regular expressions, and highly irregular expressions that are just _called_ something like "Perl compatible regular expressions" or "extended regexp" or whatever, is important. So is not writing script that unnecessarily nest three or more loops, and work fine on small testdata, but take "forever" with realistic sizes of data. Know and understand the various o-notations. Just because quicksort is usually quick, doesn't prevent it from having O(n²) worst case complexity. You may be fine with that, but you will want to know why you can live with it. There has to be a metric ton of good books on this, so it's possible to learn. Enjoy!
@@lhpl As someone who's inherited, and then had to completely rewrite from scratch, core automation scripts for clusters that were written by novice sysadmins, I concur that learning these things is important.
Some aysadmins learn that awk and grep exist and that's the end of their training. Rewriting from scratch saved me hours upon hours it would have cost me to try to maintain the poorly made code inherited from novice sysadmins.
I dont blame you for doing Cloudflare again, their RCAs are always excellent. This is such an excellent channel, you deserve far more subs! These are exceedingly entertaining and interesting for software engineers (and probably most other folks too!)
Absolutely crazy videos you're pumping up. Love your comedic editing style too!
Every video of yours makes me feel like to the entire internet could break at any moment lol
I kept wondering why they didn't just do a rollback to fix the issue, thanks for addressing that at the end.
Yeah, as much as I love Cloudflare for smaller stuff there's a reason a lot of large enterprises use Akamai. A little overpriced for a simple growth phase startup and not as transparent as Cloudflare when something breaks on their end, but that massive bucket list of features available with Ion Premier, Cloudlets, and many more, especially Datastreams and their web security analytics portal, is an absolute lifesaver. Hell, it helps us debug all sorts of broken stuff upstream of it too, although I wouldn't be surprised if Cloudflare offered something like Akamai Reference IDs for easy, enterprise-friendly tracing. Specifically, Akamai is really particular about having identical Staging and Production sections with really fast rollback when production error rates increase even a little.
All those rules are stored locally to each node, and you cannot rollback to a machine that is dead or so high on CPU that can't even handle a connection. I presume they globally disabled WAF and restarted the nodes, so when up, they didn't try to apply the WAF rules and were free to be rolled back/forward. Then they re-enabled WAF (very slowly, I presume :) ) and all was back to normal.
It's kinda funny that the Internet was designed to be a `web` that hopefully would prevent failures of a single node taking down the whole system, but nowadays we heavily rely on a handful of service providers just to run the Internet.
Regex is great like shell scripts: works everywhere and does it jobs... up until a certain script size when the chance of bugs starts increasing and you should think of using another tool instead or in conjunction.
Also this sounds like GitOps to the extreme: when you can only change your state via your repo and all the triggers that come with it you might as well replace your CD with a single bash script (see above).
out of the 10,000 times that this topic has been covered on RUclips in this exact amount of detail,
this is so far the most recent.
kudos!
I had a regex blow up on me like that once. Not **quite** as silly as .*(?:.*=.*), but pretty close. The regex library we were using implemented backtracking with recursion, so instead of eating CPUs like a bag of chips it would instead masticate for a while before eventually running out of stack, whereupon it would puke Pringles. This was an especially fun one to fix because if you google “regex stack overflow” you’ll find that there are zillions of questions on stackoverflow about regexes that have nothing to do with stack overflows.
And yes, in shame I must admit the regex in question did not fit on my screen all at once. In my defense, however, that was because a year or so earlier I had torn the line noise apart and put 3 to 5 characters of actual regex on each line, followed by a comment. Only two lines had // I have no idea, this shouldn’t do anything, but it doesn’t work without it.
lmaoo man i feel you
Come to the dark side. We indent our regexes.
@@tehlaser jesus christ man
The reason we predominantly use NFA regular expression engines is not just because they're usually faster if we don't throw non-degenerative expressions at them, but also because they support expressions that exceed the capabilities of a regular grammar, such as back references to a specific capture group that has been seen previously.
I was under the impression that they're generally slower
@@MH_VOIDfor a normal case the performance is generally similar, but the difference is that these linear engines like RE2 are more predictable and less likely to blow up in your face.
If you don't have control of the pattern and the input, they are *much* safer, and losing features that depend on backtracking is generally not a big deal.
If it's really performance critical just don't use regex at all if you can avoid it.
NFAs and DFAs are computationally equivalent and recognise only exactly the regular languages. So a NFA-backed RE engine would have to implement additional functionality as languages with backreferences are not regular.
I've learned in the meantime that the biggest speed advantage is actually due to unrelated technologies such as a JIT compiler in PCRE2, which is in fact a top-down parser that happens to accept regex-like expressions. The only thing that is definitely faster about NFA is compiling regular expressions.
@@D0SampCertain types of things that regex engines/matchers support aren't true regex, and can't be covered by a DFA/NFA. That's a bit BS to me.
Ken Thompson is crying really hard. His work has been around for decades. As a CS guy who has specialized in algorithms, this hurts in the middle of the heart.
When in doubt, implement your expression in a delayed loop so it doesn't murder everything.
Even without being into coding I understand how this could work and it puzzles me why they didn't do this lol.
Awesome and informative video! A small correction: NFA matching is still linear in the input string. You just have to store the configuration as a set of NFA states, rather than a single state. You don't get exponentially many paths in the way you describe in the video because paths ending at the same state are merged in this set representation.
> Delete master after the pull request is merged
LMAOO
I like the part of explaining non-capturing groups and then throwing them out the window immediately after
Except it's wrong. You do non-capturing for performance reasons, to consume characters in some group (this is simply parentheses syntax reuse). In this particular case this was so obviously wrong I can't imagine anyone familiar not to spot this, but in general you shouldn't capture what's not required after the match is done.
The process seems awfully familiar...but how......
0:36 oh australia
The entire world runs on regular expressions that were written in rage.
Your videos are amazing, keep going you are bound to blow up
I stumbled across your videos yesterday, and i find them really entertaining and interesting to watch! Thank you explaining these topics in a clear way that even if I know nothing about regex or cloudfare, i can still follow along and understand the video :)
This is an incredible video. You took very complex and difficult to understand concepts and simplified it well. Well done.
Love these videos Kevin! Your amazing storytelling, editing, animations, and everything else comes together in an amazing way! Love watching every video you put out, keep it up :)
2:37 I love that he checks the "delete master branch after merging" box
love the style keep it up man
Well, from the thumbnail image, the regexp (.*=.*) says "find the LARGEST chunk of text possible before a literal = sign, then find the largest chunk after it, including other = signs if they exist", and it will walk the entire chunk of data many times to ensure it gets ALL of them.
They probably meant to do (.*?=.*?), which would have found the SMALLEST chunks of text around literal = signs, and would stop as soon as it found even a single = sign.
I love the graphics, depicting real processes very well but hilariously funny at the same time!
The Lemmino music really rounds this video off
I'm starting to hope another outage somewhere on the internet occurs just to see a Kevin Fang video with hundreds of explosion effects in my recommendations yet again
Non-capturing groups (unlike lookahead and lookbehind) do get included in the match result (think $0), they just don't create an additional sub-match.
Eg at 6:35, that would match $0: $1: $2: while removing the ?: would make it $0: $1: $2: $3:
The animation and comedic aspect of this video is great. Plus its explained extremely well. Nice
paging kevin fang, your services are needed once again
I'm a simple man. The upside down Cloudflare over Australia made me laugh. Thanks.
I mean they could have just done .*?=.* but I guess RE2 is safer long-term. Still this screams "I don't understand regex, it's just magic to me" on the part of that developer.
which is fair honestly, regex is basically just magic and once you understand the syntax you dont question it's ways.
though im surprised nowhere else along the development process was anybody concerned over it. Apparently nobody that looked over it had any idea what it was doing.
these videos are so well produced, the jokes in the imagery are so on point.
A rollback being a special case of a rollforward had me kekking heartily
the upside down Australian Cloudflare server was a nice touch
Dude your illustrations are so good and funny! :D
@11:50 You don't need a DFA to guarantee time linear in the input length because you don't actually need to follow every path, you just need to keep track of the set of states that the NFA is in. This does make it more expensive proportional to the size of the NFA, but it's still linear in the input length.
While technically all NFA can be converted into DFA, the algorithm to do so (subset construction algorithm) has an exponential worst case time runtime. This is probably why people try to approximate the DFA.
I appreciate that the Australian Cloudflare is upside down at 4:25
Things exploding in your videos is the entire reason I wake up every morning, thank you friend, it's freakin hilarious
Another nice dev store, with interesting storytelling, really enjoyable to watch. Thanks, we are waiting for more : )
You just explained regex which i literally couldn't get the grasp of in literally 30s
I can't believe I've only just run across your content - it's really well done and humorous, you're going places!
You are a gem,
thanks for the detailed information.
You didn't only explain complex information, you also explained how WAF companies work.
Thanks again.
Salam!!
I understood more with this video than in half a semester of languages and automatons
Didn't expect to hear about theoretical computer science (which is a subject I take this year) in this video but nice work. It's nice to see actual real-world usage of converting e-NFA to DFA's. I wish our prof would have included this video in his lecture...
The line at the end could be revised; it isn't that "convoluted" regex should be avoided, in fact writing regex that appears more complicated often ends up being better in that it's more specific. Really bad regex happens when people want to write something quick and dirty that will consume all valid cases, but without considering constraints, which was part of the issue here. I know this is probably what you meant by saying convoluted, but one thing missing here is a bit talking about what you should consider writing instead, and people unfamiliar with good vs bad regex might not come away with the right idea.
another based regex enjoyer
Cloudflare's free tier has the most value of any other free tier on the internet. They give you access to almost everything that the big companies have access to, just with certain limitations like max rule count. Amazing company
God I love watching these vids, I love the duality of high quality, digestible, information coupled with a nice sprinkling of "don't be a dipsh*t" commentary over issues and causation. Developer wise, nothing brings me more joy about my job than someone pointing out how much of an imbecile I *could* have been on that one day.
It MAY break the internet… approved 😊
I'm currently working on my thesis where I'm also dealing with regular expressions and their internal NFA representation quite a bit. And I recently encountered some papers about these risks of the naive backtracking implementation most engines use. Very interesting to now see one of these problems occur in practice. This happening in practice also gives quite some validity to the approach Rust is taking, which ensures good asymptotic characteristics.
Only thing I want to point out is that you saying "the increase in steps can potentially be exponential" is fairly misleading. Since it makes it sound like this particular case has an exponential asymptotic runtime, while it only has a quadratic one.
Define "Most". All unix utilities use DFA's . The problem are interpreted languages, they use PCRE so they can use the ~= operator dynamically, as the recursive regex doesn't need compilation and the implementation is easier. (even bash uses PCRE for the =~ operator)
Using compilation caching, for example, all of them could use the normal regex library, but I guess it increases complexity.
@@framegrace1 That's fair. Most regex implementations I've encountered 🙂
But I'm a windows user plus I mostly use higher level interpreted languages, so adds up with what you're saying.
The DFA version combined with capture groups just gets much more complex, and I don't know of any implementation that supports lookarounds with this either yet.
I love your videos on the internet blowing up. Perfect blend of programming, memes, and good graphics
11:00 Wow, you've explained the usefulness of using DFA way better than my professor! Now it all makes sense!
While I appreciate you inverting the logo for Cloudflare Australia, our monitors are already upside down, you're good bro 👌
I still have questions as to why there are so many non-capturing groups in that regex and isn't the second ".*" before the "=" redundant? Could not this regex have been simplified?
Edit: Also considering parameter names are typically shorter than their values, even if you had to do this I would assume (.*?=.*) would be more efficient on average.
The following regex would return the exact same match count: =
If the purpose of that regex was to consume an entire line containing "=" (another thing it does, badly) then we could do something similar to what ya wrote: (?:.*?=.*)
Though, I'm unsure if the non-capturing group is necessary, so maybe the version provided is fine.
If we run the original through a regex tester? It doesn't matter how long the string is, it plain fails in every way. There is no use case for that regex except DoSing thyself.
I love your videos and goofy animations please never stop doing these
It's a shame that even in 2023, most programmers still don't know how to make correct atomic regex, or even heard of the idea of applying atomicity to regex. That regex has more issues than just .*(?:.*=.*) but, yes, that's the most problematic part.
To be fair regex is not the first or main thing people learn.
Second, if I were them, I'd just make a simple working regex.
The explanation of the NFA and DFA was really good, would’ve been nice to see this before my Theory of Computing class
It does sweep some disadvantages under the table though. First, the DFA can be exponential in size, which means what you gain in execution time, you already lost in building time. Second, not all things which are possible with an NFA can be done with a DFA, or they can only be done at a huge cost. In particular, back references. For instance, matching a sequence of a's, then a b, then a sequence of a's of the same length as the first sequence of a's. As a regex, you write that as /(a*)b\1/. Try turning that into a DFA.
This is a great video. You made it very easy to follow. Took me back to my computational models class.
Wow great explanation of the matching
process, I already did not like regex but
this is next level... especially because one could easily write UNDERSTANDABLE code that does the job, but runs in O(n)...
The regex that ChatGPT creates:
9:24 I love you how explain how NFAs work on a basic level, wish I had this for my required Theory of Computing class in my undergrad lmao
Hahah, where did you get the animation @4:30m from? Looks really nice.
If the input is arbitrary and especially comes from untrusted source, regex is almost never the answer, unless you have already limited the input size and complexity. This kind of logic should've called for a deterministic parser instead. While it would possibly have taken longer to write, it would have been more readable. On top of that, I would also consider using property-based testing framework to test with arbitrary input.
Good learning experience, and great that Cloudflare shared detailed explanation of their mistakes.
You cannot (practically) use a deterministic parser for WAF, but you can compile the requests as said in the video which is what they should have done instead of using PCRE (I bet they used PCRE because it doesn't need compilation and "makes things easier")
Almost the exact same bug took down stack overflow once. A catastrophically bad backtracking regex choked on a weirdly written piece of text.
12:19 I like how made by google is both a pro and a con 💀
Checking the code with property based testing, like with hypothesis, should find easily and relatively fast, that the regexp doesn't work because it timeouts for relatively small strings. Of course, at the time of the desaster, this wasn't wide spread, and you also need to be aware to set the timeout to a relatively short time (because the problem here is that you have to handle many requests per seconds, so a usual timeout in some seconds might keep the problem undedected at worse).
Dude your graphics are so funny 💀 I just about died when you edited the flow chart for WAF deployments
This channel concept is brilliant. Thank you so much.
"Delete `master` after the pull request is merged" I'm dead thank you so much for this lmao
not me wildly trying to recall every regex i've written directly into prod and wondering if i've ever had a dirt eating moment😳
I feel comforted by the fact that even CloudFlare employees can be utterly destroyed by regex issues as well. They're just like me fr fr
The original incarnation of regular expressions was very efficient and fast. But then people started adding features to it - features people liked, so they've stuck around. But - they open the door to VERY inefficient expressions. It doesn't take a lot of study to understand why some of these things break your performance.
it's wild that I had learned and used regex for years before I actually took a course on automata in uni
I have implemented multiple regex solutions that I never really understood. They seem to be working until they don't. 😬
This is why I pretty much always avoid * and + in regex without the question mark. Adding the question mark makes it very efficient. Take this regex for example /.*?end/
This matches any string ending in "end", but it will terminate the first time it sees "end". So it will only match the first 4 characters of "0endend". As soon as it is able to match the full regex once, it stops backtracking and trying to eat more characters.
If we were to use the greedy version instead without the question mark though... /.*end/
It will match the full string "0endendendendend" because the ".*" tries to consume as many characters as possible, resulting in tons of unecessary backtracking, and it is almost never my intent to consume as many characters as possible. If I want to check for the beginning or end of a line, I can just use ^ and $ respectively.
I do a fuckton of regexing, but I rarely make them lazy, and just leave it as the default eager unless I actually need to in order to get the correct matches (which is pretty rare IME). Does it really matter most of the time though if you make it lazy? Surely not, because if it did, regex by default would be lazy and the question mark would make it eager, no?
@@MH_VOID No idea why it's eager by default. I just know that the question mark improves performance because it doesn't have to search as much of the string. I can't say exactly how much it improves performance, or how often it's relevant, but is basically guaranteed to be faster some of the time because it literally performs fewer operations. I imagine this is more relevant when the repeated group is very vague and will match lots of things, like /.*/. On the other hand, something like /o*/ would stop as soon as it found something that wasn't an "o".
But here's an example of where the greedy version is really bad. Let's go with this regex again. /.*end/
And lets run it on this hypothetical string,
"hello end, [pretend there are a fuckton of characters here], etc, etc, bender"
So instead of just matching "hello end", it has to go through this insanely long string, and ends up matching everything up to "bend" because "end" is in there. And frankly, that probably isn't even what you wanted. Even worse though, it would still have to go through this enormous string even if it didn't end with the word "bender". Even if your actual match is some tiny thing at the beginning of the string, the greedy version forces the engine to go through the ENTIRE string. It's just silly and 99% of the time isn't what you want.
This is why I pretty much always use the question mark unless I absolutely need it to be greedy for some reason. It's more likely to be efficient, and less likely to result in confusing bugs.
@@iamstickfigure question mark helps if normal path of your program is match. but what if it's non-match in 90% cases? regex with question mark will perform same amount of work
in my practice I often worked with regex matching entire string /^...something...$/
question marks will not help much with them either
I know that i'm late for the party, and that maybe it was already been written, but i need to point it out (because i'm studying it right now) that at the end it's not (AFIK) completely correct to call it NFA. Technically because every transition consume an item it's a Petri net. Aside from that good job, and beautiful content!
Reminds me of when I first started at SAP and my soul job was to hunt down expensive sql statements in clients db's.. Loved being able to reduce server load and response times while learning client systems...
Tldr
They added new code to stop a new type of attack.
In a hurry, they skipped testing.
The new code was so badly broken it was mistaken for an attack. It was later rolled back. New internal systems should reduce the risk of the same happening again.
...been trying to grok coding how i do geometry. After years of intermediate searching, you helped me see some of said "theory"...thank you!