Regex is HARD!

Поделиться
HTML-код
  • Опубликовано: 4 янв 2025

Комментарии • 58

  • @ArjanCodes
    @ArjanCodes  10 месяцев назад

    💡 Get my FREE 7-step guide to help you consistently design great software: arjancodes.com/designguide.

  • @ToddVanyo
    @ToddVanyo 11 месяцев назад +2

    Always remember the adage: if you think regex is the solution to your problem, you now have 2 problems.

  • @twelvethis3979
    @twelvethis3979 11 месяцев назад +25

    Very nice video, thank you, Arjan. I was just wondering: What specifically is bad about REGEX_1, and why are REGEX_2 and REGEX_3 better?

    • @PanduPoluan
      @PanduPoluan 10 месяцев назад +1

      Unclassed greedy operators.

  • @randomdude2540
    @randomdude2540 11 месяцев назад +2

    Love the new format! Thanks for the content.

    • @ArjanCodes
      @ArjanCodes  11 месяцев назад +1

      I'm glad you're enjoying the new content! :)

  • @todd.mitchell
    @todd.mitchell 11 месяцев назад +6

    Thanks! More on optimizing regex please.

  • @obinnaokonkwo2465
    @obinnaokonkwo2465 11 месяцев назад +2

    PYtips.
    Great job Arjan.

  • @maleldil1
    @maleldil1 11 месяцев назад +1

    My recommendation from experience is to have _a tonne_ of tests for your regex, especially if it's very important for your application like email checking is. In Python with pytest, you can use parametrised tests to load valid and invalid emails from text files and just check that the output is correct. As time goes by and you find some false positives and negatives, you can add them to your test data to ensure you've fixed the bug.

    • @edgeeffect
      @edgeeffect Месяц назад

      Absolutely! Treat regex as if it's a malicious enemy.

  • @cheweh842
    @cheweh842 11 месяцев назад +6

    Valid email addresses aren't really possible to match with regular expressions, anyway. At least, not all possible addresses as allowed by the RFC. For that reason I don't know what regex I should use for email addresses, if anything at all.

    • @Draggeta
      @Draggeta 11 месяцев назад

      From what I've heard, it is best just to check for only one @ sight with something before and something after. Now I wonder how to check for only one @ character in a string....

    • @MichaelONeillIrish
      @MichaelONeillIrish 11 месяцев назад +1

      What I've had the most success with is doing a trivial check on the input for optional form validation, and then actually trying to send an email to the address.
      For the pattern, checking for non-whitespace characters, then an @, followed by more non-whitespace characters, then a period, then more non-whitespace characters is generally sufficient. A false positive match isn't really all that harmful, and you shouldn't get any false negatives, so it tends to ensure users have put in something vaguely, potentially correct before submitting the form. Or, like @Dragetta said, just check for an @ symbol in the string and be done with it.
      Then you try sending an email, and if it's successfully delivered, it's valid. If it fails to deliver, you can either stop there, or look into more robust retry logic e.g. using a pending registrations table in the DB that you try to verify several times before removing to avoid cluttering up your user table.

    • @Draggeta
      @Draggeta 11 месяцев назад +2

      @@MichaelONeillIrish i like the check to send an email.
      However, domains don't need to have a period and spaces are allowed in the email address. That is what makes validating email addresses such a pain.

  • @Tesfamichael.G
    @Tesfamichael.G 11 месяцев назад +1

    Tip of the week

  • @Djellowman
    @Djellowman 11 месяцев назад

    Goed video format! Kort & informatief.

  • @hcubill
    @hcubill 11 месяцев назад

    Very very interesting! Sparked many thoughts ❤

  • @silkogelman
    @silkogelman 11 месяцев назад +1

    Tuesday tips? Did you mean to say Code Snippets by ArjanCodes? 😁

  • @uwegenosdude
    @uwegenosdude 11 месяцев назад +1

    Great video again. Thanks a lot. Is there a way in Python to limit the execution time of a regex to prevent such a scenario like a ReDoS attack?

    • @edgeeffect
      @edgeeffect Месяц назад

      Write A LOT of tests!

  • @ikari3k
    @ikari3k 11 месяцев назад

    In terms of regex readability isn't adding comments to your regex using re.VERBOSE and rstring just a standard to be used? Do you find it helpful when coding complex matches?

  • @dragonfly-7
    @dragonfly-7 11 месяцев назад

    One the naming question for the new series: Stay with "tuesday tips". Reason ? The 1st thought is the best one most of the time.

  • @johnabrossimow
    @johnabrossimow 11 месяцев назад

    Writing Regex is kind of like writing raw sql, why does sql have abstraction libraries but regex doesn't?

  • @cfk-oz
    @cfk-oz 11 месяцев назад

    Maybe do a video on Panel or Panel vs Dash

  • @MartinPHellwig
    @MartinPHellwig 11 месяцев назад

    I'd say it should be "Arjan in shorts", I trust you can come up with your thumbnails 😊

  • @lxathu
    @lxathu 11 месяцев назад +1

    And be careful when you create your own parsing algorithm in order not to use a regular expression because routines can be hard to read, can be sub-optimal and can contain an eternal loop. Or even a lot of them.

  • @joaopedrorocha5693
    @joaopedrorocha5693 11 месяцев назад

    Why not "Arjan Tips"? Nice, simple and reminds the channel name kk

  • @slawek6302
    @slawek6302 11 месяцев назад

    Codjan Tips or Code-jan Tips

  • @OradWasTaken
    @OradWasTaken 11 месяцев назад +3

    arjan's ardvice

  • @d3stinYwOw
    @d3stinYwOw 11 месяцев назад

    Since we are in the realm of web, maybe some review on usage of HTMX for python folks out there? Would be great to see that here! :)

  • @wilk85
    @wilk85 11 месяцев назад

    Maybe everyday tips?

  • @copperfield42
    @copperfield42 11 месяцев назад

    how about calling it:
    Arjan's Tips
    weekly tips
    developers tips

  • @rrwoodyt
    @rrwoodyt 11 месяцев назад +2

    Cue the XKCD about regular expressions….

    • @edgeeffect
      @edgeeffect Месяц назад

      As long as it's "Perl problems" or "regex golf" and not "save the day"

  • @dynamicgrad3820
    @dynamicgrad3820 11 месяцев назад

    Automata Theory (DFA) helps to write better regex expressions ;)

    • @maleldil1
      @maleldil1 11 месяцев назад

      Only very simple regexes can be directly converted to/from DFAs. As soon as you get into stuff like backtracking and non-greedy matching, the conversion becomes convoluted. I believe it's rarely worth it.

  • @Casimistico
    @Casimistico 11 месяцев назад +2

    Can’t catch which regex is evil just like my regexs cant catch valid strings

  • @anonapache
    @anonapache 11 месяцев назад +3

    If you use such a long regex, you probably shouldn't use one.
    Also, to catch all possible triggers for an "infinite" loop, use a timeout.

    • @QwDragon
      @QwDragon 11 месяцев назад

      Length is not a problem. Regex can't be infinite, but it can be exponential, even a short one.

    • @anonapache
      @anonapache 11 месяцев назад +2

      @@QwDragon Length might not be a technical problem, but for sure a human one.

  • @juanjoseexpositogonzalez1126
    @juanjoseexpositogonzalez1126 11 месяцев назад +3

    Easy Python Pills (to swallow) for the name?

  • @QwDragon
    @QwDragon 11 месяцев назад

    It's obvious that the first is bad bacause of constraction before @ sign in form of ([smth]?[other]*)*

  • @masterofIich
    @masterofIich 11 месяцев назад

    I dont think Regex1 will catch all email adress

  • @ramb0lxmb
    @ramb0lxmb 11 месяцев назад +3

    regex is the perfect ai use case. something fairly easy for a computer to translate from plain written language, but looks crazy to a human.

    • @maleldil1
      @maleldil1 11 месяцев назад +5

      There's a significant chance that the AI will hallucinate the regex, and it's really hard for the human to catch that.

    • @edgeeffect
      @edgeeffect Месяц назад

      ... if the AI could be trusted to get it right... which it can't.

  • @Ruskialt
    @Ruskialt 10 месяцев назад

    Regex are so tedious they need be verified with unit tests.

  • @edgeeffect
    @edgeeffect Месяц назад

    A "clever" regex is a guaranteed way to show-off what a 10X ninja rockstar developer you are... and your team will "thank" you for it for many many years after you write it.

  • @EvenTheDogAgrees
    @EvenTheDogAgrees 11 месяцев назад

    Regexes are OK, but I consider them write-only code except in the most trivial cases. They're easy to write, but hard to read.

  • @diegol_116
    @diegol_116 11 месяцев назад

    Then the double videos per week started... Great!

  • @PanduPoluan
    @PanduPoluan 10 месяцев назад

    Greedy operators are nearly always bad, except when they're at the very end.

  • @ashleymcgee3536
    @ashleymcgee3536 11 месяцев назад

    Oooh yeah. We’ve reDos’d ourselves before.

  • @RajveerSingh-vf7pr
    @RajveerSingh-vf7pr 11 месяцев назад

    DickVanDyne Tips