it took me 1 month to fix this compiler bug

Поделиться
HTML-код
  • Опубликовано: 25 авг 2024

Комментарии • 310

  • @leddoo
    @leddoo  Год назад +58

    IF you're about to comment that you think the result should be `4`, please consider:
    `a + { a += 1; a }` in different languages:
    rust: 3, specified left to right.
    js: 3, specified left to right.
    python: 3, specified left to right.
    java: 3, specified left to right.
    c#: 3, specified left to right.
    scala: 3, specified through method call semantics.
    dart: 3, specified through method call semantics.
    swift: 3, left to right according to some forum post 😔
    bash: 3, unclear.
    lua: 3 with globals/upvals, 4 with locals, unspecified.
    c/c++: 3 or 4 depending on compiler, unspecified.
    go: 4 on 1.20, unspecified.

    • @ehsnils
      @ehsnils Год назад +8

      I get it that it should be 3, but I also see that the given code that failed isn't really how I see that clean, conclusive and maintainable code should be written so I'd prefer to write the code in more steps to ensure that there aren't corner cases like this occurring.
      If this compiler got it wrong and it's corrected now there will be another compiler or interpreter that will fail this in the future.

    • @musaran2
      @musaran2 Год назад +23

      You should have explained early WHY you considered this a bug.
      Languages must either define order of evaluation of expressions, or forbid ambiguous constructs. Else there is no "expected result".
      C/C++ in particular *defines* it to be implementation-dependent. This is to leave opportunities for compiler optimisation. Not the same as "oops, dun goofed".

    • @cmilkau
      @cmilkau Год назад

      This is why using the same symbol for values and references is a bad idea.

    • @boelwerkr
      @boelwerkr Год назад

      To mention Perl5 again. Perl can also do inline:
      ~perl -le '$x=1; print $x + do{ $x+=1; $x }'~ ==> 4 (i didn't use "$a" because it's a special predefined variable)
      It is a know and wanted behavior.

    • @ITR
      @ITR Год назад

      Third opinion: don't allow people to write statements like this.

  • @scialomy
    @scialomy Год назад +127

    I'd say 4 is a perfectly valid result as long as it is consistent and documented. This is in fact what I expected when I read the thumbnail. Ideally a+(something modifying a) should be an ill-defined expression resulting in a compilation error.

    • @Lantalia
      @Lantalia Год назад +3

      Depends on the sequence points and order of operations. left to right should be 3, right to left should probably be 4. But in any case, if you have this bug, your IL is ill defined

    • @ethanyalejaffe5234
      @ethanyalejaffe5234 Год назад +5

      I agree. You can do a similar thing to the example in python, except with lists. At least on my python interpreter, the following code prints['X','X','X','X'] (notice the 4 X's!). You need to know ahead of time how python handles ints, because the code with lists is almost identical and gives a totally different answer.
      L = ['X']
      def bar():
      global L
      L += ['X']
      return L
      print(L + bar())

    • @mattshnoop
      @mattshnoop Год назад +6

      I agree, too. We covered basically this exact example in my university's course on programming language design: what *should* happen, intuitively, when you have `a + something that mutates a`? The professor asked the class, and we were split pretty much down the middle. It basically just comes down to how the language designers decided to do things; as long as the language follows its own (hopefully documented) semantics, I don't see any foul play.

    • @scialomy
      @scialomy Год назад

      @@mattshnoop Yes. I always tell junior developers: try and reduce to a minimum the wtf/line of your codebase. This should be applied down to language design.

    • @dungusberryrocks
      @dungusberryrocks Год назад

      It's impossible to determine if a is being mutated inside of more complex expressions that can't be a heuristic

  • @stevencowmeat
    @stevencowmeat Год назад +109

    I was reverse engineering a scripting language a game was using. I found a file that was called like scripttest or something and it had a large amount of really weird statements like the one shown in the video. I thought it was odd but now I know why.

    • @wutaf0ck
      @wutaf0ck Год назад +1

      Care to share which game it was? I'm curious

    • @stevencowmeat
      @stevencowmeat Год назад +5

      @@wutaf0ck Apex Legends

    • @ITR
      @ITR Год назад +2

      can you share the rest of the strange expressions?

  • @stanleydodds9
    @stanleydodds9 Год назад +26

    I think a much more difficult question to answer (for people who just think it should be 4) is the case that you apply two different operations to the variable in two different blocks, which don't commute with each other.
    For example: set a = 3. Then evaluate {a += 1; a} + {a *= 2; a}
    If you evaluate left to right, you get 4 + 8 = 12
    If you evaluate right to left, you get 7 + 6 = 13
    If you evaluate both blocks, and then use the value in a for addition, it still depends whether you go left to right or right to left:
    left to right: 8 + 8 = 16
    right to left: 7 + 7 = 14
    So for everyone who says "it should be 4" just be aware that if you want a well defined result, it still matters whether you evaluate left to right, or right to left, and you have to choose one of them.

    • @christopheriman4921
      @christopheriman4921 9 месяцев назад

      I evaluated from left to right and am fully aware that it would be dependent on order of evaluation in this context since you are mutating the original value and as such changing how the result is computed. I guess there is also the possibility that this could just be run as you see it where it does both sides of the operation at once but that would be annoying because you would have data races and have largely unpredictable results for the same operation across multiple runs.

  • @neoplumes
    @neoplumes Год назад +7

    Mutability strikes again 😅

  • @sinom
    @sinom Год назад +13

    Honestly when just going through it in my head it equating to 4 was what I got, but it depends on the exact specifications you want your language to have I guess

  • @leddoo
    @leddoo  Год назад +75

    hey 👋
    quite a few people have commented that they think `4` should indeed be the answer or that it should be “unspecified”.
    lua apparently doesn’t define an evaluation order, so this is technically not a bug.
    but what i didn’t show is that, in lua, you only get `4`, if `a` is a local variable. if it is a global variable instead or a captured local (upvalue), the result is also `3`.
    the technical reason for that is that, in a binary expression, lua first generates code for the LHS, then the RHS. globals and upvalues require a kind of load instruction, which is emitted before the function call, so you get the “old” value of `a`, `1`.
    but lua doesn’t generate code for locals, it just “uses the register, where the value already is”.
    again, technically that’s fine, because lua doesn’t define what it should be, so the implementation can do what’s convenient.
    but i find it very strange to put undefined behavior into a managed language that really doesn’t need it. especially if, like in my case, the language aims to be beginner friendly.
    i’ve recently discovered another, much simpler technique to solve this issue, and i’ll soon post an article about it, link will be here. the post will also be announced on the discord.

    • @mihiguy
      @mihiguy Год назад +5

      C also does not specify evaluation order within an expression (so even the same expression in multiple places of the code can have different results, or when compiled for different architectures). Linden Lab's LSL version 1 was also a language where the + operator had weird (but specified) execution order. "list = (list = []) + list + [e]; " was, in fact, the least memory consuming way of appending an element to a list, due to the fact that arguments passed to the "+" function were copied before the list concatenation was executed, and the nested assignment would free one copy of the list before the concatenated copy was created. As each script had a memory limit of a few kilobytes only, these kinds of hacks were needed back then.

    • @dercooney
      @dercooney Год назад +6

      i'd argue that, while this is undefined behavior, it's also sinful. this is the sort of corner case i wouldn't want in anything remotely production, even if we should have a defined and consistent behavior

    • @vanlepthien6768
      @vanlepthien6768 Год назад +1

      @@mihiguy The testing manager where I once worked had a code example where 9 different C compilers (multiple architectures) generated 8 different results, using simple integer variables and operations..

    • @goldenalt3166
      @goldenalt3166 Год назад +2

      If you want to be beginner friendly, you should warn the user not to do this at all.

    • @0LoneTech
      @0LoneTech Год назад +1

      Why is it beginner friendly to have the same name refer to multiple different values within one expression, then?

  • @xxbomelxx874
    @xxbomelxx874 Год назад +107

    Is it just me, or is this guy just hypnotic to watch? 😅

    • @leddoo
      @leddoo  Год назад +24

      😵‍ i'm just gonna assume that's a good thing :P

    • @CroneKorkN
      @CroneKorkN Год назад +37

      It is. Your videos have a great flow. They are structured, have the right amount of information density. Your voice is calm, but still emphazises the content.

    • @leddoo
      @leddoo  Год назад +15

      @@CroneKorkN damn, thanks man! :D

    • @mapron1
      @mapron1 Год назад

      For me he just annoying to watch.

  • @samwise1491
    @samwise1491 Год назад +22

    I have struggled to understand phi in my own llvm targeting compiler for ages and you explained it in a way that I understood in minutes!

    • @leddoo
      @leddoo  Год назад +4

      that's awesome to hear!
      i wasn't sure if i should include that part, as the video was already so long.
      but i guess it payed off :D

    • @samwise1491
      @samwise1491 Год назад +3

      @@leddoo Nah it was super interesting, thanks for the high quality compiler content, good luck with your language!

    • @leddoo
      @leddoo  Год назад +3

      @@samwise1491 awesome, thanks man!

    • @lhpl
      @lhpl Год назад +1

      I too think this was a great explanation of SSA. But I'm still not sure I understand phi. Isn't it just a matter of requiring that (strikeout)empty or implicit(/strikeout) _all_ branches of conditionals _that do not modify a variable modified in some other branches_ should all copy a[n-1] to a[n]? Maybe I should just watch the video again.

    • @lhpl
      @lhpl Год назад

      Ah, I think I may get it now.
      a1 := 0
      if foo → a2 := 1 fi
      phi: a3 := if foo → a2 | a1 fi
      pr(a3)
      But there's nothing wrong in _combining_ the two (or more) if's, is there? So it would just become:
      a1 := 0
      if foo → a2 := 1 ; a3 := a2
      | a3 := a1 fi
      pr(a3)
      Is this right? Or are there traps that cause this to not work always? If it's right, I believe it is simpler to understand than a "magical" phi function, which I think is usually even written as just a3 := phi(a2,a1), glossing over the fact that it depends very much on the condition foo also.

  • @Ratstail91
    @Ratstail91 Год назад +148

    My compiler says it should equal 4... and I agree, honestly.

    • @eps-nx8zg
      @eps-nx8zg Год назад +28

      in languages like c++ it's undefined behavior because the evaluation order is not determined by the spec. So the response you get is rng based on what compiler you use.

    • @ExediceWhyNot
      @ExediceWhyNot Год назад +7

      ​@@eps-nx8zg Doesn't the evaluation order of arguments work when there are multiple arguments?
      a + ... is technically a one single argument, the problem is actually in the evaluation order of the + operator. I see how it's undefined behavior though. Mathematically, if you change the order you add numbers in, the sum shouldn't change. And in this example it does depend on whether the function will be called first. Writing stuff like this isn't and shouldn't be defined.
      I agree that it should equal 4 as well, though.

    • @leddoo
      @leddoo  Год назад +18

      @@ExediceWhyNot `+`, the function, is still commutative.
      like you say, the question is in which order the operands are evaluated.
      you can either choose something simple like left to right, and then you'll end up with 3.
      or you use a more complicated rule, like "compound expressions first", and then you'll get 4.
      but if you then introduce a function call to the identity function, suddenly you get 3 again: `id(a) + { a += 1; a }`
      "variables last" is actually a more complicated rule.
      _the thing to note is that variables also need to be evaluated_ - why should the order of evaluation depend on whether the operands are variables or other kinds of expressions?
      i agree that initially, 4 seems like the more reasonable result, but once you think about the implementation & the consequences, you (or i at least), realize that the result should actually be 3.
      (assuming we don't want to introduce UB into an otherwise safe & well defined language)
      edit: by "think[ing] about the implementation" i mean, making the rules more concrete & realizing that "variables last" actually requires more logic and is therefore the more complicated rule.

    • @aprildenham5376
      @aprildenham5376 Год назад +3

      I would argue that "evaluate blocks" is the simpler logical method, and therefore evaluate the block expression first (and treat the addition as if its operands were swapped as a result).
      In that situation, we get 4 as well. (In this case, to force 3 as an answer, the identity function needs to inline itself from a->{a}, at which point the fallback is left-to-right eager evaluation.)

    • @leddoo
      @leddoo  Год назад +3

      @@aprildenham5376
      here's the logic for regular LTR evaluation:
      def eval_binop(op):
      l = eval_expr(op.left)
      r = eval_expr(op.right)
      return op.op(l, r)
      and here's the logic for blocks first:
      def eval_binop(op):
      if is_block(op.r):
      l = eval_expr(op.right)
      r = eval_expr(op.left)
      else:
      l = eval_expr(op.left)
      r = eval_expr(op.right)
      return op.op(l, r)
      would you say the second one is simpler?

  • @driden1987
    @driden1987 Год назад +27

    Wow this is super good, well job! I don't really have much experience writing compilers, only wrote some compiler code for university and it's tough and a lot of weird bugs arise.

  • @draakisback
    @draakisback Год назад +9

    Interesting video. I've implemented many compilers in my day and I've run into similar issues. This is why for the most part when I create a compiler these days I usually try to make it so that the variables are immutable by default. When you assume there's no mutability, you can get rid of a lot of these issues and of course there are instances where users will want to have mutable variables but in those cases you can have very specific logic in the IR / AST piece of the compiler.

    • @PanduPoluan
      @PanduPoluan Год назад

      Heh I don't have any experience writing compilers ... but before leddoo mentioned SSA, a thought occurred to me: Probably make the registers immutable, will that help?

    • @draakisback
      @draakisback Год назад

      @@PanduPoluan hmm, that's an interesting thought but it would probably lead to a lot of spilling (Data being allocated to memory instead of registers). You only have so many registers and if all of them are immutable then you are definitely going to run out of them in situations where your program uses many variables. There are some register allocating algorithms that use a concept called coalescing where you limit the amount of moves between registers. So basically the compiler would figure out what pieces of data can live in a single register as opposed to be pushed from one register to another or to RAM. It's a very complicated subject matter though, register allocation and I would be lying if I said I knew everything about it. They're very well could be an algorithm that tries to keep the registers as immutable as possible.

  • @foxquirk
    @foxquirk Год назад +2

    At every point in this video I was thinking: The solution to this is **.
    And every time that was quickly followed by you saying: The solution to this is **.
    Very satisfying.

  • @toyuyn
    @toyuyn Год назад +18

    My intuition from primary school maths was to do whatever is in the brackets first, so 4 was correct for me as well
    Ultimately my solution is stop writing cursed code like that

  • @TechSY730
    @TechSY730 Год назад +10

    I guess which is more intuitive depends on whether one thinks of sub-blocks should be like parenthesis with regard to order of operations (so they are evaluated earlier in a statement), or just "normal" left to right wise.
    There really isn't a right answer everyone will agree with (which makes it suspect whether arbitrary code blocks as sub-expressions is even a good design decision at all).
    In dynamic languages at least. In static, compiled languages, it think most would agree "3" is the right answer so long as '=' is reassignment.

    • @haphazard1342
      @haphazard1342 Год назад +1

      Allowing arbitrary sub-blocks in expressions is the root of the problem, but only as long as they do not have their own scope. If you have globals, then it's only weird if you don't enforce LTR evaluation.

  • @BRLN1
    @BRLN1 Год назад +16

    funny ... while watching I was thinking: is this even a bug?
    Who says the way python or lua handle this piece of code is the right way to do it?
    Who says any of them is wrong?
    Even though python and lua offer different results for "what appears to be the same algorithm". It really comes down to how you intend and therefore specify your language and order of term resolving. I have implemented your example in c and got 4.

    • @leddoo
      @leddoo  Год назад +7

      it is I, who says that anything other than left-to-right evaluation is "wrong" :P
      someone else made a similar point, perhaps i should have said that it's my opinion :D

    • @davidc.890
      @davidc.890 Год назад +1

      You could try a different compiler and see if you get the same results. AFAIK the evaluation order of function arguments in C is unspecified and differs between gcc and clang. I don't know if this also applies to arithmetic expressions.

    • @etopowertwon
      @etopowertwon Год назад +2

      @@davidc.890 gcc for the same code returns 4, though I used its extension to be close to original code:
      return a + ({ a += 1; a; });
      (gcc supports statement exprs like this and nested function that can modify outer variables)
      You can of course go usual "a + (a += 1), but it's not close to original clode.
      In O3 it compiles to `lea eax, 2[rdi+rdi]` (rdi+rdi+2, i.e. "double original argument increased by one").
      Rust's borrow checker would whip you if you tried to do that. Such contrived example actually demonstrates why having read and write references at the same time confusing.

    • @Yotanido
      @Yotanido Год назад

      @@etopowertwon I used a global and a function call to try this. With clang, I get 3. With GCC I get 4.
      I tried multiple different versions of each and the result was consistent.
      I also tried it in rust, using a block expression. I got 3.

  • @nulano
    @nulano Год назад +5

    My Lua compiler does in fact print 3, since I took a lot of inspiration from Python's compiler while implementing it, but I do recall looking into this issue. Since Lua does not define the evaluation order, I decided that copying Lua's implementation exactly in this case was not needed for my compiler.

  • @alexthomas7754
    @alexthomas7754 Год назад +6

    I have never code my own compiler, and wouldn't even know where to start, but you break this down super well! new sub

  • @monsieuralexandergulbu3678
    @monsieuralexandergulbu3678 Год назад +3

    In my compiler I used the same approach, but I didn't know it existed. I thought i was a genius, but I guess there's nothing new under the sun 😂

    • @leddoo
      @leddoo  Год назад +1

      i mean, coming up with ssa from scratch is pretty impressive :D

  • @L1Q
    @L1Q Год назад +3

    my initial judgement for what the result should be was more of a pointer/register based operation. you reference the same value during addition, so it's not a huge surprise the value is different after the increment. now that you showed the stack based approach python uses, I can't definitely tell if one is "more correct" than the other

    • @leddoo
      @leddoo  Год назад

      well, like i described in the pinned comment, in lua, you only get this pointer like behavior for locals, not for globals.
      the pointer like semantics would also imply that introducing a call to the id function would change the program semantics, cause you presumably would expect calls to be evaluated left to right. (`id(a) + { a += 1; a }`) (not saying you should write that ofc)

  • @mbgdemon
    @mbgdemon Год назад +10

    I don't understand why either 3 or 4 is right or wrong. It looks to me like this should be undefined behavior. Either is arguably correct, the situation is ambiguous. PEMDAS is unambiguous, but it doesn't apply here since there's a change of state.

    • @freyja5800
      @freyja5800 Год назад +4

      it is not necessarily undefined behaviour, it depends on how the operator is defined. e.g. if it is defined as always evaluating the left side first, 3 is correct, if always the right side first, 4 is correct.
      If it is undefined, neither is correct (UB)
      (fun fact: it is UB in C, and gcc 13.1.1 gives 4 while clang 15.0.7 gives 3, but clang warns about it even at default warn level)

    • @leddoo
      @leddoo  Год назад +3

      you see, i'm rust pilled 🌚
      i think it's kinda wild to leave such things undefined. "you don't specify in which order side effects take place, in a language *about* side effects (an imperative language), really?"
      but maybe that's just me :D

    • @freyja5800
      @freyja5800 Год назад

      @@leddoo oh I agree that of the 3 solutions, ub is the weirdest one.
      but for what I do I am basically bound to c, so you kinda have to learn to live with it
      (though I definitely wish I could use rust for a number of things)

    • @allesklarklaus147
      @allesklarklaus147 Год назад

      @@freyja5800 So how did you test this in C? It appears to be only a compiler extension and gcc even warns about it being illegal in ISO C if you use any -stdcxx flag.

    • @freyja5800
      @freyja5800 Год назад

      @@allesklarklaus147 I used a preincrement, since that is the equivalent operation to the other code pieces. to be precise, the code was
      ```c
      int main(int argc, char *argv[]) {
      int a = 1;
      printf("%d
      ", a + (++a));
      }
      ```

  • @arisweedler4703
    @arisweedler4703 8 месяцев назад

    Great video. Love the comparisons of various stages of IR at the end.
    Since this video has come out, godbolt has added the ability to do this!

  • @downstream0114
    @downstream0114 Год назад +1

    "Lua doesn't get this right either" - Shocked Pikachu

  • @Jan_Jan_
    @Jan_Jan_ 2 месяца назад

    Super clear explanation and approach to solving! Great work 👍

  • @Pixel__Man
    @Pixel__Man Год назад +21

    Really cool video, learning about technical details without diving deep down into the code helps a lot to understand the logic ! Nicd job :)

  • @ForsakenDAemon
    @ForsakenDAemon Год назад +1

    Given how long it took me to figure out why 4 could be a valid result, I’ve clearly been staring at closures far too long

    • @leddoo
      @leddoo  Год назад

      hehe same, it just didn't occur to me, that 4 could be valid, or that someone would support the decision to leave this undefined.

  •  Год назад +14

    Python: 2+2=5, for extremely large values of 2
    Lua: 2+2=6, for extremely large values of 2

  • @nathankamenchu1239
    @nathankamenchu1239 Год назад +6

    Wonderful video. Well explained and I love the progress

  • @bigjango6888
    @bigjango6888 Год назад

    I would love a video of you explaining more about SSA and register allocation. I really enjoy your compiler and optimization videos.

  • @Xevion
    @Xevion Год назад +4

    Just me who evaluated this expression depth first? I did the expression on the right, got a = 2, then inlined a in a + a (2 + 2) and got 4. I don't know if I would call this a 'bug' in my mind.

    • @leddoo
      @leddoo  Год назад

      if you evaluate it depth first (in-order), you evaluate the LHS (`a`) first, before the block that increments `a`, so you get `1 + ...`
      :^)

    • @Xevion
      @Xevion Год назад +1

      @@leddoo Okay, well, RTL + Depth First and technically, it's not a bug then. I mostly said 'depth first' in a LITERAL sense, not a programmatic sense. I.e. evaluating only if there are no other inner statements to evaluate first.

    • @OMGclueless
      @OMGclueless Год назад +1

      @@leddoo If your mental model is itself of registers rather than of values + bindings, then the LHS (`a`) means "the register containing `a`" and the RHS means "the register returned by `{ a += 1; a }`" (which may or may not be the same depending on whether blocks always evaluate their return arguments), and the operator `+` is defined as operating on two registers rather than on two values.
      To a modern computer scientist who has has been steeped in the notions of lambda calculus with its bright lines between values and name bindings it might seem natural that the expressions `a` and `{ a }` must each be an evaluation, and that the operator `+` must operate on values, but it's not the only option. In fact either, both, or neither may involve evaluating `a`. This was a hotly-debated topic early on in computing when it was not clear whether Fortran (Call-by-Reference by default) or C (Call-by-Value by default) would win. It turns out C "won" and by such a margin that any other option even seems a bit alien to a well-trained computer scientist or engineer. But from first principles they are both reasonable models of computation.

  • @LiraNuna
    @LiraNuna Год назад

    As a past compiler engineer, I couldn't stop screaming "SSA" for almost the entire video

  • @spaghettiking653
    @spaghettiking653 Год назад +1

    Brilliant, I wrote a basic compiler but I never got round to this heavy lifting. Thanks for this knowledge, when I give it my second try I'll be sure to steal these ideas :')

  • @revengerwizard
    @revengerwizard Год назад +5

    Hi, interesting video as always! Just wanted add that Lua’s parser/compiler doesn’t actually use any AST for bytecode generation (one pass).

    • @leddoo
      @leddoo  Год назад +3

      oh interesting! i saw something like `expr` in the code generator & assumed it was some kind of ast. looks like that was `expdec`, which is created by the parser while parsing expressions, before calling into the code generator.
      thanks for pointing that out 👌

  • @maksymiliank5135
    @maksymiliank5135 Год назад +2

    If you pass the operands by reference then it should be equal to 4. If you pass them by copy (which you should do with ints), then it depends on the evaluation order of the arguments. When I first saw the thumbnail I thought that the result should be equal to 4, because it matches both of the cases if you pass arguments by reference, and if you first evaluate the block expressions, and then the variables. This type of code is very similar to stuff like (++i++ + i) in other languages. First of all you shouldn't write this type of code, and second of all this probably should be a warning or even a compile time error.

  • @anon_y_mousse
    @anon_y_mousse Год назад +1

    I would argue that the braces work equivalently to parentheses in grouping operations, thus what's inside the braces should be evaluated first as they would for a parenthetical operation. So for my language, which braces would actually be a syntax error in this situation as blocks don't evaluate to a scalar, 4 would be correct, if you used parentheses and a comma to delimit the parts of the expression anyway. As for tuples, I went back and forth on whether I should use parentheses or braces, but ultimately I decided on a tick syntax, so it's always obvious what's a composite object and what's not.

  • @DavidsKanal
    @DavidsKanal Год назад +1

    Damn this video is very focused and to the point. Was very interesting and I learned a lot! Keep up the good compiler AND video work!

  • @dynpallomah5918
    @dynpallomah5918 Год назад +10

    I think the Lua code makes sense than the Python one imo. The call operator f(args) should be parsed before the addition operator a+b. Maybe it's just me.

    • @sagitswag1785
      @sagitswag1785 Год назад

      this is 100% more initiative in my eyes as well

    • @DFPercush
      @DFPercush Год назад +1

      I guess the thinking is that the terms of a mathematical expression, of similar precedence, should be evaluated in left to right order.

  • @xlerb2286
    @xlerb2286 Год назад +1

    Good video, and a good approach. And so compilers go from small and simple to big and ... not simple ;) I'm writing a little specialized scripting language at work right now and I'm taking an approach I've never tried before. It wouldn't be appropriate for a real language but it's been working well for me. I do miss working on real compilers though.

  • @itay1232
    @itay1232 Год назад +18

    I don't like how you called your compiler's old behaviour - and especially lua's - 'wrong'. The compiler is the one to dictate what is the correct behaviour of any program. I'm no expert on the matter, but I'd imagine the lua compiler/interpreter follows lua's specification in terms of evaluation order and copies. In C++, for example, this code would be UB, because the order of evaluation of most built in operators is unspecified.

    • @leddoo
      @leddoo  Год назад +7

      yeah, that's fair.
      though i personally find it "wrong" to leave such basic parts of the language unspecified.
      why even support such a feature, if users can't rely on it? (and why not *at least* warn them, if they use them in unreliable ways?)
      i can understand that it made sense for C. similar to how `short` and `long` made sense for C.
      but i'm very much against new languages leaving such things unspecified. that's just asking for confused users.
      perhaps i should have mentioned that that's my opinion, not fact.

    • @nbas09
      @nbas09 Год назад +5

      You make a fair point. The reason the order of evaluation is unspecified in C/C++ is to enable some sorts of optimizations by re-ordering arguments evaluation.
      But @leddoo still has a strong point as some compilers will simply forbid situations where a variable could be written to when it could be read from (e.g. Rust) in the same expression (in C lingo between two sequence points). And for the long term that's better since the optimizer could still re-order evaluations for optimizations purposes given the language already forces the programmer to write code that's not sensitive to such re-ordering.
      But in Lua's case for this particular example, I will also submit that this behavior violates POLA (principle of least astonishment) so as a programmer, I would be unhappy when that happens. BUT Lua does leave the order of evaluation undefined so it is not a bug!
      Who doesn't love having footguns? ;-)
      Edit: removed erroneous paragraph that was from not reading the original comment well.

    • @leddoo
      @leddoo  Год назад +6

      @@nbas09 i mostly agree with you here. though i'm having a hard time with the optimization argument.
      i think it's fair to say that optimizations must preserve program semantics (except for stuff like fastmath; though that merely changes program semantics, optimizations must still preserve these modified semantics).
      now the issue is, the order of side effects, in general, affects program semantics. that is why we are using an imperative language in the first place. if we didn't want to use time/sequencing to express our programs, we'd be using declarative languages.
      therefore, leaving the order of side effects undefined (in certain cases) seems like a fundamental inconsistency in the language design.
      again, i think it made sense for C. because with a strict evaluation order, the compiler has to do more work, to prove that two expressions are in fact order independent.
      but modern compilers frequently run much more expensive program analyses than mere "expression level reordering". it's my understanding that gcc and clang simply choose some evaluation order, like right-to-left, and then leave the actual reordering optimizations up to the backend (which does function-wide reordering).
      so yeah, i think leaving the ordering of side-effect undefined in a modern imperative language is a design flaw.
      (btw: lua only "prints 4" when using local variables. this does not happen with globals or upvalues. quite the footgun indeed!)

    • @diketarogg
      @diketarogg Год назад +1

      No way. I was so sure that in c++ the result would be 4. My reasoning was that int& a is summed with int returned from a function. This is so unexpected. I tested, gcc produce 4 under linux for both a + func() and func() + a (which makes a lot of sense for me, that it should be defined behaviour).

    • @itay1232
      @itay1232 Год назад

      @@diketarogg What exact expression are you referring to here?

  • @YotaXP
    @YotaXP Год назад +3

    I love getting to look inside compilers like this.
    Makes me wonder, though. What would be the "correct" result if the operands were swapped?
    `print({a += 1; a} + a)` Should this still be 3?
    On another note, if both operands were replaced with impure function calls too complex to be inlined, would the behavior here still be definable?

    • @leddoo
      @leddoo  Год назад +1

      languages like rust and mine simply define evaluation order to be "left to right". so you'd get 4, if you swapped the operands.
      optimizations like inlining are done by the backend & have to obey program semantics.

  • @ErikBongers
    @ErikBongers Год назад +2

    When implementing blocks for my language, I had to decide what a block inherits. I opted to NOT inherit variables. Watching this video, I now realize I really dodged a bullet there!
    Of course this means that a block is only useful as a function body where values are passed as by-val parameters.
    ...come to think of it: my blocks can't be used in expressions, so this probably isn't an issue for me anyway.
    But still, I now know what to stay clear of.

    • @leddoo
      @leddoo  Год назад

      interesting, so those blocks are only useful for their side effects? 🤔

    • @ErikBongers
      @ErikBongers Год назад +1

      To be honest, they are just function bodies. Input is parameters, output a return value. Although...my language has settings, like the date format (dmy, mdy) and which predefined functions (trig,...) and constants (PI, c, G,...) are visible. A stand-alone block could have it's own settings without disturbing the parent block's settings. Thus a block without side effects would have some use, but admittedly, limited.
      I have to clarify that my language is more like math lab. Every statement is stand-alone and the result of each statement is immediately output, so the concept of side effects is different.

  • @irina_divine
    @irina_divine Год назад +1

    1:16
    This is not a bug, for lua function calls are first priority.
    so a + f() is actually f() + a
    to fix this and make it the way you want
    local a = 1
    function f ()
    a = a + 1
    return a
    end
    print( (a * 1) + f() )
    now it will do a * 1 first, then call the function in the interpreter.

    • @leddoo
      @leddoo  Год назад

      not true, you can read the pinned comment to find out why.

  • @xelspeth
    @xelspeth Год назад +1

    I think the solution should be 2. That's just how I defined it.
    Jokes aside I really like the video apart from you insisting on it being "correct" or "incorrect" because as many have stated, 4 is a perfectly valid result of this. Compiled code also doesn't have to make sense as long as it's either properly documented or explicitly stated to not be specified. and you can absolutely make up how it works (hence why while joking when I said 2, if you could argue why it does that and have it consistently do so, it would be a perfectly reasonable response)

    • @leddoo
      @leddoo  Год назад

      hehe yeah, calling it "correct/incorrect" wasn't "correct" in retrospect :D

  • @herrkatzegaming
    @herrkatzegaming Год назад +1

    Interestingly, the modified Lua 5.1 that CraftOS-PC uses outputs 3 in this example

  • @gameofpj3286
    @gameofpj3286 Год назад +7

    I personally find the "buggy way" actually more intuitive, but it's interesting to see how you changed it to suit your needs :D Nice work!

    • @sagitswag1785
      @sagitswag1785 Год назад +6

      I also feel like 4 should be correct. Feels like functions calls in an expression should evaluate first, hence because "a" is global, it gets incremented for both operands of the addition.

    • @leddoo
      @leddoo  Год назад +2

      it’s really interesting how many people have said they’d find the buggy way more intuitive :D
      what would you expect if the LHS was also a function call that returned the value of `a` instead? 🤔

    • @sagitswag1785
      @sagitswag1785 Год назад +2

      @@leddoo imo in that case the answer should be 3, since the function would return the value of a, and not a itself. If the function returns a pointer to a, and then you dereference it for the addition, then the answer would be 4.

    • @sockymanthesock32
      @sockymanthesock32 Год назад

      ​@@sagitswag1785 What if the LHS function modifies "a" as well? Would "{ a *= 3; a } + { a += 1; a }" evaluate to 7 or 8?

    • @sagitswag1785
      @sagitswag1785 Год назад +2

      @@sockymanthesock32 I would evaluate it as 7 for that example, and 8 for the example where the LHS returns a pointer to a. My logic is you first evaluate from left to right, and if either of the operands contain a reference to a, rather than a value, then the references get evaluated after the functions get resolved.

  • @Starwort
    @Starwort Год назад +2

    3:12 this explanation is slightly misleading to watch, only because the numbers in the bytecode happen to be the same as the results (they are actually name IDs)

  • @user-zd9wd
    @user-zd9wd Год назад +1

    I don’t know if the added complexity was worth it

  • @ABaumstumpf
    @ABaumstumpf Год назад

    Just depends on what the goal of that specific language is and how the behaviour is defined.
    There is an argument for multiple different results - including compiletime or runtime errors on such constructs.
    C for example opted for not giving such expression a definition - your code would be expressly undefined behaviour and as such not valid C(++) and should be avoided.
    Why did they do that? Performance. For the program as well as the compiler. Explicitly defining the behaviour means that the rules need to be more complex or restrictive, the compiler needs to be more complex, and the runtime might be worse. For a language that was designed to have as little overhead as possible that is a no go.
    Another approach would be to make this an expressly illegal construct as the meaning is ambiguous.
    Or to have a strict scoping and ordering so that blocks are evaluated first, or it is evaluated left to right, or based on some other presidency.
    Personally i am more for making it ill-defined or implementation defiened - with required diagnostic messaging:
    Any assumptions upon what others assume about the language are most likely nit correct.
    I know people that are adamant that C++ is evaluating a simple "a+b+c+d" left to right cause addition is defined as left-associative - but associativeness says nothing about evaluation order.

  • @JannyJan-ik5rr
    @JannyJan-ik5rr 2 месяца назад

    Amazing video, thank you for sharing! :D

  • @isotoxal
    @isotoxal Год назад +4

    Another infomatory and interesting video!!!

  • @AlexAegisOfficial
    @AlexAegisOfficial Год назад +4

    My human intuition says the answer should be 4. The + operator can't commence until both operands are reduced to a number. So that's why the block should happen first, before evaluating 'a' to a number. Incrementing a to 2, then doing 'a + a', which is now 2 + 2.
    But now what I've written that out, why should a blocks evaluation come earlier than a variables.. I could argue for the result '3' with the exact same argument...
    I think whats important to keep the mathematical properties of +, like commutativity. { a += 1 } + a should be the same as a + { a+= 1 }, this property can be kept as long as block and variable resolving is ordered, and not done strictly from left to right. A decision has to be made which has to come first, the block or the variable, regardless of order.
    Since in math we deal with parentheses first, I'd say do the block first, and thus the answer should be 4.

    • @anon_y_mousse
      @anon_y_mousse Год назад

      This is why I may start reading the comments before commenting because that was basically the same conclusion I came to.

    • @qwertyqwerty-jp8pr
      @qwertyqwerty-jp8pr Год назад

      The problem then becomes if both sides are blocks how would you order then (if that's allowed)

    • @SurmenianSoldier
      @SurmenianSoldier Год назад

      @@qwertyqwerty-jp8pr left to right

    • @qwertyqwerty-jp8pr
      @qwertyqwerty-jp8pr Год назад

      @@SurmenianSoldier we all know left to right work but please explain how this improve it bruh
      It is very counterintuitive and a mess if you combine left to right with op solution so God please do not suggest non sense

    • @SurmenianSoldier
      @SurmenianSoldier Год назад

      @@qwertyqwerty-jp8pr how else would you order it then?

  • @QwDragon
    @QwDragon Год назад +1

    What if variable is assigned multiple things in the loop? How to allocate a register per loop iteration?

    • @OMGclueless
      @OMGclueless Год назад +3

      The variable inside the loop gets a version number, and it is assigned with a phi function that uses the version in the outer block if entering the loop for the first time, and uses itself if the loop body repeats.

  • @itachi2011100
    @itachi2011100 Год назад

    Lol man just rediscovered LLVM

  • @sceniagaming3886
    @sceniagaming3886 Год назад +1

    The real issue here (and the reason there's even a disagreement about what should be the result) is globals. There's a reason people don't like globals, and it's somewhat complicated, but boils down to unintended side effects. If a core feature of your language operates on globals without making it explicit, it might not actually be beginner friendly, so rather than forcing this expression to behave in a certain way, I'd argue the better approach would be to rethink how it should work in the first place.
    If your block expressions were capturing the outer variables as inner copies and never modifying outer variables (so, proper scoping), the result would be 3 and a would be 1 after your statement has been evaluated because the a that was incremented is a local copy scoped to the block. This fixes not just what you perceive as a bug (which is really arguably a design decision) and makes the question of evaluation order entirely obsolete in the best way, but also teaches your target audience about scopes. If they really want to use globals, make it hard and explicit. Globals should be an advanced feature that you only unlock once you've really understood a language.

    • @leddoo
      @leddoo  Год назад

      > which is really arguably a design decision
      yes though in my mind, strict left to right evaluation is the only sane choice. but yeah technically that's a design decision :D
      RE blocks copying values.
      interesting, someone else suggested that too. i think such a feature can be useful (or at least a feature to restrict captures).
      but i don't think it would be a good idea to have it enabled by default. because for consistency, this would also have to apply to blocks in conditionals and loops. and if that's the case, you can now no longer just type `if foo { a += 1 }` and all variables modified in a loop would have to be captured explicitly. again, i think it can be useful to impose such restrictions, but as the default, in an imperative language, they don't make sense.
      i'm also not sure what you mean by "proper scoping", because to me, accessing and modifying variables from outer scopes is entirely valid and something that makes imperative languages more flexible than functional ones.

    • @leddoo
      @leddoo  Год назад

      "global" variables are different than accessing an outer scope, because they outlive the function, and thus introduce persistent state. (you could argue that variables in outer scopes outlive the inner scopes. but if your function is too large to reason about the nested scoping, just split it up a bit.)

    • @sceniagaming3886
      @sceniagaming3886 Год назад

      If I understand your blocks correctly, they're basically shorthand notation for anonymous functions, so the a in the video would outlive this anonymous function, making it a global from that function's perspective. I realize "global" might not be the best word for it, but basically, your anonymous function is allowed to modify a variable outside of its own scope, which leads to behavior that is technically undefined until you decide more or less arbitrarily what it should do.
      Strict left to right evaluation in this case means holding a copy of a value's state. Essentially, "a + f()" is copying a into a temporary value, then evaluating f (which modifies the original a). If you instead consider variables equivalent to pointers, modifying a in f() should in fact affect the a to which you're adding your result.
      Intuitively, a variable is more likely to be understood like a pointer because it's giving a name to something different, but it's not inherently becoming a copy of that thing. An analogy of your program would be something like this:
      "You have 2kg of sugar. Baking a cake reduces your sugar supply by 1kg and results in a 3kg cake. Print the weight of your sugar supply + the result of baking one cake."
      Note that in this analogy, f() decrements a instead of incrementing it. If you present this problem to people, I'd argue they will give the answer as "4 kg" (the new sugar supply plus the cake) more often than "5kg" (the old sugar supply plus the cake) because the mere fact that baking a cake affects your sugar supply will also cause them to think the result should reflect the change despite the sugar supply coming first in the calculation.

  • @Kualinar
    @Kualinar Год назад

    That's a nice case a side effect.

  • @billionai4871
    @billionai4871 Год назад +1

    I'm only halfway through, but this bug doesn't sound as out there as you make it seem (unless I misunderstood something). even though the code a + {a += 1; a} will hopefully never be used, I could totally see someone using some function call like equal (A[i], B[i++]) if they were iterating and checking if pairs of elements were equal. After all, in our human reading, we can understand that the user wants the i-th element of A and B, then incrementing i, so I can see some people deciding to write that.
    Maybe I've spent too long teaching new programmers? maybe, but that doesn't change the fact that they'd do that.

    • @leddoo
      @leddoo  Год назад

      the bug mattered to me, cause macros & generated code can be somewhat hard to reason about.
      i don’t want to have to think, is it safe to insert the user specified code here, or might that cause UB, cause i haven’t defined an evaluation order (or implemented it properly).

    • @anon_y_mousse
      @anon_y_mousse Год назад +1

      @@leddoo That would make an interesting addendum to this subject. However, that's also why macros generally have to copy their arguments when they do any complex processing, such as the perniciously simple looking max(a,b) macro. Generally they advise that it should be a function, but I'd argue for delayed evaluation of macro arguments as I do for my language.

  • @marius2k8
    @marius2k8 Год назад

    I know it says you're German, but I can't get over how much you look like my Polish (Bilnoski/Chovanec) relatives. It's uncanny.

  • @arjix8738
    @arjix8738 Год назад +1

    Lua has anonymous functions, which are not that different from blocks with a return value.

  • @nightfox6738
    @nightfox6738 Год назад +1

    Interesting thing I came across. This seems to exist in gcc as well, but it could just be because gcc is prioritizing parentheses (as it should by order of operations) and the only way to actually make this work in c is with parentheses, however this hypothesis proves to be false when simple (a++) wrapped in parentheses results in 3. It seems a += 1 is not treated identically to a++ as expected.
    int a = 1;
    printf( "%d
    ", a + (a += 1, a) ); // 4
    a = 1;
    printf( "%d
    ", a + (++a, a) ); // 4
    a = 1;
    printf( "%d
    ", a + (a++, a) ); // 4
    a = 1;
    printf( "%d
    ", a + ++a ); // 4
    a = 1;
    printf( "%d
    ", a + a++ ); // 3
    a = 1;
    printf( "%d
    ", a + (++a) ); // 4
    a = 1;
    printf( "%d
    ", a + (a++) ); // 3
    a = 1;
    printf( "%d
    ", a + (a+=1) ); // 4
    It seems the compound assignment operator is closer to the pre-increment operator than the post-increment, which honestly for me is somewhat counter intuitive because I would expect x++ and x+=1 to do the same thing, just as I would expect (x++)++ to be equal to x+=2 which is where you'd start running into some less obscure problems.

    • @anon_y_mousse
      @anon_y_mousse Год назад +1

      The answer for why C works like that is because of precedence, in part, but also because the comma operator creates a sequence point which forces the increment to occur fully before the value of the parenthetical expression is evaluated. It's essentially a separate statement at that point as opposed to continuing a single one.

    • @nightfox6738
      @nightfox6738 Год назад +1

      @@anon_y_mousse yes but (x+=1) is still evaluated before the assignment as opposed to (x++)

    • @anon_y_mousse
      @anon_y_mousse Год назад +1

      @@nightfox6738 Indeed, because of precedence, += has a higher precedence than post-increment.

    • @nightfox6738
      @nightfox6738 Год назад +1

      @@anon_y_mousse It's not that it has a higher precedence than post-increment, it's that intuitively I expect it to have a lower precedence than the assignment operator, but instead it behaves more like pre-increment

    • @anon_y_mousse
      @anon_y_mousse Год назад

      @@nightfox6738 Again, yes, because of operator precedence, = has one of the lowest precedence of all the operators.

  • @chrissalch693
    @chrissalch693 Год назад

    What is the priority order for the execution of expressions here? Mathematically, this seems like undefined behavior because the evaluation order of things being added is not usually meaningful. Here the result is completely dependent on the order of execution. The problem comes form a being modified on the expression rather than just returned. The behavior you're seeing is only a bug because this code is trying to match the behavior of another system/ specification that dictates the execution order. Otherwise, there is no inherently correct way to do this.

  • @Rumplestiltzchen
    @Rumplestiltzchen Год назад

    seeing line number 5729 gives me anxiety

  • @oubracode
    @oubracode Год назад +4

    Nice video, i can understand why lang dev is very annoying and hard, sometimes i need to rewrite my whole code for some simple bug again and again, and end up doing nothing, i want to get an advice of how i can get my motivation and continue working on my language?

    • @leddoo
      @leddoo  Год назад +1

      hey! i can't really know what will keep you motivated, but i can give you some "general tips":
      building any kind of large project is hard and takes lots of time. you'll pretty much always need multiple iterations until it works and until it is "not horrible".
      for long term motivation, perhaps remind yourself of why you started the project in the first place, what your mission is.

    • @oubracode
      @oubracode Год назад +2

      @@leddoo Thank you so much for this reply, i'll keep it in mind :)

  • @NoNameAtAll2
    @NoNameAtAll2 Год назад +1

    it's an old joke that only sysadmins know value of i++ + i++

  • @noredine
    @noredine Год назад +1

    4 seemed normal to me

  • @8ightfold
    @8ightfold Год назад

    IMO you should evaluate the stuff in braces before everything else, it's just the way most people assume it'll work. Despite my disagreement I still enjoyed the video :) keep it up!

  • @Richard_Knusprig
    @Richard_Knusprig Год назад +1

    This is one of the videos ever made!

  • @taurasandaras4699
    @taurasandaras4699 Год назад

    12:43 Never knew notepad could get so bright in light mode

  • @ExCyberino
    @ExCyberino Год назад +1

    I'd like to see an SSA video from you.

  • @williamdrum9899
    @williamdrum9899 Год назад

    Laughs in assembly

  • @colly6022
    @colly6022 Год назад

    i would have assumed it was 4, since, at least intuitively to me, blocks / scopes / parenthesis should all take prescience in the order of operations.
    a + { a += 1; a }
    should be evaluated in this order:
    A = { a = 1 + 1; a } = 2
    B = a = 2
    result = A + B = 2 + 2 = 4

    • @leddoo
      @leddoo  Год назад +1

      yup, many people have said that :D
      this intuition comes from math, where we do parentheses first and variables last.
      in math, we can do that, because the order of evaluation doesn't matter, and keeping variables around longer can lead to simpler expressions.
      but in imperative programming languages, the order of evaluation does matter, because there are side effects.
      and if you think about it, the plus operator just has a left operand and a right operand, which both need to be evaluated before the addition can be performed. the simplest way to evaluate them would be to first evaluate the left and then the right operand (alternatively you could do the right one first, but left to right corresponds more closely to how the code is read - top to bottom, left to right). any other evaluation order, like blocks first, actually requires more complicated logic - inspecting what kinds of expressions the operands are, and depending on that, deciding which to evaluate first. this makes the language spec more complicated, and that makes it harder to reason about the language.
      imperative programming languages are not math, and i think i need to make a follow up video about that :D

  • @Manuel3451000
    @Manuel3451000 Год назад +1

    Looking at the LUA code, it should indeed return 4, you're declaring the variable "a" in the same scope as the function "f", meaning any operation performed on "a" inside "f" aren't constrained inside it. You're incrementing "a" to 2, then returning it. Change the variable "a" you're using inside "f" to be local, so it doesn't affect the "a" outside and your problem disappears. This is intended behaviour and not a compiler bug. This doesn't need to be fixed and in fact, some code might rely on this behaviour.

    • @OMGclueless
      @OMGclueless Год назад

      If Lua used left-to-right evaluation order for its operators, then in the expression "a + f()" first "a" would evaluate to 1, and only then would "f()" evaluate to 2. It shouldn't matter that "a" is incremented, because it should have already been evaluated.

    • @leddoo
      @leddoo  Год назад

      as you may have read in the pinned comment, making `a` an upvalue or global makes the program return 3.

    • @Manuel3451000
      @Manuel3451000 Год назад +1

      @@leddoo making "a" an upvalue just declares a new local variable with the same name inside "f", precisely one of the fixes I suggested to get your expected result. As for making "a" global, I think it might be due to compiler optimization because at the print call, "a" has been a constant and the compiler can just preprocess it to just directly its value

  • @SethPentolope
    @SethPentolope Год назад +1

    If you insist on having a sequence point at each operand that defines the order of evaluation (which you are) you are limiting the optimization potential.
    What might be more beginner friendly is to have the compiler find situations where order of evaluation matters and then print a warning (or an error) that tells the user that they have created confusing code that may not do what they intended.

    • @leddoo
      @leddoo  Год назад

      doesn’t seem to be enough optimization potential for rust 🤷🏼‍♀️

    • @SethPentolope
      @SethPentolope Год назад +1

      @@leddoo 🧐Interesting… I’m aware that the amount of additional optimization is small. It can matter more for unusual processor architectures and large/unusual expressions that call externally linked functions (so the compiler doesn’t know if the function has side effects).
      int *a;
      b = *a + foo() + *a; // example in C. If there is no sequence point for +, there are less pointer dereferences needed.

    • @leddoo
      @leddoo  Год назад

      @@SethPentolope in your example, i assume you as the programmer know whether `foo` can modify `*a`, so you could factor out `*a` yourself. if it actually mattered, which it probably wouldn't. except, like you said, on unusual processors. but are those edge cases really worth the UB for everyone else?
      i learned the hard way that `++` isn't evaluated sequentially in c++. i had something like `color = Color(*p++, *p++, *p++)`
      which looks completely reasonable, if you don't know about that weird exception.

    • @anon_y_mousse
      @anon_y_mousse Год назад

      @@leddoo An interesting if horrific example. I'm curious how you would handle such a case in your language? I would prefer to just do something like Color( *p, p[1], p[2] ); p += 3; or use some array slicing with decomposition which would both look cleaner and be easier to understand at a glance.

    • @leddoo
      @leddoo  Год назад

      @@anon_y_mousse with left to right evaluation, it just works (™)
      it's not as horrific of an example as you might think.
      in the code for the "do less" video, i'm parsing the input like this:
      [...]
      let geode_robot = (next(), next());
      where `next` is a closure that finds the next regex match in the input & returns it.
      i don't think this could be written in a "cleaner" way. but in c++, you'd have to introduce temporary variables, cause the evaluation order is unspecified.

  • @holz_name
    @holz_name Год назад +1

    Interesting. Java avoids all of this by stipulating that the variables outside lamda expressions must be final. So { a += 1; a } would produce a compiler error because a is changed but it must be a final variable. Final variable means that you can't assign a new value to it.

    • @ABaumstumpf
      @ABaumstumpf Год назад

      Java avoids this by specifying an evaluation order

    • @user-dh8oi2mk4f
      @user-dh8oi2mk4f Год назад

      @@ABaumstumpf ??? This problem stems from the evaluation order.

    • @ABaumstumpf
      @ABaumstumpf Год назад

      @@user-dh8oi2mk4f ... no. The problem arises from the LACK of a well-defined evaluation order.

  • @gblargg
    @gblargg Год назад

    The only solid argument I can come up with for one or the other is to keep consistency with how something behaved in a previous version. Neither seems compelling as "correct", as it's a perverse situation. Really the best thing would be a compiler warning that the result depends on something subtle. I'd rather the language spec leave it unspecified (like the C family) so the compiler's optimizer doesn't have to worry about this nonsense.

    • @leddoo
      @leddoo  Год назад

      i've talked to several people who've implemented languages, and to all of them, `3` was unambiguously the "correct" answer (if you didn't want to leave it undefined for optimization purposes). (see pinned comment for what popular languages do. notably, all of the most popular languages (incl js & python) require the result to be `3` by their spec.)
      `3` is also what you'll get if you implement a naive tree walking interpreter.
      the consistency is consistency with similar expressions. so `a + { a += 1; a }` should behave the same as `id(a) + { a += 1; a }` (where `id = x => x`) or `{ a } + { a += 1; a }`.

  • @itsTyrion
    @itsTyrion 9 месяцев назад

    meta question: how did you get that "laser pointer" cursor?

  • @davidt9902
    @davidt9902 Год назад

    Side effect code, when I saw the code in the thumbnail I expected the answer was 4. If you have already released the compiler fixing the bug may introduce bugs into anyone code who depends on original calculation order.

  • @borone8573
    @borone8573 Год назад

    At the first look it doesn't look like a bug. But even if it is not intended, you can say that such code has "unspecified behavior".

  • @OldGameAcc
    @OldGameAcc Год назад

    And that's why one needs to use parameters if he doesn't know what he's doing.
    Still, if a+f() is a sum, then f()+a should yield the same result. 4 makes more sense here in that regard.
    With 3 you break the math.

  • @vitalyl1327
    @vitalyl1327 Год назад +1

    This is why lexical scoping is important. And if you have a well-defined scoping, name resolution is a trivial, mechanical problem.

  • @jimwinchester339
    @jimwinchester339 Год назад

    Does your language have a 'volatile' qualifier?

  • @user-kh4uf8jm8j
    @user-kh4uf8jm8j Год назад +1

    It's a feature not a bug.

  • @max_ishere
    @max_ishere Год назад +1

    👉 you! Are wrong. a += should replace the value of a before the plus gets evaluated.

    • @leddoo
      @leddoo  Год назад

      what does your favorite language do?
      if it’s python/rust/javascript, they’ll all give you 3 (ie += after +).
      c/c++ will give you undefined behavior :P

  • @VinyJones2
    @VinyJones2 Год назад

    It often a bad idea to change variable value, try to avoid it and most of the bug go away

  • @twiinner1135
    @twiinner1135 Год назад

    Undefined behaviour. Nothing guarantee the computer would evaluate it the block right before the actual value of a, and that block had modified a. In such case, you need a way to tell the compiler that “the order does matter!”, so that he would not alter the order.

  • @KayOScode
    @KayOScode Год назад

    I don’t even think your result is unexpected. It’s the same thing as doing a += (a+= 1) in c which also gives 4. If your code is going to modify a variable as part of the expression, you should expect wonky results. I wouldn’t even fix this, I’d just document it. My compiler doesn’t have block expressions right now, but it does have if expressions, and if you did something stupid like a += if (a += 1) a++ else a; I’d have to say that’s on you. Mine does have an evaluation order, so my result would not be undefined, I think it would be 4 if a starts at zero. If statements are resolved last, post incs are first and += is resolved right before if expressions

    • @leddoo
      @leddoo  Год назад

      as long as it's well defined, i think it's fine.
      personally, i prefer the simplicity of a left-to-right depth-first-search.
      which is basically what you end up with, if you write a simple code generator for a stack based vm.

  • @RandomGeometryDashStuff
    @RandomGeometryDashStuff Год назад

    13:55 the window before final bytecode contains phi, where is phi removed?

    • @leddoo
      @leddoo  Год назад +1

      not 100% sure how other compilers do it, but mine does it during regalloc/codegen. it first tries to assign the same “physical” register to phi source/dest nodes. during codegen, it generates (parallel) copies for all phis, whose src/dst are in different registers.

  • @ShawnHCorey
    @ShawnHCorey Год назад +2

    Are these two equivalent? Remember that addition is commutative.
    a + { a += 1; a }
    and
    { a += 1; a } + a

    • @OMGclueless
      @OMGclueless Год назад +4

      Addition *of integers* is commutative. If the two operands are not integers but are instead themselves expressions, then evaluation order matters, and left-to-right is probably the most sensible which makes these different.

    • @anon_y_mousse
      @anon_y_mousse Год назад

      @@OMGclueless I can agree with that. Which is why both orderings should yield 4 because the braced expression should be evaluated first in a complex expression such as that in each instance.

    • @Fulmenify
      @Fulmenify Год назад +1

      @@anon_y_mousse What I found interesting in this regard is an example someone else gave in another comment here, where both left and right side are braced expressions. No matter how you evaluated that example, it always mattered which order you did it in.
      So, as soon as you allow expressions like this that reassign a variable used in the calculation, you can throw commutative behaviour of addition out.
      The example was this (by @stanleydodds9):
      For example: set a = 3. Then evaluate {a += 1; a} + {a *= 2; a}
      If you evaluate left to right, you get 4 + 8 = 12
      If you evaluate right to left, you get 7 + 6 = 13
      If you evaluate both blocks, and then use the value in a for addition, it still depends whether you go left to right or right to left:
      left to right: 8 + 8 = 16
      right to left: 7 + 7 = 14

    • @anon_y_mousse
      @anon_y_mousse Год назад

      @@Fulmenify Yep, and it makes sense to evaluate the braced expressions first, regardless of which order or how the value of the variable gets used, so in the example from the video it makes sense if the braced portion is evaluated first. As I stated in another comment, in a language like C, which doesn't evaluate blocks as a singular value, but does have parentheses which can sort of do the same, commas inside parenthetical expressions cause sequence points which cause such expressions to be evaluated first.

  • @liorean
    @liorean Год назад

    I would argue that both answers 3 or 4 are wrong. Because addition is commutative, you should have a guarantee that if you swap the augend with the addend, you will get the same answer. Or said another way, the evaluation of augend and addend must be independent of each other. The culprit is the reassignment which is something that should not be allowed in a mathematical statement.

    • @leddoo
      @leddoo  Год назад +1

      it's not a mathematical statement. it's an expression in an imperative language.
      and variables in imperative languages are quite different from regular math variables. (note that the addition operation itself, performed after operand evaluation, is still commutative)

    • @liorean
      @liorean Год назад

      @@leddoo I guess my argument here would be that it really should be, though. Once down at the mathematics, follow the rules of mathematics. But then I'm heavily leaning in the push-effects-to-the-surface-and-keep-the-innards-pure direction of programming, whether in imperative or in functional languages. You're embedding a side effect in a subexpression to what should have been a pure expression, thereby making the entire expression impure.

  • @antadefector
    @antadefector Год назад

    Disclaimer: I'm not a programmer, so please bear with me. This looks to me like a problem of environment handling in lamba functions, say lisp, scheme, maybe ocaml. a, a variable in block should be copied, used and discarded withing { } block, not influencing outer a, if a is not global. After that it is up to optimization to f-up the code so we don't see all steps, if many are ommited. Read this whole sentence as a question, and please disregard my bad english. Best regards, thanks for nice videos.

    • @leddoo
      @leddoo  Год назад

      a block like the one in the example is no different from the block in an `if`.
      so if you couldn’t modify “outer” variables from blocks, you couldn’t conditionally change the value of some variable. (like you say, that’s closer to how functional languages work, but isn’t practical for imperative languages)

  • @ushiocheng
    @ushiocheng Год назад

    this is probably exactally why c/cpp throw it to undefined behavior and be like screw whoever does this stuff
    I learned this stuff as "value labeling" and I would just do it at AST level by keep track of the information in symbol table

    • @chri-k
      @chri-k Год назад

      C ( at least GCC ) actually does not show a warning if you use a function instead of ++ ( but it’s still undefined behavior )

  • @incription
    @incription Год назад

    why is addition non commutative in programming languages?

    • @leddoo
      @leddoo  Год назад +1

      only in languages with side effects. because side effects & state are non-commutative.

  • @kees-janhermans910
    @kees-janhermans910 Год назад

    I just stumbled upon this video. Which language is this?

    • @leddoo
      @leddoo  Год назад

      uh, probably mine gh:leddoo/kibi
      though it's on hiatus for a few more months.

  • @treym2k
    @treym2k Год назад +1

    Tbh it should be 4 imo...

  • @yash1152
    @yash1152 Год назад

    12:23 why not use ssa only for within expressions rather than everyehere?
    -or say use some form of actual version control where u can update the -_-suffix-_- itself?- (well, it won't work; sorry)

    • @leddoo
      @leddoo  Год назад

      hmm, you'd still need a "full ssa implementation".
      you can't really skip out on control flow cause of cancelling operators like `a and b`.
      so you'd have the same complexity, but a less powerful optimizer.
      unless i'm missing something 🤔

  • @crackwitz
    @crackwitz Год назад

    Compiler should nope but gathering all modified visible variables might be a chore

  • @fisharepeopletoo9653
    @fisharepeopletoo9653 Год назад

    Someone told me one + one could never equal 4 so I told them
    var one = 2
    print(one + one)

  • @tamertamertamer4874
    @tamertamertamer4874 Год назад

    Imo that code just isn’t clear in the first place so it shouldn’t be written that way. But yeah I agree it’s a bug

    • @leddoo
      @leddoo  Год назад

      ye agreed.
      it's just a concise way to illustrate the bug.
      there's similar nonsense in the java spec to explain evaluation order.

  • @christianmartinez2179
    @christianmartinez2179 Год назад +1

    I find the "buggy" to be more intuitive and the "correct" one to be buggy.
    It feels like the "correct" one is buggy because it implies that the compiler is creating a temporary copy of the global variable before executing the expresion that updates its value which does not feel correct, or otherwise it implies that it executes from left to right and executes expressions afterwards. It feels like maths where if there's something ambiguous, wrapping an expression in parenthesis and solving that first helps get rid of that ambiguity, in this case the parenthesis would be the inline expression.
    Anyway I wrote this before finishing the video so we'll see.

    • @christianmartinez2179
      @christianmartinez2179 Год назад +1

      Yeah, just what I had guessed, creating a copy feels incorrect when you are dealing with global variables, the code is dumb anyways but 4 is what I would expect when stupidly using a global variable that way

  • @cmilkau
    @cmilkau Год назад

    wouldn't it be easier to generate the copies and remove them later in an optimization pass?
    EDIT: X-D

  • @KryptLynx
    @KryptLynx Год назад

    This is the code you shouldn't use. It reeks of "undefined behaviour". It modifies the same var multiple times in single sentence.