Laziness in Haskell - Part 3: Demand

Поделиться
HTML-код
  • Опубликовано: 6 ноя 2024

Комментарии • 54

  • @DeGuerre
    @DeGuerre Год назад +12

    I'm old. I used Miranda before I used Haskell. Example4 would not be correct in Miranda (or Orwell, or probably Miracula), because Miranda's semantics are that pattern matching works as if it was top-to-bottom, left-to-right, and that a reduction is not fired if the redex fails to conform to the left-hand side.
    I suspect that many people still think this way. It arguably violates the principle of least surprise that the argument of addMaybe can be evaluated before the conformality check. But a better shortcut mental model for Haskell is "all bottoms are equal". (To be honest, this is a rule of life that can work in many contexts.)

  • @adrusi
    @adrusi Год назад +25

    This was a fantastic video and I feel like I just learned something as important to writing Haskell as understanding CPU cache and branch prediction behavior is to writing C. Super grateful!

  • @valcron-1000
    @valcron-1000 Год назад +25

    Amazing work as usual. Eagerly waiting for the next video 😅

    • @DrewryPope
      @DrewryPope Год назад +4

      I am not strictly waiting for the next video. 😅

  • @ha.alamin
    @ha.alamin Год назад +10

    I like the term greedy instead of eager, as used in regex terminology; there's a nice symmetry in that they're both essentially “character flaws”; it's like the computer can't win with us, no matter what it does.

    • @Kram1032
      @Kram1032 Год назад

      "Do as I wish, not as I say!"

    • @nesssel
      @nesssel Год назад

      Lazy as a word is catchy but it doesn't market itself too well at the start. In my FP introduction, our prof used the German term for it which translates to on-demand evaluation. While it doesn't exactly roll off the tongue, it presents the concept much better imo.
      The German word is "bedarfgesteuerte Auswertung" for those curious :D

  • @jks612
    @jks612 Год назад +3

    This series has been incredible. Thank you.

  • @lander18000
    @lander18000 Год назад +11

    Thank you for explaining how seq works. I had exactly the misconception that you described. I still don't fully understand what NOINLINE and INLINE pragmas do and why it changed the behavior of the program in ghci.

    • @SuperManitu1
      @SuperManitu1 Год назад +10

      GHC inlines small functions by default. This means that it replaces the function call with its definition. You usually want that because it opens up new optimizations and because the function is small there is a low risk of exploding the size of the program.
      However, big functions are not inlined because if they are called more than once, this would drasticly increase the size of the program. Alexis simulates this behavior by forbidding the compiler to inline. Conversely if you have a function that is big, so GHC would normally not inline it, you can force GHC to inline with the INLINE pragma.
      The change in GHCi was mostly due to enabling optimizations at all, but preventing inlining makes sure that the error really comes from the demand analysis and not some other optimization

    • @thoperSought
      @thoperSought Год назад +1

      @@SuperManitu1
      thanks, that's super helpful

    • @Kram1032
      @Kram1032 Год назад

      @@SuperManitu1 what determines whether a function is big or small?

    • @SuperManitu1
      @SuperManitu1 Год назад

      @@Kram1032 There is no hard limit, GHC has a heuristic on when to inline, depending on multiple factors.

  • @kasozivincent8685
    @kasozivincent8685 Год назад +1

    Oh no. Why it took me so much time to find this channel. I am coming here from Twitter 😊

  • @FatihAltnok
    @FatihAltnok Год назад +1

    I really like the explanation of demand analysis and writing specifications for GHC to generate a program.
    With the AI hype, people think it's okay to give a high level specification for AI to write a program, but don't trust a high level language compiler to do the same. To me, the problem is that the AI output is inspectable where GHC output is not to most.
    One simple solution would be to make demand analysis results observable to the developer. For example, haskell-language-server can allow us to hover over function parameters to see if they are lazy or strict. That would help us validate our understanding of the generated program and reduce adding bang patterns blindly until things are fast again.

    • @pedrovasconcelos7674
      @pedrovasconcelos7674 Год назад +1

      You can inspect GHC output by reading Core -- see Alexis own videos on the subject!

  • @RandCode
    @RandCode Год назад +6

    How long until we get a FreeCodeCamp lecture by Alexis? What an amazing lecturer!! 🙌🙌

  • @NNOTM
    @NNOTM Год назад +3

    Could it be conceivable that the function arrow could be parameterized (possibly in a way that's hidden from the user) by whether or not the function demands its argument, and then GHC could automatically generate two versions of foldl and use one or the other depending on whether the supplied function is strict? Of course the naive version of this would produce a number of specializations exponential in the number of arguments, so that's probably not ideal, but maybe the issues could be worked out/around

  • @EricTorreborre
    @EricTorreborre Год назад +5

    No reason to flame you, quite the contrary 😀. I had never really thought of the evaluation from the demand/data dependency point of view, starting from a strict evaluation, then adding thinks. This is enlightening. You did not come back to the evaluation of addMaybeNothing under optimization. I guess that GHC sees 2 paths leading to a bottom values and chooses to return the "highest" one in the call graph?

    • @SuperManitu1
      @SuperManitu1 Год назад +4

      She did come back to it. GHC sees that x is always demanded (or rather the semantics do not change if it is always demanded), so addMaybe is evaluated like in a strict language, the arguments before entering the function. This evaluates the bang error first

  • @isaacvandoren4733
    @isaacvandoren4733 Год назад

    This is fantastic!

  • @BethKjos
    @BethKjos Год назад

    @10:44 you ask the meaning of the program. My answer is that the program has no meaning because `error` exactly denotes the meaningless computation. The question which specific error "wins" is presumably moot. An implementation is presumably welcome to flip a coin. In practice fair coins are an expensive computational resource, so it will probably be consistent from run to run.

  • @con-f-use
    @con-f-use Год назад

    I'd argue that eager and strict correlate quite tightly and so do lazy and non-strict ('lenient' or 'permissive' better?). So it's alright to simplify to one parameter with a bit of error. Just imagine a maximally eager, yet very non-strict language or a maximally lazy AND strict one. In a lot of cases laziness and strictness work against each other.

  • @pedrovasconcelos8260
    @pedrovasconcelos8260 Год назад +3

    Strict vs non strict and eager vs lazy are both relate to the semantics, but the former is denotational and the later is operational. You can implement non stricteness by normal order evaluation (e.g. macro expansion) or lazy evaluation (i.e. memoizing results). However it does not make sense to implement non strict semantics using eager evaluation or strict semantics using eager evaluation. Thats why 2 quadrants of the initial diagram were empty.

    • @tobykelsey4459
      @tobykelsey4459 Год назад

      Your language is strict if it specifies the order of evaluation of all arguments and expressions. If it leaves it to the compiler or changes with optimization it is non-strict. This is independent of eager vs lazy. From this video it appears Haskell is non-strict when evaluating/propagating exceptions, while for example an eager "C"-like language can be non-strict by leaving the order of floating-point operations up to the compiler.

  • @flowname
    @flowname Год назад +1

    24:41 ok, granted it's totally fine to evaluate the left argument due to compiler's inference of the positive position bottoming, it's still unclear from the point of the semantics why "totally fine" translates to "it's going to be error bang" as an answer to your twitter puzzle. Is it only because there's no other implementation but GHC or is it because strictness analysis implies that when the positive position is bottoming the negative positions have the priority (due to implied default strictness as a starting point)?

    • @SuperManitu1
      @SuperManitu1 Год назад

      Because the arguments are evaluated before entering the function just like in a strict language

    • @tweag
      @tweag  Год назад +6

      Yes, I probably should have made this more explicit: the correct answer is that _either_ exception may be raised. Any answer that says one of the two exceptions will be raised is incorrect, and we can observe that in GHC by showing that changing optimization settings alters which exception is raised.

  • @Archibalgul
    @Archibalgul Год назад

    What IDE/theme are you using for this series? It looks quite nice for Haskell presentations. Probably Visual Studio, but which theme exactly?

    • @tweag
      @tweag  Год назад +1

      The editor is Pulsar (née Atom), the font is Fira Code, and the color scheme is Base16 Tomorrow Dark.

  • @lukasschneiderbauer6090
    @lukasschneiderbauer6090 Год назад

    In the foldl example, you are making a point that GHC has to act like it does because it needs to "preserve bottomness", and if it doesn't know the demand structure of f it needs to allocate the whole unwanted chain of thunks.
    Does that mean that if I actually compile that function together with a known definition of f, and the compiler decides to inline f, it actually could decide to not do it that way because it knows the demand structure of f?

    • @tweag
      @tweag  Год назад +4

      It doesn’t make sense to consider whether the compiler might “inline f” because inlining happens where a definition is _used_ , and the use site of f is inside foldl, which must work for a variety of functions. It would make more sense to talk about inlining foldl, but foldl is recursive, so inlining isn’t helpful, and GHC won’t do it: there’s still a use of foldl present after inlining, so it doesn’t really buy you anything.
      I think you probably mean is to ask whether GHC might somehow _specialize_ foldl to a particular choice of f. Currently, GHC only specializes functions on two classes of arguments: typeclass dictionaries and literal data constructors. Arbitrary functions are neither of those, so GHC will never specialize foldl in that way. It’s not obvious how GHC could predict when specialization like that is likely to be a win, so I don’t expect that to change anytime soon.
      That said, specializing on _demand signatures_ could be a potentially fruitful avenue of exploration. GHC doesn’t currently do that, but maybe it could in the future!

    • @ni-fr
      @ni-fr Год назад +1

      If you apply the Static Argument Transformation to foldl, then it can be inlined, which would enable the optimization that I think the original commenter was considering.
      (The static argument transformation would increase allocation when foldl isn't inclined, so it's not necessarily a win.)

    • @ni-fr
      @ni-fr Год назад +1

      Also, I think there remains a future work ticket for SpecConstr to work on function arguments, not just data.
      Both SAT and SpecConstr seem relevant to the commenter's question. But also a couple tangents away from the video :)

    • @tweag
      @tweag  Год назад +1

      @@ni-fr Yes, good point-I’d forgotten about the static argument transformation! However, the fact that I’d forgotten makes some sense, as GHC currently does not enable SAT by default at any optimization level; it must be explicitly enabled via -fstatic-argument-transformation. However, if anything, the fact that it is currently disabled by default emphasizes my point: it’s difficult to determine when it’s a win. But it would definitely be nice to get it working reliably.

  • @ApprendreSansNecessite
    @ApprendreSansNecessite Год назад

    I'm not a haskeller but this `addMaybe` case is a little harder for me to parse.
    1- I'm not sure how the function is inlined and further optimised. I imagine GHC substitues the arguments, realises that `Nothing` is passed, that it bottoms and calls it a day, removing all the rest, removing `(error "bang")`, not allocating that thunk. Is that what happens?
    2- in the case where the function is not inlined, I imagine that a thunk is created for the first argument if it's not statically known that it bottoms (maybe it raises an exception some of the time), but how far does GHC go dissecting these expressions before giving up and allocating a thunk? Because maybe an expression does systematically raise an exception with a given argument but it's buried very deep. Does the compiler give up at some point?
    In my day to day programming it doesn't matter which exception is raised but in a context were performance is critical and you have different potential causes of failure you want to be able to predict what is going to happen.

    • @tweag
      @tweag  Год назад +1

      GHC propagates demand information bottom-up. The optimizer annotates functions with demand signatures that keep track of which arguments are demanded and how. I made a previous video about this if you’d like to know more: ruclips.net/video/XiTO1EGKrhE/видео.html
      What this means is that there is no such thing as a demand being “buried too deep” for GHC to detect it. GHC does not analyze demands top-down, so it does not need to exhaustively explore the entire call graph to determine the demands on an expression. However, of course, demand information is always an approximation: GHC cannot precisely detect whether a value will be demanded at compile-time because that may require knowledge of which branches are taken. If there is any possibility that a value might not be demanded, GHC will conservatively assume it is not demanded.
      If you really want to do nitty-gritty optimization of your program, you can ask GHC to dump the demand signatures for every function in your program; the syntax for demand signatures is documented in the GHC User’s Guide. But most of the time you should not worry about it as long as you write good, simple, straight-line Haskell code and use strict datatypes. Those topics are the subject of the next video, so stay tuned.

    • @ApprendreSansNecessite
      @ApprendreSansNecessite Год назад

      Amazing stuff! Laziness is well worth the effort (...wait)

    • @lukasschneiderbauer6090
      @lukasschneiderbauer6090 Год назад

      What does it mean for a datatype to be strict/lazy?

    • @ApprendreSansNecessite
      @ApprendreSansNecessite Год назад

      @@lukasschneiderbauer6090 Take this with a pinch of salt but I understand it means basically having a bang on every field. If you look up the video mentioned in the comment, at some point near the end it is shown that when a primitive field is marked as strict, it is not boxed, which removes the step of unboxing it, and more generally I guess having strict fields simply ensures you don't have growing unevaluated thunks in your data structures. It would remove the need to litter your code with strictness annotations.

    • @gungunpanda8135
      @gungunpanda8135 Год назад

      @@ApprendreSansNecessite so are strict datatype always preferred when writing production code in haskell?

  • @alurma
    @alurma Год назад

    Fantastic!!

  • @NerdFuture
    @NerdFuture Год назад

    I thought "eager"meant evaluating things *sooner* than one might think.

  • @m4c13k86
    @m4c13k86 Год назад +3

    I get that this optimisation doesn't ultimately change final return value, but it's super un-intuitive. The GHC analysis of addMaybe makes sense if you have straight up exception as x, but what if you don't? It could potentially be an expensive computation.
    Thinking that Haskell is lazy I (the caller of addMaybe) would expect x not to be evaluated it is not *really* needed.
    Thinking that Haskell is strict I (the caller of addMaybe) would create a thunk or wait with calculating x until necessary.
    What actually happens is mix of the worst of the two worlds. Haskell is said to be lazy, but as shown it's sometimes not. How can I as a caller of addMaybe know which way is haskell going to treat arguments this time? Hell even the author of addMaybe might not know - if I change Nothing case in addMaybe function to just return 0 then it behaves like you would expect from lazy language. People are right to be mystified by this behaviour - it is not a good situation.

    • @tweag
      @tweag  Год назад +6

      I can assure you that you do not want the compiler to introduce thunks everywhere in the optimistic hope that they might not be evaluated. Thunking really does have a serious overhead. People dislike thunks for a reason. As stated many times, exceptions are supposed to be raised when there is a bug in your program, so it is okay for exception-raising code paths to be slow paths. We want to make the intended operation of our programs the fast paths, and that is what GHC attempts to do.
      In the next video, I will discuss some more about some important aspects of laziness in Haskell that I have not yet covered, and they allow you to ensure something is truly lazy if you really want that. But I completely disagree that this is “the worst of both worlds,” and I think many people would agree.

    • @flowname
      @flowname Год назад

      @@tweag > it is okay for exception-raising code paths to be slow paths. We want to make the intended operation of our programs the fast paths, and that is what GHC attempts to do.
      In that particular case with addMaybe couldn't the compiler see that the right exception is the cheapest one? After all, the compiler should be able to see that the expression on the right is statically defined and there are no runtime unknowns, why would it prefer evaluating the left argument instead? I guess what I'm trying to understand is the exact meaning of the sentence "it's okay to do something". is it okay but there's freedom to choose and decide differently for an alternative Haskell compiler, or is it okay and therefore the right choice is to always evaluate , because there are either strict analysis algorithm that dictates the right choice or other GHC-specific reasons (let's pretend there's an alternative compiler).

    • @tweag
      @tweag  Год назад

      ​@@flowname Your use of the phrase “the right exception” does not make sense. I assume you do not mean “right” as in “correct”, as you contrast it with “left”, but of the two arguments passed to `addMaybe` in this program, only one of them is a bottom. The other exception is raised inside the definition of `addMaybe`. In this particular program, raising the "bang" exception is definitely the cheaper option, as it does not require allocating a thunk to suspend the evaluation of the first argument to call `addMaybe`!
      Of course, if GHC were to inline `addMaybe`, the choice between the two exceptions would be truly arbitrary, as it could see manifestly that both are demanded. Indeed, that is precisely why the NOINLINE annotation is needed to cajole GHC into raising the "bang" exception! But as I mentioned in the video, in real programs, GHC will not just inline everything-that would be a very poor compilation strategy-so the way it behaves when the function call remains a function call is representative of how it behaves on real programs.

  • @canelonism
    @canelonism Год назад

    SIIIIIIIIIIIIIIII

  • @gloverelaxis
    @gloverelaxis Год назад

    This series is soooooo good! I wish you'd explained the etymology (or at least a mnemonic) for "bottoms" and the verb "to bottom", because that is a horrendously non-descriptive name (as is just about everything in CS, unfortunately). I eventually found Wikipedia's "Bottom type" entry: en.wikipedia.org/wiki/Bottom_type

    • @boblefest
      @boblefest 4 месяца назад

      If you put all your types in a hierarchy, with a type "branching" out into its supertypes, then if you picture branches growing upwards, "Bottom" would be the type at the bottom, branching into all other types (i.e. being a subtype of all other types).
      In languages with class inheritance such a tree would capture the subclassing relations between classes. Haskell doesn't have subtyping to the same extent, but I'd say that `Bottom` is a subtype of everything here (which is why you can say that the expression `error "poof"` is an `Integer`, and it is also a `Bool`, etc, since `Bottom` is a subtype of every type).