This thing is built on top of another of the guy's projects called HVM, which was a massively parallel runtime originally envisioned as a compilation target for Haskell. But it's a whole platform, you can write any language on top of it. It's pretty cool.
I love that guy as he managed to implement this, enable it to be incoroprated in custom projects via design and finally promote it out (at least enough to be heard by Prime) at the same time.
When somebody mentioned Bend to me, I got immediatly reminded of the original HVM - was glad to see it's a continuation of that idea in HVM2. Amazing work, wish I was intelligent enough to understand how it works.
There is a massive market out there for parallel computing and the current libraries or '''toolkits" to use them are a joke. I hope there will be support for AMD GPUs.
If I can run haskell on this I'd love to do that, bend's lack of types really hurts understanding for this kind of code tbh. Or heck, just give me ADT in bend, that's all I need.
Mutex: To freely quote Kevlin Henney here: "They should really have called them Bottleneck instead. No one would be like - Oh, I think we just need to add two bottlenecks here in those classes and we are fine" :D.
My late wife could have used this in her daily work. She was obsessed with h bar (Planck's constant over 2π, or Dirac's constant) because she used it all the time in her code. When she died, a military officer came to my door and asked me very politely to let them know if she had any notes or material from her work in DC so they could come and collect it. If you're not doing that kind of work, it is unlikely you'll be using Bend.
I think the language could be ideal for creating a solver for linear programming, where you need a lot of parallel calculations. This could be a huge business opportunity.
interaction combinators are super simple. The basic form is just: "If two of *the same* combinator meet, connect up their input wires" "it two *different* combinators meet, copy them" and somehow that suffices to - be Turing complete - calculate things as efficiently as possible And furthermore, these compute locations can happen completely independently from each other, i.e. in parallel. There ARE limitations on this. Namely, you have to structure your code in a form that properly takes advantage of this. However, I *think* that's where Bend comes in. HVM2 is basically typed interaction combinators. Bend probably is a layer that attempts to minimize your having to think about the specific function form. And the design with folds and recursions rather than loops *also* is all about increasing the chance to write in a style that the interaction combinator parallelism can actually be taken advantage of. I recommend at least trying to skim the mentioned paper. You don't have to understand every last bit of the theory in order to understand at least roughly how these interaction combinator graphs work. It's really clever stuff, and pretty long-known too. What's new here is this effort to make it run seamlessly on a GPU.
There are no evidences for the second claim. In fact, the author clearly distorted the meaning of optimal evaluation in the context of lambda calculus. Take a look at what Lévy means by optimal evaluation in his 1978 paper, then read what Asperti says in 1998 about the ineficiency of optimal evaluation.
Specifically: What is a combinator? Why can they interact? What interactions are we talking about? What does it mean to copy one? What does it mean "to connect up their input wires"? What is a compute location? What is a function form?
@@Hector-bj3ls all of those things are best explained through pictures I'm afraid. The diagrams in the paper are pretty good though. Here are my best tries: You can think of a combinator as almost like an assembly instruction for computations. I don't think "why" they interact is a good question: They do so "by definition". The questions is really "how". There are different kinds of combinators that interact in various ways. These interaction combinators are among the simplest though. I don't actually know why they are called interaction combinators rather than anything else, but I'd say they interact with each other? Whenever two combinators encounter each other, they annihilate or "pass through each other" in a copy step. - What that means is very easy to show in a diagram and very hard to only put in words. However, you can see that diagram in this video. It's the one where a pair of combinators turns into four. The triangles are the combinators. They have one input and two outputs each, given by the lines. A compute location is whenever two combinators are directly connected along the input. The term "function form" isn't anything rigoros. That was my handwavy term. Basically, a seemingly magical property of these combinators is, that you can compute a lot of things in a *sublinear* number of steps. Sometimes even in O(1) And the programs you write for that often look quite naive. A toy example is to apply "not" to a bit like quadrillions of times. - Obviously, it's quite easy to say whether the outcome is the same or the opposite as you start with simply by looking at your enormous number in the quadrillions and check the least significant bit (check if it's even or odd) However, *naively* you might just go "oh apply not and then not and then not ..." across those quadrillions of steps. It'd take way too long to do that sequentially, even on modern GPUs. But by the magic of Interaction Combinators, defined in certain precise ways, you can take even this really stupid naive program and it'll give you the result in essentially as much time as the parity check would've done. However, whether this works depends on how you define your "not" function. IIRC it was like this: If you have your classic church-encoded bools: True = \t f -> t False = \t f -> f basically just a pair of singletons, where the left pair is considered True, and the right pair is considered False Then there are two seemingly equivalent versions of Not: not_a = \b -> b False True not_b = \b t f -> b f t The first definition explicitly assigns True to False and False to True. The second one simply says "swap what you get". Using the first definition, you will *not* achieve any speedup with repeat application. It will stay linear. The second definition, however, will allow interaction combinators to discover *sub expressions* that are identical. So when you get something like not not not not not not not not True the first definition will just sequentially do: not = not_a not ( not ( not ( not ( not ( not ( not ( not True ))))))) which is linear. But the second will end up doing something like not = not_b (((not . not) . (not . not)) . ((not . not) . (not . not))) True which can be simplified to: a = (not . not) b = (a . a) x = (b . b) True effectively using only three steps. By construction, it is guaranteed that your combinators will *always* find this optimal form with the fewest numbers of steps, so long as your function is defined in a way that can be turned into a tree like that. The algorithm it finds is equivalent to Exponentiation by Squaring where you go a = x * x b = a * a x^8 = b * b It'll do that without you having to cleverly think about how to do this efficiently yourself rather than using an expensive pow(x, 8) So long as your function has a reasonably tree-like form (or whatever), it will find the optimal way. And moreover, every single compute step is completely local, so if you ever have multiple places that can be reduced at the same time, those places can be safely evaluated in parallel without fear of race conditions or anything like that. I think that's also the reason why bend doesn't directly offer loops and wants you to use recursion instead. It makes it so you are kinda forced into a style that's more likely to take advantage of this kind of speedup. Perhaps with time some syntactic sugar will make that more straight forward.
12:59 I usually use `.ok()` to ignore an error instead of assigning it to underscore. This will give you a warning if the type changes from Result to impl Future, etc.
13:45 Clippy warns you about this things. The specific lint rule is called "let_underscore_future". Clippy should be mandatory when writing Rust imo, it has so many helpful warnings.
Adult education in mathematics ... Fast Python ... Loops that don't suck ass ... Check Check Check. I am delighted, as I'm going to use for Weather and Climate analysis... Thank you for the video and your point of view as when I came across this language I thought it was to good to be true. Awesome.
@oldtools6089 then i said your call is important to us 🤣🤣🤣 yes math is actually a lie because airline companies are a conspiracy as they use flying carpets and magic because we all live on a flat earth... 😉 ... lol 😆 🤣 😂
The creator is a friend of mine, he is called Victor Taelin, he is from Brazil and he is a very skilled language creator, he made more than 5 programming languages already and he is a workaholic as sh*t
@@eNtrozx Distributed linear algebra across a variety of devices with the intent of deploying various MLP and Transformer blocsk in edge and consumer facing applications. This language is really interesting because it's a lot more ergonomic than writing GPU kernels. Even if it was, say, 10% slower, if it saves you 90% of the dev time it's worth it because CUDA engineers and ML engineers are pretty expensive right now.
Perhaps the most impressive part about this all, and is something that's not mentioned in the video, is that the HVM compiler can in fact compile from other languages (given that touring computation based languages are equivalent to lambda computation ones, and lambda computation ones can be then converted to the interaction combinators) So you can like write javascript or python or whatnot c, and compile ir with the HVM and boom, you haven't changed a single line of code and your program runs on gazillion threads and fucking fast
It's really cool but I noticed that while their relative benchmarks compare against CPU they also mention that their "single core performance is still extremely sub-par". So that 57x speedup may not really be accurate. I do have an advanced education in mathematics, but I don't really understand how this language works. I'll have to spend time reading through their paper. My initial expectation was that the paper would describe the language in terms of category theory/type theory (especially after seeing the string diagram looking diagrams) but that's not the case. There are a lot of new domain specific languages (DSLs) being created and studied in research labs all over the world these days (for many applications including concurrency, database updating, parallelism, message passing, etc..). Many of them offer many promising specialized features but most are ultimately just research projects not intended for widespread use. In the future, the hope is that they will lead to good and mature languages for many things. I don't know if that's the case with Bend but it would be good to get an interview with the developers to learn more about it as there is very little information on their website (e.g., who is behind it, is there a future roadplan, is there funding, etc..). Also on a sidenote, I don't think the transpiler/compiler distinction is really that meaningful these days. For instance, the Java compiler is really just a transpiler down to Java bytecode, which is then interpreted by the JVM (unless you're running on ARM in which case the JVM is implemented as a stack machine in microcode and in that case the Java compiler really is a compiler).
oh their parallel performance is also still extremely sub-par (compared e.g. to haskell, I haven't seen a comparison with Python single-core or multiprocess perf yet)
Well the parallel performance is just a multiple of the single-core performance, so that statement stlll makes sense. If they improve their single-core perofrmance, all of the metrics will improve, but (ideally) would stay in the same ratio.
Guy Steele spent several years in Sun's labs designing precisely this (it was called Fortress then, suggesting a "secure fortran"). Eventually he got to some problems they couldn't really solve and the whole project was scrapped.
Oh no, Fireship missed that python threads aren't parallel... The Global Interpreter Lock means it's only ever one thread running at once in python, so python threads only help with IO bound stuff
This will be super-useful to me. I was programming in python for machine learning and a lot of time I had to write C++ code and a wrapper but parallelizing is fucking hard. I only did this when I needed to do some sort of data manipulation that numpy/pandas/matplotlib didn't allow(they covered all data analysis up to date) but when you want to modify the sub-arrays inside an image they sometimes fall short. I would need to fallback into python for the manipulation but not kidding python is AT LEAST 3 orders of magnitude slower then whatever non-parallelized crap I write in C++ to do the same job and any non-parallelized crap I write in C++ is already considered slow to be useful in consumer hardware.
@@vitalyl1327 The problem was the data transfer since I don't know OpenCL very well so when I did that it was slow swapping the data between the openGL part of the code and the numpy part. That's the reason I used C++ together with PyCuda. I believe they refactored my old code to CUDA PYTHON/Cupy some magician did the manipulation in Cupy directly if I am not mistaken. I will prob have to read a lot of code to understand when I get back to the company in october thanks for the suggestion. Also if you know how to that transfer faster or some documentation showing that would be very helpful OpenGL would allow us to use a couple of AMD GPUS we have sitting there idle even if the perfomance isnt as good
There are programming models that detect dependencies which are "high-level" where you give a sequence of intructions and it becomes a parallel program (OpenMP kinda gets you there, task-based runtimes, ...)
I’m definitely gonna try using this. I work in data science at a university and I think this could definitely speed up my terribly implemented parallel python scripts for linear algebra and entropy analysis.
Don't call it "massively parallel" unless it is at least multi-node, and tens of thousands of cores (with the potential to go to millions). Like MPI, for example. This is just multi-threading.
Another feeling that I just got: it may be a magic box analogous to Kolmogorov-Arnold Networks that just apply analytical approach to perceptron activation functions to reduce multiple simple ones into a single one that is more complex mathematically. As long as you can keep within the consistent boundaries of mathematical rigor, analytical calculus can work magic for you, like the Fourier transform, convolutions or Taylor series. Or the Karnaugh maps.
@@patrickl5290 Whenever the term „calculus” comes up, it mosty gets associated with integrals and derivatives - whereas in fact it has a much broader meaning, as in lambda calculus for example. What I had in mind here though was functional analysis as opposed to numerical methods or matrix algebra. To give a more practical example: if there is a known integral of a function or if a compound or convolution of functions can be analytically transformed into a simpler form over the whole number domain - you win without any doubts. The real life problem is that usually the reality very rarely conforms to ideals and the analytical approach has to give way to numerical approximations that are a brute force approach essentially. Replacing bilions of params with hundreds or thousands but which describe more complex analytical building blocks - like b-splines or trig-functions (for potential Fourier transform approach proposed in the comments here) - would be a great reduction in numerical load.
Fold is more than just reduce. It's "universal" in that you can have full Turing completeness without imperative for loops using folds. They map well to modern hardware. This is a promising idea that's complex to execute. I hope he succeeds though. This isn't something that should require Haskell or other exotic tooling in order to use.
All loops can be implemented using just recursion. The lambda calculus is turning complete. You don't need loops at all. I have no idea what you mean that it's more universal.
@@tophy9865 I didn't mention recursion. I was talking about folds. Folds are a universal construct. So are certain combinators. Typically, in toy languages used to study programming language theory, you may have a language with branches and loops, but no recursion. There are other ways to achieve similar results.
@@MaxHaydenChiz I'm still not sure what you mean when you say it's universal. My point is that omitting loops is quite normal for Turing complete languages. All you need is something that can do the same thing. Fold (which can be implemented as a general recursive function) is one way to do this.
@@tophy9865 We are talking past each other. Prime said that fold was "just reduce". Folds are not "just reduce". They are more general. Any loop can be replaced by a fold. (This is more or less how Scala implements loops under the hood IIRC.) Not all loops can be replaced by a reduce. Yes. We could *also* replace all loops with recursion. But that is not relevant to my statement that you can have an imperative language that only understands folds but not loops and not recursion and still be Turing complete. We could also use the combinators in Haskell's G-machine for that matter. Or any of a large number of other primitives. So I don't understand what point you are trying to make. But hopefully I've made mine clear.
Ah the `bend` keyword seems to be a sort of generator for recursive datatypes? the HOF `fold` (left and right) consumes recursive data types, and `bend` produces them. The example shown in the video at 20:36 creates a binary tree with depth 3, for example. Really cool! If only these languages were not python like lol. I hate the syntax, as I much prefer curly braces and the flexibility that comes with that than whitespace-scoping in python-like languages. Also, TYPES! I sincerely hope there is a way to enforce strict typing in this language.
@@agustinpizarro Preference, mostly. I have used a ton of Python and GDScript, but I simply can't get used to the indentation based scoping. It's so hard to tell which block you are in if you have multiple levels of indentation, vs using braces where it's super easy to tell with just a glance. While it's just plain bad to have too many levels of indentation, braces-scopes are so much easier to distinguish with lots of indentation, especially if you are going from an indented line to a non indented one. Also, accidentally indenting something to a different scope than expected is much easier with indentation, and I've done it far too many times. Also, if you really need, you can have an arbitrary scope with braces, which you can't with indentation. Python will just error. And yes, I have used this before.
@@pranavbadrinathan6693 Unless you have syntax highlighting fir your braces in your IDE and you are looking for the right color of the brace (which not always works), i still wonder how having extra characters added (braces) without any strict indentation rules, helps you find that you are in the right scope.
@@agustinpizarro Distinguishing between 3-4 indentation levels on screen at once is easier with braces imo. I can see a closing brace and know the next line is the previous scope. I usually use 4 space indents, and find it harder to figure out scope changes as quickly.
6 месяцев назад+1
About the rust underscore issue @13:00 - set the type of the underscore to the type of the returning function and when it changes you get the error as intended. For example let _ : Result = run_fun(); You can also tell clippy to bug you if you forget to do so.
I'm slowly reading through the original paper on Interaction Combinators and it's blowing my mind. If you can comprehend it, Bend represents the future of high-performance software. Moore's law has found a stopgap in the domain of groundbreaking theory.
@@Takyodor2 I don't know, the ability to massively parallelise even basic arithmetic across 16k GPU threads is a massive challenge in any coding language.
@@sentinelav It's really not. If you know C++, and read about the memory model of GPU's for a few hours, you could easily parallelize problems with inherent parallelism in CUDA (or OpenCL). On the other hand, if the problem _doesn't_ trivially translate into massive parallelism, you will either A) not be able to utilize all those GPU threads, and get better performance with fewer CPU threads, or B) perform a whole lot of extra work on the GPU, synchronizing and performing duplicate calculations, and get better performance with fewer CPU threads. Many threads does not mean automatically faster!
A similar (sounding at least) language, is Halide. Its built on top of the dreaded C++, but it is really cool. Its a bit more "manual", but provides a straight forward way to test different layouts of the same data.
@@Hebruwu I use CUDA regularly; roll my own parallelism on the cpu side. Little bit of tbb here and there. There's basically a 0% chance that something like this would be competitive with our custom solutions.
Parallel is only hard if you're worried about linearly scaling ram and power, otherwise you could just make a hard copy of every process at the time of splitting, respawn on failure.
I would guess the downside is memory usage, I'd assume Bend needs to build full execution graph (interaction net) in memory and then make at least partial copies when net is reduced. More traditional algorithm can loop over structures and modify them in-place. However not having to think parallelism so hard can produce unexpected speedups in some algorithms.
So a more structured MapReduce. It's so good because now you get the benefits of a type system among ADT abstraction, with the benefits of how many infrastructure is in place ready for MapReduce-like environment. Big if true
In all the outrage the one chatter goin "We STUPID?" somehow ... got me right in the chuckles ... seriously, when you are reading this it is still going on...
There is a part of my codebase that has a complexity of O(n4) which to some degree has been optimised using Dynamic Programming (i.e, caching) to O(n3), but this calculation has to run many times a minute and with a growing dataset, I was considering moving this to CUDA.... After a lot of research, I hit upon the C++ CUDA wall. I know that if I stepped into it, even for this part of the codebase, it would be a minefield area for the rest of the team including myself if I don't look at it for a while. This sounds interesting to me. Thanks mate!
Shaders are sick, I delved into them earlier this year, nothing else ever felt so complicated to get started.. in the end you get multiple layers, mainly one doing all the vertices in parallel and one doing all the pixel colors in parallel, and it uses a notation basically the same as running through a for loop across the x axis with one inside it for running across the y axis (or vice versa, i forget which order they actually use -- and i'm even more lost on how the vertices are all indexed).
Given the focus on tensors in new hardware, it would be really cool if it bent the code so that anything that could be written as tensor multiplication, is.
The point about parallelism is SO true. I mean it's true of so many interactive elements of code, or even libraries at time. Like, interacting with the raw output from a speaker in Rust; look at the example and it's the easiest thing in the world, but try to do something with it and you run into race conditions, you are moving data around in closures, and need to keep track of them while they get multithreaded ; you're probably going to end up multithreading both receiving the data and processing it, while needing for it to be in order, and so many other possible issues.
while we probably wont just jump head first into this, maybe it lays the groundwork for something amazing in the future. What I meant to say is: Bend me over 'til I fold.
22:48 the idea of the language is exactly the opposite of what you're saying here! bend's whole existence is to try to make parallelism simpler, so you _don't_ need the adult education in mathematics to use it
It would be nice if this lead to another form of optimization in compilers. Although it looks like the user is expected to prime the program by writing an operation in a specific form, maybe someone can map common patterns into patterns that the hvm can then parallelize. The computation kernels might need a further pass in which you turn the instructions into simd compatible operations so our cpus can cook hooooot. Haha. But yes, I won’t be using it directly. I struggle as it is to convince people to let write a small utility in Rust.
@@isodoubIet we're trying to make the language so you don't need a math phd to use it! for example, the `fold` operation could be defined mathematically as a catamorphism like primagen said in the video, but we're adding plenty of syntax sugar to make it less daunting
With Go I used multithreading for a data extract as a beginner and honestly it was pretty easy. It was only extracting chunks in parellel because of some dependencies but go routines made it pretty simple. I didn't know how lucky I was to be using Go I guess.
The memory footprint would be interesting. More interesting is at which point there is a real benefit, i.e. how to write the code to really benefit from parallelism. The small example (see formulas) would rather cause more overhead than benefit.
Something to consider would be the substantial power draw increase for 5x faster by spawning 1000x more threads. An M3 doesn't peak anywhere near a 4090.
This is actually kind of special. Maybe most people don't target algo's in everyday work, but for those people who do, like game physic engineers, maybe some other who need to work with some evaluations/simulations etc. This makes their lives so much more incredibly quicker, MAYBE less reliable, depends on the actual output of it's compiler. But I'm here for it, physics rendering in general, if this optimizes instead of just 'facilitating' could honestly change a lot quicker than we've ever seen.
That applies to most new tech! Tends to be macOS and Linux only, from a dev perspective. E.g., blockchain SDKs. And often WSL2 doesn’t quite work and you need to go full Linux VM.
Doesn't this mean that nontraditional accelerator cards are viable now? Like cards from tenstorrent which are specifically for ai. The main disadvantage was that it didn't have cuda but if you run bend you would actually take full advantage of all the parallel cores so now you get better than 4090 performance for 1/4 the price???
My counterpoint to your conclusion that we won't use it is that someone will probably build a game engine on it with a cute little visual editor for the noobs to play with. Sounds like a cool idea, anyways. They're already touting its shader capabilities, so that's a solid start to attracting the engine development community.
I’ll have to read the paper and hope for some >1% grok ratio… 🤯😉 But on the surface this HVM magic feels a bit like BEAM magic - which would be a very promising prognostic 👍🏻
Well bad news for you prime. Based on statistics, there are multiple Js frameworks being built on top of this, and also react team planing to use it in react 20 😂
"I'm not gonna use this..." you say that, then six months later we will see a "Learning the Bend Language" video.
How can we take the language seriously when the promotional videos use light mode??!!
@@dylancode light mode is better for showcase
@@mrs8683 Maybe in some cases... Depends what you prefer I guess!
@@dylancode Light mode is fine actually -Light mode enjoyer
Probably some AoC is my guess
This thing is built on top of another of the guy's projects called HVM, which was a massively parallel runtime originally envisioned as a compilation target for Haskell. But it's a whole platform, you can write any language on top of it. It's pretty cool.
I love that guy as he managed to implement this, enable it to be incoroprated in custom projects via design and finally promote it out (at least enough to be heard by Prime) at the same time.
HASKELL MENTIONED
When somebody mentioned Bend to me, I got immediatly reminded of the original HVM - was glad to see it's a continuation of that idea in HVM2. Amazing work, wish I was intelligent enough to understand how it works.
There is a massive market out there for parallel computing and the current libraries or '''toolkits" to use them are a joke. I hope there will be support for AMD GPUs.
If I can run haskell on this I'd love to do that, bend's lack of types really hurts understanding for this kind of code tbh. Or heck, just give me ADT in bend, that's all I need.
Mutex: To freely quote Kevlin Henney here: "They should really have called them Bottleneck instead. No one would be like - Oh, I think we just need to add two bottlenecks here in those classes and we are fine" :D.
sure, but there are many cases where not using them is not an option for correctedness.
My late wife could have used this in her daily work. She was obsessed with h bar (Planck's constant over 2π, or Dirac's constant) because she used it all the time in her code. When she died, a military officer came to my door and asked me very politely to let them know if she had any notes or material from her work in DC so they could come and collect it. If you're not doing that kind of work, it is unlikely you'll be using Bend.
This language is so advanced that we dont even have a problem for it to solve (YET)
But you can imagine
You're not supposed to create the right tool for the problem, you're supposed to create the problem for the right tool
I think the language could be ideal for creating a solver for linear programming, where you need a lot of parallel calculations. This could be a huge business opportunity.
The problem exist, it's called video games
hobbyist wanting to get into GPU programming without needing to get into cuda
interaction combinators are super simple. The basic form is just:
"If two of *the same* combinator meet, connect up their input wires"
"it two *different* combinators meet, copy them"
and somehow that suffices to
- be Turing complete
- calculate things as efficiently as possible
And furthermore, these compute locations can happen completely independently from each other, i.e. in parallel.
There ARE limitations on this. Namely, you have to structure your code in a form that properly takes advantage of this. However, I *think* that's where Bend comes in.
HVM2 is basically typed interaction combinators. Bend probably is a layer that attempts to minimize your having to think about the specific function form. And the design with folds and recursions rather than loops *also* is all about increasing the chance to write in a style that the interaction combinator parallelism can actually be taken advantage of.
I recommend at least trying to skim the mentioned paper. You don't have to understand every last bit of the theory in order to understand at least roughly how these interaction combinator graphs work. It's really clever stuff, and pretty long-known too. What's new here is this effort to make it run seamlessly on a GPU.
There are no evidences for the second claim. In fact, the author clearly distorted the meaning of optimal evaluation in the context of lambda calculus. Take a look at what Lévy means by optimal evaluation in his 1978 paper, then read what Asperti says in 1998 about the ineficiency of optimal evaluation.
Super simple he says... proceeds to use words no one with an IQ under 9000 understands.
Specifically:
What is a combinator?
Why can they interact?
What interactions are we talking about?
What does it mean to copy one?
What does it mean "to connect up their input wires"?
What is a compute location?
What is a function form?
It's just pure functional programming to be honest
@@Hector-bj3ls all of those things are best explained through pictures I'm afraid. The diagrams in the paper are pretty good though.
Here are my best tries:
You can think of a combinator as almost like an assembly instruction for computations.
I don't think "why" they interact is a good question: They do so "by definition". The questions is really "how". There are different kinds of combinators that interact in various ways. These interaction combinators are among the simplest though.
I don't actually know why they are called interaction combinators rather than anything else, but I'd say they interact with each other? Whenever two combinators encounter each other, they annihilate or "pass through each other" in a copy step. - What that means is very easy to show in a diagram and very hard to only put in words.
However, you can see that diagram in this video. It's the one where a pair of combinators turns into four.
The triangles are the combinators. They have one input and two outputs each, given by the lines.
A compute location is whenever two combinators are directly connected along the input.
The term "function form" isn't anything rigoros. That was my handwavy term.
Basically, a seemingly magical property of these combinators is, that you can compute a lot of things in a *sublinear* number of steps. Sometimes even in O(1)
And the programs you write for that often look quite naive.
A toy example is to apply "not" to a bit like quadrillions of times. - Obviously, it's quite easy to say whether the outcome is the same or the opposite as you start with simply by looking at your enormous number in the quadrillions and check the least significant bit (check if it's even or odd)
However, *naively* you might just go "oh apply not and then not and then not ..." across those quadrillions of steps. It'd take way too long to do that sequentially, even on modern GPUs.
But by the magic of Interaction Combinators, defined in certain precise ways, you can take even this really stupid naive program and it'll give you the result in essentially as much time as the parity check would've done.
However, whether this works depends on how you define your "not" function.
IIRC it was like this:
If you have your classic church-encoded bools:
True = \t f -> t
False = \t f -> f
basically just a pair of singletons, where the left pair is considered True, and the right pair is considered False
Then there are two seemingly equivalent versions of Not:
not_a = \b -> b False True
not_b = \b t f -> b f t
The first definition explicitly assigns True to False and False to True.
The second one simply says "swap what you get".
Using the first definition, you will *not* achieve any speedup with repeat application. It will stay linear.
The second definition, however, will allow interaction combinators to discover *sub expressions* that are identical.
So when you get something like
not not not not not not not not True
the first definition will just sequentially do:
not = not_a
not ( not ( not ( not ( not ( not ( not ( not True )))))))
which is linear.
But the second will end up doing something like
not = not_b
(((not . not) . (not . not)) . ((not . not) . (not . not))) True
which can be simplified to:
a = (not . not)
b = (a . a)
x = (b . b) True
effectively using only three steps.
By construction, it is guaranteed that your combinators will *always* find this optimal form with the fewest numbers of steps, so long as your function is defined in a way that can be turned into a tree like that.
The algorithm it finds is equivalent to Exponentiation by Squaring where you go
a = x * x
b = a * a
x^8 = b * b
It'll do that without you having to cleverly think about how to do this efficiently yourself rather than using an expensive pow(x, 8)
So long as your function has a reasonably tree-like form (or whatever), it will find the optimal way. And moreover, every single compute step is completely local, so if you ever have multiple places that can be reduced at the same time, those places can be safely evaluated in parallel without fear of race conditions or anything like that.
I think that's also the reason why bend doesn't directly offer loops and wants you to use recursion instead. It makes it so you are kinda forced into a style that's more likely to take advantage of this kind of speedup. Perhaps with time some syntactic sugar will make that more straight forward.
12:59 I usually use `.ok()` to ignore an error instead of assigning it to underscore. This will give you a warning if the type changes from Result to impl Future, etc.
13:45 Clippy warns you about this things. The specific lint rule is called "let_underscore_future". Clippy should be mandatory when writing Rust imo, it has so many helpful warnings.
Lowkey, Bend isn't even that guy's most impressive work (it's HVM).
Real
HVM is actually written really poorly ngl look at the rust interpreter source
@@dowiee2694 The better interpreter is written in C and CUDA
Adult education in mathematics ... Fast Python ... Loops that don't suck ass ... Check Check Check. I am delighted, as I'm going to use for Weather and Climate analysis... Thank you for the video and your point of view as when I came across this language I thought it was to good to be true. Awesome.
@@bbourbaki Grift ?
You can't use math to make yourself a seer. To predict the weather requires personality.
@oldtools6089 then i said your call is important to us 🤣🤣🤣 yes math is actually a lie because airline companies are a conspiracy as they use flying carpets and magic because we all live on a flat earth... 😉 ... lol 😆 🤣 😂
@@oldtoolslol, no
The creator is a friend of mine, he is called Victor Taelin, he is from Brazil and he is a very skilled language creator, he made more than 5 programming languages already and he is a workaholic as sh*t
As someone who is the target audience of this language, that call out at the end felt super weird, lol
As a brazilian aware of how Ideology works, that also felt weird.
Out of curiosity, how are you the target audience? what do you do?
@@eNtrozx Distributed linear algebra across a variety of devices with the intent of deploying various MLP and Transformer blocsk in edge and consumer facing applications.
This language is really interesting because it's a lot more ergonomic than writing GPU kernels. Even if it was, say, 10% slower, if it saves you 90% of the dev time it's worth it because CUDA engineers and ML engineers are pretty expensive right now.
Weird in which sense?
Why weird? I thought it was A+ v funny
Coolest thing I’ve tried yet. Can’t wait to see where it goes
I am a prophet: it'll go absolutely nowhere!
Literally in December “Advent of Code in bend”
Perhaps the most impressive part about this all, and is something that's not mentioned in the video, is that the HVM compiler can in fact compile from other languages (given that touring computation based languages are equivalent to lambda computation ones, and lambda computation ones can be then converted to the interaction combinators)
So you can like write javascript or python or whatnot c, and compile ir with the HVM and boom, you haven't changed a single line of code and your program runs on gazillion threads and fucking fast
Another Brazilian language 🇧🇷
BR mentioned, carai
vamos
Brasil sil sil sil
Vai Taelin!
BRAZIL MENTIONED
"Mutalisks" sounds like a scary flying creature from star craft 😁
That's just hydralisks but they also mutated from all the radiation and stuff (like ninja turtles, y'know), so now they are mutalisks
It's really cool but I noticed that while their relative benchmarks compare against CPU they also mention that their "single core performance is still extremely sub-par". So that 57x speedup may not really be accurate.
I do have an advanced education in mathematics, but I don't really understand how this language works. I'll have to spend time reading through their paper. My initial expectation was that the paper would describe the language in terms of category theory/type theory (especially after seeing the string diagram looking diagrams) but that's not the case.
There are a lot of new domain specific languages (DSLs) being created and studied in research labs all over the world these days (for many applications including concurrency, database updating, parallelism, message passing, etc..). Many of them offer many promising specialized features but most are ultimately just research projects not intended for widespread use. In the future, the hope is that they will lead to good and mature languages for many things. I don't know if that's the case with Bend but it would be good to get an interview with the developers to learn more about it as there is very little information on their website (e.g., who is behind it, is there a future roadplan, is there funding, etc..).
Also on a sidenote, I don't think the transpiler/compiler distinction is really that meaningful these days. For instance, the Java compiler is really just a transpiler down to Java bytecode, which is then interpreted by the JVM (unless you're running on ARM in which case the JVM is implemented as a stack machine in microcode and in that case the Java compiler really is a compiler).
oh their parallel performance is also still extremely sub-par (compared e.g. to haskell, I haven't seen a comparison with Python single-core or multiprocess perf yet)
Well the parallel performance is just a multiple of the single-core performance, so that statement stlll makes sense. If they improve their single-core perofrmance, all of the metrics will improve, but (ideally) would stay in the same ratio.
Bend compiles Lambda Calculus to Interaction Combinators (see also Interaction Nets), and HVM is a parallel runtime for these.
Guy Steele spent several years in Sun's labs designing precisely this (it was called Fortress then, suggesting a "secure fortran"). Eventually he got to some problems they couldn't really solve and the whole project was scrapped.
@@VivekYadav-ds8oz Their single-core is in Rust, multi-core in C and massive-core in CUDA, they will have to improve each thing separately.
Those interaction combinators look so cool and mind bending
Oh no, Fireship missed that python threads aren't parallel... The Global Interpreter Lock means it's only ever one thread running at once in python, so python threads only help with IO bound stuff
21:46 ah yes, the first phase, denial
Finally, something to run my teetering tkinter monolith at faster than 5 seconds per frame
Bend is basically haskell's design and approach to parallelism.
Interesting.
This will be super-useful to me. I was programming in python for machine learning and a lot of time I had to write C++ code and a wrapper but parallelizing is fucking hard. I only did this when I needed to do some sort of data manipulation that numpy/pandas/matplotlib didn't allow(they covered all data analysis up to date) but when you want to modify the sub-arrays inside an image they sometimes fall short. I would need to fallback into python for the manipulation but not kidding python is AT LEAST 3 orders of magnitude slower then whatever non-parallelized crap I write in C++ to do the same job and any non-parallelized crap I write in C++ is already considered slow to be useful in consumer hardware.
Write in OpenCL then. C++ is not suitable.
@@vitalyl1327 The problem was the data transfer since I don't know OpenCL very well so when I did that it was slow swapping the data between the openGL part of the code and the numpy part. That's the reason I used C++ together with PyCuda. I believe they refactored my old code to CUDA PYTHON/Cupy some magician did the manipulation in Cupy directly if I am not mistaken. I will prob have to read a lot of code to understand when I get back to the company in october thanks for the suggestion. Also if you know how to that transfer faster or some documentation showing that would be very helpful OpenGL would allow us to use a couple of AMD GPUS we have sitting there idle even if the perfomance isnt as good
Brazilians are really proud of this, Prime.
They missed a great opportunity here. The main process exit command should have been over();
There are programming models that detect dependencies which are "high-level" where you give a sequence of intructions and it becomes a parallel program (OpenMP kinda gets you there, task-based runtimes, ...)
I’m definitely gonna try using this. I work in data science at a university and I think this could definitely speed up my terribly implemented parallel python scripts for linear algebra and entropy analysis.
Prime: I don't know that lambda calculus terminology.
Also Prime: I believe it's called a catamorphic operation.
😂
Don't call it "massively parallel" unless it is at least multi-node, and tens of thousands of cores (with the potential to go to millions). Like MPI, for example. This is just multi-threading.
Multi-threading is straight forward when you stay out of the "shared mutable state" quadrant.
Another feeling that I just got: it may be a magic box analogous to Kolmogorov-Arnold Networks that just apply analytical approach to perceptron activation functions to reduce multiple simple ones into a single one that is more complex mathematically. As long as you can keep within the consistent boundaries of mathematical rigor, analytical calculus can work magic for you, like the Fourier transform, convolutions or Taylor series. Or the Karnaugh maps.
What is analytical calculus in this context?
@@patrickl5290 Whenever the term „calculus” comes up, it mosty gets associated with integrals and derivatives - whereas in fact it has a much broader meaning, as in lambda calculus for example. What I had in mind here though was functional analysis as opposed to numerical methods or matrix algebra. To give a more practical example: if there is a known integral of a function or if a compound or convolution of functions can be analytically transformed into a simpler form over the whole number domain - you win without any doubts. The real life problem is that usually the reality very rarely conforms to ideals and the analytical approach has to give way to numerical approximations that are a brute force approach essentially. Replacing bilions of params with hundreds or thousands but which describe more complex analytical building blocks - like b-splines or trig-functions (for potential Fourier transform approach proposed in the comments here) - would be a great reduction in numerical load.
Fold is more than just reduce. It's "universal" in that you can have full Turing completeness without imperative for loops using folds. They map well to modern hardware. This is a promising idea that's complex to execute. I hope he succeeds though. This isn't something that should require Haskell or other exotic tooling in order to use.
All loops can be implemented using just recursion. The lambda calculus is turning complete. You don't need loops at all. I have no idea what you mean that it's more universal.
@@tophy9865 I didn't mention recursion. I was talking about folds. Folds are a universal construct. So are certain combinators. Typically, in toy languages used to study programming language theory, you may have a language with branches and loops, but no recursion. There are other ways to achieve similar results.
@@MaxHaydenChiz I'm still not sure what you mean when you say it's universal. My point is that omitting loops is quite normal for Turing complete languages. All you need is something that can do the same thing. Fold (which can be implemented as a general recursive function) is one way to do this.
@@tophy9865 We are talking past each other. Prime said that fold was "just reduce". Folds are not "just reduce". They are more general. Any loop can be replaced by a fold. (This is more or less how Scala implements loops under the hood IIRC.) Not all loops can be replaced by a reduce.
Yes. We could *also* replace all loops with recursion. But that is not relevant to my statement that you can have an imperative language that only understands folds but not loops and not recursion and still be Turing complete.
We could also use the combinators in Haskell's G-machine for that matter. Or any of a large number of other primitives.
So I don't understand what point you are trying to make. But hopefully I've made mine clear.
@@MaxHaydenChiz Can you explain how it's more general? Do you just mean that it works on any stream?
1 problem: If you're someone that programs in Bend, are you then known as a Bender?
Programming languages are becoming as common as JavaScript frameworks
inb4 bend compiles to js
Ah the `bend` keyword seems to be a sort of generator for recursive datatypes? the HOF `fold` (left and right) consumes recursive data types, and `bend` produces them. The example shown in the video at 20:36 creates a binary tree with depth 3, for example. Really cool!
If only these languages were not python like lol. I hate the syntax, as I much prefer curly braces and the flexibility that comes with that than whitespace-scoping in python-like languages. Also, TYPES! I sincerely hope there is a way to enforce strict typing in this language.
There's also a haskell/ml-like syntax you can use
What do you use braces for? In the end you are indenting your code properly, right?
@@agustinpizarro Preference, mostly. I have used a ton of Python and GDScript, but I simply can't get used to the indentation based scoping. It's so hard to tell which block you are in if you have multiple levels of indentation, vs using braces where it's super easy to tell with just a glance. While it's just plain bad to have too many levels of indentation, braces-scopes are so much easier to distinguish with lots of indentation, especially if you are going from an indented line to a non indented one. Also, accidentally indenting something to a different scope than expected is much easier with indentation, and I've done it far too many times.
Also, if you really need, you can have an arbitrary scope with braces, which you can't with indentation. Python will just error. And yes, I have used this before.
@@pranavbadrinathan6693 Unless you have syntax highlighting fir your braces in your IDE and you are looking for the right color of the brace (which not always works), i still wonder how having extra characters added (braces) without any strict indentation rules, helps you find that you are in the right scope.
@@agustinpizarro Distinguishing between 3-4 indentation levels on screen at once is easier with braces imo. I can see a closing brace and know the next line is the previous scope. I usually use 4 space indents, and find it harder to figure out scope changes as quickly.
About the rust underscore issue @13:00 - set the type of the underscore to the type of the returning function and when it changes you get the error as intended. For example let _ : Result = run_fun(); You can also tell clippy to bug you if you forget to do so.
LOL, that async comic is like smoking. There were 99 non-smokers in a room. Then a smoker entered. Then there were 100 smokers.
I'm slowly reading through the original paper on Interaction Combinators and it's blowing my mind. If you can comprehend it, Bend represents the future of high-performance software. Moore's law has found a stopgap in the domain of groundbreaking theory.
??? It doesn’t fundamentally speed up anything though, it just seems to represent more things as trees (kinda? Not even though really?)
@@palmberry5576 Have you had a chance to check out the original paper? They've essentially created the parallel equivalent of a Turing machine.
It's not going to get better performance compared to handwritten parallel code. The promise is to make it easier to write fast-ish parallel code.
@@Takyodor2 I don't know, the ability to massively parallelise even basic arithmetic across 16k GPU threads is a massive challenge in any coding language.
@@sentinelav It's really not. If you know C++, and read about the memory model of GPU's for a few hours, you could easily parallelize problems with inherent parallelism in CUDA (or OpenCL). On the other hand, if the problem _doesn't_ trivially translate into massive parallelism, you will either A) not be able to utilize all those GPU threads, and get better performance with fewer CPU threads, or B) perform a whole lot of extra work on the GPU, synchronizing and performing duplicate calculations, and get better performance with fewer CPU threads. Many threads does not mean automatically faster!
A similar (sounding at least) language, is Halide.
Its built on top of the dreaded C++, but it is really cool. Its a bit more "manual", but provides a straight forward way to test different layouts of the same data.
"Everyone virtually in this chat will never use language"
We'll see.
Do you use CUDA, OpenMP, or MPI? If not, I doubt you’ll need it. Seems like a language targeted at heavily computational fields
@@Hebruwu Yet, it's not.
@@Hebruwu I use CUDA regularly; roll my own parallelism on the cpu side. Little bit of tbb here and there. There's basically a 0% chance that something like this would be competitive with our custom solutions.
most people in chat writes javascript, lol they don't need performance not gpu programming 😅
Parallel is only hard if you're worried about linearly scaling ram and power, otherwise you could just make a hard copy of every process at the time of splitting, respawn on failure.
I would guess the downside is memory usage, I'd assume Bend needs to build full execution graph (interaction net) in memory and then make at least partial copies when net is reduced. More traditional algorithm can loop over structures and modify them in-place. However not having to think parallelism so hard can produce unexpected speedups in some algorithms.
So a more structured MapReduce.
It's so good because now you get the benefits of a type system among ADT abstraction, with the benefits of how many infrastructure is in place ready for MapReduce-like environment.
Big if true
I'm absolutely going to use it! Not professionally probably, but just because it's neat. I'll probably write a couple applications then go back to JS
FP with thread optimizations has entered the room.
In all the outrage the one chatter goin "We STUPID?" somehow ... got me right in the chuckles ... seriously, when you are reading this it is still going on...
There is a part of my codebase that has a complexity of O(n4) which to some degree has been optimised using Dynamic Programming (i.e, caching) to O(n3), but this calculation has to run many times a minute and with a growing dataset, I was considering moving this to CUDA.... After a lot of research, I hit upon the C++ CUDA wall. I know that if I stepped into it, even for this part of the codebase, it would be a minefield area for the rest of the team including myself if I don't look at it for a while. This sounds interesting to me. Thanks mate!
"Check out this new thing! it's easy! just python!" - proceeds to write a hieroglyphic script that makes python look sane.
Shaders are sick, I delved into them earlier this year, nothing else ever felt so complicated to get started.. in the end you get multiple layers, mainly one doing all the vertices in parallel and one doing all the pixel colors in parallel, and it uses a notation basically the same as running through a for loop across the x axis with one inside it for running across the y axis (or vice versa, i forget which order they actually use -- and i'm even more lost on how the vertices are all indexed).
a transpiler is a compiler. it's a subcategory.
w take
Trans piler awareness month coming up
@@nevelis non-binary computation awareness month when?
I hate idiots who even use the term "transpiler". Dumb code monkeys invented it, it has nothing to do with any academic CS.
Thank you. Sometimes I feel like everyone just wants there to be compilers and runtimes. Get freaky. Believe in yourself.
This is a language where easy-to-parallelize things are trivial and hard-to-parallelize things are impossible.
Given the focus on tensors in new hardware, it would be really cool if it bent the code so that anything that could be written as tensor multiplication, is.
One point that not mentioned at all is that python doesn't run in parallel even if you have parallel code, because of the Global Interpreter Lock
Multiprocessing?
Until you use 3.12... or numba
The start of this video is an extremely concise summary of learning any programming language
The point about parallelism is SO true.
I mean it's true of so many interactive elements of code, or even libraries at time. Like, interacting with the raw output from a speaker in Rust; look at the example and it's the easiest thing in the world, but try to do something with it and you run into race conditions, you are moving data around in closures, and need to keep track of them while they get multithreaded ; you're probably going to end up multithreading both receiving the data and processing it, while needing for it to be in order, and so many other possible issues.
while we probably wont just jump head first into this, maybe it lays the groundwork for something amazing in the future. What I meant to say is: Bend me over 'til I fold.
I can't wait for the time where they are done optimizing and the thread creation overhead is the only culprit of performance
I want to learn it to call myself a code bender.
Seems like the peak Advent of Code language
I can see a use for it in trading strategy backtesting. I've been looking better solutions than jumping into the deep end with elixir.
22:48 the idea of the language is exactly the opposite of what you're saying here! bend's whole existence is to try to make parallelism simpler, so you _don't_ need the adult education in mathematics to use it
He meant more in the sense of, if you're not a mathematician you likely don't have a usecase for this no matter how cool it sounds
It would be nice if this lead to another form of optimization in compilers. Although it looks like the user is expected to prime the program by writing an operation in a specific form, maybe someone can map common patterns into patterns that the hvm can then parallelize. The computation kernels might need a further pass in which you turn the instructions into simd compatible operations so our cpus can cook hooooot. Haha. But yes, I won’t be using it directly. I struggle as it is to convince people to let write a small utility in Rust.
It's a pure functional language so yes math phds are the only audience.
@@isodoubIet we're trying to make the language so you don't need a math phd to use it! for example, the `fold` operation could be defined mathematically as a catamorphism like primagen said in the video, but we're adding plenty of syntax sugar to make it less daunting
Great final take at the end. Cant wait to put my hands on it and benchmark it against CUDA and Triton!
Even a well known algorithm, name your shit better than single character things, jesus.
With Go I used multithreading for a data extract as a beginner and honestly it was pretty easy. It was only extracting chunks in parellel because of some dependencies but go routines made it pretty simple. I didn't know how lucky I was to be using Go I guess.
Glasgow Haskell Compiler mentioned! Bend is just functional programming.
if the package manager ends up slapping, this language will take over
The memory footprint would be interesting. More interesting is at which point there is a real benefit, i.e. how to write the code to really benefit from parallelism. The small example (see formulas) would rather cause more overhead than benefit.
Been waiting for you to go over this.
Something to consider would be the substantial power draw increase for 5x faster by spawning 1000x more threads. An M3 doesn't peak anywhere near a 4090.
This is actually kind of special. Maybe most people don't target algo's in everyday work, but for those people who do, like game physic engineers, maybe some other who need to work with some evaluations/simulations etc.
This makes their lives so much more incredibly quicker, MAYBE less reliable, depends on the actual output of it's compiler. But I'm here for it, physics rendering in general, if this optimizes instead of just 'facilitating' could honestly change a lot quicker than we've ever seen.
Also, the reduction rules for combinator interactions feel strangely similar to the Conway’s game of life… 😉
Me brain too smooth for this.
immutable by default data and controlled side effects makes parallelization easier
How do we solve the theory of everything? -- Just write a TOE funtion.
And yeah, VIctor Taelin programs in Haskell, C++, TypeScript, Idris, Coq and AGDA.
i loved how wonderful was the conclusion in the end of the video 🤣
spent a day trying to write a matrix dot product in this. absolutely infuriating.
I'm gonna make a lib on it to measure weather conditions, it's gonna be called "Air Bending"
Its from the etherium dev that already solved all the worlds blockchain problem in magical blockchain scam land ...
I’m really hoping there is an „over” keyword in that bend lang… to make every bend like more powerful or sth 😉🙃
"Currently not working on Windows, please use WSL2 as a workaround." 😭
That applies to most new tech! Tends to be macOS and Linux only, from a dev perspective. E.g., blockchain SDKs. And often WSL2 doesn’t quite work and you need to go full Linux VM.
I immediately picked up this language, I have a DNA alignment algorithm I was going to rewrite with threading, but bend could actually be better!
I applied for a Data Science position I may use this
This guy's a fucking genious, hes the one behind HVM
21:44 emotional damage
I can't wait for 100 hours of using bend.
Doesn't this mean that nontraditional accelerator cards are viable now? Like cards from tenstorrent which are specifically for ai. The main disadvantage was that it didn't have cuda but if you run bend you would actually take full advantage of all the parallel cores so now you get better than 4090 performance for 1/4 the price???
Modern Python is written with type hinting, so this language is most likely end up adding types if it becomes popular.
My counterpoint to your conclusion that we won't use it is that someone will probably build a game engine on it with a cute little visual editor for the noobs to play with. Sounds like a cool idea, anyways. They're already touting its shader capabilities, so that's a solid start to attracting the engine development community.
0:54 single letter variable names should never be used and showing off a demo with them is wild.
I’ll have to read the paper and hope for some >1% grok ratio… 🤯😉 But on the surface this HVM magic feels a bit like BEAM magic - which would be a very promising prognostic 👍🏻
1:25 Microplastics Found in Human Testicles
In C#, the linter will warn you when you have an ignored promise. Why doesnt rust do that?
It will also eat up all your memory like it is Chrome if you fuck up. FUN!
See the actor-based Pony language. Taking a long time to get to 1.0 though.
Using special languages with hyper-parallelisation without a special need is like premature ej^H^Hoptimisation.
It's always dumb when you can't read the code because types magically don't exist.
Python with one-letter variable names and no type hints drives me bonkers
Well bad news for you prime. Based on statistics, there are multiple Js frameworks being built on top of this, and also react team planing to use it in react 20 😂