Thank you for making this video. As a Forth beginner, it's always a pleasure to come quality across videos on Forth. Seeing your examples opened my eyes on the potential for Forth!
Nicely done Hans. It shows how radically different Forth is to the conventional programming paradigm. Your next video will have to do some "fast-talking" to sell Forth to the masses I think.
In a way the restrictions that Forth impose are a good thing, since they force you to write words that are simple. I've been programming on and off in Forth for nearly three years now and one of the things I've learned is to throw it all away and start again after understanding what the problem is the first, second or even third time! The final words that emerge are very succinct descriptions of the problem. You may not understand without tracing through the stack what is inside the word, when you first look at it, but the word's name tells you what it is doing in the context that you are looking at it and that is all you need to know. I use a deque rather than a stack, so that the usual dup, drop, swap & over (I only use those) are repeated at the bottom of the deque. The effect is similar to using the return stack, without the problems of return stack variables crossing loop, branching and word creation boundaries. I find Forth fun to program in, but I STILL wonder whether it is superior to C. It's easier to debug, but it takes me longer to create a program.
IMHO - in time you'll succeed to go faster with Forth. Yes, it's often fiendishly difficult to translate a C program to Forth, but that's IMHO because it was created with a different mindset. E.g. take *getopt()* I once tried to translate that to Forth - and it was sheer horror; with loads of nested IF's and stuff. And note I'm talking about the C version here (I discarded whatever was left op the Forth code a long time ago). So that wasn't going to work. So I sat back and thought of how *I* would tackle it if it were a fresh problem. And I reduced it to a 50 line program, 6 words (longest: 8 lines) and one single variable. It does the same thing. I even find it much easier to use, since all options boil down to a character, execution token pair. I couldn't even imagine why it had seemed to be such a big deal in the first place. And that's IMHO what Forth does. I changes the way you think about programming. I also noticed it's more fun to do it, because you reach the point where you can actually test something is much earlier. If I succeed to "think Forth" rather than to "think C" I can move much faster.
@@HansBezemer Thinking about it from a Forth mindset is definitely the way to go. However, Forth is so radically different its taking me years to get there😆. Lately, I've been beginning to think of Forth as a stream of tokens. These tokens come in three types. Run time words (functions that act on the stack), literals (numbers or strings) and parsing words, which 'eat' the token stream until another parsing word tells them to stop. The compiling words ( : & ; ), branching words (if else then) and looping words (begin while) are examples of parsing words. I also have parsing words that do other things. One eats the token stream and places it on the stack as a string. Another eats the token stream and prints it to stdout. Yet another eats decompiled words in the middle of a compiled word (pcode) inside the dictionary and IMMEDIATELY executes them (among other things this lets me store strings inside compiled words). Thinking of your program as a stream of tokens that you manipulate can result in some quite novel solutions to problems. My Forth is loosely based on Chuck Moore's cmForth with a separate vocabulary for compile time words instead of the status bit that most Forths use to flip between compile time and run time. I've used Forth for Web site development (that's on-going), theatre DMX lighting control (my biggest success with Forth to date) and building a direct credit schedule for a bank from an old dBase app that was no longer supported by the vendor.
@@creditscorenz There is a lot to unpack here. First: the tokens. I think you are on to something. I've noticed I use the approach when I do Unix shell scripts. You put in a filter and the rest is transparent to you. I have the same with Forth. Applying DOES> definitions to tables allows me to convert one value to another quite transparently. In my ETL lib I can apply a filter to a read value in order to transform it - either by translating it or reformatting it. Factor even goes further by manipulating entire quotations. If you're doing Forth it takes some time to wrap your mind around, but the idea is very interesting. And in 4tH it works. After defining some words I could pull in entire Factor definitions without trouble. From ColorForth I learned you can do *WITHOUT* complex control structures. I applied that to my preprocessor. Essentially, you break off a routine when a flag is false. I kept the flag, so I could evaluate "ELSE" statements as well. With recursion, you can do all kind of loops.
@@creditscorenz I'm not a great fan of parsing words, though. In short - it's complex. The main interpretation loop of Forth is simple: discard leading white space, parse a word until the next white space. A parsing word sneaks into this operation and then does its thing. It doesn't feel like a neat operation. A recent development is the recognizer - a words that becomes part of of the interpretation loop and does its thing. You can do neat thing with these when you think of it. Let's say anything enclosed by quotes is a string that goes on a string stack (why not - floating point words have their own stack as well). You could write things like: "This is a comment" // "( --)" // "helloworld" : "Hello world!" type cr ; *//* takes a string from the string stack and discards it; *:* takes a string from the string stack and starts a definition *TYPE* takes a string from the string stack and prints it. Note I'm not saying this is the way Forth should be transformed (heavens, no!) but it is an interesting idea to explore. I think you could eliminate most parsing words that way.
@@HansBezemer I think it was a quote from Chuck Moore about ColorForth which also made me realise that I didn't need complex control structures like Case statements. However, I am sticking with ELSE and BEGIN UNTIL 😉I think this sums up what Forth is all about. The relentless drive for simplicity.
Great video Hans, great fan of the series… one of the challenges of stack based languages, in particular Forth has ben precisely the use of the stack, and while forth dependent, this use of the stack has a major limitation in modern CPU’s which does not allow for Instruction Level Parallelism, as there is a data hazard created in the use of the stack as the system does not know when a function will make use of the data on the stack nor can it properly address dependencies. While it has made Forth easy to implement in early times, the direction of modern computing and programing requires a means to best use the hardware. Would locals help in this case? If they are implemented correctly (ESP/RSP in x86 architecture,) they would have a fundamental performance benefit. Again, forth dependent, but the stack “might be” a limitation rather then a benefit moving forward. Thoughts?
Most obviously, a stack based language will profit from a stack based architecture - performancewise . However, most of the popular architectures have been register machines. Professional compilers like SwiftForth are quite capable to produce tight code (after optimization). Also note there are different threading models here - and curiously, subroutine threaded is *not* always faster than (in)direct threading. But in general I think it is safe to say that Forth is about 5 times slower than C. What makes C "slow" is IMHO the creation and destruction of stack frames. I can assure you that function call overhead is something to be reckoned with. It's not for nothing that every high level Forth VM is implemented as a *switch()* statement (4tH included ). Locals typically reside in stack frames. I'm not much of a CPU expert, but doesn't that mean there's an (essential) hardware stack which has to be monitored by some kind of optimization? So why should that impose a limitation? In Forth local variables depend on the implementation: some are created on the return stack. Still, it requires copying data. That having said - in practice it doesn't seem to have too much effect - at least not in my humble experiments. Most important to me is *not* performance. It is the creation and maintenance of the code. What does the fastest performance mean when it takes me 10 times longer to create it? In practice a *lot* of jobs are done in minutes or even seconds. Halving that time doesn't give the business an edge - otherwise *NOBODY* would be using Python. 4tH beats Python easily - often by magnitudes of 10. Performance is not the most deciding factor to a language IMHO.
@@HansBezemer Hello Hans, I can’t disagree with you on any of the well made points. Your point about the overhead of creating and destroying stack frames in C is well taken. Efficient management of stack frames is indeed crucial. In Forth, if locals can be optimized to reside in registers or managed with minimal overhead, we might achieve better performance. Moreover, different threading models and optimizations in professional compilers, like those used in SwiftForth, demonstrate that Forth can be made to run efficiently despite its stack-based nature. However, I’m still pondering whether the use of locals could offer a more systematic performance improvement, especially given the hardware stack monitoring and optimization capabilities in modern CPUs. However, my primary focus is on the evolving trends in CPU and computing architecture, particularly the increasing pipeline depths and the limitations that stack-based approaches might face in this context. The challenge with stack-based architectures lies in their inherent limitations regarding (ILP) and data hazards, as modern CPUs strive to maximize performance through parallel execution. Given these trends, I’m “exploring” whether a well-architected use of locals-leveraging CPU registers like ESP/RSP in x86 architecture-could offer significant benefits not just in terms of performance, but also in code readability and maintainability. On the success of Python, it’s evident that its popularity is not primarily due to raw performance, but rather the extensive ecosystem of tools and libraries it offers. The ability to quickly prototype and test ideas in the REPL, coupled with rich data structures, has been immensely rewarding for developers. Interestingly, many modern Python packages have managed to achieve impressive performance levels through careful optimization, sometimes approaching those of traditionally faster languages like Rust or C, for example the 1BRC. I’m curious to hear your thoughts on whether the strategic use of locals, aligned with modern CPU register architectures, could further enhance Forth. Could this approach address the limitations of stack-based designs in a meaningful way, balancing both performance improvements and code readability? As you well covered, I find it inadvertent to consider stack alignment as a better way (The Forth way) at the expense of readability, intuition *noobs* or performance when compared with the use of locals.
@@jemo_hack I'm actually not a hardware guy. Sure, one has to have a working knowledge of the hardware below to get the best out of it. Many Forths actually cache TOS in a register - and claim they get a better performance out of it. If you're talking performance though, note that if I put stuff on the stack, it's being taken *transparently* by the next Forth word. With locals, there has to be some transfer of data - either by some dedicated area or the return stack, where some kind of stack frame is being built. So I'd wonder very much if locals could beat that. Note my uBasic/4tH *has* locals - and 4tH has a lib to allow for locals. But calling a word *without* locals is quite straight forward. You just put a return address on the stack and jump. No questions asked. Since 4tH always checks where it is jumping to, there is a performance penalty to be paid - and it can be demonstrated there is one. So, I've always been reluctant to add built-in locals to 4tH. I have been playing with a different idea like to offset the stack at entry and assign symbols to that offsets - a bit like uBasic/4tH with the *LOCAL()* and *PARAM()* keywords. It hasn't come to fruition, though. It would combine the advantages of using the stack *and* locals. But then - you don't want to keep the parameters eternally - and where do you keep the return values? In a local? Now, how do you clean up the whole darn thing when exiting? In uBasic/4tH, the entire stackframe is discarded and the return value is left on the data stack. So, it's at best a murky design.
Thank you for making this video. As a Forth beginner, it's always a pleasure to come quality across videos on Forth.
Seeing your examples opened my eyes on the potential for Forth!
Nicely done Hans. It shows how radically different Forth is to the conventional programming paradigm.
Your next video will have to do some "fast-talking" to sell Forth to the masses I think.
In a way the restrictions that Forth impose are a good thing, since they force you to write words that are simple. I've been programming on and off in Forth for nearly three years now and one of the things I've learned is to throw it all away and start again after understanding what the problem is the first, second or even third time! The final words that emerge are very succinct descriptions of the problem. You may not understand without tracing through the stack what is inside the word, when you first look at it, but the word's name tells you what it is doing in the context that you are looking at it and that is all you need to know.
I use a deque rather than a stack, so that the usual dup, drop, swap & over (I only use those) are repeated at the bottom of the deque. The effect is similar to using the return stack, without the problems of return stack variables crossing loop, branching and word creation boundaries.
I find Forth fun to program in, but I STILL wonder whether it is superior to C. It's easier to debug, but it takes me longer to create a program.
IMHO - in time you'll succeed to go faster with Forth. Yes, it's often fiendishly difficult to translate a C program to Forth, but that's IMHO because it was created with a different mindset.
E.g. take *getopt()* I once tried to translate that to Forth - and it was sheer horror; with loads of nested IF's and stuff. And note I'm talking about the C version here (I discarded whatever was left op the Forth code a long time ago).
So that wasn't going to work. So I sat back and thought of how *I* would tackle it if it were a fresh problem. And I reduced it to a 50 line program, 6 words (longest: 8 lines) and one single variable.
It does the same thing. I even find it much easier to use, since all options boil down to a character, execution token pair.
I couldn't even imagine why it had seemed to be such a big deal in the first place.
And that's IMHO what Forth does. I changes the way you think about programming. I also noticed it's more fun to do it, because you reach the point where you can actually test something is much earlier.
If I succeed to "think Forth" rather than to "think C" I can move much faster.
@@HansBezemer Thinking about it from a Forth mindset is definitely the way to go. However, Forth is so radically different its taking me years to get there😆. Lately, I've been beginning to think of Forth as a stream of tokens. These tokens come in three types. Run time words (functions that act on the stack), literals (numbers or strings) and parsing words, which 'eat' the token stream until another parsing word tells them to stop. The compiling words ( : & ; ), branching words (if else then) and looping words (begin while) are examples of parsing words. I also have parsing words that do other things. One eats the token stream and places it on the stack as a string. Another eats the token stream and prints it to stdout. Yet another eats decompiled words in the middle of a compiled word (pcode) inside the dictionary and IMMEDIATELY executes them (among other things this lets me store strings inside compiled words). Thinking of your program as a stream of tokens that you manipulate can result in some quite novel solutions to problems. My Forth is loosely based on Chuck Moore's cmForth with a separate vocabulary for compile time words instead of the status bit that most Forths use to flip between compile time and run time. I've used Forth for Web site development (that's on-going), theatre DMX lighting control (my biggest success with Forth to date) and building a direct credit schedule for a bank from an old dBase app that was no longer supported by the vendor.
@@creditscorenz There is a lot to unpack here. First: the tokens. I think you are on to something. I've noticed I use the approach when I do Unix shell scripts. You put in a filter and the rest is transparent to you.
I have the same with Forth. Applying DOES> definitions to tables allows me to convert one value to another quite transparently. In my ETL lib I can apply a filter to a read value in order to transform it - either by translating it or reformatting it.
Factor even goes further by manipulating entire quotations. If you're doing Forth it takes some time to wrap your mind around, but the idea is very interesting. And in 4tH it works. After defining some words I could pull in entire Factor definitions without trouble.
From ColorForth I learned you can do *WITHOUT* complex control structures. I applied that to my preprocessor. Essentially, you break off a routine when a flag is false. I kept the flag, so I could evaluate "ELSE" statements as well. With recursion, you can do all kind of loops.
@@creditscorenz I'm not a great fan of parsing words, though. In short - it's complex. The main interpretation loop of Forth is simple: discard leading white space, parse a word until the next white space.
A parsing word sneaks into this operation and then does its thing. It doesn't feel like a neat operation.
A recent development is the recognizer - a words that becomes part of of the interpretation loop and does its thing. You can do neat thing with these when you think of it. Let's say anything enclosed by quotes is a string that goes on a string stack (why not - floating point words have their own stack as well). You could write things like:
"This is a comment" //
"( --)" //
"helloworld" : "Hello world!" type cr ;
*//* takes a string from the string stack and discards it;
*:* takes a string from the string stack and starts a definition
*TYPE* takes a string from the string stack and prints it.
Note I'm not saying this is the way Forth should be transformed (heavens, no!) but it is an interesting idea to explore. I think you could eliminate most parsing words that way.
@@HansBezemer I think it was a quote from Chuck Moore about ColorForth which also made me realise that I didn't need complex control structures like Case statements. However, I am sticking with ELSE and BEGIN UNTIL 😉I think this sums up what Forth is all about. The relentless drive for simplicity.
Stack manipulation is much simpler with { } ,though im not sure its better
Great video Hans, great fan of the series… one of the challenges of stack based languages, in particular Forth has ben precisely the use of the stack, and while forth dependent, this use of the stack has a major limitation in modern CPU’s which does not allow for Instruction Level Parallelism, as there is a data hazard created in the use of the stack as the system does not know when a function will make use of the data on the stack nor can it properly address dependencies. While it has made Forth easy to implement in early times, the direction of modern computing and programing requires a means to best use the hardware. Would locals help in this case? If they are implemented correctly (ESP/RSP in x86 architecture,) they would have a fundamental performance benefit. Again, forth dependent, but the stack “might be” a limitation rather then a benefit moving forward. Thoughts?
Most obviously, a stack based language will profit from a stack based architecture - performancewise . However, most of the popular architectures have been register machines. Professional compilers like SwiftForth are quite capable to produce tight code (after optimization).
Also note there are different threading models here - and curiously, subroutine threaded is *not* always faster than (in)direct threading. But in general I think it is safe to say that Forth is about 5 times slower than C.
What makes C "slow" is IMHO the creation and destruction of stack frames. I can assure you that function call overhead is something to be reckoned with. It's not for nothing that every high level Forth VM is implemented as a *switch()* statement (4tH included ).
Locals typically reside in stack frames. I'm not much of a CPU expert, but doesn't that mean there's an (essential) hardware stack which has to be monitored by some kind of optimization? So why should that impose a limitation?
In Forth local variables depend on the implementation: some are created on the return stack. Still, it requires copying data. That having said - in practice it doesn't seem to have too much effect - at least not in my humble experiments.
Most important to me is *not* performance. It is the creation and maintenance of the code. What does the fastest performance mean when it takes me 10 times longer to create it? In practice a *lot* of jobs are done in minutes or even seconds. Halving that time doesn't give the business an edge - otherwise *NOBODY* would be using Python. 4tH beats Python easily - often by magnitudes of 10. Performance is not the most deciding factor to a language IMHO.
@@HansBezemer Hello Hans, I can’t disagree with you on any of the well made points.
Your point about the overhead of creating and destroying stack frames in C is well taken. Efficient management of stack frames is indeed crucial. In Forth, if locals can be optimized to reside in registers or managed with minimal overhead, we might achieve better performance.
Moreover, different threading models and optimizations in professional compilers, like those used in SwiftForth, demonstrate that Forth can be made to run efficiently despite its stack-based nature. However, I’m still pondering whether the use of locals could offer a more systematic performance improvement, especially given the hardware stack monitoring and optimization capabilities in modern CPUs.
However, my primary focus is on the evolving trends in CPU and computing architecture, particularly the increasing pipeline depths and the limitations that stack-based approaches might face in this context.
The challenge with stack-based architectures lies in their inherent limitations regarding (ILP) and data hazards, as modern CPUs strive to maximize performance through parallel execution. Given these trends, I’m “exploring” whether a well-architected use of locals-leveraging CPU registers like ESP/RSP in x86 architecture-could offer significant benefits not just in terms of performance, but also in code readability and maintainability.
On the success of Python, it’s evident that its popularity is not primarily due to raw performance, but rather the extensive ecosystem of tools and libraries it offers. The ability to quickly prototype and test ideas in the REPL, coupled with rich data structures, has been immensely rewarding for developers. Interestingly, many modern Python packages have managed to achieve impressive performance levels through careful optimization, sometimes approaching those of traditionally faster languages like Rust or C, for example the 1BRC.
I’m curious to hear your thoughts on whether the strategic use of locals, aligned with modern CPU register architectures, could further enhance Forth. Could this approach address the limitations of stack-based designs in a meaningful way, balancing both performance improvements and code readability?
As you well covered, I find it inadvertent to consider stack alignment as a better way (The Forth way) at the expense of readability, intuition *noobs* or performance when compared with the use of locals.
@@jemo_hack I'm actually not a hardware guy. Sure, one has to have a working knowledge of the hardware below to get the best out of it.
Many Forths actually cache TOS in a register - and claim they get a better performance out of it.
If you're talking performance though, note that if I put stuff on the stack, it's being taken *transparently* by the next Forth word. With locals, there has to be some transfer of data - either by some dedicated area or the return stack, where some kind of stack frame is being built. So I'd wonder very much if locals could beat that.
Note my uBasic/4tH *has* locals - and 4tH has a lib to allow for locals. But calling a word *without* locals is quite straight forward. You just put a return address on the stack and jump. No questions asked. Since 4tH always checks where it is jumping to, there is a performance penalty to be paid - and it can be demonstrated there is one. So, I've always been reluctant to add built-in locals to 4tH.
I have been playing with a different idea like to offset the stack at entry and assign symbols to that offsets - a bit like uBasic/4tH with the *LOCAL()* and *PARAM()* keywords. It hasn't come to fruition, though. It would combine the advantages of using the stack *and* locals.
But then - you don't want to keep the parameters eternally - and where do you keep the return values? In a local? Now, how do you clean up the whole darn thing when exiting?
In uBasic/4tH, the entire stackframe is discarded and the return value is left on the data stack. So, it's at best a murky design.