Functional Parsing - Computerphile
HTML-код
- Опубликовано: 27 сен 2024
- Functional or Combinator Parsing explained by Professor Graham Hutton.
Professor Hutton's Functional Parsing Library: bit.ly/C_FunctP...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottsco...
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
Types 20+ lines of Haskell program in a single breath, then gets 0 warnings or errors from GHC, then goes on as if it is totally normal...
Profile picture checks out.
Because it is normal. He's done this kind of code many times before.
And no StackOverflow screen up!
The Graham Haskell Compiler
That's totally normal
say it takes a string as an input and returns a tree as an output one...more...time
... and then the example parser doesn't even return a tree, but just an integer.
@@PauxloE Agreed - I guess the video couldn't get much longer though, but I'd have loved to see his parser return an AST and then implement the logic to interpret it!
@@PauxloE It evaluates the tree as it goes along., and it is walking the tree even though we're not seeing it. To get a tree, you'd modify the grammar to so that the do block returns a tree, rather than the the result of evaluating the math expression. The biggest problem I see with this toy example is there's no error handling. Malformed input can just get lost without affecting the tree or warning the user.
gi70st I can fix that for you... It takes an array of chars and gives back an AST
It was so confusing how he did that switcheroo. I was there thinking "how does `return (x+y)` make it produce a node `"+"` with branches `x` and `y`?"
And even after acknowledging that this is really evaluation, he kept calling it a parser, even though the beginning of the video he insisted that what a parser is to him is something that produces a tree as output.
Had to pause the video say, after studying parsers a lot recently I think the beginning explanation is on of the best of what a parser actually is
I think it sounds like an ideal explanation, when you know the subject well beforehand, interesting.
The videos of this guy are a treasure, thanks a lot!
Regarding the "list of results" remark, maybe a fun example would be a parser that can handle all possible interpretations of "time flies like an arrow, fruit flies like a banana"
That's a cool one.
The key (which IMHO was not so well explained in the video) is that you have a function (parser combinator) that takes other functions (parsers) as input, and produces a function (more complex parser) as a result.
This is rather similar to "normal" recursive-descent parsing, except you don't do the combining yourself, and instead let the parser combinator do it for you.
That's one of the key takeaways of Functional programming, "Higher-Order Functions", or functions that use other functions as parameters.
A short and sweet paper is the Functional Pearl about Sixth-Order Functions. Why would you need a Sixth-Order Function?
When you are using parser combinators!
Doesn't sound like something I would need a video for. It's clear from the name "parser combinator". It's clearly something that combines parsers into a parser, and then it's obvious those are then combined by further combinators and that's how you can build the whole root parser.
- This video shows how to do that with monadic combinators, and the which I guess is just a concat function for the lists.
The operator is most likely a custom operator definited in the package.
@@cparks1000000it's an alternative operation. If the first operation suceeds, it returns the result of first operation or the second if the first fails. Or nothing if both fail.
Thanks, that was very illuminating. And it’s nice to see Haskell being used in a practical setting.
WOW! Anyone saw the beautiful recursion in the definition of these expression parsers? I don't have the kind of brain to quickly think recursively like this
Why don't you use Maybe monad instead of List monad?
I think it would be better and more intuitive.
I mean:
type Parser a=String->Maybe (a,String)
"It doesn't matter if you know any Haskell, because I'll be explaining everything as I go along. So you won't understand a thing either way."
> claims we don't need to know Haskell
> isn't explicit about which words are imported from his module and which are keywords.
As of the finished module in 19:35:
The import declaration allows use of the exported definitions from module Parsing. There's also a standard library module Prelude which is imported by default if not mentioned.
Equals= defines values.
Do and >=) operator, i.e.: do { b >= (\b -> c)
Module Prelude exports typeclass methods add(+), multiply(*), alternate(), bind(>>=), and return, and also exports the implementations of (+) and (*) for whichever of the various number types the end program might use. The defaulting rules in the REPL session seem to select the Integer type.
Module Parsing defines type Parse, function char, and method implementations for (), (>>=), and return. The bind(>>=) method does all the heavy lifting of combining two parsers into one larger parser.
The main module defines functions expr, term, and factor.
One thing here is that parser he implemented does binary opetator evaluation right-to left. Left-to-right binary operators are more involving.
Graham Hutton. Your Haskell book was incredibly useful!! Never clicked so fast on a computerphile video. Parsing in Haskell was a bit hard in the beginning though.
what is the name of the book? okok i just google it. =D
@@keestv3116 Programming in Haskell :)
I, too, used the book. Excellent resource for starting Haskell.
> Parsing in Haskell was a bit hard in the beginning though.
Getting used to all those new operators (, , , among others) was brain-breaking, initially.
Oh he wrote that book. That book got me through first year in uni. I bought it in desperation for my resit. It was the only module I had to resit and I still harbour a special hatred of Haskell.
Pleased to observe that he uses the vi editor - a REAL man :-)
That's not a sign of manhood. It's a sign that he has a clinical case of needing to be different. ;-)
@@lepidoptera9337Yeah, "different". Because he uses the text editor that comes with every computer ever. He is so quirky :/
@@sebastiangudino9377 It doesn't come with Windows, child. Why would it? Windows has specialized IDEs of all sorts for all kinds of R&D. I am currently using Eclipse with an embedded microcontroller environment. The hardware debugger is fully integrated with the code editor and I can step through my program line by line. I could never do that with vi.
OK, now you got your two minutes of attention. I hope your basement has gotten a little warmer. ;-)
@@lepidoptera9337 WSL is a POSIX system that comes with windows from Windows 10 onwards. It includes vi
Having trouble parsing Grahams accent. Sounds like its from everywhere in the UK at once! Fascinating
He sounds unambiguously Irish to me, but as far as I know, he is actually Scottish.
"4 and half screen fulls and is full fledged and can basically implement any parser you like" - mic drop
Excellent explanation, very interesting to watch!
Wow! So cool how much you can do in a few lines of Haskell. Like defining a grammar, and a parser for it!
Sorry, but back in the day, I used Lex and YACC. This takes all the fun out of it
More videos with prof Hutton and functional programming. My favorite topic
More of this, please! Graham is an outstanding teacher.
why this sir doesn't have a single playlist at all? do computerphile forgot to make one? "The functional programming series maybe" ?
12:56 "And then I'm gonna *simply* return it"
Ah yes, that's what happens there, no magic at play at all :v
Yeees :v
Thanks for such visual and enlightening video. Next time, however, I'd rather see this kind of video (coding) as a screen cast, rather than camera jumping back and forth. Professors head talking just obscured the point he wanted to make about the primitives library. Overall, it was surprising how the parsing library neatly applied to the problem.
WOW... I just had to wrote a parser and evaluator, in Kotlin and I had 3 classes, working in conjunction to get final answer. While this feels like magic.
Great video! I've written some parsers in Haskell, but Prof. Hutton makes it look easy.
I already know Haskell and quite a number of monads already but haven't got time to learn parser as a monad. So this video is perfect for me.
It's effectively the State Monad.
@@Dan-gs3kg If all parsers always produce singleton list, then parser monad = state monad as you said. So parser monad could be used as state monad. *But can you show the converse?* because that is what it means for me to say 'parser monad is effectively state monad' and I don't yet see how.
@@ancbi I'm a bit fast and loose here, but you can see the correspondence between say:
> Parser a = P (String -> (a, String))
> State s a = S (s -> (a, s))
Where (s ~ String), and (~) is Type Equality
The more general statement:
> Parses a = P (String -> [(a, String)])
Is a bit different, but then we can also make a more general statement about what the State effect is.
> StateT s m a = S (s -> m (a, s))
Where (s ~ String, m ~ List), we get (Parses)
Where (s ~ String, m ~ Identity), we get (Parser)
If we want zero or one parse results, we replace List with Maybe, if we want literate parse errors, then we use Either.
All of this hinges off of the initial State Effect.
@@Dan-gs3kg Hey. Thanks for making it clear. It was my ignorance of StateT making me say what I said.
@@ancbi you're welcome.
If you want to look at how "stacked" a parser combinator library gets look for the packages: megaparsec, parser-combinators, and yoctoparsec.
The first is a fully featured library that generally dominates in performance, and expressiveness in both passing and error reporting. Yet it operates on the same principle seen previously.
The second is a library that can interface with any parser combinator library. This is possible because of the algebraic structure of parser combinators. In other words, those principles are very permissive.
And the third is very reductive of what a parser combinator library is. It's effectively a pile of free theorems (invariants) and one primitive: The single token matching primitive. At some point you figure out that a principled outlook is the most constructive.
Neat but you lost me when you started implementing the grammar.
Also would have liked to see the tree before it was evaluated.
The implementation reflects the grammar. Alternative parse paths are separated by (), where alternate expressions of the same term are separated by (|).
Strange instance of applicative where he uses only head of the list
I feel like I need to take a ten week course on this.
I felt that way when I first saw Haskell, too. I've been learning it as a hobby since August and it's totally revitalized my interest in computer science on the whole.
Nooo, the list scares me 😭
Why couldn't you just use a normal Maybe
this looks like DCG in Prolog, but DCG is more declarative
There is a Channel on RUclips called low-level-javascript and the guy has a series building a parsing combinator from scratch. It is super interesting and o highly recommend, he's helping me a Lot with my parsing studies
Yes that guy is a genious
A thousand upvotes! I'm building a simple parser in SCSS of all languages. Let's say it's an interesting adventure. Especially considering that my programming skills are rusty.
Graham Hutton wasn't even born when the early papers on parsing were written. Is this dude even real?????
More haskell please, I love it
Cutting back and forth between the computer, the screen, and the speaker is not the best decision. It's very frustrating.
So interesting! Thank you so much! That is quality teaching.
Nice video. Would like it if you'd switch screens less to make it feel more "dynamic" while there's code being shown.
So, your definition of parser includes both the tokenizer and the syntax checker?
You can say that a tokeniser is a parser from a String to a List of Tokens. Rinse and repeat with the List of Tokens until you have what you want.
So if we wrote some code that ran parsers in an infinite loop and then buried that computer would it produce infinite trees and thus solve deforestation?
Okay, so expr ::= term + expr | term. Right. Now, lets add minus, will we?
expr::= term + expr | term - expr | term. Whoops, now what? Oh. And, btw, is the order of operands to significant? How about left recursion?
Is Hascal some sort of hybrid language?
Brilliant! Thank you!
How would one use this with left recursion and left associativity?
One of the limitations of top-down parsing methods such as functional parsing is that left recursion results in parsers that loop. One possible solution is to transform the grammar first to eliminate the left recursion. Another is to define special-purpose combinators to parse operators that associate to the left.
Idear
parzer
Maybe it's because I don't know haskell but it's unclear to me how expr chooses whether to do the '+' branch or not
It backtracks. It tries to do the '+' case and if it can't it backtracks and does just the term case.
I'm wondering, is a parser a function that takes a string as an input and gives a tree as an output?
amazing speaker
> has got a Mac
> codes on Vi
List of results? Wouldn't an Either be better? Then you either get the tree, or an error back
Just wrote an abstract syntax tree in my utils package for tokenizing symbolic math expressions, would be cool if you guys went over how ASTs are used in compilers / interpreters as well
That gets into recursion schemes. Instead of consuming strings, you consume and rebuild trees. There's quite a bit involved there as different schemes have drastically different capabilities.
The important bit is that those pieces compose well, and have some very nice optimisations.
So Parser is just ReadS?
Okay so now let's build this in C
I thought I was having a stroke at 1:40!
I had to google the pronounciation of parsing. I didn't know that there were words pronounced differently in en-US and en-GB.
There are all kinds of words that are pronounced differently in each dialect
Hascal
No Tripod?
import argparse
Did prof Hutton's code check for leading zeroes being invalid for integer like input was "000123" would be kind of invalid input for an integer. Even "022" would be invalid in a traditional sense for an integer.
Thank you for this video. but why do I get this exception?
I'm running it on GHCi 7.6.3
*Parsing> parse (some digit) "123"
*** Exception: Parsing.hs:38:10-21: No instance nor default method for class operation GHC.Base.return
I also get this warning when I load Parsing.hs
[1 of 1] Compiling Parsing ( Parsing.hs, interpreted )
Parsing.hs:38:10: Warning:
No explicit method or default declaration for `return'
In the instance declaration for `Monad Parser'
Ok, modules loaded: Parsing.
Apple, definitely upgrade your GHC. Try not to use the versions of GHC that come with the package managers of various OS choices as they tend to be quite out of date. Use the curl (or wget) install scripts that you’ll find on the websites for either of the things I’m about to mention in quotes. Either install the Haskell “Stack” tool for managing isolated sandboxed versions of GHC on your system (so you can have different projects operating with different GHC versions) or use “ghcup” to install Cabal for you to do something similar. But yeah, avoid whatever is in your package manager like the plague. And GHC 7.6.3 is pretty out of date with how monads are implemented. There’s been some significant language shifts since then.
Apple also add the line “return = pure” within the Monad instance declaration (...just above the definition for (>>=) is where you’d normally put it).
NWN thank you very much! Unfortunately I need to use specifically this version for my university coursework
Apple that’s a shame. What university/subject is this? Lectures should really update their course materials + backend. In GHC 7.10 the “Functor-Applicative-Monad Proposal” was implemented that made all Monads have Applicative as a superclass, which is a pretty big change and something that should be part of your learning ecosystem.
haskell is so much like black magic lol I should learn it
Vim is based
recursive descent(?)
I have no Idea how this worked...
Haskell can be a bit of a 'culture shock' to people like me who are used to procedural programming, but I definitely want to learn more about it.
vdisj ::= "either" vp "or" vp
ndisj ::= "either" np "or" np
"An expression can either be a term plus an expression or a term."
Parse error!
Take a shot everytime he says to take a string as an input and a tree as an output
5:01 noooo i thought the ouput was going to be a tree
A parser for things is a function ftom strings to lists of pairs of things and strings
Thanks
Using C could to implement a separate lexer, and parser using the shunting yard algorithm for infix expressions is more didactic IMO
You should make a video. I would like to see how it works
parzer
Interesting. How do combinator parsers compare with table parsers like Earley parsers? Is it possible yo combine approaches?
s/yo/to/ !
In your first example you basically use parsers as regular expressions.
The cuts in the video are a bit annoying, would've been better to stay longer on the screen and scrap all the views on the laptop from further away... yes I can pause, but I would like to see what he is talking about while he is talking about it. There is no need to show me his laptop.
why does he say everything twice or trice ?
i´m confused with the fact that he is saying everything twice or trice.
is it fun for him ti say sentenses twice or trice?
3:50 "a parser may not always succeed..."
Ah, so we're going to have it throw a value error.
"...so we'll just let it return an empty list"
Wat
"Never return an error value that is indistinguishable from legitimate data" is a mistake too often made. In this case an empty list is not expected.
It is generally accepted that when you apply a parser to an empty stream, it is correct to have the parser not fail and produce an empty tree. If you accept this as correct, then producing an empty tree on invalid input is semantically incorrect.
yeah this parser is just a joke
As someone who knows Haskell and knows how to parse in it:
This has to do with the implementation of the choice operator (). Graham says it returns either the left parsed value or the right parsed value, but what the operator does is that it lets BOTH sides parse and then concatenate the results. This is possible because the return values of the parser functions is a (tuple in a) list, so when one side fails it returns an empty list, which can be appended to the other list (even though that operation doesn't actually do anything).
This means that it doesn't matter whether left, right, or both fail. Either way, the results get concatenated and THAT becomes the final result.
If there's anything that isn't clear, let me know.
@@DutchmanDavid, so if they both succeed you'll get a concatenated mess?
It doesn't sound very safe.
I don't understand the appeal for Haskell
of Haskell or functional programming in general?
so I guess I dislike pure functional programming languages
This video is confusing.
It's confusing how he doesn't put `Parsing.` before the things imported from that library. Which words are keywords of the language and which words are imported?
It's confusing how he has words differing only in case. `char` and `Char`.
The library has `P` quite a few times. What is `P`? In some lines it looks like it's the name of an argument to a function being defined, but some other functions don't have `P` in the place where I'd expect arguments to be named.
It's confusing how he started with a bit of the defining the parser, then jumped to high level using his already finished parser library, then briefly showed us a flash of the library for a few seconds, then went back to using it again. How does it even work?
18:60 It's confusing that he keeps refering to this as parsing, giving us the idea of the tree mentioned earlier being the output. "How does `return (x+y)` build up that tree" I asked. - It doesn't, it's evaluating the expression.
`char` is a function, `Char` is a type. Types and constructors start with upper case.
`P` is a data constructor, which is a function. If you think of `Parser a` as a type wrapping a `String -> [(a,String)]`, then `P` is the constructor that has type `P :: (String -> [(a,String)]) -> Parser a`.
In one place (the definition of function `parse`), `P` is on the left hand side, to pattern match and extract the `String -> [(a,String)]` from `Parse a`.
You don't have to prefix unqualified imports.
The bit about the passing primitives is uninteresting, as it's pretty much bit masking on character literals.
The combinators are... Probably the most magical and important part of it.
To make it construct an AST, replace "+, *" with AST constructors.
can anyone tell me a practical use for parsers? like I watched the entire video and know what they do, but I feel a little unguided, in what common situation you'd use specifically a parser and not just a for loop
I mean, a parser could be built on a for-loop. Parsing is basically just the process of reading some data, like a mathematical expression, and understanding it.
Math expressions can't always be calculated left-to-right (i.e. in the order you or your program would read them), like (1 + 3 * (7 - 1)), so one approach you may take is to decode the expression into objects -- a tree structure -- and then, after you've read and understood the whole expression, you'd use that data to calculate it in the right order.
The video's approach is to use functions that are chained in sequence, and those functions could in theory use a for loop. For example, we may have a parser function for variable names, which would want to use a for-loop to read until it sees a character that can't be a variable name. We may have a parser function for whole numbers rather than single digits, which would take the same approach.
Programming language interpreters and compilers. JSON parsers, syntax highlighters, linters, transpilers (like Babel) web browsers that parse html and css, web scrapers all use parsers. Whether the average working programmer uses them is another story. Another example would be if you find yourself making a DSL for adding content to your video game and would rather not write the content in C#, you might consider writing a parser. Anytime you take input and transform it to produce some for of structured output, you are in essence writing or using a parser
According to LangSec, all input should be parsed into a well defined structure (i.e. a type that represents only the set of valid input) before being processed. That way you aren't making assumptions about your input that can be exploited by attackers. The type of parser used should match the complexity of the Grammer used to define valid input, following Chomsky's hierarchy of Grammers.
Does this kind of parser run in O(n), assuming an unambiguous grammar? n being the length of the input string.
I don't think it does, but that's because this is a pretty naive implementation (which is totally fine BTW - that way more people can understand what roughly happens behind the screen).
If you want O(n), take a look at a full parser implementation like Parsec, Megaparsec or Attoparsec (just to name a few).
No. It may have to do a bit of backtracking when the parser should take the second possibility in the sub-parsers. This means re-parsing some stuff, making it non-linear in complexity.
For example, consider the simple case "2" and the parser from the video.
First, it tries to parse this string as 'term+expr', which expands to 'factor*term+expr' and ultimately '(expr)*term+expr'. The '(expr)' fails, so it tries the second alternative of 'factor', which is 'int'.
This succeeds, so it goes back up the tree to parse '*term+expr' out of (2, ""), which fails. Therefore, it tries the second alternative of 'term' in 'term+expr'. This ends up parsing (2, "") again, with '+expr' left over to parse out of the empty string. This again fails, so the second case of the 'expr' parser is used, which eventually succeeds after another round of backtracking in the 'term' and 'factor' parsers to produce the final result: (2, "")
So it you have a structure of sort with leveled information and break out thing that meet specific conditions, is that parsing your explanation is so filled with lingo?
I hear you talk aaout returning trees and pairs is that a request for being a parser.
What about if you split up a string consisting of a leveled record tree" maybe first into substrings knowing which type of information in each string and the sub iterate the record structures within them using the record brackets and identifiers isn't that parsing top down?
What is the main definition of a parser, i thought it was breaking out information from structured data using rules?
@@JmanNo42 Yep, and the structured data is always a tree when calculating expressions. Operations always return a single result.
What is the Haskell parsing library without IO, Maybe, Option and Exceptions?
Well, the same I guess.
@@bingloveskoki Nah, this one is just a toy :)
painful
at the last example, I was expecting to either have something parsed or get the "unconsumed" string... but the result was nothing with nothing "unconsumed"
It's giving a list of all the ways the input can match an expr with something left over. The last example gives the answer "it can't be an expr with anything left over". In a context where you could have an expr or something else, it would first pass the whole string to the expr parser, then pass the whole string to the other parser; choices remember the original string, and pass that to each parser, rather than passing the unconsumed part to the next parser like sequences do. If the expr parser gave [(nothing,"(2+4")], that would mean it was okay for an expr to consist of nothing.
But can it divide by zero?
BTW what is the font you used in this video?
vi users 👍🏻
Waste of time, just use nano. Zero learning curve.
@@user255 But very little extensibility. Minimalism, while nice in some regards, comes at a cost.
He also should have mentioned that he is using "vi" and that you can use any text editor (even if you shouldn't).
but why no colours?
@@JNCressey Too user friendly...
@@pounet2, 17:27 *tap tap tap on the spacebar
"So what's a natural number? ..it's just a non negative integer.." No, those are called non negative integers, natural numbers are all non-negative AND non-zero integers, zero is not in the set of natural numbers.
2:30
> functional programming
> "first thing we're gonna do is define a type that does stuff"
This is not like an OOP class. It's like a function signature. He's saying that a function from String to Tree is of type Parser. In other words, he doesn't define a type that "does stuff"
@@jackeown, ah, so instead of it being like "if you're a parser, you will do this" it's like "if you do something like this then I'll call you a parser" ?
@@JNCressey yes.
This seems to be a few levels above the actual level. Would be easier for me if was explained first the lowest level of logic.
Graham Hutton!
Perhaps Simon Peyton Jones or Connor McBride or Philip Wadler or Robert Atkey next?
Natural numbers cannot begin with a zero. Your parser should account for that.
What's that accent?
Sounds Scottish to me
Haskell
The "Low Level JavaScript" channel has a great series on implementing this type of parsing in.. well.. JavaScript (surprise!)
BODMAS
Can someone explain to me why this language is considered important? Is it doing something new, which other languages haven't achieved? It seems just a different syntax from what has been used before, or am I missing the point?
Steve Kiley as far as I understand it one draw is that it seeks to avoid side effects from operations also it's a functional programming language while there are others this one's also bringing concepts out of modern programming theory into a usable language. really is a whole bunch of things and you kind of just got a Google 'why haskell'
@@TheNewton Thanks for your reply. I shall try to find out more about it. I am old school programming, so I need to find out more about it.
what is this weird language lol
Scottish
I think Notepad is the right tool to program in Haskell.
@K.D.P. Ross I think Vim contributed to the creation of functional programming, it is so hard to copy and edit text, that the guys came out with an approach that limits those operations to a bare minimum.
Is he using spaces instead of tabs?
OMG he is! (17:27) First time I see this behavior in the wild.
When you want your "do notation" indentation to line up like that in Haskell I think you need spaces...
@@osolomons I start to understand why many people hate Haskell
Tabs tend to break Haskell programs. Spacing makes blocks like in Python but unlike Python the number of spaces you have to indent is not consistent. This is because you have to indent past a specific thing like `if' or `[' rather than indent an additional tab or 4 spaces. `do' is not one of those things, IIRC.
@@osolomons, a decent IDE would insert the right number of spaces for you if you pressed the tab key. what rubbish is he using?
That's interesting because I programmed my own parser that works just like this without having ever seen it. I'm using it make a grapher.
Imo it would have made a lot more sense to use a more common language so you could just talk about parsing rather than teaching us haskell.
For real. I'm sure Haskell is great, but I'd rather not have a crash course in it while learning about something so abstract.
Haskell's syntax is so unconvetiontional that at least for me this was basically pointless video