Fun stuff! 30 years ago, when I was a UNIX admin, I had a book titled UNIX Powertools. I used it more often than any of my other books until the pages all fell apart. It covered a lot of things you cover in this course. I love this stuff.
Very true. But the benefit is that eventually, after enough practise or ROTE learning, you know how to write the regex, and it is widely applicable to many problems.
I am incredibly grateful to this channel. It’s really changing the way I work and it has taken the joy of working in the command line to a whole new level!
This is a fantastic showcase of the different things that you can do in a cut/sed/grep/awk etc streaming situation to solve a myriad of problems as the case may warrant. Love the inclusion of R, which in combination with awk seems like it could be used to detect outliers in a file/dataset and flag is with a QA warning. Bad data is an omnipresent danger in data flows and is worse than ever since a lot of data is coming from internet sources ( urls etc ). Some of the worst 'bad' data isn't outright formatting errors or NULLS but data that is valid but not quite right -- it is out of the expected/normal range but doesn't show up in red as a 0/NULL/data omission.
In 18:00 he wrote a long long command that is not very useful. I mean, instead, he could use the cut command, that precisely does what hes trying to do: show us just what we want. A good example would be: cat ssh.log | cut -d ' ' -f4 Witch: -d option stands for "delimiter" (a pattern that separates words, for example) - f option stands for "field" -- think a field like a column, in this example would be the 4th column The cut command is really good when we are trying to read data with some of pattern For instance, the /etc/passwd file on a Linux system has a ":" for the delimiter. So, the info is separated by a column character. Obs.: Sorry about my English, just learning xD
ed works very well. Learn ed, it is simple to learn in fact you have almost everything. Use "." to state the current line and return to command mode, "a" for after-use this to begin editing, and "w" for write-use this before "q" for quit, ed is great for building scripts. Also you need "p" print, for searching; "n" shows the line number. Quitting without writing will leave the file untouched, no matter the number of edits.
One tip, if you're giving a lecture with regular expression, or even just bash/csh/tcsh shell - don't use a 'fancy' shell environment, especially one that ends in a nonstandard format of having a "|" pipe character at the end. It makes it confusing for the end-user and not necessary.
the exercises for this lecture are kinda challenging for someone who is a complete beginner in regex. i know the first exercise they provided was a beginner regex tutorial, but the second question raised the roof on the difficulty.
Very cool. I knew about command lines utils like sed and awk, but never used because I thought they are complicated. Now they make a bit more sense. Thanks. By the way, this doesn't look like a lesson about data wrangling. More like: "how much stuff there is in shell and what can i do with it".
22:30 i saw lots of post saying that sed doesn’t support non-greedy match, neither GNU nor BSD. Correct me if I’wrong, I think that non-greedy match won’t actually work on his machine. I tried regex101 website, by default it uses PCRE mode, is that why it works there?
Wait, 13:13, regex operates "in place". In other words, if a regex is to modify a string, the regex operates on the modified string as it's being processed? Understanding the behavior of this particular regex isn't terribly difficult, but the lecturer's wording made is sound like it behaves like the implication of the question I asked.
Actually, no this does not appear to be the behavior: ± |master ✓| → echo 'abcaabbcb' | sed -E 's/(ab|bc)//g' cab If this were the case I would expect ab to be removed as well, as once the inner 'abbc' is removed the remaining 'ab' _should_ be removed as well, correct? Could be a case of "peak behind" or something. I'd appreciate input from someone who knows regex better than myself, thanks.
Ye and anyone doing sysadmin stuff for Linux, should really learn these tools. Atleast learning: awk, sed grep(-P for perl regex) and probably cut aswell
I think so. If we had a repeating pattern of 'ab's (e.g. abababababc), the asterisk would replace all of them at once with nothing, as opposed to one 'ab' at a time.
Protip: If you want to *really* learn Regex, take some kind of parsing course. Regexes are powerful enough that you could write a parser in it (because it's a state machine, behind the curtains - though this will likely make more sense after learning how to parse) I'm a dirty Windows pleb, but I can still use regex via *visual studio code* to wrangle textual data. It's great.
A parser parses context-free grammars, which cannot be implemented using regular expressions since regex can only process a regular language (CFGs are more expressive than regular languages). We use regex in creating a token generator i.e. a lexical analyzer.
it seems Regex > SQL because you have much of the same functionality and don't have construct/manipulate crazy complicated tables so long as you structure your files correctly(regex case)???
People are commenting on how helpful this lecture was. Helpful how? If you already know the commands, syntax, options, shell substitution, etc, then you already know this stuff. If you are learning Linux (and I imagine that the students are there to learn Linux), are they really supposed to retain the blizzard of commands, brackets, braces, pipes, etc, from watching someone whip through several examples? At the end of his lecture (at 49:44), he asks "Any questions about what we've covered so far?" Not one question from anyone in the classroom. No one asking a question is not a sign that they understood. Rather, it is a sign that they are all completely lost. This instructor's goal was to show off his Linux skills; not impart knowledge to his students. Be honest. If you were in that classroom, would you have retained anything useful? Would you be able to sit at the terminal and repeat even 10% of his examples? The point of taking a class is to learn the subject matter; not to sit in awe of the instructor's wizardry. I would rather learn and retain 25% of the topics, than fumble through 100% of the topics.
@Perhaps, you are mistaken. The lecturer is not showing off his skills. He is showing what possibilities lie ahead with the tools in hand in Linux. The purpose of lecture is not to help you memorize course content, but, to show you what is available for you to explore. "Education is not the learning of facts, but the training of the mind to think," as Albert Einstein said. When you know there are tools to do these stuff, you can think of creative ways to use them in your favour
@@prasannarajaram If I were paying tens of thousands of $$ to learn a skill, then I want to learn the skill. I do not want to leave, asking myself: "What did I just sit through?" "Did I pay big $$ for that?" "Is there a next step where I will be taught to use those commands?" (it was not mentioned in the video). When I went to school, every class taught me a tangible skill. I was asked questions, because the instructor took an interest in ensuring that the pupils understood her teachings. After listening to the teacher, I worked on actual examples, so that my brain performed the actions and processed the material, so that it would stick with me. The instructor did not jump from chapter to chapter at breakneck speed. It is fine for the instructor to show some of the advanced items, and give a general introduction to what it entails. But then he needs to break down each part, and spend some time on each part; giving the students time to absorb the material. The name of this class was not "Overview of the power of Linux". This class did not expand the mind. It spun it in circles. He might as well have tossed in some python and some perl scripting. They wrangle data, too. No one needs to pay the cost of a new car to hear and watch his lecture. The same "look what you can do with Linux" content can be found in countless, free on-line videos, found all over social media.
@Perhaps Are you worried for what this students pay? Are you the controller of the faire price? Let anyone judge by themselves that. But if you are worried them there is more important good causes I'm a poor guy ok Argentina :) read the news of my country. The videos are a good first approach to the subject. thanks the people who pay that and the people who take the money and share that to all of us
@@fedemoreno613 . -- Am I worried for what students pay? I never gave it a thought, any more than I thought about how much students pay for transportation, clothing, nutrition, etc. I am puzzled by why you asked that question. Are you asking because you worry, and you want to compare your worries with me? -- Am I the controller of the faire price? I do not understand your question. -- Let anyone judge by themselves that. Judge what by themselves? Whatever they are judging is fine with me. Why would you think it (whatever "it" is) would not be fine with me? I don't understand why you would ask? I am also a poor guy in Argentina. We live in a small world.
@@NoEgg4u "fair price" i refer to the answer you gave to Prassana, What a small world!, may be we are a block from each other. it was 2am and I could not sleep and this video was useful to me and then I read a comment with the thumb down. Your comment and you said "No one needs to pay the cost of a new car to hear and watch his lecture", and i said to you "Laissez-faire"
@Bloatman McEmacs Looks like input or firacode. Inconsolata is a sans font. EDIT: So I did some digging because I was also interested in the font haha, and he says in one of his videos from his channel that it's Noto Sans Mono (Noto Mono?)
The cat command is the standard way to convert one or more file(s) into a standard input stream that then can be followed by possible multiple piped commands. If you work with piped command-lines a lot "cat input-file" is already typed automatic before you start wondering what shell I do next with it...
others keep talk about new trend data mining data science thing with python, while this guy did all these just by piping a bunch of bash commands together
You should check out Jeroen Janssens' "Data Science at the Command Line" www.datascienceatthecommandline.com/ He had a RUclips video back when he was still working on the book but I can't find it right now.
Just a ressource for learning /testing your regex knowledge: regexcrossword.com/ - I am currently stuck at the shakespearean ones because I lack the knowledge / english is a 2nd language for me ;)
I like the name 'missing semester'. It truly holds good for most of us.
Fun stuff! 30 years ago, when I was a UNIX admin, I had a book titled UNIX Powertools. I used it more often than any of my other books until the pages all fell apart. It covered a lot of things you cover in this course. I love this stuff.
if that's the case, probably I should remove that book from my wish list since it is kind of old.
This dude is a monster. His Rust streams are invaluable gems.
What’s his name ? And where can I access rust streams
@@harshteck His name is Jon Gjengset and his channel (same name) is right here on the youtubes!
@@ryanleemartin7758 thank you
Super fun to learn about such powerful pipelines that can be done just using terminal. Hats off. Best lecture I have ever watched. Goosebumps!! Love
I have a problem.
I will solve the problem with a regular expression.
Now I have two problems.
lol
lol, regular expression confuses me, too
that was nice xd xd
Very true. But the benefit is that eventually, after enough practise or ROTE learning, you know how to write the regex, and it is widely applicable to many problems.
I am incredibly grateful to this channel. It’s really changing the way I work and it has taken the joy of working in the command line to a whole new level!
Most productive way to spend my quarantine time :) , thanks a lot ❤️
This is a fantastic showcase of the different things that you can do in a cut/sed/grep/awk etc streaming situation to solve a myriad of problems as the case may warrant. Love the inclusion of R, which in combination with awk seems like it could be used to detect outliers in a file/dataset and flag is with a QA warning. Bad data is an omnipresent danger in data flows and is worse than ever since a lot of data is coming from internet sources ( urls etc ). Some of the worst 'bad' data isn't outright formatting errors or NULLS but data that is valid but not quite right -- it is out of the expected/normal range but doesn't show up in red as a 0/NULL/data omission.
this is art. what a beautifully technology.
This is amazing! I think it's not only useful for students in schools but also veterans who have been working on linux for a long time.
Man, I wish I had done CS. Thanks for making this stuff available to us for free.
These lectures are fantastic, and Jon is particularly brilliant!
Amazing lecture. Thank you. Every software engineer should know this.
This is an extremely educational video, thank you so much for sharing this lecture.
In 18:00 he wrote a long long command that is not very useful. I mean, instead, he could use the cut command, that precisely does what hes trying to do: show us just what we want.
A good example would be:
cat ssh.log | cut -d ' ' -f4
Witch:
-d option stands for "delimiter" (a pattern that separates words, for example)
- f option stands for "field" -- think a field like a column, in this example would be the 4th column
The cut command is really good when we are trying to read data with some of pattern
For instance, the /etc/passwd file on a Linux system has a ":" for the delimiter. So, the info is separated by a column character.
Obs.: Sorry about my English, just learning xD
@isaacvicente can me tell me the opening terminal used? Vim? connected to github via linux terminal?
Wow. Applause for the quality of content 👏👏👏
ed works very well. Learn ed, it is simple to learn in fact you have almost everything. Use "." to state the current line and return to command mode, "a" for after-use this to begin editing, and "w" for write-use this before "q" for quit, ed is great for building scripts. Also you need "p" print, for searching; "n" shows the line number. Quitting without writing will leave the file untouched, no matter the number of edits.
Sam is awesome also
How I wish I could see this 4 or 5 years ago, when I started learning cs.
love this lecture! I was surprised the same time when his camera capture appearing on screen
at 35:33 u can sort using the -u sort flag instead of sort | uniq ,it can also be done by sort -u | ...
really fascinating and helpful
Really helpful classes, glad that you shared it with us! 😀
One tip, if you're giving a lecture with regular expression, or even just bash/csh/tcsh shell - don't use a 'fancy' shell environment, especially one that ends in a nonstandard format of having a "|" pipe character at the end. It makes it confusing for the end-user and not necessary.
This session was gold. Thank you really.
This is a great video , in the middle of this process as part of machine learning video, thanks!
Great series of lectures, thank you
Incredible class. Thank you
the exercises for this lecture are kinda challenging for someone who is a complete beginner in regex. i know the first exercise they provided was a beginner regex tutorial, but the second question raised the roof on the difficulty.
It's MIT. The learning curve is steep.
In these examples, cat isn't necessary. You can just specify the file name as an argument to sed, and use one less process.
Very cool. I knew about command lines utils like sed and awk, but never used because I thought they are complicated. Now they make a bit more sense. Thanks.
By the way, this doesn't look like a lesson about data wrangling. More like: "how much stuff there is in shell and what can i do with it".
22:30 i saw lots of post saying that sed doesn’t support non-greedy match, neither GNU nor BSD. Correct me if I’wrong, I think that non-greedy match won’t actually work on his machine. I tried regex101 website, by default it uses PCRE mode, is that why it works there?
this man is not a man but a god
Hey!! Super amazing topic, and super amazing session, thanks!! ;)
Awesome lecturer!
Wait, 13:13, regex operates "in place". In other words, if a regex is to modify a string, the regex operates on the modified string as it's being processed? Understanding the behavior of this particular regex isn't terribly difficult, but the lecturer's wording made is sound like it behaves like the implication of the question I asked.
Actually, no this does not appear to be the behavior:
± |master ✓| → echo 'abcaabbcb' | sed -E 's/(ab|bc)//g'
cab
If this were the case I would expect ab to be removed as well, as once the inner 'abbc' is removed the remaining 'ab' _should_ be removed as well, correct? Could be a case of "peak behind" or something. I'd appreciate input from someone who knows regex better than myself, thanks.
So this is why people go to MIT...
A lot of Linux users excluding ubuntu know these tools.
Ye and anyone doing sysadmin stuff for Linux, should really learn these tools. Atleast learning: awk, sed grep(-P for perl regex) and probably cut aswell
@@ezio934 Ubuntu trolls are everywhere 😂, but after using arch I also love arch. But please don't troll.
At [12:00] would it be the same result if we wrote: echo 'abcaba' | sed "s/ab//g"?
I think so. If we had a repeating pattern of 'ab's (e.g. abababababc), the asterisk would replace all of them at once with nothing, as opposed to one 'ab' at a time.
very wonderful lectures
Protip: If you want to *really* learn Regex, take some kind of parsing course.
Regexes are powerful enough that you could write a parser in it (because it's a state machine, behind the curtains - though this will likely make more sense after learning how to parse)
I'm a dirty Windows pleb, but I can still use regex via *visual studio code* to wrangle textual data. It's great.
Regexes correspond to finite state machines and cannot parse recursive structures, you need some kind of grammar to do that (and tool like GNU Bison)
A parser parses context-free grammars, which cannot be implemented using regular expressions since regex can only process a regular language (CFGs are more expressive than regular languages). We use regex in creating a token generator i.e. a lexical analyzer.
At my school they do this sort of stiff in a lab session for a semester.
46:13 to 49:08 literally freaks me out!
Thats actually very common for Linux users.
Nice lecture! I miss the professor repeating the question asked. It's hard to hear.
Yeah, I too was flattered that he cares for us internet consumers
mit quality, hugh?
how do we give syntax highlight to the terminal? their terminal looks awesome
Tnahks for the lecture. You can tell what is the plugin that colorized the arguments of command line ?
github.com/zsh-users/zsh-syntax-highlighting
Great lecture!
04:24 I don't know why there is no one of the audience applaud saying WOW YEAH as like they do when they attend apple confs!
you are stupid?... IDK
Can anyone tell me how his commands show up in red untill he completes them??
The regex at 27:10 scared the ****** out of me
Holy fuck. That completely blew me away. My god.
Thank you so much !! inspiring ❤
Gold!
it seems Regex > SQL because you have much of the same functionality and don't have construct/manipulate crazy complicated tables so long as you structure your files correctly(regex case)???
Thank you very much.
which terminal you are using? zsh, fish? where do we look for to get exact terminal settings like you?
I think he said in the lecture that he was using fish
thank you for sharing nice lecture
People are commenting on how helpful this lecture was. Helpful how?
If you already know the commands, syntax, options, shell substitution, etc, then you already know this stuff.
If you are learning Linux (and I imagine that the students are there to learn Linux), are they really supposed to retain the blizzard of commands, brackets, braces, pipes, etc, from watching someone whip through several examples?
At the end of his lecture (at 49:44), he asks "Any questions about what we've covered so far?"
Not one question from anyone in the classroom.
No one asking a question is not a sign that they understood. Rather, it is a sign that they are all completely lost.
This instructor's goal was to show off his Linux skills; not impart knowledge to his students.
Be honest. If you were in that classroom, would you have retained anything useful?
Would you be able to sit at the terminal and repeat even 10% of his examples?
The point of taking a class is to learn the subject matter; not to sit in awe of the instructor's wizardry.
I would rather learn and retain 25% of the topics, than fumble through 100% of the topics.
@Perhaps, you are mistaken. The lecturer is not showing off his skills. He is showing what possibilities lie ahead with the tools in hand in Linux. The purpose of lecture is not to help you memorize course content, but, to show you what is available for you to explore.
"Education is not the learning of facts, but the training of the mind to think," as Albert Einstein said.
When you know there are tools to do these stuff, you can think of creative ways to use them in your favour
@@prasannarajaram If I were paying tens of thousands of $$ to learn a skill, then I want to learn the skill. I do not want to leave, asking myself:
"What did I just sit through?"
"Did I pay big $$ for that?"
"Is there a next step where I will be taught to use those commands?" (it was not mentioned in the video).
When I went to school, every class taught me a tangible skill. I was asked questions, because the instructor took an interest in ensuring that the pupils understood her teachings. After listening to the teacher, I worked on actual examples, so that my brain performed the actions and processed the material, so that it would stick with me. The instructor did not jump from chapter to chapter at breakneck speed.
It is fine for the instructor to show some of the advanced items, and give a general introduction to what it entails. But then he needs to break down each part, and spend some time on each part; giving the students time to absorb the material.
The name of this class was not "Overview of the power of Linux".
This class did not expand the mind. It spun it in circles. He might as well have tossed in some python and some perl scripting. They wrangle data, too.
No one needs to pay the cost of a new car to hear and watch his lecture.
The same "look what you can do with Linux" content can be found in countless, free on-line videos, found all over social media.
@Perhaps Are you worried for what this students pay? Are you the controller of the faire price? Let anyone judge by themselves that. But if you are worried them there is more important good causes I'm a poor guy ok Argentina :) read the news of my country. The videos are a good first approach to the subject. thanks the people who pay that and the people who take the money and share that to all of us
@@fedemoreno613 .
-- Am I worried for what students pay?
I never gave it a thought, any more than I thought about how much students pay for transportation, clothing, nutrition, etc. I am puzzled by why you asked that question.
Are you asking because you worry, and you want to compare your worries with me?
-- Am I the controller of the faire price?
I do not understand your question.
-- Let anyone judge by themselves that.
Judge what by themselves?
Whatever they are judging is fine with me. Why would you think it (whatever "it" is) would not be fine with me? I don't understand why you would ask?
I am also a poor guy in Argentina.
We live in a small world.
@@NoEgg4u "fair price" i refer to the answer you gave to Prassana, What a small world!, may be we are a block from each other. it was 2am and I could not sleep and this video was useful to me and then I read a comment with the thumb down. Your comment and you said "No one needs to pay the cost of a new car to hear and watch his lecture", and i said to you "Laissez-faire"
what font is that? I really like it!
@Bloatman McEmacs Looks like input or firacode. Inconsolata is a sans font. EDIT: So I did some digging because I was also interested in the font haha, and he says in one of his videos from his channel that it's Noto Sans Mono (Noto Mono?)
In his dotfiles, the alacritty config, it says it is Noto Sans Mono. github.com/jonhoo/configs/blob/master/gui/.config/alacritty/alacritty.yml
Computers are beautiful
I'm trying the exercises at the moment and I can't get it working (been trying for some hours now). Are there solutions anywhere? I'm on #2
I figured out a RegEx to find the words that are matching the criteria: (.*(a|A).*){3}.*[^s]$ but how to I search for these specific words?
I noticed you have a user called "kodi" 29:07
cat ssh.log | less?? Why not less ssh.log?
The cat command is the standard way to convert one or more file(s) into a standard input stream that then can be followed by possible multiple piped commands.
If you work with piped command-lines a lot "cat input-file" is already typed automatic before you start wondering what shell I do next with it...
Daksh is right, but just as ekkotron said and I agree: force of habit.
Its meant for normies. Most intermediate Linux users know this.
it just like first time you learn how to declare var. int a; a = 10;
Sweet Jesus! Cat piped to a sed regex piped to sort piped to uniq piped to awk.... PIPED TO R!
Cool!
I'd have clicked Like 100 times if I can
others keep talk about new trend data mining data science thing with python, while this guy did all these just by piping a bunch of bash commands together
You should check out Jeroen Janssens' "Data Science at the Command Line" www.datascienceatthecommandline.com/ He had a RUclips video back when he was still working on the book but I can't find it right now.
@@user-sd2en6pn3z thanks for sharing...this is gold.
Basically, this is sysadmin 101.
Can't you just import the huge file into Python and use a normal programming language and editor rather than typing it into the cmd line?
The tools in Linux are extraordinarily powerful. Don’t let the command line simplicity deceive you.
Weird, I don't see the option -E in the manual for sed.
There are different versions of sed implementations. the default on Mac does have -E option.
Things that my school didn't teach me
Just a ressource for learning /testing your regex knowledge: regexcrossword.com/ - I am currently stuck at the shakespearean ones because I lack the knowledge / english is a 2nd language for me ;)
lots of cat abuse
Amazing lecture. Thank you.