aside from "AHA! moments" there are "hell yeah!" moments when you manage to find a clear and short educational material that covers core conceptual model, so that you wouldn't have to read docs to figure out the gist of it! I love docs, but life is too short to read them all) Thanks a lot! You've saved two days of my life! Multiply that by the number of views ~13k and you've got almost one human life saved!) Great job!
Man we need more of these intermediate in-depth tutorials. So well made. Every other video tutorial just talks about basic git stuff branches, commits, etc.
This is so well explained! I got a deeper appreciation of how cleverly designed git is. I never knew each time I do git commit it tracks every single file; imagine a project with hundreds of files!
wow, so glad I watched this. This 30min gave me so much insights of git design (hands-on way!!) and it made git usage instantaneously easier. THANK YOU!
awesome video so far, but I believe around minute 19:20 when you say "this commit points to 2 files" you are actually talking about the new tree pointing to foo.txt and bar.txt
Brilliant demo of the internals of git! thanks for the information...helps with a good foundation knowledge to understand git commands..thank you David for helping understand.
Very interesting tutorial on how the .git folder is impacted by some basic git commands internally, for me it's as important for a git user than knowing how a cpu works for a developer :) Thank you David for this introduction, it will for sure allow me to understand and research about even deeper plumbing of git in the future :)
So lets say if in my git repo in i would make a change in every file each time i do a commit then git would have to allocate space on disc equal to how much space repo does take (more or less) and multiply it by number of commits? Also great video!
Great video. Is there a software tool to visualize ALL the sha references? I know you can see the commit DAG, but I want to see the commit+tree+blob DAG, like what was visualized at @10:37
Like are there any pyviz like tools that can automatically generate the clean, beautiful diagram you made at @10:37 but for and Git repo? It’s clear you could make the diagram by crawling through the objects file but that seems frustrating and unnecessary
Beautiful. Looking at the hashes and the contents of each file referring to (or named after those) hashes you begin to gain understanding... of just what the eff they have got inside, what they are supposed to be, and what they are connected to. So explicit that pleases and hurts. Kudos for teaching us the most important git command (and how to refer to each of them by the first 6 digits of the hash.): $ git cat-file -t dc23ab / git cat-file -p dc23ab
I'm a little late to the party but: If this is "the information manager from hell" I can't imagine how the others are..... 🙂 Great job on explaining the object structure and how git does things in a practical and simple manner. Thank You Ver Much!
What I still don't understand is how git is able to change the file system view. When you cd into a directory, you'll see the files/dirs of that commit reference. When you go to a different git branch, the file system view changes along with it. ext4 and other file systems also use references (inodes) to display files. Is git just a sort of overlay file system? If so, how would that work with so many different file systems and operating systems?
It really doesn't make any special changes to the file system itself - it's not FUSE. When you git checkout a branch, it first changes the HEAD to be the new hash or the new branch. Once the HEAD is sorted, it'll go ahead and figure out what changes it needs to make to your worktree. After that, it's a matter of changing all the files in the directory, which can be done without knowledge of the filesystem.
In your Git model PDF you showed a single commit having multiple trees following parent-child pattern. However, practically when you showed the contents of a commit it just held reference to one single tree + one single tree only held reference to one/multiple blobs. What are we missing ?
tree objects are flat lists of references. References to both blobs and other trees. If the db needs to represent a sub-directory in your repo you would see a tree reference inside a tree. The example did not have any files in sub-directories.
Outstanding presentation of the fundamental git add/commit semantics. Do you have a overview video on fetch/merge/pull? Suggestion: you mention branches in the intro and towards the end of the presentation you give a simple example of creating a new branch (named foobranch). It would be useful to point out the effect of the first commit of foobranch to HEAD and the pointer values of foobranch and master both before and after the commit. That all begs for a second excellent video on branching. The subtleties of file timestamps are also interesting to comment on. Hint, Git does not record file timestamps. BYW I like the sound of your keyboard. (-:
Thank you! [bibs on the wall=trail?] I'll google some more and try it for myself but it would be great to see an example where you rename a file. I think in those cases, we lose the history of the changes to a file? i.e: this would be considered "deleted a file" + "added a file". I'm not sure I understand Linus' philosophy on that one... That's where I'm still confused... Just starting up on git; our team uses SVN and, migrating to git, many are worried that git "loses commit history" when refactoring/renaming files. [ignore svn2git issues, I'm talking about a project already in git]
12:57 this is how the video should start Ave then the slide show after or in between the code and slides The slides are eye glazing material without the code
I read that sha1 is a cryptographic algo, for which encrypting is easy, but encrypting from encrypted is nearly impossible. If git hashes all contents to sha1 hashes, how does it decrypts the contents so fast and correctly while say changing branch ?
Git doesn’t store the objects in an encrypted form. They are compressed and the hash is used as a name in a phone book to look them up. You can’t decrypt a sha1 hash into its original input you can however find another input that will give the same hash. This is called a collision. Password cracking is looking for variants on known passwords and the dictionaries of various languages to find words that have matching hashes. The longer and more random a password is the more attempts are required to find a match.
1. He's using screen command. You need to install it by "apt install screen". 2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
What is the purpose of local repository? Let's take a case that we have central repository and our workspace only. What difference will local repository make?
You might want to check out Linus Torvald's talk on Git, particularly where he emphasizes the distributed nature of Git, and how it's superior to centralized version control systems like SVN etc.
This would be amazing if it would've been correct since you say trees have the complete current state as blobs, but at 11:05 you show trees pointing to other trees.
9:50 "The contents of these files is *encrypted"* That's not the term you were looking for. Git storing objects in a compressed, application specific format is not a form of encryptiom.
It should be like that right? "Git stores the content in a compressed format and creates the SHA1 based on the compressed contents meaning compressed data are the parameter to produce the sha1 keys"?
@@abdulmatin3208 SHA1 is not a cypher. It doesn't encrypt stuff. SHA-1 is a cryptographic hash function. A hash function maps data of a variable size to values of a fixed size (hashes) - 20 bytes long in the case of SHA-1. The "cryptographic" part means that it's considered to have a set of properties that makes it useful in cryptographic applications. Perhaps the most important property being that it's very hard to find hash collisions, i.e. two inputs that produce the same output. Git uses SHA-1 as the hash function of its *content addressable storage* system. Content addressable storage is a way to organize stored data where the identifier for a piece of data (a git object in our case) is derived from the data itself. To summarize: git objects are stored is a simple, unencrypted format that happens to use compression to save space. SHA-1 is only used to *identify* the objects in git's content addressable storage system.
1. He's using screen command. You need to install it by "apt install screen". 2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
Hey man, I switched back to gnome-terminal as terminator was showing strange characters when using. Will give tmux a try, thanks. Edit: Also, I found gnome-terminal to be faster than terminator, eg when I cd to a big git repo, terminator takes more time to load.
aside from "AHA! moments" there are "hell yeah!" moments when you manage to find a clear and short educational material that covers core conceptual model, so that you wouldn't have to read docs to figure out the gist of it! I love docs, but life is too short to read them all) Thanks a lot! You've saved two days of my life! Multiply that by the number of views ~13k and you've got almost one human life saved!) Great job!
I cannot stress enough how wonderfully things were explained. Cheers man!
Man we need more of these intermediate in-depth tutorials. So well made.
Every other video tutorial just talks about basic git stuff branches, commits, etc.
This is so well explained! I got a deeper appreciation of how cleverly designed git is.
I never knew each time I do git commit it tracks every single file; imagine a project with hundreds of files!
What did that poor keyboard ever do to you? :)
Wonderful tutorial! But the Whac-A-Mole game with the keyboard is very loud.
This is the best ever low level internals explanation on how git works (like Ben Eater). A gift for us low level fans
Beyond helpful because of its level of detail and precision.
Thank you sir for this video! Your efforts into making a simplified free video is much appreciated by us all curious learners!!
Great video! Explains in detail how Git works under the hood. The last two examples were an icing on the cake. Thanks for putting this together!
wow, so glad I watched this. This 30min gave me so much insights of git design (hands-on way!!) and it made git usage instantaneously easier. THANK YOU!
I send all my new devs this video! SO key in understanding how to use git.
awesome video so far, but I believe around minute 19:20 when you say "this commit points to 2 files" you are actually talking about the new tree pointing to foo.txt and bar.txt
Great video. Exactly what I was looking for! Thank you. Every so called "weird/scratch your head" moment in git makes sense now.
Brilliant demo of the internals of git! thanks for the information...helps with a good foundation knowledge to understand git commands..thank you David for helping understand.
Thanks for the video. Really accessible and useful
Nice and crisp explanation! It was good
I never feared the SHA, but now I understand the SHA. Thank you
Very interesting tutorial on how the .git folder is impacted by some basic git commands internally, for me it's as important for a git user than knowing how a cpu works for a developer :) Thank you David for this introduction, it will for sure allow me to understand and research about even deeper plumbing of git in the future :)
I wasn't afraid of the shas before but this took it to a different level. Thank you.
Hey Thomas, you made my day. Thank you so much .
This is CRAZY. Thank you so much!
Awesome explanation!
This took me an hour. 30 minute watching git internals video, 30 minute writing python function handling creating/deleting folder with 100 files :D
So lets say if in my git repo in i would make a change in every file each time i do a commit then git would have to allocate space on disc equal to how much space repo does take (more or less) and multiply it by number of commits?
Also great video!
Thank you! You explained what happens in git internally in a very easy and clear way!
Great video. Is there a software tool to visualize ALL the sha references?
I know you can see the commit DAG, but I want to see the commit+tree+blob DAG, like what was visualized at @10:37
Like are there any pyviz like tools that can automatically generate the clean, beautiful diagram you made at @10:37 but for and Git repo?
It’s clear you could make the diagram by crawling through the objects file but that seems frustrating and unnecessary
Beautiful. Looking at the hashes and the contents of each file referring to (or named after those) hashes you begin to gain understanding... of just what the eff they have got inside, what they are supposed to be, and what they are connected to. So explicit that pleases and hurts. Kudos for teaching us the most important git command (and how to refer to each of them by the first 6 digits of the hash.): $ git cat-file -t dc23ab / git cat-file -p dc23ab
Really clearly explained and well presented - thanks for creating this!
Awesome video!!!!! It is exactly what I was looking for.
This was very helpful in understand ing git.. Thanks
Crazy video amazing explanation 🤩
I'm not even a developer, just an infrastructure guy, but this was an excellent explanation.
Amazing. Cleared a lot of things for me.
Wonderful explanation. That's what I was looking for. Thanks man.
How come git add results in adding blob to the objects directory? Shouldn't it be done after running git commit command?
Goes deep on just one thing ..what's behind the 40 char SHA and makes it easy!! Thanks much!! 😊
Awesome talk, David! Thank you!
Thanks much.! Really helps a lot to start understand internals and data structure behind it.
AHA moment worked, thank you!!
Fantastic video!
Fantastic tutorial! Now it's clear to me when we say Git doesn't store diffs.
It kinda does in pack files, but only as a space-saving optimization, not as a part of its core model.
Thank you for this. Clearly understood the basic internal workings of git.
So just to be clear there are duplications in blobs every time we change a file?
Amazing video! RIP keyboard
I'm a little late to the party but:
If this is "the information manager from hell" I can't imagine how the others are..... 🙂
Great job on explaining the object structure and how git does things in a practical and simple manner. Thank You Ver Much!
amazing explanation
What I still don't understand is how git is able to change the file system view. When you cd into a directory, you'll see the files/dirs of that commit reference. When you go to a different git branch, the file system view changes along with it. ext4 and other file systems also use references (inodes) to display files. Is git just a sort of overlay file system? If so, how would that work with so many different file systems and operating systems?
It really doesn't make any special changes to the file system itself - it's not FUSE. When you git checkout a branch, it first changes the HEAD to be the new hash or the new branch. Once the HEAD is sorted, it'll go ahead and figure out what changes it needs to make to your worktree. After that, it's a matter of changing all the files in the directory, which can be done without knowledge of the filesystem.
Fantastic content. Learnt a lot from this.
Awesome video, loved the exercise, I learned a lot from it, thank you!
Great content! Thanks for sharing!
Great explanation,Thanks a lot🌺
Excellent presentation.
In your Git model PDF you showed a single commit having multiple trees following parent-child pattern. However, practically when you showed the contents of a commit it just held reference to one single tree + one single tree only held reference to one/multiple blobs. What are we missing ?
tree objects are flat lists of references. References to both blobs and other trees. If the db needs to represent a sub-directory in your repo you would see a tree reference inside a tree. The example did not have any files in sub-directories.
Does anyone know what music was used for the intro?
So many aha moments! And so well explained!
Thanks for this excellent video. Awesome the last surprise part
This video is amazing! Thank you so much.
Great video!
Thank you very much for this video!
Awesome lesson!!
Outstanding presentation of the fundamental git add/commit semantics. Do you have a overview video on fetch/merge/pull?
Suggestion: you mention branches in the intro and towards the end of the presentation you give a simple example of creating a new branch (named foobranch). It would be useful to point out the effect of the first commit of foobranch to HEAD and the pointer values of foobranch and master both before and after the commit. That all begs for a second excellent video on branching.
The subtleties of file timestamps are also interesting to comment on. Hint, Git does not record file timestamps.
BYW I like the sound of your keyboard. (-:
Thank you! [bibs on the wall=trail?] I'll google some more and try it for myself but it would be great to see an example where you rename a file. I think in those cases, we lose the history of the changes to a file? i.e: this would be considered "deleted a file" + "added a file". I'm not sure I understand Linus' philosophy on that one... That's where I'm still confused... Just starting up on git; our team uses SVN and, migrating to git, many are worried that git "loses commit history" when refactoring/renaming files. [ignore svn2git issues, I'm talking about a project already in git]
Such a good video. Thanks ton man!
Amazing video .. cheers !!
Is git also making a chain of hashes, like a blockchain?
12:57 this is how the video should start Ave then the slide show after or in between the code and slides
The slides are eye glazing material without the code
Thanks a lot. Very informative video.
Awesome. Really useful
Excellent !!
Thx a lot, glorious tutorial!
Thank you , all of git commands are playing with tree, blob, commit object :) thanks for nice explaination
This is what I need!
Very nice! Thanks.
Awesome video
This is gold
is there any alternative to watch command on windows and mac os x?
very informative . thank you
nice way of explanation.
Confused why he kept re-adding files (git add ....). In all my git experience I add it the first time and never have to add again.
Thank you for this video
I read that sha1 is a cryptographic algo, for which encrypting is easy, but encrypting from encrypted is nearly impossible. If git hashes all contents to sha1 hashes, how does it decrypts the contents so fast and correctly while say changing branch ?
Git doesn’t store the objects in an encrypted form. They are compressed and the hash is used as a name in a phone book to look them up.
You can’t decrypt a sha1 hash into its original input you can however find another input that will give the same hash. This is called a collision. Password cracking is looking for variants on known passwords and the dictionaries of various languages to find words that have matching hashes. The longer and more random a password is the more attempts are required to find a match.
Start at 3:40.
What utility did you use to split the shell?
1. He's using screen command. You need to install it by "apt install screen".
2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
What is the purpose of local repository?
Let's take a case that we have central repository and our workspace only. What difference will local repository make?
You might want to check out Linus Torvald's talk on Git, particularly where he emphasizes the distributed nature of Git, and how it's superior to centralized version control systems like SVN etc.
Thank you!
How to view the content of blob if it's not text file? Just say it is image of zip file or something.
Too good to be true!!
This would be amazing if it would've been correct since you say trees have the complete current state as blobs, but at 11:05 you show trees pointing to other trees.
9:50 "The contents of these files is *encrypted"*
That's not the term you were looking for.
Git storing objects in a compressed, application specific format is not a form of encryptiom.
It should be like that right? "Git stores the content in a compressed format and creates the SHA1 based on the compressed contents meaning compressed data are the parameter to produce the sha1 keys"?
@@abdulmatin3208 SHA1 is not a cypher. It doesn't encrypt stuff.
SHA-1 is a cryptographic hash function.
A hash function maps data of a variable size to values of a fixed size (hashes) - 20 bytes long in the case of SHA-1.
The "cryptographic" part means that it's considered to have a set of properties that makes it useful in cryptographic applications.
Perhaps the most important property being that it's very hard to find hash collisions, i.e. two inputs that produce the same output.
Git uses SHA-1 as the hash function of its *content addressable storage* system.
Content addressable storage is a way to organize stored data where the identifier for a piece of data (a git object in our case) is derived from the data itself.
To summarize: git objects are stored is a simple, unencrypted format that happens to use compression to save space. SHA-1 is only used to *identify* the objects in git's content addressable storage system.
brilliant!
The famous Butterfly keyboard it is.
Thanks sir 🙏🏼
Thank you
It's crazy I really thought that internally git would store diffs.
stupendous
why you so angry at your keyboard man! For real tho it's distracting :) good video!
20:45
Doug DeMuro This is the new Bmw x5 ......
Voila !
What terminal is he using? Doesn't look like gnome-terminal
1. He's using screen command. You need to install it by "apt install screen".
2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
@@AliAnwarwish thanks.
@@shivanshhanda7553 Hey bro. You can install tmux too. I'm using it rn.
Hey man, I switched back to gnome-terminal as terminator was showing strange characters when using. Will give tmux a try, thanks.
Edit: Also, I found gnome-terminal to be faster than terminator, eg when I cd to a big git repo, terminator takes more time to load.
@@shivanshhanda7553 Yes I do agree. But tmux is better at session management too. Yeah nice, worth a try.
It's "working copy" not working directory