aside from "AHA! moments" there are "hell yeah!" moments when you manage to find a clear and short educational material that covers core conceptual model, so that you wouldn't have to read docs to figure out the gist of it! I love docs, but life is too short to read them all) Thanks a lot! You've saved two days of my life! Multiply that by the number of views ~13k and you've got almost one human life saved!) Great job!
Man we need more of these intermediate in-depth tutorials. So well made. Every other video tutorial just talks about basic git stuff branches, commits, etc.
This is so well explained! I got a deeper appreciation of how cleverly designed git is. I never knew each time I do git commit it tracks every single file; imagine a project with hundreds of files!
wow, so glad I watched this. This 30min gave me so much insights of git design (hands-on way!!) and it made git usage instantaneously easier. THANK YOU!
Very interesting tutorial on how the .git folder is impacted by some basic git commands internally, for me it's as important for a git user than knowing how a cpu works for a developer :) Thank you David for this introduction, it will for sure allow me to understand and research about even deeper plumbing of git in the future :)
Brilliant demo of the internals of git! thanks for the information...helps with a good foundation knowledge to understand git commands..thank you David for helping understand.
awesome video so far, but I believe around minute 19:20 when you say "this commit points to 2 files" you are actually talking about the new tree pointing to foo.txt and bar.txt
I'm a little late to the party but: If this is "the information manager from hell" I can't imagine how the others are..... 🙂 Great job on explaining the object structure and how git does things in a practical and simple manner. Thank You Ver Much!
Beautiful. Looking at the hashes and the contents of each file referring to (or named after those) hashes you begin to gain understanding... of just what the eff they have got inside, what they are supposed to be, and what they are connected to. So explicit that pleases and hurts. Kudos for teaching us the most important git command (and how to refer to each of them by the first 6 digits of the hash.): $ git cat-file -t dc23ab / git cat-file -p dc23ab
What I still don't understand is how git is able to change the file system view. When you cd into a directory, you'll see the files/dirs of that commit reference. When you go to a different git branch, the file system view changes along with it. ext4 and other file systems also use references (inodes) to display files. Is git just a sort of overlay file system? If so, how would that work with so many different file systems and operating systems?
It really doesn't make any special changes to the file system itself - it's not FUSE. When you git checkout a branch, it first changes the HEAD to be the new hash or the new branch. Once the HEAD is sorted, it'll go ahead and figure out what changes it needs to make to your worktree. After that, it's a matter of changing all the files in the directory, which can be done without knowledge of the filesystem.
Outstanding presentation of the fundamental git add/commit semantics. Do you have a overview video on fetch/merge/pull? Suggestion: you mention branches in the intro and towards the end of the presentation you give a simple example of creating a new branch (named foobranch). It would be useful to point out the effect of the first commit of foobranch to HEAD and the pointer values of foobranch and master both before and after the commit. That all begs for a second excellent video on branching. The subtleties of file timestamps are also interesting to comment on. Hint, Git does not record file timestamps. BYW I like the sound of your keyboard. (-:
Great video. Is there a software tool to visualize ALL the sha references? I know you can see the commit DAG, but I want to see the commit+tree+blob DAG, like what was visualized at @10:37
Like are there any pyviz like tools that can automatically generate the clean, beautiful diagram you made at @10:37 but for and Git repo? It’s clear you could make the diagram by crawling through the objects file but that seems frustrating and unnecessary
So lets say if in my git repo in i would make a change in every file each time i do a commit then git would have to allocate space on disc equal to how much space repo does take (more or less) and multiply it by number of commits? Also great video!
9:50 "The contents of these files is *encrypted"* That's not the term you were looking for. Git storing objects in a compressed, application specific format is not a form of encryptiom.
It should be like that right? "Git stores the content in a compressed format and creates the SHA1 based on the compressed contents meaning compressed data are the parameter to produce the sha1 keys"?
@@abdulmatin3208 SHA1 is not a cypher. It doesn't encrypt stuff. SHA-1 is a cryptographic hash function. A hash function maps data of a variable size to values of a fixed size (hashes) - 20 bytes long in the case of SHA-1. The "cryptographic" part means that it's considered to have a set of properties that makes it useful in cryptographic applications. Perhaps the most important property being that it's very hard to find hash collisions, i.e. two inputs that produce the same output. Git uses SHA-1 as the hash function of its *content addressable storage* system. Content addressable storage is a way to organize stored data where the identifier for a piece of data (a git object in our case) is derived from the data itself. To summarize: git objects are stored is a simple, unencrypted format that happens to use compression to save space. SHA-1 is only used to *identify* the objects in git's content addressable storage system.
12:57 this is how the video should start Ave then the slide show after or in between the code and slides The slides are eye glazing material without the code
In your Git model PDF you showed a single commit having multiple trees following parent-child pattern. However, practically when you showed the contents of a commit it just held reference to one single tree + one single tree only held reference to one/multiple blobs. What are we missing ?
tree objects are flat lists of references. References to both blobs and other trees. If the db needs to represent a sub-directory in your repo you would see a tree reference inside a tree. The example did not have any files in sub-directories.
Thank you! [bibs on the wall=trail?] I'll google some more and try it for myself but it would be great to see an example where you rename a file. I think in those cases, we lose the history of the changes to a file? i.e: this would be considered "deleted a file" + "added a file". I'm not sure I understand Linus' philosophy on that one... That's where I'm still confused... Just starting up on git; our team uses SVN and, migrating to git, many are worried that git "loses commit history" when refactoring/renaming files. [ignore svn2git issues, I'm talking about a project already in git]
I read that sha1 is a cryptographic algo, for which encrypting is easy, but encrypting from encrypted is nearly impossible. If git hashes all contents to sha1 hashes, how does it decrypts the contents so fast and correctly while say changing branch ?
Git doesn’t store the objects in an encrypted form. They are compressed and the hash is used as a name in a phone book to look them up. You can’t decrypt a sha1 hash into its original input you can however find another input that will give the same hash. This is called a collision. Password cracking is looking for variants on known passwords and the dictionaries of various languages to find words that have matching hashes. The longer and more random a password is the more attempts are required to find a match.
What is the purpose of local repository? Let's take a case that we have central repository and our workspace only. What difference will local repository make?
You might want to check out Linus Torvald's talk on Git, particularly where he emphasizes the distributed nature of Git, and how it's superior to centralized version control systems like SVN etc.
1. He's using screen command. You need to install it by "apt install screen". 2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
This would be amazing if it would've been correct since you say trees have the complete current state as blobs, but at 11:05 you show trees pointing to other trees.
1. He's using screen command. You need to install it by "apt install screen". 2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
Hey man, I switched back to gnome-terminal as terminator was showing strange characters when using. Will give tmux a try, thanks. Edit: Also, I found gnome-terminal to be faster than terminator, eg when I cd to a big git repo, terminator takes more time to load.
Saying that git is a DAG sounds like a challenge to researchers working on intentional cryptographic hash collisions. It's all fun and games until someone finds a commit that has itself as a parent. :)
Replacing sha1 with a different hash algorithm wouldn’t be difficult. You can use the bit length of the hashes to distinguish them. Compatibility might be a pain temporarily. I am not sure if the known attacks on sha1 allow constructing arbitrary input that collides.
aside from "AHA! moments" there are "hell yeah!" moments when you manage to find a clear and short educational material that covers core conceptual model, so that you wouldn't have to read docs to figure out the gist of it! I love docs, but life is too short to read them all) Thanks a lot! You've saved two days of my life! Multiply that by the number of views ~13k and you've got almost one human life saved!) Great job!
Man we need more of these intermediate in-depth tutorials. So well made.
Every other video tutorial just talks about basic git stuff branches, commits, etc.
I cannot stress enough how wonderfully things were explained. Cheers man!
This is the best ever low level internals explanation on how git works (like Ben Eater). A gift for us low level fans
This is so well explained! I got a deeper appreciation of how cleverly designed git is.
I never knew each time I do git commit it tracks every single file; imagine a project with hundreds of files!
What did that poor keyboard ever do to you? :)
Wonderful tutorial! But the Whac-A-Mole game with the keyboard is very loud.
Great video! Explains in detail how Git works under the hood. The last two examples were an icing on the cake. Thanks for putting this together!
Beyond helpful because of its level of detail and precision.
wow, so glad I watched this. This 30min gave me so much insights of git design (hands-on way!!) and it made git usage instantaneously easier. THANK YOU!
Thank you sir for this video! Your efforts into making a simplified free video is much appreciated by us all curious learners!!
I never feared the SHA, but now I understand the SHA. Thank you
Great video. Exactly what I was looking for! Thank you. Every so called "weird/scratch your head" moment in git makes sense now.
I wasn't afraid of the shas before but this took it to a different level. Thank you.
I'm not even a developer, just an infrastructure guy, but this was an excellent explanation.
Very interesting tutorial on how the .git folder is impacted by some basic git commands internally, for me it's as important for a git user than knowing how a cpu works for a developer :) Thank you David for this introduction, it will for sure allow me to understand and research about even deeper plumbing of git in the future :)
I send all my new devs this video! SO key in understanding how to use git.
Brilliant demo of the internals of git! thanks for the information...helps with a good foundation knowledge to understand git commands..thank you David for helping understand.
awesome video so far, but I believe around minute 19:20 when you say "this commit points to 2 files" you are actually talking about the new tree pointing to foo.txt and bar.txt
Thank you! You explained what happens in git internally in a very easy and clear way!
I'm a little late to the party but:
If this is "the information manager from hell" I can't imagine how the others are..... 🙂
Great job on explaining the object structure and how git does things in a practical and simple manner. Thank You Ver Much!
Fantastic tutorial! Now it's clear to me when we say Git doesn't store diffs.
It kinda does in pack files, but only as a space-saving optimization, not as a part of its core model.
Really clearly explained and well presented - thanks for creating this!
Amazing. Cleared a lot of things for me.
This took me an hour. 30 minute watching git internals video, 30 minute writing python function handling creating/deleting folder with 100 files :D
Hey Thomas, you made my day. Thank you so much .
Amazing video! RIP keyboard
Thanks much.! Really helps a lot to start understand internals and data structure behind it.
Wonderful explanation. That's what I was looking for. Thanks man.
Thank you for this. Clearly understood the basic internal workings of git.
Awesome talk, David! Thank you!
Beautiful. Looking at the hashes and the contents of each file referring to (or named after those) hashes you begin to gain understanding... of just what the eff they have got inside, what they are supposed to be, and what they are connected to. So explicit that pleases and hurts. Kudos for teaching us the most important git command (and how to refer to each of them by the first 6 digits of the hash.): $ git cat-file -t dc23ab / git cat-file -p dc23ab
Nice and crisp explanation! It was good
Awesome video!!!!! It is exactly what I was looking for.
Thanks for this excellent video. Awesome the last surprise part
Goes deep on just one thing ..what's behind the 40 char SHA and makes it easy!! Thanks much!! 😊
So many aha moments! And so well explained!
This is CRAZY. Thank you so much!
Awesome video, loved the exercise, I learned a lot from it, thank you!
What I still don't understand is how git is able to change the file system view. When you cd into a directory, you'll see the files/dirs of that commit reference. When you go to a different git branch, the file system view changes along with it. ext4 and other file systems also use references (inodes) to display files. Is git just a sort of overlay file system? If so, how would that work with so many different file systems and operating systems?
It really doesn't make any special changes to the file system itself - it's not FUSE. When you git checkout a branch, it first changes the HEAD to be the new hash or the new branch. Once the HEAD is sorted, it'll go ahead and figure out what changes it needs to make to your worktree. After that, it's a matter of changing all the files in the directory, which can be done without knowledge of the filesystem.
Great explanation,Thanks a lot🌺
Outstanding presentation of the fundamental git add/commit semantics. Do you have a overview video on fetch/merge/pull?
Suggestion: you mention branches in the intro and towards the end of the presentation you give a simple example of creating a new branch (named foobranch). It would be useful to point out the effect of the first commit of foobranch to HEAD and the pointer values of foobranch and master both before and after the commit. That all begs for a second excellent video on branching.
The subtleties of file timestamps are also interesting to comment on. Hint, Git does not record file timestamps.
BYW I like the sound of your keyboard. (-:
Thank you , all of git commands are playing with tree, blob, commit object :) thanks for nice explaination
Excellent presentation.
Awesome explanation!
This video is amazing! Thank you so much.
Fantastic content. Learnt a lot from this.
AHA moment worked, thank you!!
Awesome lesson!!
Great content! Thanks for sharing!
Great video. Is there a software tool to visualize ALL the sha references?
I know you can see the commit DAG, but I want to see the commit+tree+blob DAG, like what was visualized at @10:37
Like are there any pyviz like tools that can automatically generate the clean, beautiful diagram you made at @10:37 but for and Git repo?
It’s clear you could make the diagram by crawling through the objects file but that seems frustrating and unnecessary
So lets say if in my git repo in i would make a change in every file each time i do a commit then git would have to allocate space on disc equal to how much space repo does take (more or less) and multiply it by number of commits?
Also great video!
Fantastic video!
9:50 "The contents of these files is *encrypted"*
That's not the term you were looking for.
Git storing objects in a compressed, application specific format is not a form of encryptiom.
It should be like that right? "Git stores the content in a compressed format and creates the SHA1 based on the compressed contents meaning compressed data are the parameter to produce the sha1 keys"?
@@abdulmatin3208 SHA1 is not a cypher. It doesn't encrypt stuff.
SHA-1 is a cryptographic hash function.
A hash function maps data of a variable size to values of a fixed size (hashes) - 20 bytes long in the case of SHA-1.
The "cryptographic" part means that it's considered to have a set of properties that makes it useful in cryptographic applications.
Perhaps the most important property being that it's very hard to find hash collisions, i.e. two inputs that produce the same output.
Git uses SHA-1 as the hash function of its *content addressable storage* system.
Content addressable storage is a way to organize stored data where the identifier for a piece of data (a git object in our case) is derived from the data itself.
To summarize: git objects are stored is a simple, unencrypted format that happens to use compression to save space. SHA-1 is only used to *identify* the objects in git's content addressable storage system.
Crazy video amazing explanation 🤩
So just to be clear there are duplications in blobs every time we change a file?
12:57 this is how the video should start Ave then the slide show after or in between the code and slides
The slides are eye glazing material without the code
In your Git model PDF you showed a single commit having multiple trees following parent-child pattern. However, practically when you showed the contents of a commit it just held reference to one single tree + one single tree only held reference to one/multiple blobs. What are we missing ?
tree objects are flat lists of references. References to both blobs and other trees. If the db needs to represent a sub-directory in your repo you would see a tree reference inside a tree. The example did not have any files in sub-directories.
Thank you! [bibs on the wall=trail?] I'll google some more and try it for myself but it would be great to see an example where you rename a file. I think in those cases, we lose the history of the changes to a file? i.e: this would be considered "deleted a file" + "added a file". I'm not sure I understand Linus' philosophy on that one... That's where I'm still confused... Just starting up on git; our team uses SVN and, migrating to git, many are worried that git "loses commit history" when refactoring/renaming files. [ignore svn2git issues, I'm talking about a project already in git]
This was very helpful in understand ing git.. Thanks
why you so angry at your keyboard man! For real tho it's distracting :) good video!
Thank you very much for this video!
Such a good video. Thanks ton man!
It's crazy I really thought that internally git would store diffs.
Great video!
amazing explanation
How come git add results in adding blob to the objects directory? Shouldn't it be done after running git commit command?
The famous Butterfly keyboard it is.
Thanks a lot. Very informative video.
Start at 3:40.
Amazing video .. cheers !!
I read that sha1 is a cryptographic algo, for which encrypting is easy, but encrypting from encrypted is nearly impossible. If git hashes all contents to sha1 hashes, how does it decrypts the contents so fast and correctly while say changing branch ?
Git doesn’t store the objects in an encrypted form. They are compressed and the hash is used as a name in a phone book to look them up.
You can’t decrypt a sha1 hash into its original input you can however find another input that will give the same hash. This is called a collision. Password cracking is looking for variants on known passwords and the dictionaries of various languages to find words that have matching hashes. The longer and more random a password is the more attempts are required to find a match.
This is what I need!
Awesome. Really useful
Confused why he kept re-adding files (git add ....). In all my git experience I add it the first time and never have to add again.
Awesome video
Excellent !!
Thx a lot, glorious tutorial!
Does anyone know what music was used for the intro?
Very nice! Thanks.
Is git also making a chain of hashes, like a blockchain?
This is gold
nice way of explanation.
How to view the content of blob if it's not text file? Just say it is image of zip file or something.
What is the purpose of local repository?
Let's take a case that we have central repository and our workspace only. What difference will local repository make?
You might want to check out Linus Torvald's talk on Git, particularly where he emphasizes the distributed nature of Git, and how it's superior to centralized version control systems like SVN etc.
very informative . thank you
Thank you for this video
is there any alternative to watch command on windows and mac os x?
Thank you!
Too good to be true!!
What utility did you use to split the shell?
1. He's using screen command. You need to install it by "apt install screen".
2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
brilliant!
This would be amazing if it would've been correct since you say trees have the complete current state as blobs, but at 11:05 you show trees pointing to other trees.
Thanks sir 🙏🏼
Thank you
stupendous
It's "working copy" not working directory
What terminal is he using? Doesn't look like gnome-terminal
1. He's using screen command. You need to install it by "apt install screen".
2. I would suggest you to intall terminator on linux. It's a very smooth terminal splitter and works like charm.
@@AliAnwarwish thanks.
@@shivanshhanda7553 Hey bro. You can install tmux too. I'm using it rn.
Hey man, I switched back to gnome-terminal as terminator was showing strange characters when using. Will give tmux a try, thanks.
Edit: Also, I found gnome-terminal to be faster than terminator, eg when I cd to a big git repo, terminator takes more time to load.
@@shivanshhanda7553 Yes I do agree. But tmux is better at session management too. Yeah nice, worth a try.
Saying that git is a DAG sounds like a challenge to researchers working on intentional cryptographic hash collisions.
It's all fun and games until someone finds a commit that has itself as a parent. :)
Replacing sha1 with a different hash algorithm wouldn’t be difficult. You can use the bit length of the hashes to distinguish them. Compatibility might be a pain temporarily.
I am not sure if the known attacks on sha1 allow constructing arbitrary input that collides.
20:45
Doug DeMuro This is the new Bmw x5 ......
Voila !