Reading guide
This article is right Git Pro 10.2 Git internals - git objects The interpretation and transformation of the chapter mainly introduces two things: 1) using Git underlying command to complete the submission, 2) trying to use NodeJS to parse Git objects (Ruby is provided in the article).
###0x001 initialization
Initialize a local warehouse:
$ mkdir git-test $ cd git-test $ git init Initialized empty Git repository in ...
To view the file structure:
+ git-test + .git + branches - config - description - HEAD + hooks + info + objects + info + pack + refs
For the time being, we don't pay attention to other folders, only objects. At this time, there are only two folders, info and pack. We don't pay attention to them either. We only pay attention to the changes of objects except for info and pack.
0x002 hash-object
This command is used to calculate the object ID for a file and possibly create a blob file. There are two meanings: 1) calculate the object ID. what is the object ID? 2) what is blob file? Why is it possible? The answer will be given next.
Execute command:
$ echo 'test content' | git hash-object -w --stdin d670460b4b4aece5915caf5c68d12f560a9fe3e4
-w indicates that the hash object stores the data object, and if not specified, returns the calculated objectId. --stdin indicates that content is read from standard input, that is, test content is calculated as content.
When we execute this command, we will return a 40 character SHA1 hash value. D670460b4aece5915caf5c68d12f560a9fe3e4, because - w is formulated, git will store this calculation and view the files under objects:
+ objects + d6 - 70460b4b4aece5915caf5c68d12f560a9fe3e4
It will be found that there is an additional folder d6, and there is a file 70460b4aece5915caf5c68d12f560a9fe3e4 in d6. The combination of the two is exactly the object ID just generated.
If we execute this command many times, we will find that this file has not changed because it already exists, which is the reason why it may be generated.
If we change the content, a new objectID and a new blob file are generated.
0x003 cat-file
We've been up to how to store files, so how to read them? You can use cat file
$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4 test content
0x004 file storage and version recovery
Next we use files instead of content
$ echo 'version 1' > test.txt $ git hash-object -w test.txt 83baae61804e65cc73a7201a7252750c76066a30
Then update the file and store it
$ echo 'version 2' > test.txt $ git hash-object -w test.txt 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
objects at this time
+ objects + 1f - 7a7a472abf3dd9643fd615f6da379c4acb3e3a + 83 - baae61804e65cc73a7201a7252750c76066a30 + d6 - 70460b4b4aece5915caf5c68d12f560a9fe3e4
Then restore the contents of the file to the first version
$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt $ cat test.txt version 1
Or the second version
$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a > test.txt $ cat test.txt version 2
0x005 tree object and write tree
Add files to cache
$ git update-index --add --cacheinfo 100644 \ 83baae61804e65cc73a7201a7252750c76066a30 test.txt
Write cache contents to tree object
$ git write-tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579 $ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579 100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txt
Create a new number object, including the second version of test.txt and a new file:
$ echo 'new file' > new.txt $ git update-index --cacheinfo 100644 \ 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt $ git update-index --add new.txt $ git write-tree 0155eb4229851634a0f03eb265b69f5a2d56f341 $ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341 100644 blob fa49b077972391ad58037050f2a75f74e3671e92 new.txt 100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
0x006 submit commit and commit tree
With the tree object, you can submit the tree object and generate a commit
$ echo 'first commit' | git commit-tree d8329f b51096bf62fa145c0b95ce18dc3020daa1f2556e
View this commit
$ git cat-file -p b51096bf62fa145c0b95ce18dc3020daa1f2556e tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579 author Scott Chacon <schacon@gmail.com> 1243040974 -0700 committer Scott Chacon <schacon@gmail.com> 1243040974 -0700 first commit
Next, commit the second tree object and use - p to specify the last commit for this commit
$ echo 'second commit' | git commit-tree 0155eb4229851634a0f03eb265b69f5a2d56f341 -p b51096bf62fa145c0b95ce18dc3020daa1f2556e bf41fa3700a67914b3b45eefced02fffcdaf4464
Use git log to view records
commit bf41fa3700a67914b3b45eefced02fffcdaf4464 Author: lyxxxx <lyxxxx@yeah.net> Date: Sun Nov 17 22:14:36 2019 +0800 second commit new.txt | 1 + test.txt | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) commit b51096bf62fa145c0b95ce18dc3020daa1f2556e Author: lyxxxx <lyxxxx@yeah.net> Date: Sun Nov 17 22:07:01 2019 +0800 first commit test.txt | 1 + 1 file changed, 1 insertion(+)
The above is the process of creating Git submission history using the underlying command, which mainly involves five commands:
- Hash object: calculate objectID and create blob file
- Cat file: read the object file of production
- Update index: update the staging area file
- Write tree: write the staging file to the tree file
- Commit tree: submit tree file
0x007 object file type
There are three types of object files:
- blob: hash object generation, representing a file
- Tree: write tree generation, indicating the list of files in the cache
- Commit: commit tree generation, indicating the list of files submitted this time
Their relationship is:
- commit contains a tree object
- Tree contains multiple blob objects and tree objects
0x008 how to generate objectid
Next, use NodeJS to demonstrate how to generate an objectID,
Suppose what is up, doc?
const content = 'what is up, doc?' const type = 'blob'
The storage format of the object object object is:
const store = `${type} ${content.length}\0${content}`
Then calculate the value of sh1:
const crypto = require('crypto'); const hash = crypto.createHash('sha1'); hash.update(store); const objectID = hash.digest('hex');
The final calculation results are as follows:
bd9dbf5aae1a3862dd1526723246b20206e5fc37
Then there is storage. During storage, compression will be performed before storage:
const zlib = require('zlib'); const result = zlib.deflateSync(Buffer.from(store))
Then split and store it under the objects folder according to the object ID:
+ objects + bd - 9dbf5aae1a3862dd1526723246b20206e5fc37
Full source:
const zlib = require('zlib'); const fs = require('fs'); const Buffer = require('buffer').Buffer const crypto = require('crypto'); const type = 'blob' const content = process.argv[2] const store = `${type} ${content.length}\0${content}` const hash = crypto.createHash('sha1'); hash.update(store) const objectID = hash.digest('hex') const result = zlib.deflateSync(Buffer.from(store)) const path = '.git/objects' const [a, b, ...file] = objectID const dirPath = `${path}/${a}${b}` const filePath = `${dirPath}/${file.join('')}` fs.mkdirSync(dirPath) fs.writeFileSync(filePath)
0x009 resources
- Official text
- Introduction to Git principle - Ruan Yifeng
- Source code of this chapter : I continue to maintain this project, and slowly try to write a miniGit using NodeJS to play.
0x010 carry goods
I found a funny library recently. The author is a big guy-- Phenomenon level micro scene editor based on React.