[front end ramble] Git internals - Git objects

Keywords: Programming git zlib SHA1 Ruby

Reading guide
This article is right Git Pro 10.2 Git internals - git objects The interpretation and transformation of the chapter mainly introduces two things: 1) using Git underlying command to complete the submission, 2) trying to use NodeJS to parse Git objects (Ruby is provided in the article).

###0x001 initialization
Initialize a local warehouse:

$ mkdir git-test
$ cd git-test
$ git init
Initialized empty Git repository in ...

To view the file structure:

+ git-test
   + .git
       + branches
       - config
       - description
       - HEAD
       + hooks
       + info
       + objects
           + info
           + pack
       + refs

For the time being, we don't pay attention to other folders, only objects. At this time, there are only two folders, info and pack. We don't pay attention to them either. We only pay attention to the changes of objects except for info and pack.

0x002 hash-object

This command is used to calculate the object ID for a file and possibly create a blob file. There are two meanings: 1) calculate the object ID. what is the object ID? 2) what is blob file? Why is it possible? The answer will be given next.

Execute command:

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4

-w indicates that the hash object stores the data object, and if not specified, returns the calculated objectId. --stdin indicates that content is read from standard input, that is, test content is calculated as content.

When we execute this command, we will return a 40 character SHA1 hash value. D670460b4aece5915caf5c68d12f560a9fe3e4, because - w is formulated, git will store this calculation and view the files under objects:

+ objects
    + d6
        - 70460b4b4aece5915caf5c68d12f560a9fe3e4

It will be found that there is an additional folder d6, and there is a file 70460b4aece5915caf5c68d12f560a9fe3e4 in d6. The combination of the two is exactly the object ID just generated.

If we execute this command many times, we will find that this file has not changed because it already exists, which is the reason why it may be generated.

If we change the content, a new objectID and a new blob file are generated.

0x003 cat-file

We've been up to how to store files, so how to read them? You can use cat file

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
test content

0x004 file storage and version recovery

Next we use files instead of content

$ echo 'version 1' > test.txt
$ git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30

Then update the file and store it

$ echo 'version 2' > test.txt
$ git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a

objects at this time

+ objects
    + 1f
        - 7a7a472abf3dd9643fd615f6da379c4acb3e3a
    + 83
        - baae61804e65cc73a7201a7252750c76066a30
    + d6
        - 70460b4b4aece5915caf5c68d12f560a9fe3e4

Then restore the contents of the file to the first version

$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt
$ cat test.txt
version 1

Or the second version

$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a > test.txt
$ cat test.txt
version 2

0x005 tree object and write tree

Add files to cache

$ git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt

Write cache contents to tree object

$ git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579
100644 blob 83baae61804e65cc73a7201a7252750c76066a30      test.txt

Create a new number object, including the second version of test.txt and a new file:

$ echo 'new file' > new.txt
$ git update-index --cacheinfo 100644 \
  1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
$ git update-index --add new.txt
$ git write-tree
0155eb4229851634a0f03eb265b69f5a2d56f341
$ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt

0x006 submit commit and commit tree

With the tree object, you can submit the tree object and generate a commit

$ echo 'first commit' | git commit-tree d8329f
b51096bf62fa145c0b95ce18dc3020daa1f2556e

View this commit

$ git cat-file -p b51096bf62fa145c0b95ce18dc3020daa1f2556e
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Scott Chacon <schacon@gmail.com> 1243040974 -0700
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit

Next, commit the second tree object and use - p to specify the last commit for this commit

$ echo 'second commit' | git commit-tree 0155eb4229851634a0f03eb265b69f5a2d56f341 -p b51096bf62fa145c0b95ce18dc3020daa1f2556e
bf41fa3700a67914b3b45eefced02fffcdaf4464

Use git log to view records

commit bf41fa3700a67914b3b45eefced02fffcdaf4464
Author: lyxxxx <lyxxxx@yeah.net>
Date:   Sun Nov 17 22:14:36 2019 +0800

    second commit

 new.txt  | 1 +
 test.txt | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

commit b51096bf62fa145c0b95ce18dc3020daa1f2556e
Author: lyxxxx <lyxxxx@yeah.net>
Date:   Sun Nov 17 22:07:01 2019 +0800

    first commit

 test.txt | 1 +
 1 file changed, 1 insertion(+)

The above is the process of creating Git submission history using the underlying command, which mainly involves five commands:

  • Hash object: calculate objectID and create blob file
  • Cat file: read the object file of production
  • Update index: update the staging area file
  • Write tree: write the staging file to the tree file
  • Commit tree: submit tree file

0x007 object file type

There are three types of object files:

  • blob: hash object generation, representing a file
  • Tree: write tree generation, indicating the list of files in the cache
  • Commit: commit tree generation, indicating the list of files submitted this time

Their relationship is:

  • commit contains a tree object
  • Tree contains multiple blob objects and tree objects

0x008 how to generate objectid

Next, use NodeJS to demonstrate how to generate an objectID,

Suppose what is up, doc?

const content = 'what is up, doc?'
const type = 'blob'

The storage format of the object object object is:

const store = `${type} ${content.length}\0${content}`

Then calculate the value of sh1:

const crypto = require('crypto');

const hash = crypto.createHash('sha1');
hash.update(store);
const objectID = hash.digest('hex');

The final calculation results are as follows:

bd9dbf5aae1a3862dd1526723246b20206e5fc37

Then there is storage. During storage, compression will be performed before storage:

const zlib = require('zlib');

const result = zlib.deflateSync(Buffer.from(store))

Then split and store it under the objects folder according to the object ID:

+ objects
    + bd
        - 9dbf5aae1a3862dd1526723246b20206e5fc37

Full source:

const zlib = require('zlib');
const fs = require('fs');
const Buffer = require('buffer').Buffer
const crypto = require('crypto');

const type = 'blob'
const content = process.argv[2]

const store = `${type} ${content.length}\0${content}`

const hash = crypto.createHash('sha1');
hash.update(store)
const objectID = hash.digest('hex')
const result = zlib.deflateSync(Buffer.from(store))

const path = '.git/objects'
const [a, b, ...file] = objectID
const dirPath = `${path}/${a}${b}`
const filePath = `${dirPath}/${file.join('')}`
fs.mkdirSync(dirPath)
fs.writeFileSync(filePath)

0x009 resources

0x010 carry goods

I found a funny library recently. The author is a big guy-- Phenomenon level micro scene editor based on React.

Posted by fredted40x on Sun, 17 Nov 2019 23:14:17 -0800