Section IV Design of ipfs File Slices

Keywords: Database git Python Linux

ipfs slicing technology

As we all know, the technology used in ipfs is based on the existing technical solutions, of course, not that the ipfs team technology is not good, but if you stand on the shoulders of giants, you can start looking further. About file slicing, git is very excellent. git was born in a famous family. It was realized by linux torvalds, and then polished by many martial arts masters.

git warehouse catalogue

[root@localhost ipfs-cxx]# ll .git/
total 60
drwxr-xr-x.  2 root root  4096 Sep 19 17:06 branches
-rw-r--r--.  1 root root    54 Sep 19 17:45 COMMIT_EDITMSG
-rw-r--r--.  1 root root   260 Sep 19 17:06 config
-rw-r--r--.  1 root root    73 Sep 19 17:06 description
-rw-r--r--.  1 root root    23 Sep 19 17:06 HEAD
drwxr-xr-x.  2 root root  4096 Sep 19 17:06 hooks
-rw-r--r--.  1 root root 12720 Sep 19 17:43 index
drwxr-xr-x.  2 root root  4096 Sep 19 17:06 info
drwxr-xr-x.  3 root root  4096 Sep 19 17:06 logs
drwxr-xr-x. 25 root root  4096 Sep 19 17:45 objects
-rw-r--r--.  1 root root   107 Sep 19 17:06 packed-refs
drwxr-xr-x.  5 root root  4096 Sep 19 17:06 refs
[root@localhost ipfs-cxx]#
  • Branches are branch directories that record the topology of version branches
  • hooks are example shell scripts, which can be seen by experts and not used by ordinary users once in a lifetime.
  • An index file is a version directory file that records file names, file sha1() hash values, add or commit status, and date in action units.
  • Object/xy stores slice files. XY encoding range: 00~ff. In ipfs, the range of x is [2 `8, A-Z], and the range of Y is [A-Z]
  • refs records current branch, remote target branch information

ipfs warehouse directory:

Take windows as an example: C: Users Administrator. ipfs datastore stores links files, or metadata, using leveldb database
C: Users Administrator. ipfs blocks XX stores content data using common file system slices

Ignoring the version written in golang, the excellent cxx version of ipfs metadata design format is as follows:

"source file name":
"{
	"type":"tree",
	"file_sha1":"5f01473a8c4d050bd2df5dabaca3d5e31e6b52f6",
	"file_name":"document.zip",
	"size":8236956
}"
"file_sha1":
"{
    "links":[
        {
            "hash":"166ec216e3b0848c17cbb323e9db7e4f96d71a44",
            "size":262144
        },
        {
            "hash":"5d0ac4a0daf189f3a1ef60483221a21b5d18cea4",
            "size":262144
        },
        {
            "hash":"92d3ad03c76c5aad4df7325ad9902164f57c249d",
            "size":262144
        },
        {
            "hash":"a721fb84fbff46920debe9b0d05ba0203899e274",
            "size":262144
        },
        {
            "hash":"fd8a834168dc8f7308a9c917ec4a4dd742a24d64",
            "size":110492
        }
    ],
    "slices_num":32,
    "type":"list",
    "file_name":"document.zip",
}
"

cxx version file warehouse class

namespace ipfs {
	using namespace file_utils;
	using namespace crc8;
	using namespace db;

	namespace file {
		
		class repository final
		{
			/*
			 *Be careful:
			 *In the initial stage, the single database sub-table is applicable. After all the documents of the framework are formed, the sub-hash sub-table is applicable. For metadata, the metadata of files within 10T should be able to support.
			 *1.0 The version does not consider sub-database and sub-table for the time being.
			 *Write to list.ldb database (leveldb) as an index file after slicing the original file, key is list_name, value is root_list
			 *
			 *1.1 Version design idea:
			 File object links metadata is written to the database as a data. When the file is larger than 100M, it is necessary to consider writing object links metadata into multiple data.(
			 For example, a list file, links up to 256, can support index 64M data files.
			 Referring to the design method of minix file system and taking into account the small files whose average size is 100M, we plan to use two-level list.
			 	Level 1 "type"="list_top1"
			 	Level 2 "type"="list_top2", maximum support for 256 * 256 * 256kb = 16GB file slice storage and indexing)
			 */
				bool add_file(const char * filepath)
				{
								if (false == is_file_exist(filepath)) {
									cout << "please source file exist!" << endl;
									return false;
								}
				
								int64_t total_size = file_size(filepath);
								if (-1 == total_size) {
									return false;
								}
								size_t slices_cnt = total_size / block_unit;
								size_t free = total_size % block_unit;
								size_t all_slices_count = slices_cnt;
								if (free) {
									all_slices_count += 1;
								}
				
								cout << filepath << " Number of slices=" << slices_cnt << " last slices size=" << free << endl;
								// ...
				}


			bool get_file_block(const char * path, void * buf, size_t offset, size_t len) {
							// First open the leveldb directory file and get links key
							// Getting links corresponding to key s from level dB
							// Obtain the block sha1 value from links, and then calculate the absolute directory of object in the warehouse
							// Read the warehouse object file and write buf back
			
							// Read catalog files
							string key(basename(path));
							string links_str;
							if (false == db->get_item(key, links_str)) {
								cout << "tree Metadata does not exist" << endl;
								return false;
							}
							
							//Court < < get_file_block() reads the directory file as follows: < < endl < < links < < endl; 
			
							Json::Reader reader;
							Json::Value root_tree;
							if (!reader.parse(links_str, root_tree)) {
								return false;
							}
							string type = root_tree["type"].asString();
							string file_sha1 = root_tree["file_sha1"].asString();
							string file_name = root_tree["file_name"].asString();				
							int64_t size = root_tree["size"].asInt64();
			}
			
			bool del_file(const string & sha1);
			bool del_file(const char *path);
			
			static void callback_repo_gc(const boost::system::error_code&,
					boost::asio::deadline_timer* pt)
			{
				std::cout << "Timing 10 sec Clean up garbage files in warehouses!" << std::endl;

				pt->expires_at(pt->expires_at() + boost::posix_time::seconds(10)) ;

				pt->async_wait(boost::bind(callback_repo_gc, boost::asio::placeholders::error, pt));
			}

			static void repo_gc(void)
			{
				// unlink file which markede as removed
				//
				boost::asio::io_service io;
				boost::asio::deadline_timer t(io, boost::posix_time::seconds(10));
				t.async_wait(boost::bind(callback_repo_gc, boost::asio::placeholders::error, &t));
				std::cout << "launch up garbage clean timer !" << std::endl;
				io.run();
			}
			
		public:
			string path_{};
			string block_{};
			string datastore_{};
			size_t const 	block_unit{1024 * 256};
			Cleveldb * db{nullptr};
		}
	}
}

Regarding this article:

Writing this article as a technical analysis, by the way, a few other aspects.

c + + is an excellent development language. It links the preceding with the following. It has all the features of high-level language, as well as the features of c language, which is directly oriented to the bottom of api.
Unlike some languages, such as golang, python and other so-called high-level languages, their libraries and compilers, interpreters are implemented in c++, which in turn is not good.

Now there are some engineers or leaders or PMS who don't know where they are. They have the spirit of actively and selflessly guiding others. They just read some misleading technical blogs. They don't know golang or Python themselves. They never tire of asking others to use golang/python. They have learned how to do this. Do you know the advantages of pronunciation? Or do we treat information as knowledge, pretend to be knowledge and mislead the audience?

linux operating system is still written in pure c at present, many large-scale software and basic components are written in c + +.
The development efficiency of a simple project may have something to do with language. A medium-sized and large-scale project has nothing to do with language. Personally, I think it mainly depends on the level of engineers.
As long as linux is not written in golang and python or java, c/c + + is irreplaceable.

Posted by liljim on Thu, 19 Sep 2019 19:33:56 -0700