MongoDB Distributed File Storage System

Keywords: MongoDB Java less Fragment

Summary

For BSON document objects in MongoDB's storage base unit, field values can be of binary type. Based on this feature, we can store files directly in MongoDB, but there is a limitation. Because a single BSON object in MongoDB cannot be larger than 16MB, we need GridFS if we need to store larger files.

Small File Storage System and GridFS File Storage

Let's start with an example of MongoDB storing small file systems:

First, the mongofiles of MongoDB are used to upload files:

D:\MongoDB\Server\3.2\bin>mongofiles.exe list
2017-03-06T13:41:03.283+0800    connected to: localhost

D:\MongoDB\Server\3.2\bin>mongofiles.exe put E:\deliveryTask.doc
2017-03-06T13:41:23.535+0800    connected to: localhost
added file: E:\deliveryTask.doc

D:\MongoDB\Server\3.2\bin>mongofiles.exe list
2017-03-06T13:41:30.114+0800    connected to: localhost
E:\deliveryTask.doc     2971

View file storage through the mongos command:

> use test
switched to db test
> show collections
fs.chunks
fs.files
restaurants
user
> db.fs.files.find()
{ "_id" : ObjectId("58bcf683afa0fa20bc854a2b"), "chunkSize" : 261120, "uploadDat
e" : ISODate("2017-03-06T05:41:23.604Z"), "length" : 2971, "md5" : "5434b8033062
99fff57c8a54d3adf78b", "filename" : "E:\\deliveryTask.doc" }

You can see that the file upload was successful.

Since this chapter mainly deals with some theories and operation and maintenance practices, it is not concerned with the development and implementation of specific code (specific code will be introduced in the following chapters as an example of Java).

Try uploading a file larger than 16MB:

D:\MongoDB\Server\3.2\bin>mongofiles.exe put E:\synch.rar
2017-03-06T14:33:11.028+0800    connected to: localhost
added file: E:\synch.rar

D:\MongoDB\Server\3.2\bin>mongofiles.exe list
2017-03-06T14:33:15.265+0800    connected to: localhost
E:\deliveryTask.doc     2971
E:\synch.rar    24183487

View file storage through the mongos command:

> db.fs.files.find()
{ "_id" : ObjectId("58bcf683afa0fa20bc854a2b"), "chunkSize" : 261120, "uploadDate" : ISODate("2017-03-06T05:41:23.604Z"), "length" : 2971, "md5" : "5434b803306299fff57c8a54d3adf78b", "filename" : "E:\\deliveryTask.doc" }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "chunkSize" : 261120, "uploadDate" : ISODate("2017-03-06T06:33:12.013Z"), "length" : 24183487, "md5" : "bbfe4d8579372aa0729726185997e908", "filename" : "E:\\synch.rar" }

Successful,

Look at chunks:

> db.fs.chunks.find({},{data:0})
{ "_id" : ObjectId("58bcf683afa0fa20bc854a2c"), "files_id" : ObjectId("58bcf683afa0fa20bc854a2b"), "n" : 0 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b2d"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 0 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b2e"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 1 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b2f"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 2 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b30"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 3 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b31"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 4 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b32"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 5 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b33"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 6 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b34"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 7 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b35"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 8 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b36"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 9 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b37"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 10 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b38"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 11 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b39"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 12 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b3a"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 13 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b3c"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 15 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b3b"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 14 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b3e"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 17 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b3d"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 16 }
{ "_id" : ObjectId("58bd02a7afa0fa21d4a14b3f"), "files_id" : ObjectId("58bd02a7afa0fa21d4a14b2c"), "n" : 18 }
Type "it" for more

You can see that large files are divided into many chunk s, so why upload files over 16MB will be successful, because we use the file stored in the GridFS system, because we use the mongfiles method to upload files.

The following are query, download and delete operations:

D:\MongoDB\Server\3.2\bin>mongofiles.exe search rar
2017-03-06T14:45:31.974+0800    connected to: localhost
E:\synch.rar    24183487

D:\MongoDB\Server\3.2\bin>mongofiles.exe --local D:\mongodb_download.rar get E:\synch.rar
2017-03-06T14:47:17.841+0800    connected to: localhost
finished writing to D:\mongodb_download.rar

D:\MongoDB\Server\3.2\bin>mongofiles.exe delete E:\synch.rar
2017-03-06T14:47:56.649+0800    connected to: localhost
successfully deleted all instances of 'E:\synch.rar' from GridFS

D:\MongoDB\Server\3.2\bin>mongofiles.exe list
2017-03-06T14:48:03.886+0800    connected to: localhost
E:\deliveryTask.doc     2971

In fact, we can also customize the prefix of the collection, default is fs, or set the size of the chunk, default is 256KB.

summary

Then how to decide which storage scheme to use in the actual distributed file storage system can be adopted as follows:
1. For any files uploaded by users, size judgment is made on the client side.
2. When the file size is less than 16MB, it is stored directly in the ordinary collection of MOngoDB.
3. When the file size is larger than 16MB, it is uploaded to GridFS and saved by collections fs.files and fs.chunks.
4. When a user downloads a file, he searches for it in different collections according to the size and attributes of different files.

In addition, for fs.chunks files, we can store them in fragments, and the key can choose the index field {"files_id"}. This field ensures that all the segmented chunks of the file are on the same slice as far as possible. fs.files do not need to be fragmented. This collection only stores metadata information of the file, but also has a small amount of data. At the same time, it can set the default block size (256KB).

It should be noted that GridFS is not suitable for small file storage, because reading files from GridFS involves two query operations, first query the fs.files set, then query the fs.chunks set, and then get the whole file after the chunks merge.

Another point to note is that the file block size is 256KB, while the block size of the fragment defaults to 64MB, so don't get confused.

Posted by adaminms on Fri, 12 Apr 2019 03:03:32 -0700