minio high performance Kubernetes native object storage

Keywords: MinIO

minio high performance Kubernetes native object storage

  • minio high performance Kubernetes native

    Object storage

    • characteristic
    • install
      • stand-alone
      • Distributed
    • Client mc installation and use
    • Optimization Practice of minio in K8S

MinIO is an object storage service based on Apache License v2.0 open source protocol. It is compatible with Amazon S3 cloud storage service interface and is very suitable for storing large-capacity unstructured data, such as pictures, videos, log files, backup data and container / virtual machine images. An object file can be of any size, ranging from a few kb to a maximum of 5T. MinIO is a very lightweight service that can be easily combined with other applications, similar to NodeJS, Redis Or MySQL.


  • High performance minio is the fastest object storage in the world (the official website says:
  • Elastic capacity expansion is very convenient for cluster elastic capacity expansion
  • Native cloud services
  • Open source and free, most suitable for enterprise customization
  • S3 factual standard
  • Simple and powerful
  • Storage mechanism (Minio uses erasure code and checksum to protect data from hardware failure and silent data damage. Even if half of the number (N/2) of hard disks are lost, the data can still be recovered)


minio is divided into server and client. The server is deployed through minio. The client is only a binary command (mc). Object storage (addition, deletion, query, etc.) can be operated through mc. Of course, minio also provides SDK s in various languages. For details, please refer to Official website

The installation of the server is divided into independent stand-alone mode and Distributed installation. The installation method of the following stand-alone mode is similar to that of the stand-alone mode, but different according to the transmission parameters



Distributed benefits distributed Minio allows you to form multiple hard disks (even on different machines) into an object storage service. Because the hard disks are distributed on different nodes, distributed Minio avoids a single point of failure

Data protection distributed Minio adopts erasure code to prevent multiple node downtime and bit attenuation bit rot. distributed Minio requires at least four hard disks, and the erasure code function is automatically introduced by using distributed Minio.

The highly available stand-alone Minio service has a single point of failure. On the contrary, if it is a distributed Minio with n hard disks, your data is safe as long as N/2 hard disks are online. However, you need at least N/2+1 hard disks to create new objects. For example, a 16 node Minio cluster with 16 hard disks per node is still readable even if 8 servers are down However, you need 9 servers to write data. Note that as long as you comply with the restrictions of distributed Minio, you can combine different nodes and several hard disks per node. For example, you can use 2 nodes with 4 hard disks per node, or 4 nodes with 2 hard disks per node, and so on.

Consistency Minio in distributed and stand-alone modes, all read and write operations strictly follow the read after write consistency model.

Erasure code Minio uses erasure code and checksum to protect data from hardware failure and silent data damage. Even if you lose half (N/2) of your hard disk, you can still recover data.

What is erasure code? Erasure code is a mathematical algorithm to recover lost and damaged data. Minio uses Reed Solomon code to split objects into N/2 data and N/2 parity blocks. This means that if there are 12 disks, an object will be divided into 6 data blocks and 6 parity blocks, and you can lose any 6 disks (whether it is a stored data block or parity block), you can still recover the data from the remaining disk. Is it very NB? Please google if you are interested.

Why is erasure code useful? The working principle of erasure code is different from RAID or replication. For example, RAID6 can not lose data when two disks are lost, while Minio erasure code can still ensure that half of the disks are lost data security Moreover, the Minio erasure code is used at the object level to recover one object at a time, while RAID is used at the volume level, and the data recovery time is very long. Minio encodes each object separately. Once the storage service is deployed, it usually does not need to replace the hard disk or repair. The design goal of the Minio erasure code is to improve performance and use hardware as much as possible.

What is bit rot protection? Bit rot, also known as data corruption, Silent Data Corruption and Silent Data Corruption, is a serious data loss problem of hard disk data at present. The data on the hard disk may be damaged unknowingly, and there is no error log. As the saying goes, open guns are easy to hide and hidden arrows are difficult to prevent. This kind of mistake made behind the scenes is better than the hard disk But don't be afraid. The Minio erasure code uses a high-speed HighwayHash hash based checksum to prevent bit attenuation.

Distributed deployment: GNU/Linux and macOS example 1: start the distributed Minio instance with 8 nodes and 1 disk for each node. You need to run the following commands on all 8 nodes.

minio server \     

#helm installs google itself     
helm install minio --set mode=distributed,numberOfNodes=4,imagePullPolicy=IfNotPresent,accessKey=v9rwqYzXXim6KJKeyPm344,secretKey=0aIRBu9KU7gAN0luoX8uBE1eKWNPDgMnkVqbPC,service.type=NodePort,service.nodePort=25557 googleapis/minio -n velero     

#After the installation, query the status of pods. If the READY status of pods is normal, the installation is successful, as shown in the figure below     
kubectl get pods -n velero -o wide     

#If the READY status of pods is not always the status, check logs     
kubectl logs minio-0 -n velero     

#If you are prompted that the disk s are waiting, you can restart pods to check     
kubectl delete pods -n velero minio-{0,1,2,3} 

#The default is cluster access. For convenience, I use nodeport     

As shown in the figure above, when I use four nodes to create distributed minio, I will use the default pvc to create storage. By default, one 10G storage is created for each node (which can be customized and modified)

Client mc installation and use


chmod +x mc     
./mc --help     

mc command Guide

ls       Lists files and folders.     
mb       Create a bucket or folder.     
cat      Displays the contents of files and objects.     
pipe     Will one STDIN Redirect to an object or file STDOUT.      
share    Generate for sharing URL.      
cp       Copy files and objects.     
mirror   Mirror buckets and folders.     
find     Find files based on parameters.     
diff     Compare the difference between two folders or buckets.     
rm       Delete files and objects.     
events   Manage object notifications.     
watch    Listen for events on files and objects.     
policy   Manage access policies.     
session  by cp Command to manage saved sessions.     
config   Administration mc Configuration file.     
update   Check for software updates.     
version  Output version information.     

mc command practice

#View the minio server configuration     
mc config host ls     

#Add minio server configuration     
mc config host add minio  v9rwqYzXXim6KJKeyPm344 0aIRBu9KU7gAN0luoX8uBE1eKWNPDgMnkVqbPC --api s3v4     

#View Mini bucket     
mc ls minio     

#Create bucket     
mc mb minio/backup     

#Upload local directory (file without r)     
mc cp -r  ingress minio/backup/     

#Download remote directory (file without r)     
mc cp -r  minio/backup .     

#Mirror a local folder to Minio (similar to rsync)      
mc mirror localdir/ minio/backup/     

#Continuously listen for local folder mirroring to Minio (similar to rsync)      
mc mirror -w localdir/ minio/backup/     

#Continuously find all jpeg images from the minio bucket and copy them to the minio "play/bucket" bucket     
mc find minio/bucket --name "*.jpg" --watch --exec "mc cp {} play/bucket"     

#Delete directory     
mc rm minio/backup/ingress  --recursive --force     

#Delete file     
mc rm minio/backup/service_minio.yaml     

#Delete all incomplete uploaded objects from mybucket     
mc rm  --incomplete --recursive --force play/mybucket     

#Delete objects 7 days ago     
mc rm --force --older-than=7 play/mybucket/oldsongs     

#Output MySQL database dump file to minio     
mysqldump -u root -p ******* db | mc pipe minio/backups/backup.sql     

#mongodb backup     
mongodump -h mongo-server1 -p 27017 -d blog-data --archive | mc pipe minio1/mongobkp/backups/mongo-blog-data-`date +%Y-%m-%d`.archive     

Optimization Practice of minio in K8S

As mentioned above, minio's practice in k8s is in my practice environment. After I install distributed through helm, I use nfs as storeageclasses by default. There are four nodes in total and four pvc are automatically created. After I delete the data of one pvc, minio can still read and write normally, and the data can still exist. Refer to the following figure

But there is one biggest problem. If you use self built shared storage such as nfs, even if minio has four nodes, it can ensure data security. However, you do have only one nfs disk. In case your nfs goes down and the disk is damaged, all your data will be gone. Therefore, in order to ensure data security, it is recommended to use hostPath, Save the corresponding data in each node. In this way, even if the node goes down and the disk is damaged, your data will not be lost. Moreover, the speed of reading and writing data through the local node will be faster. Of course, you need to manage the local storage of the node

Deployment practice of minio in K8S hostPath

Environment Description: five node k8s environments, four of which are used as Minos, and all use node host networks

#1. Label four nodes because I want to deploy minio by selecting the node labeled minio server = true     
kubectl get node --show-labels=true     
kubectl label nodes node-hostname1  minio-server=true     
kubectl label nodes node-hostname2  minio-server=true     
kubectl label nodes node-hostname3  minio-server=true     
kubectl label nodes node-hostname3  minio-server=true     

#2. Add hosts to the corresponding host. If your hostname can be resolved automatically, it does not need to be modified. Add all 4 hosts     
echo "host1 [IP1] >> /etc/hosts"     
echo "host2 [IP2] >> /etc/hosts"     
echo "host3 [IP3] >> /etc/hosts"     
echo "host4 [IP4] >> /etc/hosts"     

#3. Create a namespace     
#You can also use other customized namespace s, but you need to modify the yaml file below     
kubectl create ns velero     

#4. Download headless, daemon and service     

#5. Modify and create the corresponding service and daemon     
The main modification is`minio-distributed-daemonset.yaml`     
hostPath: Define the local path of the node you need to use     
MINIO_ACCESS_KEY,MINIO_SECRET_KEY: Define your secret key,Timely modification for safety     
args: After starting parameters url Change to hostname mode: http://host{1...4}/data/minio     

`minio-distributed-service.yaml`Serving the outside world,Default to ClusterIP,Can combine ingress perhaps nodePort To visit,You can modify it yourself     

kubectl create -f minio-distributed-statefulset.yaml      
kubectl create -f minio-distributed-daemonset.yaml     
kubectl create -f minio-distributed-service.yaml     

Original text:

Posted by Hotkey on Sun, 28 Nov 2021 14:31:33 -0800