2021SC@SDUSC Hbase project overview
2021SC@SDUSC
1, HBase overview
What is HBase
HBase is a database system built on HDFS, which provides high reliability, high performance, column storage, scalability and real-time reading and writing. It is mainly used to store unstructured and semi-structured loose data. HBase uses hadoop HDFS as its file storage system, Hadoop MapReduc ...
Posted by gman-03 on Sun, 24 Oct 2021 20:13:50 -0700
The Minio Java framework for object storage spring cloud encapsulates the Minio SDK as an http interface
preface
Due to the needs of the project, binary files such as audio and images need to be stored. These files cannot be stored in the database. HBase has been used for audio storage before. However, HBase's api is too complex to use
There is also fastdfs for file storage, but the installation and configuration is very complex. Later, I found ...
Posted by Canadian on Sun, 24 Oct 2021 13:04:21 -0700
Consumer group analysis and use of sarama
The most used go client of kafka should be sarama, but the old version of sarama did not support the consumption mode of consumer groups, so most people use sarama cluster.
Later, sarama supported the consumption mode of consumer groups, and sarama cluster stopped maintenance. However, there are few online analysis of sarama consumer groups, a ...
Posted by ryeman98 on Thu, 21 Oct 2021 23:24:56 -0700
It took a month to sort out this Hadoop blood spitting dictionary
This document is compiled with reference to the official website of Hadoop and many other materials. For neat typesetting and comfortable reading, blurred and unclear pictures and black-and-white pictures are redrawn into high-definition color pictures.
At present, Hadoop 2. X is widely used in enterprises, so this article focuses on Hado ...
Posted by b-real on Thu, 21 Oct 2021 21:42:54 -0700
Construction of data warehouse environment
Hive environment construction
Hive engine introduction
Hive engine includes: default MR, tez, spark Hive on Spark: hive not only stores metadata, but also is responsible for SQL parsing and optimization. The syntax is HQL syntax. The execution engine has become Spark, and Spark is responsible for RDD execution. Spark on hive: hive is only ...
Posted by wha??? on Thu, 21 Oct 2021 19:46:17 -0700
Phoenix installation and use
Phoenix installation and use
1. Background introduction
1.1 Phoenix definition
Phoenix is an open source SQL skin for HBase. You can use the standard JDBC API instead of the HBase client API to create tables, insert data, and query HBase data.
1.2 Phoenix features
Easy integration: such as Spark, Hive, Pig, Flume and Map Reduce. Good per ...
Posted by epukinsk on Tue, 19 Oct 2021 12:40:48 -0700
Sqoop+Jekins realize the mutual transmission between Mysql and Hive database
1, Foreword
Recently, we are using Sqoop+Jekins to realize the data transfer between mysql and hive database.
It mainly uses the Import command of sqoop to import mysql data into hive, and uses the export command to export hive data to mysql.
Jekins plays a regular role, executing sh scripts regularly and synchronizing once a day.
Relevant ...
Posted by reethu on Mon, 18 Oct 2021 15:17:56 -0700
HDFS principle and operation
hadoop fs
hadoop fs -ls /
hadoop fs -lsr
hadoop fs -mkdir /user/hadoop
hadoop fs -put a.txt /user/hadoop/
hadoop fs -get /user/hadoop/a.txt /
hadoop fs -cp src dst
hadoop fs -mv src dst
hadoop fs -cat /user/hadoop/a.txt
hadoop fs -rm /user/hadoop/a.txt
hadoop fs -rmr /user/hadoop/a.txt
hadoop fs -text /user/hadoop/a.txt
hadoop fs -copyFromL ...
Posted by JamieinNH on Sat, 16 Oct 2021 10:26:18 -0700
Solution of Spark data skew
Data skew caused by Shuffle
When data skew occurs during Shuffle, we generally follow the troubleshooting steps
① Check the WEB-UI page to check the execution of tasks in the Stage of each Job, and whether there is an obvious situation that the execution time is too long
② If the task reports an error, check the corresponding log excepti ...
Posted by dstantdog3 on Sat, 16 Oct 2021 10:15:13 -0700
36-Blog Site Database-Blog Comment Information Data Operation
36-Blog Site Database-Blog Comment Information Data Operation
Item Description
Nowadays, micro-blog and blog publishing information have become the main information publishing and dissemination system. How to manage these data, this project mainly operates on the blog information table and commentary table in the blog website.
Blog site data ...
Posted by Weedpacket on Fri, 15 Oct 2021 09:19:53 -0700